PostgreSQL 慢查询调优利器：auto_explain 扩展详解与实战

2025/3/7 12:26:46 333 0 0 0

大家好，我是你们的数据库老朋友“码农DBA”。今天咱们来聊聊 PostgreSQL 数据库里一个非常有用的扩展——auto_explain，它可以自动记录慢查询的执行计划，方便咱们分析和优化 SQL 语句。相信很多用 PostgreSQL 的开发者都遇到过慢查询的问题，排查起来有时候真是让人头疼。别担心，有了 auto_explain，咱们就能更轻松地找到问题所在，让你的数据库跑得更快！

什么是 auto_explain？

auto_explain 是 PostgreSQL 的一个贡献扩展（contrib module），它能自动记录那些执行时间超过设定阈值的 SQL 语句的执行计划。简单来说，就是当一条 SQL 语句执行得很慢的时候，auto_explain 会自动把它的“执行过程”详细地记录下来，就像拍了一张“X 光片”一样，让咱们能看清楚这条 SQL 语句到底慢在哪里。

你可能会问，PostgreSQL 不是有 EXPLAIN 命令吗？为什么还要用 auto_explain？

EXPLAIN 命令确实可以查看 SQL 语句的执行计划，但是需要你手动执行。而 auto_explain 是自动的，只要配置好了，它就会在后台默默地帮你记录慢查询的执行计划，不用你每次都手动去执行 EXPLAIN，省时省力。

为什么要用 auto_explain？

自动记录，无需手动： auto_explain 会自动记录慢查询，不需要手动执行 EXPLAIN 命令。
方便分析，定位瓶颈： 通过分析 auto_explain 记录的执行计划，可以快速定位慢查询的瓶颈所在。
性能监控，防患未然： auto_explain 可以帮助你监控数据库的性能，及时发现潜在的性能问题。
配置灵活，按需定制： auto_explain 提供了丰富的配置选项，可以根据实际需求进行定制。

auto_explain 安装与配置

1. 安装

auto_explain 通常包含在 PostgreSQL 的 contrib 包中。如果你的 PostgreSQL 没有安装，可以通过以下命令安装：

Debian/Ubuntu:

sudo apt-get install postgresql-contrib

CentOS/RHEL:

sudo yum install postgresql-contrib

其他系统：

请参考对应系统的 PostgreSQL 安装文档。

2. 配置

auto_explain 的配置主要在 postgresql.conf 文件中进行。主要参数如下：

shared_preload_libraries: 需要将 auto_explain 添加到这个参数中，让 PostgreSQL 在启动时加载 auto_explain 扩展。例如：
```
shared_preload_libraries = 'auto_explain'
```
auto_explain.log_min_duration: 设置记录执行计划的最小执行时间阈值。只有执行时间超过这个阈值的 SQL 语句才会被记录执行计划。单位是毫秒（ms）。例如，设置为 1000 表示记录执行时间超过 1 秒的 SQL 语句的执行计划。建议从一个相对较大的值开始，例如1s 或者 500ms, 然后根据实际情况逐步减小。
auto_explain.log_analyze: 是否在执行计划中包含 ANALYZE 信息（实际执行时间、行数等）。建议设置为 on，以便更详细地分析执行计划。
auto_explain.log_verbose: 是否在执行计划中包含 VERBOSE 信息（更详细的输出）。根据需要设置。
auto_explain.log_buffers: 是否在执行计划中包含缓冲区使用信息（BUFFERS）。建议在需要分析 I/O 瓶颈时设置为 on。
auto_explain.log_timing: 是否在执行计划中包含每个节点的执行时间。建议设置为 on。
auto_explain.log_format: 设置执行计划的输出格式。可以是 text、xml、json 或 yaml。默认为 text。
auto_explain.log_nested_statements: 是否记录嵌套语句（例如函数或存储过程中的 SQL 语句）的执行计划。建议设置为 on，以便更全面地分析慢查询。
auto_explain.log_triggers: 是否记录触发器的执行计划。根据需要设置。

配置示例：

shared_preload_libraries = 'auto_explain'

auto_explain.log_min_duration = '1s'
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_nested_statements = on

修改完 postgresql.conf 文件后，需要重启 PostgreSQL 服务才能使配置生效。

3. 其他配置方式

除了在 postgresql.conf 中配置外，还可以通过以下方式进行配置：

会话级别： 使用 SET 命令在当前会话中设置 auto_explain 的参数。例如：
```
SET auto_explain.log_min_duration = '500ms';
```
用户级别： 使用 ALTER USER 命令为特定用户设置 auto_explain 的参数。例如：
```
ALTER USER myuser SET auto_explain.log_min_duration = '2s';
```
数据库级别： 使用 ALTER DATABASE 命令为特定数据库设置 auto_explain 的参数。例如：
```
ALTER DATABASE mydb SET auto_explain.log_min_duration = '3s';
```

auto_explain 日志分析

auto_explain 会将记录的执行计划输出到 PostgreSQL 的日志文件中。日志文件的位置和名称取决于 PostgreSQL 的配置。通常情况下，日志文件位于 PostgreSQL 数据目录下的 log 子目录中。

下面是一个 auto_explain 输出的执行计划示例（log_format = text）：

LOG:  duration: 1532.452 ms  plan:
  Query Text: SELECT * FROM mytable WHERE col1 = 1 AND col2 > 100;
  QUERY PLAN
  -------------------------------------------------------------------------------
  Index Scan using mytable_col1_col2_idx on mytable  (cost=0.43..8.45 rows=1 width=12)
    Index Cond: ((col1 = 1) AND (col2 > 100))
    Buffers: shared hit=5
    Execution Time: 1532.234 ms

从这个执行计划中，我们可以看到以下信息：

duration: SQL 语句的总执行时间（1532.452 毫秒）。
Query Text: SQL 语句的文本。
QUERY PLAN: 执行计划的详细信息。
Index Scan: 使用了索引扫描（mytable_col1_col2_idx）。
Index Cond: 索引条件 ((col1 = 1) AND (col2 > 100))。
Buffers: 缓冲区命中情况（shared hit=5）。
Execution Time: 节点执行时间(1532.234 ms)。

通过分析这些信息，我们可以找出慢查询的瓶颈所在。例如，如果发现某个节点的执行时间特别长，或者缓冲区命中率很低，就可以针对性地进行优化。

常见执行计划节点：

Seq Scan: 全表扫描。通常表示没有使用索引，或者索引不合适。
Index Scan: 索引扫描。表示使用了索引。
Index Only Scan: 仅索引扫描。表示只扫描了索引，没有访问表数据。
Bitmap Heap Scan: 位图堆扫描。通常与位图索引扫描一起使用。
Bitmap Index Scan: 位图索引扫描。使用位图索引进行扫描。
Hash Join: 哈希连接。用于连接两个表。
Merge Join: 归并连接。用于连接两个已排序的表。
Nested Loop: 嵌套循环连接。用于连接两个表，通常在其中一个表较小的情况下使用。
Sort: 排序操作。
Aggregate: 聚合操作（例如 SUM、AVG、COUNT 等）。
Limit: 限制返回的行数。
...

auto_explain 实战案例

下面通过几个实战案例来演示如何使用 auto_explain 进行慢查询优化。

案例 1：未使用索引

假设有一张名为 orders 的表，其中包含 order_id、customer_id、order_date 等字段。现在要查询某个客户在特定日期之后的所有订单：

SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';

如果 orders 表在 customer_id 和 order_date 字段上没有合适的索引，auto_explain 可能会输出以下执行计划：

LOG:  duration: 2345.678 ms  plan:
  Query Text: SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';
  QUERY PLAN
  -------------------------------------------------------------------------------
  Seq Scan on orders  (cost=0.00..12345.67 rows=1000 width=256)
    Filter: ((customer_id = 123) AND (order_date > '2023-01-01'::date))
    Buffers: shared hit=100 read=12245
    Execution Time: 2345.456 ms

从执行计划中可以看到，Seq Scan 表示进行了全表扫描，Buffers: shared hit=100 read=12245 表示读取了大量的数据块。这是典型的未使用索引的情况。

优化方案：

在 customer_id 和 order_date 字段上创建联合索引：

CREATE INDEX idx_orders_customer_id_order_date ON orders (customer_id, order_date);

创建索引后，再次执行相同的查询，auto_explain 可能会输出以下执行计划：

LOG:  duration: 12.345 ms  plan:
  Query Text: SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';
  QUERY PLAN
  -------------------------------------------------------------------------------
  Index Scan using idx_orders_customer_id_order_date on orders  (cost=0.43..8.45 rows=10 width=256)
    Index Cond: ((customer_id = 123) AND (order_date > '2023-01-01'::date))
    Buffers: shared hit=12
    Execution Time: 12.123 ms

可以看到，现在使用了 Index Scan，执行时间大大缩短，缓冲区读取次数也减少了很多。

案例 2：索引选择不当

假设 orders 表在 order_date 字段上有一个单独的索引，而在 customer_id 字段上没有索引。现在执行以下查询：

SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';

auto_explain 可能会输出以下执行计划：

LOG:  duration: 567.890 ms  plan:
  Query Text: SELECT * FROM orders WHERE customer_id = 123 AND order_date > '2023-01-01';
  QUERY PLAN
  -------------------------------------------------------------------------------
  Bitmap Heap Scan on orders  (cost=4.32..1234.56 rows=100 width=256)
    Recheck Cond: (order_date > '2023-01-01'::date)
    Filter: (customer_id = 123)
    Heap Blocks: exact=500
    Buffers: shared hit=512
    ->  Bitmap Index Scan on idx_orders_order_date  (cost=0.00..4.30 rows=100 width=0)
          Index Cond: (order_date > '2023-01-01'::date)
          Buffers: shared hit=12
    Execution Time: 567.678 ms

从执行计划中可以看到，虽然使用了 idx_orders_order_date 索引，但是由于还需要过滤 customer_id，所以进行了 Bitmap Heap Scan。这表明 order_date 上的单独索引并不是最优的选择。

优化方案：

仍然是在 customer_id 和 order_date 字段上创建联合索引：

CREATE INDEX idx_orders_customer_id_order_date ON orders (customer_id, order_date);

创建联合索引后，执行计划会变为 Index Scan，如案例 1 所示。

案例3: 统计信息过期

如果表的统计信息过时，PostgreSQL 的查询优化器可能会生成次优的执行计划。假设我们很久没有对 orders 表执行 ANALYZE 操作了。

SELECT * FROM orders WHERE order_amount > 1000;

auto_explain 输出可能如下:

LOG: duration: 892.551 ms plan:
 Query Text: SELECT * FROM orders WHERE order_amount > 1000;
 QUERY PLAN
 -------------------------------------------------------------------------------
 Seq Scan on orders (cost=0.00..18334.00 rows=1 width=256)
 Filter: (order_amount > 1000)
 Buffers: shared hit=10 read=18324
 Execution Time: 892.321 ms

尽管 order_amount 上可能有索引，但是优化器仍然选择了全表扫描。因为统计信息过时，优化器错误地估计了过滤条件 order_amount > 1000 返回的行数。

优化方案：

ANALYZE orders;

手动执行 ANALYZE 命令，更新 orders 表的统计信息。再次执行查询，auto_explain 的输出可能变成：

LOG: duration: 5.123 ms plan:
 Query Text: SELECT * FROM orders WHERE order_amount > 1000;
 QUERY PLAN
 -------------------------------------------------------------------------------
 Index Scan using idx_orders_order_amount on orders (cost=0.43..8.45 rows=10 width=256)
 Index Cond: (order_amount > 1000)
 Buffers: shared hit=15
 Execution Time: 4.987 ms

更新统计信息后，优化器选择了正确的索引扫描，执行时间显著降低。

总结

auto_explain 是 PostgreSQL 中一个非常有用的扩展，可以帮助你自动记录慢查询的执行计划，方便你分析和优化 SQL 语句。通过合理配置 auto_explain，并结合实际的执行计划进行分析，你可以快速定位慢查询的瓶颈所在，并采取相应的优化措施，提高数据库的性能。

希望今天的分享对大家有所帮助。如果你在使用 auto_explain 的过程中遇到任何问题，或者有其他关于 PostgreSQL 数据库的问题，欢迎在评论区留言，我会尽力解答。咱们下期再见！

码农DBA PostgreSQL 慢查询 auto_explain

PostgreSQL 慢查询调优利器：auto_explain 扩展详解与实战

什么是 auto_explain？

为什么要用 auto_explain？

auto_explain 安装与配置

1. 安装

2. 配置

3. 其他配置方式

auto_explain 日志分析

auto_explain 实战案例

总结

评论点评