Reading EXPLAIN ANALYZE Without Pretending It's Obvious
Jeff Straney·
I had a query taking twelve seconds on a table with two million rows. I ran EXPLAIN ANALYZE. I stared at the output for twenty minutes trying to figure out why it felt like reading a stack trace in a language I didn't write. The row counts were high, the cost numbers were big, I figured the problem was obvious and I wasn't seeing it.
The actual problem was that the planner chose a hash join over a nested loop for a query where the inner table had a very selective index that would have made the nested loop fast. The row count estimates were wrong because the statistics on one column were stale. The planner didn't know what I knew about the data, and I didn't know how to tell it.
That was when I decided to actually learn what EXPLAIN ANALYZE output means, rather than pattern-matching on numbers that look bad.
What the output is actually showing you
EXPLAIN ANALYZE gives you two things: the plan the planner chose, and what actually happened when it executed. The plan is the tree of operations. The "actually happened" part is the numbers with actual in them.
The plan reads from inside out. The innermost node executes first. If you see:
Hash Join (cost=234.50..1890.23 rows=4821 width=64) (actual time=12.3..89.4 rows=5012 loops=1)
Hash Cond: (orders.customer_id = customers.id)
-> Seq Scan on orders (cost=0.00..890.45 rows=45000 width=32) (actual time=0.1..34.2 rows=45000 loops=1)
-> Hash (cost=145.20..145.20 rows=7144 width=32) (actual time=11.8..11.8 rows=7144 loops=1)
Buckets: 8192 Batches: 1 Memory Usage: 452kB
-> Seq Scan on customers (cost=0.00..145.20 rows=7144 width=32) (actual time=0.1..6.3 rows=7144 loops=1)
The inner nodes, the two seq scans, run first. The hash of customers gets built in memory. Then the orders table gets scanned and each row is checked against the hash.
Seq scans are not always wrong
The first instinct when you see a sequential scan is to add an index. Sometimes that's right. Sometimes it isn't.
A sequential scan on a small table is almost always correct. If customers has 7,000 rows, scanning all of them might be faster than maintaining and using an index. PostgreSQL does this math. If you have a table with a hundred thousand rows and you're pulling 60% of them, a sequential scan is probably still the right call. Indexes help when you're being selective: pulling a small fraction of the rows.
If you see a sequential scan on a large table and you're returning a small number of rows, that's worth investigating. If you see a sequential scan on a table with an index that should cover your query, check whether the index is actually being used, and if not, why.
Cost estimates and what they're measuring
The cost= numbers are not in milliseconds. They're in arbitrary planner units, loosely the cost of a single page read. What matters is the ratio between the estimate and the actual.
(cost=234.50..1890.23) means startup cost and total cost. For a query you care about, the total cost is what matters. The startup cost matters for subqueries and nested loops.
If your estimated rows and actual rows are very different, that's the planner's statistics being wrong. For example:
Seq Scan on events (cost=0.00..2234.00 rows=100 width=48) (actual time=0.1..145.2 rows=84000 loops=1)
The planner thought it would get 100 rows. It got 84,000. This is a statistics problem. The planner made decisions based on wrong assumptions about the data distribution. Run ANALYZE on the table and recheck.
Hash join vs. nested loop
When the planner chooses between join strategies, it's estimating what will be cheaper based on the row counts it believes to be true.
A nested loop is fast when the inner side is small and has an index. It executes the inner side once per outer row. If the inner side has 100 rows and you're joining 1,000 outer rows, that might be fine. If the inner side has a million rows, you're in trouble.
A hash join builds a hash table of the smaller side, then probes it with the larger side. It's good for medium-to-large datasets where neither side is tiny. It uses memory.
When the planner chooses a hash join for a query where you'd expect a nested loop, check the row count estimates. If it thinks the inner table is large when it's actually small, it may be avoiding a nested loop that would be fast. Running ANALYZE often fixes this.
When the planner is genuinely wrong
Statistics go stale. A table that had even data distribution six months ago might have very skewed data today. The statistics that ANALYZE captures are sampled, not exact, which means unusual distributions can fool them.
Three situations where the planner will get it wrong and you'll need to intervene.
First: data that is highly skewed by a value that wasn't common when statistics were last collected. If 90% of your orders have status = 'completed' and the planner doesn't know that, it will over-estimate the selectivity of a filter on status.
Second: a table that has grown dramatically. Statistics are stale by nature on fast-growing tables unless autovacuum is keeping up with them. The pg_stat_user_tables view will tell you when last_analyze was.
Third: correlated columns. PostgreSQL's statistics treat columns independently by default. If city = 'Pittsburgh' almost always implies state = 'PA', the planner doesn't know that and will under-estimate how selective the combined filter is. Extended statistics (CREATE STATISTICS) can help here if you're running PostgreSQL 14 or above.
What to actually do with this
Run EXPLAIN ANALYZE on the slow query. Look for the biggest divergence between estimated rows and actual rows. That is usually where the problem lives.
If estimates are way off: run ANALYZE tablename and try again. If the estimates are right but the query is still slow: look at what operation is taking the most actual time and why. If there's a sequential scan on a large table where you're returning a small number of rows: check whether a missing or unused index is to blame.
The plan output is telling you what the planner knows and what it decided based on that knowledge. Most of the time, when the decision is wrong, the knowledge was wrong first. Finding where the knowledge diverges from reality is the actual diagnostic skill. The rest follows.
