Slow PostgreSQL queries are rarely a Postgres problem — they are almost always an application design problem. Understanding how PostgreSQL's query planner makes decisions, which indexes it can use and when, and how connection overhead accumulates at scale gives you the tools to systematically eliminate performance bottlenecks rather than guessing at them.
This article walks through the full diagnostic and optimization stack, from reading EXPLAIN ANALYZE output to tuning autovacuum and partitioning time-series tables.
Table of Contents
- How PostgreSQL's Query Planner Works
- Choosing the Right Index Type
- Partial Indexes and Expression Indexes
- Eliminating N+1 Queries with JOINs and CTEs
- Connection Pooling with PgBouncer
- VACUUM and Autovacuum Tuning
- Table Partitioning for Time-Series Data
- Materialized Views for Expensive Aggregations
- Using pg_stat_statements to Find Slow Queries
- A Real-World Diagnostic Checklist
How PostgreSQL's Query Planner Works
When you execute a query, PostgreSQL's planner generates multiple candidate execution plans and picks the one with the lowest estimated cost. Cost is measured in arbitrary units where a sequential disk page read costs 1.0 by default (seq_page_cost) and a random page read costs 4.0 (random_page_cost).
EXPLAIN ANALYZE shows you the plan that was chosen and the actual runtime measurements:
EXPLAIN ANALYZE
SELECT u.id, u.email, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > NOW() - INTERVAL '30 days'
GROUP BY u.id, u.email;Read the output from the innermost indented node outward — execution flows bottom-up:
HashAggregate (cost=1842.50..1867.50 rows=2500 width=48)
(actual time=23.412..24.108 rows=1821 loops=1)
-> Hash Left Join (cost=625.00..1730.00 rows=22500 width=40)
(actual time=8.201..19.334 rows=21800 loops=1)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders (cost=0.00..890.00 rows=45000 width=8)
(actual time=0.012..5.678 rows=45000 loops=1)
-> Hash (cost=594.00..594.00 rows=2500 width=40)
(actual time=7.934..7.935 rows=2500 loops=1)
-> Index Scan using idx_users_created_at on users
(cost=0.42..594.00 rows=2500 width=40)
(actual time=0.022..6.711 rows=2500 loops=1)
Index Cond: (created_at > (now() - '30 days'::interval))Key things to look for:
- Seq Scan on a large table — the planner chose a full table scan, either because no suitable index exists or because the selectivity estimate was poor.
- Estimated rows vs actual rows diverging by more than 10x — stale statistics. Run
ANALYZE table_name. - Nested Loop with high
loopscount — often indicates an N+1 pattern at the query level. - High
actual timeat a low-cost node — the cost model is wrong, usually due to misconfiguredrandom_page_coston SSDs.
For SSDs, set random_page_cost = 1.1 (or even 1.0). The default of 4.0 causes the planner to systematically prefer sequential scans over index scans on flash storage.
-- Session-level for testing
SET random_page_cost = 1.1;
-- Permanent in postgresql.conf or ALTER SYSTEM
ALTER SYSTEM SET random_page_cost = 1.1;
SELECT pg_reload_conf();Choosing the Right Index Type
PostgreSQL supports five main index types. Choosing the wrong one either provides no benefit or imposes unnecessary write overhead.
B-tree
The default. Supports equality, range, and ORDER BY. Use this for most indexed columns:
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);Multi-column B-tree indexes are ordered by the first column, then the second. The index (a, b) can serve queries filtering on a alone, or on a AND b, but not on b alone.
GIN (Generalized Inverted Index)
Designed for composite values: jsonb, tsvector (full-text search), and arrays. A GIN index on a jsonb column lets Postgres efficiently answer containment queries:
CREATE INDEX idx_products_attributes ON products USING GIN(attributes);
-- Efficiently uses the GIN index
SELECT * FROM products WHERE attributes @> '{"color": "blue"}';GIN indexes are slow to build and slow to update but fast to query. Use them for read-heavy JSONB columns and full-text search.
GiST (Generalized Search Tree)
A framework for building custom index types. Used most commonly for geometric data, range types, and full-text search when you need nearest-neighbor queries:
CREATE INDEX idx_locations_point ON locations USING GIST(coordinates);
-- KNN search — find 10 nearest locations to a point
SELECT * FROM locations
ORDER BY coordinates <-> ST_MakePoint(-73.98, 40.75)
LIMIT 10;GiST supports lossy storage (some false positives that require a recheck), which keeps index size manageable for complex data types.
BRIN (Block Range Index)
Tiny, fast to build, appropriate only for naturally ordered data like timestamps or auto-increment IDs in append-only tables:
CREATE INDEX idx_events_occurred_at ON events USING BRIN(occurred_at);A BRIN index stores min/max values per block range rather than individual row pointers. It can be 10,000x smaller than a B-tree index on the same column. The trade-off: it only helps when the column values are correlated with physical storage order. If rows are inserted out of order, BRIN is useless.
Partial Indexes and Expression Indexes
Partial Indexes
A partial index only indexes rows that match a WHERE clause. This makes the index smaller, faster to scan, and cheaper to maintain:
-- Only index unprocessed jobs — the common query case
CREATE INDEX idx_jobs_pending ON jobs(created_at)
WHERE status = 'pending';
-- The query must include the same condition to use the index
SELECT * FROM jobs WHERE status = 'pending' ORDER BY created_at LIMIT 100;This is particularly effective for status columns where most rows have a "terminal" state (completed, archived, deleted) but queries almost always filter for active records.
Expression Indexes
An expression index stores the result of a function or expression rather than a raw column value. This lets the planner use an index for queries involving function calls:
-- Without this index, lower(email) causes a seq scan
CREATE INDEX idx_users_email_lower ON users(lower(email));
-- Now uses the index
SELECT * FROM users WHERE lower(email) = lower('User@Example.com');Expression indexes also work for JSONB field extraction:
CREATE INDEX idx_orders_metadata_region
ON orders((metadata->>'region'));
SELECT * FROM orders WHERE metadata->>'region' = 'us-east';Eliminating N+1 Queries with JOINs and CTEs
An N+1 query occurs when you fetch N rows and then execute one additional query per row. It is the single most common cause of preventable database load in production applications.
The pattern looks like this in application code:
// N+1: one query for posts, then one per post for author
const posts = await db.post.findMany();
for (const post of posts) {
post.author = await db.user.findUnique({ where: { id: post.authorId } });
}At 100 posts, this is 101 queries. At 1,000 posts behind a slow network, it is catastrophic.
The fix is a single JOIN:
SELECT
p.id,
p.title,
p.created_at,
u.id AS author_id,
u.name AS author_name,
u.email AS author_email
FROM posts p
JOIN users u ON u.id = p.author_id
ORDER BY p.created_at DESC
LIMIT 100;For complex aggregations across multiple related tables, use CTEs (Common Table Expressions) to structure the query without sacrificing readability:
WITH recent_orders AS (
SELECT user_id, COUNT(*) AS order_count, SUM(total) AS lifetime_value
FROM orders
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY user_id
),
user_segments AS (
SELECT u.id, u.email, u.tier,
COALESCE(ro.order_count, 0) AS order_count,
COALESCE(ro.lifetime_value, 0) AS lifetime_value
FROM users u
LEFT JOIN recent_orders ro ON ro.user_id = u.id
)
SELECT * FROM user_segments
WHERE lifetime_value > 500
ORDER BY lifetime_value DESC;Note: CTEs in PostgreSQL 12+ are inlined by default (the planner can optimize through them). If you need to force materialization as an optimization fence, add MATERIALIZED:
WITH recent_orders AS MATERIALIZED (
SELECT ...
)Connection Pooling with PgBouncer
PostgreSQL creates a new OS process for every connection. At 500 concurrent connections, you are running 500 processes — even if most are idle. This consumes memory (roughly 5–10MB per connection) and adds process scheduling overhead.
PgBouncer sits between your application and Postgres, maintaining a small pool of actual server connections and multiplexing many application connections through them.
PgBouncer offers three pooling modes:
- Session mode — a server connection is assigned for the lifetime of the client connection. Functionally transparent but provides limited benefit.
- Transaction mode — a server connection is held only for the duration of a transaction, then returned to the pool. This is the recommended mode and supports most workloads.
- Statement mode — a server connection is held only for a single statement. Incompatible with multi-statement transactions and prepared statements.
A production-grade PgBouncer configuration for transaction mode:
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
server_idle_timeout = 600
client_idle_timeout = 0With default_pool_size = 25, PgBouncer maintains at most 25 actual Postgres connections while handling up to 1,000 application connections. Size your pool based on (number_of_cpu_cores * 2) + effective_spindle_count. For most cloud databases, 20–50 server connections is the right range regardless of application concurrency.
Be aware: transaction mode is incompatible with SET LOCAL, advisory locks, and LISTEN/NOTIFY. If you use any of these, either use session mode for those specific connections or route them around PgBouncer.
VACUUM and Autovacuum Tuning
PostgreSQL uses MVCC (Multi-Version Concurrency Control). When you update or delete a row, the old version is not immediately removed — it is marked as dead and left in place until VACUUM cleans it up. Uncleaned dead rows ("bloat") cause table and index scans to read unnecessary pages, degrading performance over time.
Autovacuum runs automatically but its default configuration is tuned for small databases. On high-write tables, it often cannot keep up.
Check bloat and autovacuum activity:
SELECT relname, n_live_tup, n_dead_tup,
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;If dead_pct is consistently above 10% on a large table, autovacuum is not keeping up. Tune it per-table with storage parameters:
ALTER TABLE orders SET (
autovacuum_vacuum_scale_factor = 0.01, -- trigger at 1% dead rows (default 20%)
autovacuum_vacuum_cost_delay = 2, -- less throttling (default 20ms)
autovacuum_vacuum_threshold = 500 -- also trigger at 500 dead rows
);For tables that receive heavy write load during off-peak hours, consider scheduling a manual VACUUM ANALYZE via pg_cron:
SELECT cron.schedule('vacuum-orders', '0 3 * * *', 'VACUUM ANALYZE orders');Table Partitioning for Time-Series Data
As append-only tables like events, logs, and audit trails grow into the hundreds of millions of rows, even indexed queries slow down because index trees become deep and vacuum cannot keep up with the entire table. Declarative table partitioning splits the table into child tables (partitions) that can be independently indexed, vacuumed, and dropped.
CREATE TABLE events (
id BIGSERIAL,
user_id BIGINT NOT NULL,
event_type VARCHAR(100) NOT NULL,
payload JSONB,
occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
) PARTITION BY RANGE (occurred_at);
-- Create monthly partitions
CREATE TABLE events_2026_01
PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE events_2026_02
PARTITION OF events
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Index each partition independently
CREATE INDEX ON events_2026_01(user_id, occurred_at);
CREATE INDEX ON events_2026_02(user_id, occurred_at);The planner uses partition pruning: a query with a WHERE occurred_at > '2026-01-01' clause only scans the relevant partitions. Dropping old data becomes a DDL operation (DROP TABLE events_2025_06) rather than a massive DELETE, which is instantaneous and generates no bloat.
Automate partition creation with pg_partman. Manual partition creation does not scale.
Materialized Views for Expensive Aggregations
A materialized view stores the result of a query on disk. Unlike a regular view (which re-executes the query on every access), a materialized view is refreshed explicitly. Use them for dashboards, reporting queries, and any aggregation that is expensive to compute and does not need real-time data:
CREATE MATERIALIZED VIEW monthly_revenue AS
SELECT
DATE_TRUNC('month', created_at) AS month,
SUM(total) AS revenue,
COUNT(*) AS order_count,
AVG(total) AS avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month DESC;
CREATE UNIQUE INDEX ON monthly_revenue(month);Refresh the view on a schedule or after relevant mutations:
-- Concurrent refresh does not lock reads
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_revenue;CONCURRENTLY requires a unique index and is slightly slower than a blocking refresh, but it allows reads to continue against the old data during the refresh. In production, always prefer CONCURRENTLY.
Using pg_stat_statements to Find Slow Queries
pg_stat_statements tracks execution statistics for every unique query shape executed against the database. Enable it in postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = allThen find your top offenders by total execution time:
SELECT
LEFT(query, 100) AS query_preview,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
round(stddev_exec_time::numeric, 2) AS stddev_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;Sort by total_exec_time rather than mean_exec_time. A query that takes 500ms but runs once a day is less impactful than a query that takes 10ms but runs 100,000 times a day.
After optimizing, reset the counters:
SELECT pg_stat_statements_reset();A Real-World Diagnostic Checklist
When a Postgres-backed application starts slowing down, work through this checklist in order:
Check pg_stat_statements for total execution time by query. Identify the top 5 queries consuming the most cumulative time.
Run EXPLAIN ANALYZE on each slow query. Look for sequential scans on large tables, row estimate errors, and nested loops with high loop counts.
Check for missing indexes with
pg_stat_user_tables. Ifseq_scancount is high on a large table, an index is likely missing.Check index usage with
pg_stat_user_indexes. Indexes withidx_scan = 0after sufficient runtime are wasting write overhead and should be dropped.Check for bloat by comparing
n_dead_tupton_live_tupinpg_stat_user_tables. High dead tuple ratios indicate autovacuum is not keeping up.Check connection counts with
SELECT count(*) FROM pg_stat_activity GROUP BY state. If idle connections are in the hundreds, deploy PgBouncer.Verify random_page_cost is set appropriately for your storage (1.1 for SSD, 4.0 for spinning disk).
Check for lock contention with
pg_stat_activityfiltered onwait_event_type = 'Lock'. Long-running transactions holding locks are a common source of cascading slowdowns.Review autovacuum settings on your highest-write tables. Per-table tuning is almost always necessary in production.
Consider partitioning if your largest tables exceed 50 million rows and are time-series in nature.
Systematic diagnosis beats guessing. The data Postgres exposes in its system views is comprehensive enough to identify the root cause of virtually any performance problem without touching application code.