
Why is my Postgres p99 latency spiking during traffic bursts even though CPU looks fine?
Most teams don’t discover their Postgres latency problem at idle—they discover it when the pager goes off during a traffic burst, dashboards show p99 latency blowing past SLOs, and yet CPU on the database node is happily cruising at 40–60%. If that sounds familiar, you’re probably not CPU-bound at all. You’re running into a mix of I/O, concurrency, and connection management bottlenecks that only show up under bursty load.
This guide breaks down the most common reasons your Postgres p99 latency spikes during traffic bursts while CPU looks fine, how to confirm each one with metrics, and practical ways to fix them. I’ll also show where a fast memory layer like Redis fits in when Postgres alone can’t deliver sub-millisecond responses for hot, read-heavy, or AI-style workloads.
The Quick Overview
- What It Is: A deep dive into latency failure modes in Postgres—especially why p95/p99 get ugly under bursts even when CPU isn’t saturated—and how to reduce tail latency using tuning, architecture changes, and a Redis-powered fast memory layer.
- Who It Is For: Backend engineers, SREs, and platform teams who own API latency SLOs and manage Postgres in production (self‑managed, RDS, Cloud SQL, Aurora, etc.).
- Core Problem Solved: Understanding and fixing the gap between “Postgres looks healthy” and “p99 latency is breaking the app” during traffic spikes.
Quick Answer: Your Postgres p99 latency usually spikes during bursts not because the CPU is saturated, but because you’re blocked on disk I/O, locks, connection thrash, or queueing. Fixes include reducing concurrent blocking, controlling max connections, managing queue depth, and offloading hot paths to a fast memory layer like Redis.
How Postgres p99 latency really behaves
Average and p50 latency tend to look fine up to a point; what kills you is the tail—the slowest 1–5% of queries. During bursts, a few things combine:
- More concurrent work → longer wait on locks and I/O
- Queueing in the connection pool or server → p99 skyrockets while p50 barely moves
- Background maintenance (autovacuum, checkpoints) → introduces jitter and I/O pressure
- Application behavior changes under burst (N+1 queries, slow transactions, retries)
Because these are mostly about wait time, they don’t always drive CPU to 90–100%. You can be I/O‑bound, lock‑bound, or connection‑bound while CPU looks “fine.”
A practical way to think about it:
- If CPU is low but p99 is high, ask: “What are my queries waiting on?” not “Do I need a bigger instance?”
Common root causes when CPU looks fine
1. Disk I/O is your real bottleneck
Postgres is designed to be durable and consistent. That means:
- WAL (write‑ahead log) fsyncs
- Random reads for cache misses
- Checkpoints flushing dirty pages
When your workload bursts, more pages are touched and more writes accumulate. If your working set doesn’t fit comfortably in memory, the storage layer becomes the choke point.
Symptoms:
- CPU 30–60%, but:
- Long
read_io_waitorwrite_io_waitreported in cloud metrics - Disk throughput and IOPS near provisioned limits (EBS, PD, etc.)
- Long
pg_stat_statementsshows some queries occasionally taking 100–500 ms while usually <10 ms.iostat/ cloud disk metrics show high queue depth and high avg I/O latency.
How to confirm:
Check on the DB node (self-managed) during a burst:
iostat -x 1
Watch for:
awaitconsistently > 5–10 msutilnear 100%- Large
avgqu-sz
On managed services, look at your disk latency and IOPS limits; if they flatline during bursts, you’re I/O-bound.
Fixes:
- Move to faster storage (e.g., provisioned IOPS, local SSD) or increase IOPS.
- Increase instance RAM so more of your working set fits in memory.
- Clean up unnecessary indexes and bad query plans that amplify random I/O.
- Keep hot, high‑QPS reads in Redis as a fast memory layer:
- Session data
- Product catalog metadata
- Feature flags
- Precomputed aggregates
Redis keeps that hot data in memory and serves it with sub‑millisecond latency, so Postgres isn’t forced to hit disk for every burst of traffic.
2. Connection storms and pool misconfiguration
Postgres doesn’t like thousands of concurrent backend processes. Each connection has overhead; too many connections increase context switching and contention, and when backends are busy, new requests are forced to wait in queues (poolers, driver) instead of using CPU.
Symptoms:
- CPU under 70–80%
- Very high connection count:
SELECT count(*) FROM pg_stat_activity;shows hundreds or thousands of rows.
- Application logs show:
- “timeout waiting for connection from pool”
- spiky request latency even though DB isn’t pegged
- p99 duration seen mostly in “acquiring DB connection”, not query execution.
How to confirm:
Check Postgres connections:
SELECT
count(*) AS total,
state,
wait_event_type,
wait_event
FROM pg_stat_activity
GROUP BY state, wait_event_type, wait_event
ORDER BY total DESC;
Check your pooler settings:
- For Java: HikariCP
maximumPoolSize,connectionTimeout - For Node:
maxandconnectionTimeoutMillis - For PG Bouncer / Pgpool:
max_client_conn,default_pool_size
If the pool is frequently at max and your app waits for a free connection, that wait time hits your p99.
Fixes:
- Use a single external pooler (e.g., PgBouncer in transaction mode) instead of many per-service pools.
- Right-size max connections:
- Reduce Postgres
max_connectionsto something reasonable (e.g., 200–400), and use PgBouncer to fan in thousands of app connections.
- Reduce Postgres
- Tune pool timeout: better to fail fast than let requests sit in a queue for hundreds of ms.
- Offload read‑heavy endpoints to Redis:
- Serve hot reads from Redis Cloud or Redis Software.
- Only hit Postgres for writes or cache misses.
- This reduces concurrent DB connections during bursts.
3. Lock contention and long transactions
Postgres maintains strong transactional guarantees. When one transaction holds a row/page lock longer than usual, other transactions pile up behind it. Under bursts, small delays become queues quickly.
Symptoms:
- p95 is okay, p99/99.9 are terrible
- Spiky latencies correlated with:
- batch jobs
- background cron tasks
- “admin” API calls updating lots of rows
pg_locksshows multiple queries waiting on locks.
How to confirm:
Check for waiting queries:
SELECT
now() - query_start AS running_for,
pid,
state,
wait_event_type,
wait_event,
query
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY running_for DESC
LIMIT 20;
And see who’s blocking:
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query,
blocked.wait_event_type,
blocked.wait_event
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocked.pid <> blocking.pid
JOIN pg_locks blocked_locks
ON blocked_locks.pid = blocked.pid
JOIN pg_locks blocking_locks
ON blocking_locks.pid = blocking.pid
WHERE NOT blocked_locks.granted
AND blocking_locks.granted
AND blocked_locks.locktype = blocking_locks.locktype
AND blocked_locks.relation = blocking_locks.relation;
Fixes:
- Shorten transactions:
- avoid long‑running explicit transactions in API calls
- keep “read → compute → write” as tight as possible
- Break large updates/deletes into smaller batches.
- Add appropriate indexes to avoid row-level lock escalation patterns.
- Move hot-path state (sessions, ephemeral counters, queues) out of Postgres and into Redis data structures:
- Use Redis stream/queue structures for workflows.
- Use counters in Redis for rate limiting or metrics.
- This reduces the amount of transactional work that can block table rows.
4. Autovacuum and bloat side effects
Autovacuum keeps your tables and indexes healthy, but when it falls behind you get:
- Table and index bloat → more pages to scan → longer queries under load.
- Bigger tables mean more I/O and more cache misses.
- When autovacuum finally kicks in on a bloated table, it can amplify I/O jitter.
Symptoms:
- p99 latency bad on a subset of queries hitting a specific table.
pg_stat_user_tablesshows highn_dead_tup.- Autovacuum runs for a long time on large relations.
How to confirm:
Check for bloat signals:
SELECT
schemaname,
relname,
n_dead_tup,
last_autovacuum,
last_vacuum,
n_live_tup
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 20;
Watch autovacuum activity:
SELECT
pid,
now() - xact_start AS running_for,
query
FROM pg_stat_activity
WHERE query ILIKE '%autovacuum%';
Fixes:
- Tune autovacuum to be more aggressive on hot tables:
- lower
autovacuum_vacuum_scale_factor - set per-table autovacuum settings for critical tables.
- lower
- Run manual
VACUUM (FULL)orREINDEXduring maintenance windows for heavily bloated indices. - Avoid storing high-churn ephemeral data in Postgres:
- Sessions, carts, temp tokens → Redis key–value or hashes.
- Event streams → Redis streams or a dedicated log system.
- By moving high-churn hot data out of Postgres, autovacuum has less work and is less likely to fall behind.
5. Query plans that collapse under skewed or bursty traffic
The query plan that works fine at steady-state can become terrible when the data distribution changes or the cache is cold. During bursts you might:
- Hit paths that are rarely used (e.g., “filter by promo code AND region AND category”).
- See planner pick nested loop joins that blow up with certain parameter values.
- Suffer from stale statistics.
Symptoms:
- One or a few queries responsible for most of the p99 tail.
- Those queries usually fast, but occasionally 10–100x slower under load.
How to confirm:
Enable and inspect pg_stat_statements:
SELECT
query,
calls,
mean_time,
max_time,
stddev_time
FROM pg_stat_statements
ORDER BY max_time DESC
LIMIT 20;
Then EXPLAIN (ANALYZE, BUFFERS) on problematic queries with typical “slow” parameters.
Fixes:
- Create appropriate indexes (covering indexes, partial indexes).
- Analyze tables more frequently; adjust
default_statistics_targetif needed. - Avoid query patterns that force sequential scans on large tables during bursts.
- Cache expensive derived data or semantic results in Redis:
- Precompute aggregates per account / tenant, store in Redis JSON or hashes.
- For AI use cases, use Redis as a vector database + semantic cache so expensive LLM-based fetches aren’t re-run on each burst.
6. Queueing in the app tier, not the database
Sometimes the DB is fine; it’s your application that’s queueing:
- Worker queues build up, causing “time to start work” to explode.
- Thread pools saturated.
- Backpressure not applied, so requests pile up.
Because the DB gets fewer actual active queries per second than you think, CPU stays low—but end‑user latency is awful.
Symptoms:
- DB metrics look okay (latency, CPU, active sessions).
- App metrics show:
- high time in “waiting for worker”
- high request concurrency
- p99 latency spikes correlate with spikes in app queue length, not DB metrics.
Fixes:
- Add proper backpressure and queue size limits.
- Scale horizontally at the app layer.
- Move latency-critical reads to Redis:
- Redis Cloud or Redis Software can serve thousands to millions of ops/sec with sub‑ms latency.
- Postgres is then used for durable writes and less time-sensitive reads.
Instrumentation: measuring p99 correctly
You can’t fix tail latency you can’t see. Two key practices:
- Histograms, not averages
Use histograms for DB latency, similar to Redis’s v2 Prometheus metrics that expose latency histograms for read/write commands and use histogram_quantile() for p99/p99.9.
In Prometheus/Grafana, you’d typically do something like:
histogram_quantile(
0.99,
sum(rate(http_request_duration_seconds_bucket{service="api", endpoint=~"/users.*"}[5m]))
by (le)
)
For Postgres, use your driver / ORM metrics to emit histograms per query type or per DB pool.
- Separate DB time vs app time
Instrument:
- time to get a DB connection
- query execution time
- time spent in business logic / serialization
This separation is crucial to distinguish “waiting for pool” vs “Postgres is slow” vs “app is doing too much work.”
Where Redis fits: protecting Postgres from burst load
When your latency SLOs are tight and traffic bursts are non‑negotiable (product launches, campaigns, recommendation calls, AI agents), Postgres alone often can’t deliver sub‑millisecond p99 without heroic tuning.
A fast memory layer solves this by absorbing burst traffic on hot data:
- Read-through or write-through caching:
- Keep hot rows and aggregates in Redis Cloud / Redis Software.
- Use Postgres as source of truth; Redis as speed layer.
- Real-time features:
- Session data, leaderboards, rate limiting, ephemeral events modeled using Redis data structures.
- Avoids transactional overhead on Postgres.
- AI workloads:
- Use Redis as a vector database + semantic search for embeddings.
- Use Redis LangCache as a fully managed semantic cache to:
- lower LLM costs
- reduce latency for repeated / similar queries
- Data freshness:
- Instead of pure cache-aside (which can go stale under burst), use CDC-style sync tools (e.g., Redis Data Integration) to keep Redis in near-real-time sync with Postgres.
This approach means that when traffic spikes:
- Redis handles the majority of hot reads at sub‑ms speed.
- Postgres handles a much smaller, more stable volume of writes and cold reads.
- p99 for end-user APIs stays low, even though Postgres itself might still have occasional slow queries.
Practical checklist: debugging Postgres p99 spikes during bursts
When you see p99 latency spiking while CPU looks fine, run through this in order:
- Check connection count vs instance size
SELECT count(*) FROM pg_stat_activity;- Are you using PgBouncer? Are pools right-sized?
- Look at disk metrics
- Are IOPS / throughput maxed?
- Is average I/O latency high during bursts?
- Inspect lock waits
pg_stat_activitywithwait_event_type='Lock'- Identify and fix long-running or blocking queries.
- Review autovacuum and bloat
- High dead tuples? Long-running autovacuum on hot tables?
- Identify outlier queries
- Use
pg_stat_statementsto sort bymax_timeandstddev_time.
- Use
- Separate DB vs app wait
- Is time spent waiting for a DB connection, or inside the query?
- Offload hot paths
- Move high-QPS, low-cardinality, or expensive reads to Redis.
- Use Redis for sessions, queues, counters, AI retrieval.
Summary
Your Postgres p99 latency is spiking during traffic bursts even though CPU looks fine because you’re hitting wait states—I/O, locks, connection queues, and app-level queueing—not a raw CPU limit. Under burst load, those waits compound and tail latency explodes while average metrics look deceptively okay.
The durable fix is a mix of:
- better pooling and connection limits,
- disk and autovacuum tuning,
- query and schema improvements,
- and architectural changes that move bursty, hot, or ephemeral workloads into a fast memory layer.
Redis—whether Redis Cloud, Redis Software on‑prem, or Redis Open Source—provides that memory layer. It keeps hot data, real‑time features, and AI retrieval in memory with sub‑millisecond latency, while Postgres focuses on being a reliable system of record. When you pair the two and instrument them with proper histograms and p99/p99.9 analysis, you can keep your latency SLOs intact even during the nastiest traffic bursts.