The Error
You run a query on a PostgreSQL standby server. It runs for a minute, then dies:
ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
This only happens on standbys โ never on the primary. If you're seeing it on a read replica in production, here's exactly what's happening and how to fix it.
Root Cause
PostgreSQL streaming replication replays WAL (Write-Ahead Log) records from the primary onto the standby. A WAL record occasionally requires the standby to clean up old row versions โ dead tuples โ that your query still needs to read.
PostgreSQL faces a choice: delay WAL replay (building replication lag) or cancel your query. It picks the query by default. The grace period before cancellation is controlled by max_standby_streaming_delay, which ships at just 30 seconds.
The usual suspects: long-running reports, nightly analytics jobs, or any OLAP query that scans millions of rows on the standby.
Fix 1: Increase the Conflict Delay (Quickest Fix)
Give PostgreSQL more time before it pulls the trigger on conflicting queries. Edit postgresql.conf on the standby server:
# postgresql.conf on standby
max_standby_streaming_delay = 300s # default: 30s โ bump to 5 minutes
max_standby_archive_delay = 300s # also raise if you use archive recovery
Reload without restarting:
sudo -u postgres psql -c "SELECT pg_reload_conf();"
Or via systemd:
sudo systemctl reload postgresql
Setting this to -1 tells PostgreSQL to wait indefinitely โ WAL replay pauses until your query finishes. That's fine when a few minutes of replication lag is acceptable; it's a bad idea if your standby is also serving failover.
Fix 2: Enable hot_standby_feedback (Best for Read-Heavy Standbys)
This one attacks the root cause rather than the symptom. With hot_standby_feedback = on, the standby continuously tells the primary which transaction IDs it's still using. The primary holds off vacuuming those rows until the standby is done.
Edit postgresql.conf on the standby server:
# postgresql.conf on standby
hot_standby_feedback = on
Reload config:
sudo -u postgres psql -c "SELECT pg_reload_conf();"
The catch: the primary delays autovacuum for rows the standby references. If your standby runs queries spanning 30+ minutes, table bloat on the primary can grow fast. Watch pg_stat_user_tables.n_dead_tup on the primary after enabling this โ a jump in dead tuples is your warning sign.
Fix 3: Set Timeouts on the Application Side
Sometimes the conflict comes from a transaction left open โ a connection that started a transaction, ran a query, then stalled waiting on application logic. Set limits before running anything expensive:
-- At the start of your session
SET idle_in_transaction_session_timeout = '5min';
SET statement_timeout = '10min';
-- Then run your query
SELECT * FROM large_analytics_table WHERE created_at > NOW() - INTERVAL '30 days';
This won't stop WAL conflicts, but it prevents abandoned transactions from making the problem worse.
Fix 4: Retry Logic in Application Code
For apps that query the standby directly, catch the error and retry. Most conflict cancellations are transient โ a brief pause is all it takes:
import psycopg2
from psycopg2 import OperationalError
import time
def query_with_retry(conn, sql, retries=3, delay=2):
for attempt in range(retries):
try:
cur = conn.cursor()
cur.execute(sql)
return cur.fetchall()
except OperationalError as e:
if 'canceling statement due to conflict with recovery' in str(e):
conn.rollback()
if attempt < retries - 1:
time.sleep(delay)
continue
raise
raise Exception("Max retries exceeded")
Fix 5: Move Long Queries Off the Standby
Sometimes the real answer is topology. Route heavy analytics queries to a dedicated replica that sits outside your hot-standby pool โ or schedule them against the primary during off-peak hours.
For serious reporting workloads (think multi-hour aggregations), a logical replica or a purpose-built analytics database like Redshift or BigQuery removes the problem entirely. Streaming standbys weren't designed for that kind of query pattern.
Verification Steps
Once you've applied a fix, confirm it actually stuck.
Check current settings on the standby:
sudo -u postgres psql -c "SHOW max_standby_streaming_delay;"
sudo -u postgres psql -c "SHOW hot_standby_feedback;"
Watch replication lag on the primary โ especially after enabling hot_standby_feedback or bumping the delay. You don't want the standby drifting 10 minutes behind:
-- Run on PRIMARY
SELECT
application_name,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
Re-run the query that was failing. If it completes cleanly, you're done.
Tail the standby logs to confirm no remaining conflict messages:
sudo tail -f /var/log/postgresql/postgresql-*.log | grep -i conflict
Which Fix Should You Use?
- Fastest, lowest risk: Raise
max_standby_streaming_delayto 300โ600s. Five minutes of grace period eliminates most query cancellations. - Read-heavy standbys: Turn on
hot_standby_feedback, then keep an eye on primary table bloat. - Best combined approach:
hot_standby_feedback = onplusmax_standby_streaming_delay = 300sas a safety net. This covers edge cases where the feedback signal arrives too late. - Heavy analytics: Dedicated replica or a separate analytics DB โ don't fight the replication model.
For the majority of setups, hot_standby_feedback = on with max_standby_streaming_delay = 300s on the standby clears the error for good.

