Fix PostgreSQL 'canceling statement due to conflict with recovery' on Standby Server

The Error

You run a query on a PostgreSQL standby server. It runs for a minute, then dies:

ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.

This only happens on standbys — never on the primary. If you're seeing it on a read replica in production, here's exactly what's happening and how to fix it.

Root Cause

PostgreSQL streaming replication replays WAL (Write-Ahead Log) records from the primary onto the standby. A WAL record occasionally requires the standby to clean up old row versions — dead tuples — that your query still needs to read.

PostgreSQL faces a choice: delay WAL replay (building replication lag) or cancel your query. It picks the query by default. The grace period before cancellation is controlled by max_standby_streaming_delay, which ships at just 30 seconds.

The usual suspects: long-running reports, nightly analytics jobs, or any OLAP query that scans millions of rows on the standby.

Fix 1: Increase the Conflict Delay (Quickest Fix)

Give PostgreSQL more time before it pulls the trigger on conflicting queries. Edit postgresql.conf on the standby server:

# postgresql.conf on standby
max_standby_streaming_delay = 300s   # default: 30s — bump to 5 minutes
max_standby_archive_delay = 300s     # also raise if you use archive recovery

Reload without restarting:

sudo -u postgres psql -c "SELECT pg_reload_conf();"

Or via systemd:

sudo systemctl reload postgresql

Setting this to -1 tells PostgreSQL to wait indefinitely — WAL replay pauses until your query finishes. That's fine when a few minutes of replication lag is acceptable; it's a bad idea if your standby is also serving failover.

Fix 2: Enable hot_standby_feedback (Best for Read-Heavy Standbys)

This one attacks the root cause rather than the symptom. With hot_standby_feedback = on, the standby continuously tells the primary which transaction IDs it's still using. The primary holds off vacuuming those rows until the standby is done.

Edit postgresql.conf on the standby server:

# postgresql.conf on standby
hot_standby_feedback = on

Reload config:

sudo -u postgres psql -c "SELECT pg_reload_conf();"

The catch: the primary delays autovacuum for rows the standby references. If your standby runs queries spanning 30+ minutes, table bloat on the primary can grow fast. Watch pg_stat_user_tables.n_dead_tup on the primary after enabling this — a jump in dead tuples is your warning sign.

Fix 3: Set Timeouts on the Application Side

Sometimes the conflict comes from a transaction left open — a connection that started a transaction, ran a query, then stalled waiting on application logic. Set limits before running anything expensive:

-- At the start of your session
SET idle_in_transaction_session_timeout = '5min';
SET statement_timeout = '10min';

-- Then run your query
SELECT * FROM large_analytics_table WHERE created_at > NOW() - INTERVAL '30 days';

This won't stop WAL conflicts, but it prevents abandoned transactions from making the problem worse.

Fix 4: Retry Logic in Application Code

For apps that query the standby directly, catch the error and retry. Most conflict cancellations are transient — a brief pause is all it takes:

import psycopg2
from psycopg2 import OperationalError
import time

def query_with_retry(conn, sql, retries=3, delay=2):
    for attempt in range(retries):
        try:
            cur = conn.cursor()
            cur.execute(sql)
            return cur.fetchall()
        except OperationalError as e:
            if 'canceling statement due to conflict with recovery' in str(e):
                conn.rollback()
                if attempt < retries - 1:
                    time.sleep(delay)
                    continue
            raise
    raise Exception("Max retries exceeded")

Fix 5: Move Long Queries Off the Standby

Sometimes the real answer is topology. Route heavy analytics queries to a dedicated replica that sits outside your hot-standby pool — or schedule them against the primary during off-peak hours.

For serious reporting workloads (think multi-hour aggregations), a logical replica or a purpose-built analytics database like Redshift or BigQuery removes the problem entirely. Streaming standbys weren't designed for that kind of query pattern.

Verification Steps

Once you've applied a fix, confirm it actually stuck.

Check current settings on the standby:

sudo -u postgres psql -c "SHOW max_standby_streaming_delay;"
sudo -u postgres psql -c "SHOW hot_standby_feedback;"

Watch replication lag on the primary — especially after enabling hot_standby_feedback or bumping the delay. You don't want the standby drifting 10 minutes behind:

-- Run on PRIMARY
SELECT
  application_name,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

Re-run the query that was failing. If it completes cleanly, you're done.

Tail the standby logs to confirm no remaining conflict messages:

sudo tail -f /var/log/postgresql/postgresql-*.log | grep -i conflict

Which Fix Should You Use?

Fastest, lowest risk: Raise max_standby_streaming_delay to 300–600s. Five minutes of grace period eliminates most query cancellations.
Read-heavy standbys: Turn on hot_standby_feedback, then keep an eye on primary table bloat.
Best combined approach: hot_standby_feedback = on plus max_standby_streaming_delay = 300s as a safety net. This covers edge cases where the feedback signal arrives too late.
Heavy analytics: Dedicated replica or a separate analytics DB — don't fight the replication model.

For the majority of setups, hot_standby_feedback = on with max_standby_streaming_delay = 300s on the standby clears the error for good.