The Error
Your transaction is running at SERIALIZABLE isolation and suddenly dies with:
ERROR: could not serialize access due to concurrent update
SQLSTATE: 40001
PostgreSQL rolls the transaction back immediately. No partial state. No warning. Just gone.
This fires when two or more concurrent transactions read and write the same rows in a pattern that can't be replicated by running them one at a time. PostgreSQL detects the conflict and aborts one of them โ intentionally. This is not a bug. It's the database enforcing a guarantee you asked for.
Why This Happens
Under the hood, PostgreSQL's SERIALIZABLE level uses Serializable Snapshot Isolation (SSI). SSI continuously tracks read/write dependencies across live transactions. When it spots a dependency cycle โ a pattern that no serial ordering could produce โ it sacrifices one transaction to break it.
Classic example of a dangerous cycle:
- Transaction A reads rows 1โ5 and writes row 6
- Transaction B reads row 6 and writes rows 1โ5
- Each transaction's write depends on the other's read โ PostgreSQL aborts one
With REPEATABLE READ, the trigger is simpler: you read a row, another transaction commits a change to that row, and then you try to write it. PostgreSQL won't let both writes stand.
Step-by-Step Fix
Step 1: Add Retry Logic (Required)
Every application using SERIALIZABLE transactions must retry on 40001. Full stop. This isn't an edge case to handle someday โ it's a fundamental contract of using this isolation level.
Python with psycopg2:
import psycopg2
from psycopg2 import OperationalError
import time
import random
def run_serializable_transaction(conn, operation, max_retries=5):
for attempt in range(max_retries):
try:
with conn.cursor() as cur:
cur.execute("BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE")
operation(cur)
conn.commit()
return # success
except OperationalError as e:
conn.rollback()
if e.pgcode == '40001': # serialization_failure
if attempt setTimeout(resolve, delay));
continue;
}
throw err;
} finally {
client.release();
}
}
}
Step 2: Reduce Conflict Surface
Retries keep your app correct, but they're expensive under heavy load. Cut down how often conflicts happen in the first place.
Keep transactions short. A transaction open for 2 seconds under 50 concurrent users is a collision waiting to happen. Move calculation and validation into application code before you open the transaction:
-- Bad: slow work inside the transaction window
BEGIN;
SELECT pg_sleep(2); -- simulating slow processing
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
-- Better: calculate values first, then write fast
BEGIN;
UPDATE accounts SET balance = 900 WHERE id = 1;
UPDATE accounts SET balance = 1100 WHERE id = 2;
COMMIT;
Access rows in a consistent order. When multiple transactions touch the same rows, always update them in the same sequence โ lowest primary key first works well. This alone can cut serialization failures by 30โ50% in write-heavy workloads:
-- Always update lower ID first
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- lower id first
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;
Lock early with SELECT FOR UPDATE. If you know you're going to write a row, lock it on the read. This turns a potential abort into a wait โ other transactions queue up rather than crash:
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- other transactions writing row 1 will now wait here
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
COMMIT;
Step 3: Question Whether You Need SERIALIZABLE
If 40001 shows up constantly, it's worth asking: do you actually need this isolation level?
-- Check current transaction isolation level
SHOW transaction_isolation;
-- Check database default
SHOW default_transaction_isolation;
-- Drop down for a session
SET default_transaction_isolation = 'read committed';
Most applications run fine on READ COMMITTED โ PostgreSQL's default โ combined with SELECT FOR UPDATE where you need row-level protection. Reserve SERIALIZABLE for genuine read-modify-write cycles spanning multiple rows where the combined result must look atomic. For straightforward balance transfers or counter updates, READ COMMITTED + explicit locks is cheaper and conflict-free.
Step 4: Monitor Serialization Failures in Production
Before tuning anything, get the baseline numbers:
-- Rollbacks and conflicts per database
SELECT datname,
xact_commit,
xact_rollback,
conflicts
FROM pg_stat_database
WHERE datname = current_database();
-- Tables with high dead tuple counts (sign of heavy update churn)
SELECT relname, n_dead_tup, n_live_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
A high xact_rollback count relative to xact_commit points directly at serialization pressure. Log retry attempts in your application too โ spikes in retry logs often reveal hot rows before your monitoring does.
Verify the Fix Works
Reproduce the error deliberately in a dev environment. Open two terminals:
-- Terminal 1: start but don't commit yet
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM accounts WHERE id = 1;
-- leave open, switch to terminal 2
-- Terminal 2: commit a conflicting write
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM accounts WHERE id = 1;
UPDATE accounts SET balance = 999 WHERE id = 1;
COMMIT; -- succeeds
-- Terminal 1: now try to write the same row
UPDATE accounts SET balance = 888 WHERE id = 1;
COMMIT; -- fails with 40001
With working retry logic, your application should silently succeed on the next attempt. Check your logs: you want to see retry attempts logged and then a successful commit โ not a 40001 bubbling up to the user.
Quick Reference
- SQLSTATE 40001 = serialization_failure โ always safe to retry
- SQLSTATE 40P01 = deadlock_detected โ also retriable, different root cause
- Always rollback before retrying โ never retry on an open, failed transaction
- Use exponential backoff with jitter (e.g. 100ms, 200ms, 400ms + random offset) to avoid retry storms under load
- Log every retry so you can find contention hot spots before they become incidents

