Fix PostgreSQL 'SSL SYSCALL error: EOF detected' When Connection Drops Unexpectedly

What Just Happened

You're running a query — or maybe just sitting idle — and suddenly the connection drops with:

SSL SYSCALL error: EOF detected
server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or during processing of the request.

Translation: the server-side SSL layer killed the TCP connection without sending a proper TLS close_notify alert. PostgreSQL got an abrupt EOF on the socket. Your app, psql, or connection pooler had zero warning — the connection just vanished mid-flight.

A clean disconnect sends a shutdown signal first. EOF doesn't. That means something at the OS level killed the connection without telling SSL about it. The usual suspects: a network blip, server resource exhaustion, an SSL cert issue, or a load balancer quietly axing idle connections.

Reproduce and Confirm the Error

Before jumping to fixes, nail down exactly when this happens. Run a quick connection test:

psql "host=your-db-host dbname=mydb user=myuser sslmode=require" -c "SELECT version();"

Fails instantly? The problem is connection establishment — likely a cert or SSL mode mismatch. Fails after sitting idle for a minute or two? You're dealing with a timeout or keepalive issue.

Check the PostgreSQL logs on the server for the matching disconnect:

sudo grep -i "ssl\|EOF\|connection" /var/log/postgresql/postgresql-*.log | tail -50

On RDS, pull logs from the AWS Console or via CLI:

aws rds download-db-log-file-portion \
  --db-instance-identifier mydb \
  --log-file-name error/postgresql.log \
  --output text

Quick Fixes — Try These First

1. Check if the server is actually running

sudo systemctl status postgresql
sudo journalctl -u postgresql -n 50 --no-pager

Linux's OOM killer silently terminating PostgreSQL is more common than you'd think — especially on servers with less than 4 GB RAM:

sudo dmesg | grep -i "oom\|killed" | tail -20

2. Enable TCP keepalives on the client side

Idle connections are vulnerable. Firewalls, NAT gateways, and load balancers that see no traffic for 60–300 seconds will silently drop the connection — and SSL registers that as an EOF. TCP keepalives send small probe packets to keep the connection warm. Set them in your connection string:

# psql connection string
psql "host=db-host dbname=mydb user=myuser \
  keepalives=1 \
  keepalives_idle=60 \
  keepalives_interval=10 \
  keepalives_count=5"

For libpq-based apps (Python psycopg2, Node pg, etc.), pass these as connection parameters:

# Python psycopg2
import psycopg2
conn = psycopg2.connect(
    host="db-host",
    dbname="mydb",
    user="myuser",
    keepalives=1,
    keepalives_idle=60,
    keepalives_interval=10,
    keepalives_count=5
)

3. Match SSL modes between client and server

Mismatched SSL expectations are a surprisingly common trigger. First, see what the server actually requires:

psql -c "SHOW ssl;"
psql -c "SELECT name, setting FROM pg_settings WHERE name LIKE 'ssl%';"

Then align your client's sslmode accordingly:

# If server has ssl=on and requires SSL:
export PGSSLMODE=require

# If server allows but doesn't require:
export PGSSLMODE=prefer

Permanent Fix — Based on Root Cause

Root Cause A: Load balancer / firewall idle timeout

AWS ALB drops connections idle for more than 60 seconds by default. RDS Proxy's default is 1,800 seconds. Nginx's upstream keepalive timeout is 60 seconds. Any of these can silently cut a connection — and PostgreSQL's SSL layer sees that as an unexpected EOF.

The fix has two parts: shorten PostgreSQL's internal timeouts to stay under the LB limit, and enable keepalives server-side so idle connections stay warm:

-- In postgresql.conf or per-user:
ALTER SYSTEM SET tcp_keepalives_idle = 60;
ALTER SYSTEM SET tcp_keepalives_interval = 10;
ALTER SYSTEM SET tcp_keepalives_count = 5;
SELECT pg_reload_conf();

Root Cause B: PgBouncer or pgpool-II cutting the connection

Connection poolers close idle server-side connections on their own schedule — which may not align with what your app expects. Check PgBouncer's server_idle_timeout:

# In pgbouncer.ini:
server_idle_timeout = 600     # seconds, default 600
server_lifetime = 3600        # max connection lifetime
client_idle_timeout = 0       # 0 = no timeout

# Reload:
psql -p 6432 pgbouncer -c "RELOAD;"

Also verify that PgBouncer's SSL config is consistent end-to-end. If your app connects to PgBouncer with sslmode=require but PgBouncer talks to PostgreSQL over plain TCP, you'll hit EOF on reconnect:

# pgbouncer.ini — both sides need to match:
server_tls_sslmode = require
client_tls_sslmode = require
client_tls_cert_file = /etc/pgbouncer/client.crt
client_tls_key_file = /etc/pgbouncer/client.key

Root Cause C: Expired or mismatched SSL certificates

Expired certs are sneaky. The SSL handshake starts fine, then collapses partway through — and the client sees an EOF instead of a descriptive error. Check the expiry date first:

# Check cert expiry on the PostgreSQL server:
openssl x509 -in /etc/postgresql/16/main/server.crt -noout -dates

# Check remotely:
openssl s_client -connect your-db-host:5432 -starttls postgres 2>/dev/null \
  | openssl x509 -noout -dates

If it's expired, regenerate and restart. For dev/test environments only:

# Self-signed (dev/test only):
openssl req -new -x509 -days 365 -nodes \
  -out /etc/postgresql/16/main/server.crt \
  -keyout /etc/postgresql/16/main/server.key
chmod 600 /etc/postgresql/16/main/server.key
chown postgres:postgres /etc/postgresql/16/main/server.*
sudo systemctl restart postgresql

Root Cause D: Server ran out of memory or file descriptors

When the server runs low on memory, Linux's OOM killer starts terminating processes — and PostgreSQL backends are fair game. The same thing happens if you hit the file descriptor limit: new connections fail, and existing ones drop without warning.

# Check current limits:
cat /proc/$(pgrep -o postgres)/limits | grep -i "open files\|max"

# Raise limits in /etc/security/limits.conf:
postgres soft nofile 65536
postgres hard nofile 65536

# Or in postgresql.service (systemd):
[Service]
LimitNOFILE=65536

Verify the Fix Worked

Don't just assume it's fixed. Hold a connection open through a full idle period and confirm it survives:

# Keep a connection open for 5 minutes and check it's still alive:
psql "host=db-host dbname=mydb keepalives=1 keepalives_idle=30" \
  -c "SELECT pg_sleep(300); SELECT 'still alive';"

# Monitor active connections during the test:
psql -c "SELECT pid, state, wait_event, query_start, state_change \
         FROM pg_stat_activity WHERE datname='mydb';"

If you get still alive back instead of the SSL EOF error, the fix held.

Tips

Working in a cloud environment adds another layer of complexity — subnets, security groups, and NACLs all affect whether keepalive probes even reach the database. If you need to sanity-check IP ranges or verify two hosts are in the same network segment, the Subnet Calculator on ToolCraft is handy for this — runs entirely in the browser, no data sent anywhere.

One more thing worth adding to your monitoring: alert when pg_stat_activity shows a spike in idle connections with a long state_change duration. That pattern usually appears 5–10 minutes before connections start dropping with EOF errors — catching it early saves a lot of scrambling.