How to Fix PostgreSQL 'FATAL: the database system is starting up' Error

beginner๐Ÿ˜ PostgreSQL2026-04-13| PostgreSQL (all versions), Linux (Ubuntu, CentOS), Docker, Kubernetes

Error Message

FATAL: the database system is starting up
#postgresql#devops#database-recovery#docker#troubleshooting

TL;DR: The Quick Fix

Seeing this error means your PostgreSQL service is running but isn't ready for traffic yet. Usually, the engine is busy replaying Write-Ahead Logs (WAL) to ensure your data is consistent after a crash or a sudden restart.

  • Be patient: Most of the time, the database just needs a minute to finish its internal checks.
  • Watch the logs: Run tail -f /var/log/postgresql/postgresql-15-main.log to track the recovery progress in real-time.
  • Test readiness: Use the pg_isready utility to check status without cluttering your logs with failed connection attempts.

Why This Error Happens

PostgreSQL enters a protective recovery mode during startup. It scans the pg_wal directory (or pg_xlog in legacy versions) for committed transactions that haven't been saved to the main data files. This safety measure triggers after hard reboots, power failures, or when the postmaster process is killed abruptly with kill -9.

The database stays locked until it reaches a "consistent state." For a 500GB database with high write volume, this process can take several minutes. If you are running a Hot Standby replica, you will also see this message until the standby has received enough WAL data from the primary to allow read-only queries.

Step 1: Monitor the Startup Progress

Instead of guessing how long it will take, check the logs. They provide specific details, such as the exact WAL location being replayed or how much work remains.

# Debian/Ubuntu systems
sudo tail -f /var/log/postgresql/postgresql-main.log

# RHEL/CentOS systems
sudo tail -f /var/lib/pgsql/data/log/postgresql.log

# Docker containers
docker logs -f <container_name>

Keep an eye out for these specific log markers:

LOG:  database system was shut down at 2023-10-27 10:00:00 UTC
LOG:  redo starts at 0/1A2B3C8
LOG:  consistent recovery state reached at 0/1A2B4D0
LOG:  database system is ready to accept connections

Step 2: Use pg_isready for Health Checks

Avoid crashing your application by using the pg_isready utility. This tool returns standard shell exit codes, making it perfect for CI/CD pipelines or startup scripts. It allows you to poll the database status silently.

pg_isready -h localhost -p 5432

The exit codes tell the story:

  • 0: Ready. The server is accepting connections.
  • 1: Busy. The server is rejecting connections (likely still starting up).
  • 2: No response. The server is down or there is a network issue.

Step 3: Implement Smart Retry Logic

In Docker or Kubernetes environments, apps often boot faster than the database. Your code should anticipate that the database might not be available the exact millisecond the container starts.

This Python example uses a basic backoff strategy to handle the "starting up" phase gracefully:

import psycopg2
import time

def connect_with_retry():
    retries = 15
    while retries > 0:
        try:
            conn = psycopg2.connect("dbname=test user=postgres password=secret host=db")
            return conn
        except psycopg2.OperationalError as e:
            if "starting up" in str(e):
                print("Postgres is still recovering. Retrying in 5 seconds...")
                time.sleep(5)
                retries -= 1
            else:
                raise e
    raise Exception("Timeout: Could not connect to PostgreSQL.")

Step 4: Tune Your Config to Speed Up Recovery

If your database takes 10+ minutes to start every time, your configuration is likely forcing too much WAL replay. You can reduce this window by adjusting how often PostgreSQL takes "checkpoints."

Open postgresql.conf and look for these settings:

# Increase checkpoint frequency to shorten recovery windows
checkpoint_timeout = 5min 
max_wal_size = 2GB
min_wal_size = 1GB

By increasing max_wal_size, you allow the database more room to breathe. However, if you want faster startups, you should ensure checkpoint_timeout isn't set to an excessively long duration like 30 minutes, which would leave a massive amount of data to be replayed after a crash.

Handling the Error in Docker/Kubernetes

In containerized setups, this error often triggers unnecessary restart loops. Use a healthcheck in your docker-compose.yml to make sure dependent services wait for the database to be fully operational.

services:
  db:
    image: postgres:15
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

  app:
    depends_on:
      db:
        condition: service_healthy

Final Verification

Once you believe the database is up, verify it with these three steps:

  • Check pg_isready and confirm it returns exit code 0.
  • Run a simple query via psql: psql -U postgres -c "SELECT 1;".
  • Scan the logs for the message: database system is ready to accept connections.

If the error persists for over 20 minutes without log activity, check your disk space with df -h. A full disk will stall the recovery process indefinitely, leaving the database stuck in the startup phase.

Related Error Notes