Fix 'MASTERDOWN Link with MASTER is down' When Redis Replica Refuses Reads

intermediate๐Ÿ”ด Redis2026-06-20| Redis 4.0+ on Linux (Ubuntu/Debian/CentOS), also affects Redis on Docker, Kubernetes, and managed services like ElastiCache

Error Message

MASTERDOWN Link with MASTER is down and replica-serve-stale-data is set to 'no'
#redis#replication#replica#masterdown

What happened

Your app was reading from a Redis replica. The replica lost its connection to the master, and now every read command is throwing:

MASTERDOWN Link with MASTER is down and replica-serve-stale-data is set to 'no'

The culprit is replica-serve-stale-data no on the replica. With that flag off, a disconnected replica refuses all commands except INFO and REPLICAOF โ€” even reads. It's a safety feature to prevent stale data from reaching clients, but it takes your app down hard if you haven't planned for it.

Why the replica lost its connection

Don't jump straight to the fix. Figure out what broke first, or you'll be back here in an hour. Common causes:

  • Master Redis process crashed or was restarted
  • Network partition between replica and master hosts
  • Master was promoted during a Sentinel failover and the replica still points to the old IP
  • Master is overloaded and not responding to PING within repl-timeout (default: 60 seconds)
  • A firewall rule blocked the replication port (default 6379)

Start by checking the replica's logs:

# On the replica host
tail -n 100 /var/log/redis/redis-server.log

# Or wherever your log is
redis-cli -h replica-host INFO replication

Look for lines like Connection with master lost, Connecting to MASTER, or MASTER abort. These tell you when the link dropped and give you a starting point for the root cause.

Quick fix: get reads working immediately

Need reads back right now? Set replica-serve-stale-data to yes at runtime โ€” no restart required:

redis-cli -h replica-host CONFIG SET replica-serve-stale-data yes

The replica immediately starts serving whatever data it has in memory, even if it's stale. Whether that's acceptable depends on what you store. For caches, a few minutes of lag is usually fine. For financial state or distributed locks, probably not.

On Redis 3.x and earlier, the config key had a different name:

# Redis 3.x and below
redis-cli CONFIG SET slave-serve-stale-data yes

Fix the actual connection problem

1. Check if master is reachable

# From the replica host, test basic connectivity
redis-cli -h master-host -p 6379 PING
# Expected: PONG

# Check if port is open
nc -zv master-host 6379

No PONG? That's a network or firewall problem. Sort that out before anything else.

2. Verify replication status

redis-cli -h replica-host INFO replication

Focus on these fields:

role:slave
master_host:192.168.1.10
master_port:6379
master_link_status:down          # This is the problem
master_last_io_seconds_ago:X
master_sync_in_progress:0

When master_link_status is down, the replica can't reach master_host:master_port. Check master_last_io_seconds_ago to see how long ago the link died โ€” useful for correlating with deployment logs or monitoring alerts.

3. Re-point the replica if the master IP changed

Sentinel failovers move the master to a new host. The replica keeps pointing at the old address. Update it:

# Redis 4.0+
redis-cli -h replica-host REPLICAOF new-master-host 6379

# Redis 3.x
redis-cli -h replica-host SLAVEOF new-master-host 6379

4. Force reconnect if master is back but link didn't recover

Occasionally the master recovers but the replica sits there stuck, not reconnecting on its own. This two-step resets the link:

redis-cli -h replica-host REPLICAOF NO ONE
redis-cli -h replica-host REPLICAOF master-host 6379

The replica briefly becomes standalone, then re-establishes replication. It triggers a full resync if the replica's data is too far behind.

Permanent fix: decide the right value for replica-serve-stale-data

The right value depends entirely on what you're storing and how wrong a stale read would actually be:

  • Set to yes (default) โ€” replica serves stale data when the master is unreachable. Good for caches where a few minutes of lag is acceptable. Your app stays up during replication gaps.
  • Set to no โ€” replica refuses all reads when disconnected. Use this only when stale reads are genuinely dangerous: session validation, distributed locks, financial state. Expect errors during any master downtime.

Persist the setting in /etc/redis/redis.conf so it survives restarts:

# In redis.conf on the replica
replica-serve-stale-data yes   # or no, based on your choice

# Apply without restart
redis-cli -h replica-host CONFIG REWRITE

Or edit the file and bounce Redis:

sudo systemctl restart redis

Also tune repl-timeout if connections are flaky

A busy master under heavy write load can take 10โ€“20 seconds to respond to a PING. If your replica keeps dropping the master link without an obvious cause, raise the timeout:

# In redis.conf
repl-timeout 120              # default is 60 seconds
repl-ping-replica-period 10  # how often replica pings master

Verification

Before marking this resolved, confirm the replica is actually synced and reads are flowing:

# Check replication link
redis-cli -h replica-host INFO replication | grep master_link_status
# Expected: master_link_status:up

# Confirm reads work
redis-cli -h replica-host GET some-key

# Confirm the config value stuck
redis-cli -h replica-host CONFIG GET replica-serve-stale-data

On Redis 3.x, use CONFIG GET slave-serve-stale-data instead.

Write a test key on the master and read it back from the replica to confirm replication is actually flowing end-to-end:

redis-cli -h master-host SET test-replication ok
redis-cli -h replica-host GET test-replication
# Expected: "ok"

If you're using ElastiCache or a managed Redis

You can't touch redis.conf on ElastiCache. Open the AWS console, find the parameter group attached to your cluster, set replica-serve-stale-data, and save. On Redis 6.x and above, the change applies without a reboot. On older versions, a cluster node reboot is required โ€” expect a brief read interruption during the restart.

Related Error Notes