Fix Liveness Probe Failed: context deadline exceeded Causing Constant Pod Restarts in Kubernetes

intermediate☸️ Kubernetes2026-05-15| Kubernetes 1.20+, any cloud provider (EKS, GKE, AKS) or on-prem cluster, Linux nodes

Error Message

Liveness probe failed: Get "http://10.0.0.1:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
#liveness-probe#health-check#pod#restart#kubernetes

The Situation

It's 2 AM. PagerDuty fires. You check the cluster and see pods restarting every 60–90 seconds. kubectl get pods shows a restart count climbing fast. You pull the events and there it is:

Warning  Unhealthy  pod/api-server-7d9f4b8c6-xk2pq  Liveness probe failed: Get "http://10.0.0.1:8080/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Kubernetes thinks your app is dead. It kills the container, restarts it, waits for startup β€” then the probe fails again before the app is ready. You're in a loop, and it won't break itself.

Why This Happens

The liveness probe sends an HTTP GET to your container. No response within timeoutSeconds (default: 1 second)? Failed. After failureThreshold consecutive failures (default: 3), Kubernetes restarts the container.

Four things cause this. Most are fixable in under 10 minutes.

  • Probe timeout too tight β€” Your app sometimes takes longer than 1 second to respond to /healthz, especially under load or during JVM/GC pauses.
  • App is slow to start β€” The probe fires before the app is ready to accept connections. initialDelaySeconds wasn't set, or was set to something optimistic like 5 seconds.
  • Resource starvation β€” The container is CPU-throttled. The health endpoint can't respond in time because the process is waiting for CPU cycles.
  • Health endpoint doing too much β€” Your /healthz route checks DB connections, runs queries, or calls external services. Fine on a quiet server; fatal under any real load.

Diagnose First

Don't touch the probe config yet. Confirm what's actually failing.

Check events and restart count

# See restart count
kubectl get pods -n your-namespace

# See probe failure events
kubectl describe pod api-server-7d9f4b8c6-xk2pq -n your-namespace | grep -A 20 Events

Check current probe config

kubectl get deployment api-server -n your-namespace -o yaml | grep -A 20 livenessProbe

Test the health endpoint from inside the pod

# Exec into the pod before it restarts
kubectl exec -it api-server-7d9f4b8c6-xk2pq -n your-namespace -- sh

# Hit the endpoint and time it
time wget -qO- http://localhost:8080/healthz

Anything over 1 second is your problem. A complete failure means your app has a bug in the health handler itself.

Check resource pressure

kubectl top pod api-server-7d9f4b8c6-xk2pq -n your-namespace
kubectl top node

CPU near the limit? Throttling is likely slowing every request β€” including health checks.

Fix 1: Tune the Probe Timing (Most Common Fix)

Kubernetes ships with probe defaults designed for fast, simple apps. A 1-second timeout and 10-second interval are too aggressive for most production services. Adjust your deployment:

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30    # Wait 30s after container starts before first probe
  periodSeconds: 15          # Check every 15s (not every 10s)
  timeoutSeconds: 5          # Allow 5s for response (not default 1s)
  failureThreshold: 3        # Still restart after 3 consecutive failures
  successThreshold: 1

Then apply:

kubectl apply -f deployment.yaml

JVM apps often need initialDelaySeconds: 60 or higher β€” Spring Boot with a full context load can take 45 seconds on a cold start. For apps under heavy load, timeoutSeconds: 5 is a safe starting point; go to 10 if you're still seeing failures.

Fix 2: Add a Startup Probe (Kubernetes 1.16+)

Slow startup is its own problem. Fighting it with a large initialDelaySeconds is a guess β€” you're hardcoding a number that will be wrong in CI and wrong on a degraded node. Use a startup probe instead. It holds the liveness probe until the app signals ready:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30       # Try for up to 5 minutes (30 * 10s)
  periodSeconds: 10
  timeoutSeconds: 5

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 15
  timeoutSeconds: 5
  failureThreshold: 3

The startup probe runs first. The moment it succeeds once, it stops and liveness takes over. No guessing required.

Fix 3: Fix the Health Endpoint Itself

A slow /healthz is often the sneakiest culprit. Strip out any logic that doesn't belong there. The liveness probe has one job: confirm the process is alive. Checking database connectivity, running cache pings, or validating external APIs belongs in /readyz behind a readiness probe β€” not here.

A correct liveness endpoint looks like this:

# Go
http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ok"))
})

# Node.js
app.get('/healthz', (req, res) => res.status(200).send('ok'));

# Python/FastAPI
@app.get("/healthz")
def healthz():
    return {"status": "ok"}

It should return 200 in under 5 milliseconds. If yours doesn't, strip it down until it does.

Fix 4: Increase Resource Limits

CPU throttling is easy to miss because the app looks healthy in every other way. When a container hits its CPU limit, the kernel throttles it β€” and suddenly a 2ms health check takes 2 seconds. Check whether this is happening:

kubectl exec -it api-server-7d9f4b8c6-xk2pq -- cat /sys/fs/cgroup/cpu/cpu.stat | grep throttled

If throttled_time is non-zero and growing, your app is CPU-starved. Raise the limit:

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1000m"    # Raise this if throttled
    memory: "512Mi"

Start by doubling the CPU limit, redeploy, and recheck throttled_time.

Fix 5: Switch to a TCP or Exec Probe

Sometimes the simplest probe is the right one. TCP just checks if the port is open β€” no HTTP overhead, no handler code to worry about. Exec runs a command inside the container:

# TCP probe β€” confirms port is listening
livenessProbe:
  tcpSocket:
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20
  timeoutSeconds: 5

# Exec probe β€” runs a command inside the container
livenessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 10

The exec pattern works well for apps that write /tmp/healthy on startup and delete it when they want to signal a problem. Coarse, but reliable.

Verify the Fix

# Watch pods stabilize
kubectl get pods -n your-namespace -w

# Confirm restart count stops climbing
kubectl get pods -n your-namespace
# RESTARTS column should stay flat

# Check events are clean
kubectl describe pod  -n your-namespace | tail -20
# Should show no Unhealthy warnings

Wait at least 3–5 probe cycles before calling it stable. With periodSeconds: 15, that's about 75 seconds of clean output before you close the laptop.

Prevention

  • Never ship default probe values to production. The defaults (1s timeout, 10s period) are designed for demos, not real workloads. Tune them per service.
  • Keep liveness and readiness separate. Liveness = is the process alive. Readiness = is it safe to receive traffic. Mixing them causes cascading restarts when a dependency goes down.
  • Load test your health endpoint before deploying. Run ab -n 1000 -c 50 http://localhost:8080/healthz and check p99 latency. If it's over 200ms, fix it before Kubernetes finds out.
  • Set terminationGracePeriodSeconds high enough for in-flight requests to complete. 30 seconds is a reasonable starting point for most APIs.
  • Review probe config in code review like any other production setting. Skipping it is how 2 AM pages happen.

Related Error Notes