Fix Readiness Probe Failed: HTTP Probe Failed with statuscode: 500 in Kubernetes

The Error

Readiness probe failed: HTTP probe failed with statuscode: 500

Hit this after deploying a new service version — pods came up fine, stuck at 0/1 Running, and never got traffic. The Service kept routing to old pods. The readiness probe was hitting the health endpoint, getting a 500 back, and Kubernetes quietly pulled the new pods from rotation.

Why This Happens

Kubernetes uses readiness probes to gatekeep traffic. If the probe endpoint returns anything outside 2xx or 3xx, the pod gets marked not ready and removed from the Service's endpoint list. The process is alive — it just isn't taking requests.

Common reasons the health endpoint returns 500:

The app is mid-initialization (DB handshake, cache warmup, config parsing still in progress)
A dependency — Postgres, Redis, an upstream API — is down or unreachable from the pod
The health endpoint itself throws an unhandled exception
An env var or secret is missing, empty, or pointing at the wrong host
The probe is checking the wrong path or port entirely

Step-by-Step Fix

Step 1: Check Pod Status and Events

kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

Scroll to the Events section at the bottom of the describe output. You'll see repeated lines like:

Warning  Unhealthy  5s    kubelet  Readiness probe failed: HTTP probe failed with statuscode: 500

While you're there, check the probe config that's printed — confirm the path, port, and timing values actually match what your app exposes.

Step 2: Check Application Logs

kubectl logs <pod-name> -n <namespace>
# Multiple containers in the pod?
kubectl logs <pod-name> -c <container-name> -n <namespace>
# Live tail:
kubectl logs -f <pod-name> -n <namespace>

The 500 almost always has a cause logged by the app itself. Look for stack traces, connection refused errors, or "missing required env var" messages in the first few seconds after startup.

Step 3: Hit the Health Endpoint Manually

Exec into the pod and curl the probe path yourself. Skip the guessing:

kubectl exec -it <pod-name> -n <namespace> -- sh
# Inside the pod:
curl -v http://localhost:8080/health

Read the response body. A 500 from a Spring Boot app might say "Unable to acquire JDBC Connection". A Node.js service might spit out a stack trace. The body is the actual diagnosis — don't ignore it.

Step 4: Verify the Probe Configuration

Pull the full deployment YAML:

kubectl get deployment <deployment-name> -n <namespace> -o yaml

Find the readinessProbe block:

readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  failureThreshold: 3

Three mistakes I see over and over:

Wrong path — the app serves /healthz but the probe checks /health, getting a 404 that some frameworks turn into a 500
Wrong port — app listens on 3000, probe is hitting 8080
initialDelaySeconds too low — JVM apps routinely need 45–60 seconds to be ready, but the probe fires after 10

Step 5: Check Environment Variables and Secrets

kubectl exec -it <pod-name> -n <namespace> -- env | grep -i db
kubectl exec -it <pod-name> -n <namespace> -- env | grep -i redis

If DATABASE_URL is empty or points at localhost instead of your actual DB service, every health check that tries to open a connection will fail. Verify secrets are mounted:

kubectl get secret <secret-name> -n <namespace> -o yaml
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Environment"

Step 6: Temporarily Increase initialDelaySeconds

Slow-starting app? Bump the delay before rolling a fix:

kubectl patch deployment <deployment-name> -n <namespace> --type='json' \
  -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/readinessProbe/initialDelaySeconds", "value": 60}]'

Or go interactive:

kubectl edit deployment <deployment-name> -n <namespace>

This buys time to diagnose the real issue without a cascade of probe failures tanking the rollout.

Step 7: Fix Downstream Dependency Issues

If the 500 is coming from the app trying to ping Postgres or Redis and failing, test connectivity from inside the pod first:

# Postgres:
kubectl exec -it <pod-name> -n <namespace> -- nc -zv <db-host> 5432
# Redis:
kubectl exec -it <pod-name> -n <namespace> -- nc -zv <redis-host> 6379

Can't reach it? The probe config isn't your problem. Check NetworkPolicies, Service DNS resolution, and whether the DB pod itself is in a ready state. Fix the connectivity; the probe result will sort itself out.

Verify the Fix

Watch the pod come ready in real time:

kubectl get pods -n <namespace> -w

READY should flip from 0/1 to 1/1 within a couple of probe cycles. Then confirm the pod actually joined the endpoint list:

kubectl get endpoints <service-name> -n <namespace>

Its IP should show up there. Do a quick end-to-end test:

kubectl port-forward svc/<service-name> 8080:80 -n <namespace>
curl http://localhost:8080/

Tips for Next Time

Split liveness from readiness: liveness should only check if the process is alive — not hanging, not deadlocked. Readiness checks if it can serve traffic. Bundling DB connectivity into liveness causes unnecessary pod restarts when your DB hiccups at 2am.
Keep health endpoints cheap: don't run a SELECT 1 query on every probe hit. At 10-second intervals across 20 replicas, that's 120 DB queries per minute just from health checks. A connection pool check is enough.
Use startupProbe for slow starters: Kubernetes 1.18+ has startupProbe. Use it instead of inflating initialDelaySeconds — it disables liveness/readiness until the startup probe passes, giving your app room to breathe without risking missed failures later.
Don't set failureThreshold to 1: one transient blip shouldn't yank a pod from rotation. Three failures is a reasonable floor for most services.

# A solid probe setup for a typical web service
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10
  failureThreshold: 3
  successThreshold: 1
livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 15
  failureThreshold: 5
startupProbe:
  httpGet:
    path: /health/live
    port: 8080
  failureThreshold: 30
  periodSeconds: 10