Fixing 'Cannot evict pod as it would violate the pod's disruption budget' in Kubernetes

The Problem: The Maintenance Deadlock

Ever tried to drain a node only to watch your terminal hang indefinitely? You're likely performing standard node upkeep—perhaps patching an OS or resizing an instance—but the process won't budge. You run kubectl drain <node-name>, and instead of the node emptying, you see this error looping every few seconds:

error when evicting pods/"api-gateway-v2-7c4d98bf9-xyz1" (assigned to "worker-node-01"): 
Cannot evict pod as it would violate the pod's disruption budget.

evicting pod default/api-gateway-v2-7c4d98bf9-xyz1
error when evicting pods/"api-gateway-v2-7c4d98bf9-xyz1": Cannot evict pod...

Kubernetes behaves this way because a PodDisruptionBudget (PDB) is performing its sole duty: keeping your app online. However, a strict configuration can accidentally turn a safety net into a barricade that stops the cluster from managing its own hardware.

Step 1: Identify the Blocking PDB

Start by finding the specific PDB putting the brakes on your drain. Run this command in the affected namespace:

kubectl get pdb -n <your-namespace>

Scanning the output columns is key. Focus on ALLOWED DISRUPTIONS. If that value is 0, the Eviction API will refuse to touch any pod governed by that budget. No exceptions.

NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
api-gateway-pdb  1               N/A               0                     12d

Step 2: Why are Disruptions Blocked?

Zero allowed disruptions usually boil down to three common scenarios:

1. The Single-Replica Trap

If you have replicas: 1 in your Deployment and a PDB set to minAvailable: 1, you've created a deadlock. To move the pod, Kubernetes must delete the current one. But deleting it would drop the count to 0, violating your "minimum 1" rule. The system won't kill the pod to make room for a new one because that would cause a temporary outage.

2. The Healthy Capacity Gap

Imagine a Deployment with 3 replicas and a PDB requiring minAvailable: 3. If one pod is already failing—perhaps due to a CrashLoopBackOff or an ImagePullBackOff—you only have 2 healthy pods. Since you are already below your required minimum, Kubernetes locks down the remaining pods to prevent further degradation.

3. The '100%' Configuration Error

Setting minAvailable: 100% tells Kubernetes that you never want a single pod to go offline for maintenance. This is rarely what you actually want for a distributed system, as it makes node upgrades impossible without manual intervention.

Step 3: Resolve the Blockage

Choose the path that fits your current risk tolerance.

Solution A: Scale Up Temporarily (Cleanest Fix)

The safest way to unblock the drain is to give the budget some breathing room. If your PDB requires 1 available pod and you only have 1, scale the deployment to 2. Once the second pod is Ready on a different node, ALLOWED DISRUPTIONS will flip to 1, and the drain will proceed automatically.

kubectl scale deployment api-gateway --replicas=2 -n <namespace>

Solution B: Swap to maxUnavailable

For smaller workloads, maxUnavailable is often more intuitive. Editing the PDB to allow 1 pod to be down is safer than mandating a specific number of pods stay up during a scale-down event.

# Open the PDB for editing
kubectl edit pdb api-gateway-pdb -n <namespace>

# Replace minAvailable with:
maxUnavailable: 1

Solution C: The 'Nuclear' Option (Quickest Fix)

If you're in the middle of a maintenance window and can tolerate a 30-second blip, just delete the PDB. The kubectl drain command will immediately evict the pod. You can then recreate the PDB once the node is back in service.

kubectl delete pdb api-gateway-pdb -n <namespace>

Step 4: Verification

Check the PDB status one last time. You want to see ALLOWED DISRUPTIONS at 1 or higher.

kubectl get pdb -n <namespace>

NAME             MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
api-gateway-pdb  N/A             1                 1                     12d

Your kubectl drain process should now resume, moving pods to other nodes and allowing the targeted node to enter the SchedulingDisabled state before shutting down.

Lessons for Next Time

Avoid 100% minAvailable: It’s a recipe for stuck clusters. Stick to percentages like 50% or 80%.
Single Pods Don't Need PDBs: If an app can't be scaled to 2+ replicas, a PDB with minAvailable: 1 just creates a headache during upgrades.
Use maxUnavailable: 1: For small clusters, this is generally more resilient than calculating minAvailable values.
Monitor Allowed Disruptions: Use your Prometheus alerts to find PDBs with 0 allowed disruptions before you start your maintenance window.