The Error
You check your cluster and find pods stuck in Evicted state. Running kubectl describe pod gives you this:
Status: Failed
Reason: Evicted
Message: The node was low on resource: memory. Threshold quantity: 100Mi, available: 95Mi.
The kubelet killed your pod to protect node stability. This isn't the same as OOMKilled β that happens when a container blows past its own memory limit. Eviction is the kubelet acting before things get that bad. When free node memory drops below a configured threshold (100Mi by default), the kubelet starts clearing pods preemptively.
Why This Happens
The kubelet monitors available memory against its eviction thresholds. Once memory.available dips below 100Mi, it begins evicting pods in order: BestEffort pods go first, then Burstable, and Guaranteed pods last. Your pod was unlucky enough to be in the firing line.
What typically gets you here:
- Pods with no
requestsorlimitsβ the scheduler has no idea how much memory they actually need - A memory leak in your app or one of its dependencies
- The node is genuinely too small for the workload it's running
- A noisy-neighbor pod ballooning past its limits right before its own OOMKill
- Too many pods crammed onto one node because resource requests were missing
Step 1: Identify What Got Evicted and Why
Start by listing all evicted pods across every namespace:
kubectl get pods -A --field-selector=status.phase=Failed | grep Evicted
Pull the full eviction message for a specific pod:
kubectl describe pod <pod-name> -n <namespace>
Then check the node that was under pressure:
kubectl describe node <node-name> | grep -A 10 'Conditions:'
You're looking for MemoryPressure: True. Also run this to see current resource consumption at the node level:
kubectl top nodes
Step 2: Clean Up Evicted Pods
Evicted pods don't go away on their own. They sit in Failed state and eat up API object slots. Clear them out:
# Delete all evicted pods in a namespace
kubectl delete pods -n <namespace> --field-selector=status.phase=Failed
# Or across all namespaces
kubectl delete pods -A --field-selector=status.phase=Failed
Step 3: Set Resource Requests and Limits
No resources block? Fix that first. The scheduler treats request-less pods as zero-cost and happily stacks them onto already-busy nodes. Those pods are also first in line when eviction starts.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
template:
spec:
containers:
- name: my-app
image: my-app:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Set requests to what your app uses under normal load β check with kubectl top pod over a few days. Set limits to the ceiling you're comfortable with. Getting this wrong is the single biggest driver of eviction problems.
Step 4: Check Node Capacity vs. Allocated Resources
See how much is actually allocated on the troubled node:
kubectl describe node <node-name> | grep -A 8 'Allocated resources'
Example output:
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1850m (46%) 3200m (80%)
memory 6Gi (95%) 8Gi (127%)
Memory requests at 95% with limits at 127%? That node is overcommitted. Any spike will trigger eviction. You need to either scale the node up, add nodes to the cluster, or move some pods elsewhere.
Step 5: Find the Memory Hog
Sort pods by memory consumption to find what's eating the most:
kubectl top pods -A --sort-by=memory | head -20
A pod consistently running near or over its request is either configured with too-low values or has a leak. Cross-reference actual usage with what's configured:
kubectl top pod <pod-name> -n <namespace>
kubectl get pod <pod-name> -n <namespace> -o jsonpath='{.spec.containers[*].resources}'
Step 6: Protect Critical Pods with QoS
Kubernetes derives a Quality of Service class from your resource config. To get Guaranteed status β last to be evicted β set requests equal to limits:
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "500m"
For anything truly critical, back that up with a PriorityClass:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
preemptionPolicy: PreemptLowerPriority
globalDefault: false
---
# In your Deployment spec:
spec:
template:
spec:
priorityClassName: high-priority
Step 7: Scale Up or Add Nodes (If Needed)
Sometimes the node is just too small. Add capacity on managed clusters:
# EKS β update node group
eksctl scale nodegroup --cluster=<cluster> --nodes=5 --name=<nodegroup>
# GKE β resize node pool
gcloud container clusters resize <cluster> --node-pool=<pool> --num-nodes=5
# Or let the cluster autoscaler handle it
kubectl get deployment cluster-autoscaler -n kube-system
Verify the Fix
# Confirm no more MemoryPressure on nodes
kubectl get nodes
# STATUS should be Ready, not MemoryPressure
# Check node conditions
kubectl describe node <node-name> | grep MemoryPressure
# Expected: MemoryPressure False
# Watch pods stay Running
kubectl get pods -n <namespace> -w
# Confirm no evicted pods remaining
kubectl get pods -A --field-selector=status.phase=Failed
Prevent It From Happening Again
- Always set resource requests β no exceptions. Even small pods. Use LimitRange to enforce minimums cluster-wide so nothing slips through.
- Run VPA in recommendation mode first. Vertical Pod Autoscaler will watch your pods for a few days and suggest realistic request values before you commit to them.
- Alert on MemoryPressure before it bites you. You don't want to discover evictions from a broken deployment at 2 AM.
- Add PodDisruptionBudgets for stateful or critical services so eviction respects a minimum number of available replicas.
- Enforce resource specs at the namespace level with LimitRange β this blocks pod creation if requests aren't specified:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: my-namespace
spec:
limits:
- default:
memory: 512Mi
cpu: 500m
defaultRequest:
memory: 256Mi
cpu: 250m
type: Container

