The Silent Connection Killer
Three hours. That's how long I spent debugging a microservice that couldn't reach its database. The logs were flooded with one specific error, but nothing on the application side seemed wrong. Pods were running. The service was reachable. DNS resolved fine.
dial tcp 10.244.1.45:5432: i/o timeout
In Kubernetes, an i/o timeout means the packet left but never got a reply β or it was dropped silently mid-route. That's the key difference from connection refused, which means the port is closed and the target actively said "no." A timeout usually points to a firewall or a Network Policy eating packets without sending a TCP Reset back.
How I Debugged the Timeout
For pod-to-pod timeouts, I run through a short checklist to narrow down the cause. Start simple, then work inward.
1. Verify the Pod is Actually Listening
The first check: is the target pod alive and listening on that port at all? I spun up a netshoot debug pod in the same namespace and hit the IP directly β no DNS, no service layer in between.
kubectl run tmp-shell --rm -i --tty --image nicolaka/netshoot -- /bin/bash
# Inside the pod:
nc -zv 10.244.1.45 5432
Another timeout. Same problem. That ruled out application code entirely β this was a network layer issue.
2. Check for Existing Network Policies
Next, I checked for any policies applied to the namespace.
kubectl get networkpolicy -n production
There it was: a policy named default-deny-ingress. And that's the gotcha many developers miss: the moment a pod is selected by any NetworkPolicy, it enters implicit deny mode for all traffic not explicitly listed. Someone had locked down the namespace for security but never added an exception for my new service.
The Fix: Write a Specific Allow Rule
Don't delete the deny policy. That's a security regression. The right move is to write a narrow allow rule that opens only the exact traffic you need.
The "Allow Ingress" Policy
I wrote a NetworkPolicy targeting the database pod (the receiver) to permit traffic from the API pod on port 5432.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-db-access
namespace: production
spec:
podSelector:
matchLabels:
app: postgres
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: web-api
ports:
- protocol: TCP
port: 5432
Don't Forget Egress
Timeouts can happen on the sending side too. If your source pod has an Egress policy, it might be blocked from initiating the connection in the first place. My API pod did β its egress rules were just as restrictive. I added this snippet:
# Partial snippet for the API's Egress policy
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- protocol: TCP
port: 5432
Verifying the Fix
Back in the netshoot pod, one more test:
$ nc -zv 10.244.1.45 5432
Connection to 10.244.1.45 5432 port [tcp/postgresql] succeeded!
Instant success. No more timeout.
What I Changed Going Forward
Kubernetes networking is invisible until it breaks. After this incident, I made three habits stick:
- Standardize pod labels: Every pod gets
app,role, andenvlabels from day one. NetworkPolicies can only select what they can find. - Document the deny policy: A default deny is the right call for production. But you need a clear onboarding checklist so new services don't silently time out on their first deploy.
- Verify CIDR ranges before committing: When policies involve external IPs or specific subnets, I double-check ranges with the Subnet Calculator on ToolCraft. Mistyping a CIDR block β like writing
/24when you meant/32β can accidentally allow 254 extra IPs or block an entire segment.
Still seeing dial tcp: i/o timeout after all this? Dig into the CNI logs on the worker nodes (Calico or Cilium both have per-node logs). Occasionally the culprit is a stale iptables rule or a routing table mismatch at the node level β rare, but it happens in long-running clusters after partial upgrades.

