The Error Scenario
One minute your microservices are communicating perfectly; the next, your logs are flooded with connection failures. This usually happens when a Pod tries to reach another service but hits a wall. Instead of a successful handshake, you see this error:
dial tcp: lookup my-api-service on 10.96.0.10:53: no such host
This message tells us two things. First, your Pod is correctly trying to use the Kubernetes DNS service (typically mapped to 10.96.0.10). Second, the DNS server either didn't respond or couldn't find the address for my-api-service.
Why DNS Fails in Kubernetes
In clusters built with kubeadm, 10.96.0.10 is the standard ClusterIP for the kube-dns service. When lookups fail, the root cause typically stems from one of these four areas:
- CoreDNS Instability: The DNS pods are crashing, stuck in a
CrashLoopBackOff, or overloaded. - Namespace Scoping: You are trying to reach a service in a different namespace without using its full name.
- Network Policy Blocks: A policy is preventing your Pod from sending traffic to the
kube-systemnamespace on port 53. - Configuration Drifts: The
resolv.conffile inside your container doesn't point to the correct DNS service IP.
Quick Fix: Refresh CoreDNS
Start with the basics. If CoreDNS has encountered a transient glitch or a memory leak, a quick rollout restart often clears the path. Check the status of your DNS pods first.
# Check if CoreDNS pods are running (you should see at least 2 for HA)
kubectl get pods -n kube-system -l k8s-app=kube-dns
# Trigger a fresh rollout
kubectl rollout restart deployment coredns -n kube-system
Deep Dive Troubleshooting
1. Use the Fully Qualified Domain Name (FQDN)
Kubernetes DNS resolution relies on a specific hierarchy: <service>.<namespace>.svc.cluster.local. If your application resides in namespace-a and wants to talk to my-api in namespace-b, a simple request to http://my-api will fail because the local search path only looks within namespace-a.
The Fix: Always use the full internal address for cross-namespace traffic:
# Change this:
my-api-service
# To this:
my-api-service.target-namespace.svc.cluster.local
2. Debug with a Dedicated Network Tool
If you aren't sure if the problem is your code or the cluster, spin up a debug Pod. Use the busybox:1.28 image specifically. Newer versions of BusyBox contain a known DNS search path bug that can give you false negatives during testing.
kubectl run dns-test --rm -it --image=busybox:1.28 -- /bin/sh
# Test internal cluster resolution
nslookup kubernetes.default
# Test your specific service
nslookup my-api-service.my-namespace.svc.cluster.local
If nslookup kubernetes.default fails, your entire DNS sub-system is likely down. If it works but your specific service fails, check your Service labels and selectors.
3. Inspect the Pod's resolv.conf
Each Pod has a /etc/resolv.conf file that dictates where it sends DNS queries. If this file is misconfigured, the Pod will never find the DNS server. Run this command to see what your Pod sees:
kubectl exec <pod-name> -- cat /etc/resolv.conf
A healthy configuration should look like this:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
If the nameserver IP doesn't match your kube-dns Service IP (check it with kubectl get svc -n kube-system), your Kubelet configuration is out of sync.
4. Check for Restrictive Network Policies
Security-hardened clusters often use NetworkPolicies to restrict traffic. If you use Calico or Cilium, ensure your application Pods are allowed to send egress traffic to the kube-system namespace on Port 53 (UDP and TCP). Without this, the DNS query never leaves the Pod.
# Example egress rule for DNS access
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
Prevention: Avoiding CIDR Collisions
A common trap during cluster setup is overlapping IP ranges. If your Service CIDR (e.g., 10.96.0.0/12) overlaps with your corporate VPN or local data center network, DNS packets might be routed out of the cluster instead of to CoreDNS. I recommend using an IP Subnet Calculator during the planning phase to ensure your 10.96.0.0 range is completely isolated from your physical infrastructure.
Final Verification
Once you've applied a fix, verify it by querying the service directly from the application container. You want to see a valid ClusterIP returned in the response:
kubectl exec <pod-name> -- nslookup my-api-service
# Success looks like this:
Server: 10.96.0.10
Address: 10.96.0.10:53
Name: my-api-service.default.svc.cluster.local
Address: 10.105.22.45
As soon as that Address line appears, your dial tcp errors will stop, and your services will resume communicating.

