# Troubleshooting DNS in Kubernetes Cheatsheet

You have an application running in a Kubernetes cluster and you start experiencing DNS resolution issues. If that's your case, I have found the following steps the most helpful when investigating these issues.

I hope it can help you too and remember, it's always DNS.

Beyond this cheatsheet, you have a more comprehensive guide in the official Kubernetes docs

# 1. Use a container with the necessary utilities

Quickly create a Pod that uses a container with the utilities such as nslookup or dig that you will need. (You can skip this if you already have a container with these utilities)

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always
EOF

# 2. Check the DNS configuration applied to Pods

Check the contents of the /etc/resolv.conf file inside the container.

$ kubectl exec -ti dnsutils -- cat /etc/resolv.conf
nameserver 10.43.0.10
search default.svc.cluster.local svc.cluster.local cluster.local {... might have other domains here}
options ndots:5

You want to make sure the nameserver IP corresponds to the IP of the kube-dns service and that the service is up and running!

$ kubectl get service kube-dns -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.43.0.10   <none>        53/UDP,53/TCP,9153/TCP   337d

# 3. Check if you can resolve the address

Using the dnsutils container you added in the first step, attemp to resolve a domain using nslookup as in:

$ kubectl exec -ti dnsutils -- nslookup google.com
Server:         10.43.0.10
Address:        10.43.0.10#53

Non-authoritative answer:
Name:   google.com
Address: 216.58.210.46
Name:   google.com
Address: 2a00:1450:4009:800::200e

Replace google.com with the URL you are having trouble with. This can either be annother external URL or an internal URL such as my-service.my-namespace.

If you cannot resolve it, make sure the kube-dns service is healthy, and that the Pods behind are running.

Check also if there is any error reported by those Pods, which are the ones with the label k8s-app=kube-dns:

kubectl logs -n kube-system -l k8s-app=kube-dns --tail 50

# 4. Trace the resolution and check for latencies

You can use the dig tool to trace the DNS resolution process and inspect the latencies of each network hoop.

For example, you can trace ther resolution of google.com using the dig google.com command. From the output below, we can see it ends resolving to the expected IP 216.58.210.46 and the response time:

$ kubectl exec -ti dnsutils -- dig google.com

; <<>> DiG 9.11.6-P1 <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29263
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 9

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 0fbb77fd5accad54 (echoed)
;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             28      IN      A       216.58.210.46

;; AUTHORITY SECTION:
google.com.             28      IN      NS      ns3.google.com.
google.com.             28      IN      NS      ns2.google.com.
google.com.             28      IN      NS      ns4.google.com.
google.com.             28      IN      NS      ns1.google.com.

;; ADDITIONAL SECTION:
ns2.google.com.         28      IN      AAAA    2001:4860:4802:34::a
ns1.google.com.         28      IN      AAAA    2001:4860:4802:32::a
ns4.google.com.         28      IN      AAAA    2001:4860:4802:38::a
ns1.google.com.         28      IN      A       216.239.32.10
ns3.google.com.         28      IN      A       216.239.36.10
ns4.google.com.         28      IN      A       216.239.38.10
ns2.google.com.         28      IN      A       216.239.34.10
ns3.google.com.         28      IN      AAAA    2001:4860:4802:36::a

;; Query time: 2 msec
;; SERVER: 10.43.0.10#53(10.43.0.10)
;; WHEN: Fri Nov 13 19:19:03 UTC 2020
;; MSG SIZE  rcvd: 517

If you want to go deeper, you can use dig +trace and see the intermediate steps in the resolution:

kubectl exec -ti dnsutils -- dig +trace google.com

# Using rancher?

I have been hit by this issue whereby Pods suddenly start getting their IPs assigned in the 172.17.0.0 range.

When this happenes, they are in the wrong network and thus become unreachable from other Pods. Eventually this might affect one or all of the Pods in the kube-dns system, which results in intermittent or permanent DNS resolution issues until the Pods are recreated.