Misadventures with Kube DNS

Feb 9, 2017

6 minutes read

Kubernetes DNS is neat way to handle simple service-discovery in a kube cluster. The general idea is:

watch the kube API for Services and Pods (source here)
run a golang DNS server (using SkyDNS) to serve records pointing to those Services and Pods
front that DNS server w/ a caching proxy (DNSmasq)
point all pods (w/ the property dnsPolicy: ClusterFirst, the default) at the proxy by overriding their /etc/resolv.conf with one that points only to the DNS proxy (implemented with the cluster-dns Kubelet option. Gory details here. Notably, kube-dns itself doesn’t use ClusterFirst)

On my home lab “cluster” (okay, so it’s a single NUC) I had some trouble getting kube-dns working. Once I got both the pod network (I used Calico) and kube-proxy both working properly, it became clear that DNS wasn’t working. I didn’t have any services using it for discovery yet, but I wasn’t able to resolve external names. For a long time I conflated this with not being able to route externally - discovering that the trouble was DNS, not kube-proxy led me to ponder what was unusual about my DNS setup.

As it turns out, a few weeks prior I had installed dnsmasq on the machine now hosting my cluster for a project that never took off. Knowing that kube-dns typically listens on port 53 (the canonical Deployment is here), I assumed that kube-dns and my local dnsmasq were fighting over binding to the host’s port 53. I uninstalled dnsmasq, things worked perfectly. Success!

Or so I thought. My assumption that the two services were fighting for a port was wrong - kube-dns doesn’t run in the host network namespace, it binds to port 53 on a pod IP. It wasn’t until I ran in to a similar issue at work this week that I realized the true cause.

In a Kubernetes deployment I’ve been a part of building at work, we had seen strange behavior: all external DNS queries took exactly two seconds to respond, the first time you queried them only. Talk about red flags! First off, nothing takes an exact amount of time in computing without somehow being related to timeouts - especially not network calls. These DNS responses were taking between 2.000 and ~2.050 seconds without fail. If you queried them immediately after, a response would come nearly instantly.

This issue had been filed as a curioisity - perhaps we had not provisioned enough CPU to kube-dns? Maybe it was time set up the Kube DNS autoscaler? Whatever it was, it was something to look in to in the future - iterating on performance could come after we’d established some basic functionality with the app we’re working on.

Today I was doing just that - iterating on basic app functionality, when I reached a bunch of stack traces of

getaddrinfo: Name or service not known

when the app was trying to look up the IP of a database that was inside our network. Eventually I dusted off the two-second DNS issue - given how fast the exception was being thrown, perhaps we weren’t waiting around long enough to get lookups?

A few hours of stracing the application, kube-dns, liberally tcpdumping, and talking through the flow with an excellent coworker later, I had a solid picture of our two-second timeouts. Remember our golang DNS server and caching proxy? The flow of a lookup to our kube-dns setup when it wasn’t in cache or for a kube recordlooked like:

query comes in to the proxy
proxy falls through and sends query to the golang service
golang service forwards the query back to the proxy
proxy dutifully forwards it to the golang service again
wait two seconds
golang service forwards request to our internal DNS servers, gets a response, and responds to the proxy

What the hell? Why did it loop like that! Why did we wait for two seconds? (Our hosts were configured with a resolution timeout of one second, not two!) Remember from above that kube-dns itself doesn’t use ClusterFirst for DNS resolving (that wouldn’t make sense) - so what does it do? It gets the host’s /etc/resolv.conf mounted in instead. Our host’s /etc/resolv.conf looked something like:

search foo-datacenter.example.net bar-datacenter.example.net baz-datacenter.example.net
nameserver 127.0.0.1
nameserver 10.1.1.1         # our internal DNS servers
nameserver 10.2.2.2
nameserver 10.3.3.3

The root of the problem is here: the 127.0.0.1 entry. On the host itself, that points to a local dnscache instance. Inside the kube-dns pod’s network namespace, though, 127.0.0.1 is the pod. And the nameserver listening on port 53 is…the DNS caching proxy. The golang service is where external DNS lookups ultimately happen, so it was dutifully first trying 127.0.0.1…and looping requests back to itself. This is also what happened in my home setup - it wasn’t the host’s dnsmasq that was causing trouble, it’s that installing dnsmasq on Ubuntu automatically sets /etc/resolv.conf to point to localhost:

$ # without dnsmasq
$ cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.1.1

$ sudo apt install dnsmasq
Reading package lists... Done
...
Setting up dnsmasq (2.75-1ubuntu0.16.04.1) ...
$ cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.0.1

At work, I fixed the problem by deploying a second configuration to /etc/kubernetes/resolv.conf that didn’t include 127.0.0.1, and pointing the kubelet to it with the resolv-conf option.

There’s two more interesting mysteries here:

why did the lookups only loop around once?
why two seconds for the timeout?

The answers can be found in the code! The DNS lookups are ultimately handled by the miekg/dns golang library, which has a neat feature to save extra lookups if the same on is already in flight. When the second lookup came through, it just held on to it rather than forward back to the proxy, knowing it’d have an answer soon.

The two second timeout comes from the golang runtime’s strong separation from Unix tooling - rather than use glibc functions for DNS lookup, it has its own - thus ignoring the settings we had in /etc/resolv.conf. The service’s code defines read timeout of two seconds for DNS lookups.

So there you have it! Your DNS setup outside of Kubernetes can have profound implications on whether or not kube-dns works. The world inside of Kubernetes is quite nicely formalized, but the expectations kube has of your environment are less explicit.

Back to posts

blog.sophaskins.net | personal essays