/etc/resolv.conf search和ndots配置

先说说背景,为什么会要了解一下/etc/resolv.conf配置,起因是一个跑在k8s集群的一个业务出现问题,仔细排查后,发现其中一个Pod的域名解析有问题,域名login.example.com被解析到了一个IP,而这个IP地址是另一个范域名*.ichenfu.com的解析,经过一番调查,最终发现是同事在配置一台机器上的kubelet时填错了clusterDomain的配置,将原本需要配置为c2.ichenfu.com的配置写成了c1.ichenfu.com,那么问题来了,为什么这么配置会导致DNS解析到一个错误的,而且是完全不相干的地址的呢?下面就慢慢分析一下。

首先还原一下场景,默认情况下,kubelet启动Pod的时候,会将DNS配置注入到Pod中,出问题的Pod里/etc/resove.conf内容如下:

nameserver 10.254.0.2
search default.svc.c1.ichenfu.com svc.c1.ichenfu.com c1.ichenfu.com localdomain
options ndots:5

而提供k8s DNS解析服务的coredns的配置文件如下:

.:53 {
        errors
        health
        kubernetes cluster.local c2.ichenfu.com in-addr.arpa ip6.arpa {
          pods insecure
          upstream
          fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
        reload
        loadbalance
    }

需要解释一下Coredns的配置文件,大致的逻辑是接受所有的请求,并启用kubernetes插件,当请求属于cluster.localc2.ichenfu.com两个域时,查询k8s数据,返回对应记录,对于没有记录的其他域,代理给本地的DNS进行查询。

问题来了,针对login.example.com的解析请求,为啥最终是*.ichenfu.com这条规则作出响应?

最根本的原因是Pod中resove.conf的配置,其中search default.svc.c1.ichenfu.com svc.c1.ichenfu.com c1.ichenfu.com localdomainoptions ndots:5这两行配置表明,所有查询中,如果.的个数少于5个,则会根据search中配置的列表依次在对应域中先进行搜索,如果没有返回,则最后再直接查询域名本身。
所以针对login.example.com的情况是,先查询login.example.com.default.svc.c1.ichenfu.com,再查询login.example.com.svc.c1.ichenfu.com,再查询login.example.com.c1.ichenfu.com,然后login.example.com.localdomain,如果上面列表都没有解析返回,则最后再查询login.example.com,很显然,正常情况下,经过4次多余的查询之后,最终也会获得正确的结果,但是针对这次的情况,由于配置出问题,导致在第一次查询的时候,login.example.com.default.svc.c1.ichenfu.com这个域名并不在coredns所配置的cluster.localc2.ichenfu.com域中,直接转发到本地DNS,走正常的递归查询逻辑,而最终命中*.ichenfu.com这条规则。

将配置文件修改之后,问题就解决了。

最后再来看看到底resolv.conf相关配置的具体说明:

search Search list for host-name lookup.

The search list is normally determined from the local domain name; by default, it contains only the local domain name. This may be changed by listing the desired domain search path following the search keyword with spaces or tabs separating the names. Resolver queries having fewer than ndots dots (default is 1) in them will be attempted using each component of the search path in turn until a match is found. For environments with multiple subdomains please read options ndots:n below to avoid man-in-the-middle attacks and unnecessary traffic for the root-dns-servers. Note that this process may be slow and will generate a lot of network traffic if the servers for the listed domains are not local, and that queries will time out if no server is available for one of the domains.
The search list is currently limited to six domains with a total of 256 characters.

options option …

where option is one of the following:

ndots:n

sets a threshold for the number of dots which must appear in a name given to res_query(3) (see resolver(3)) before an initial absolute query will be made. The default for n is 1, meaning that if there are any dots in a name, the name will be tried first as an absolute name before any search list elements are appended to it. The value for this option is silently capped to 15.

至少文档里是这么说了,那么实际呢,是不是真的就是这样?使用host命令测试一下:

/ # host -v www.baidu.com
Trying "www.baidu.com.default.svc.c1.ichenfu.com"
Trying "www.baidu.com.svc.c1.ichenfu.com"
Trying "www.baidu.com.c1.ichenfu.com"
Trying "www.baidu.com.localdomain"
Trying "www.baidu.com"
...

确实是这样。如此一来,在默认配置情况下,在容器内部解析域名的成本还是很高的,大部分的域名都不会有5个.,也就意味着大部分外部域名DNS解析请求都需要5次才能解析成功。
所以业务部署的时候,就需要根据情况,强制注入一个外部DNS了。

参考:

  1. https://linux.die.net/man/5/resolv.conf