今天在我们的环境中遇到了一个比较诡异的问题,我们在一台虚拟机上想要挂载一个CephFS,但是出现了一个failed: No such process的诡异问题,具体表现如下:

1
2
]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
mount: mount mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ on /tmp/data failed: No such process

很奇怪,为什么会出现No such process这个错误,刚开始以为是内核模块加载的问题,于是就看了一下内核的模块加载情况:

1
2
3
4
5
]# lsmod |grep ceph
ceph 358802 0
libceph 306625 1 ceph
libcrc32c 12644 1 libceph
dns_resolver 13140 1 libceph

发现不是内核模块的问题,因为一方面模块已经加载了,另外如果是内核模块的问题的话,应该会提示unknown filesystem type,而不是上面的错误。
怎么办呢,尝试strace看下具体的系统调用情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
]# strace -f mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
...
stat("/sbin/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
stat("/sbin/fs.d/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
stat("/sbin/fs/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
mount("mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/", "/tmp/data", "ceph", MS_MGC_VAL, NULL) = -1 ESRCH (No such process)
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3a36e4000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2502
read(3, "", 4096) = 0
close(3) = 0
...

问题出在了mount系统调用上,确实是返回了一个ESRCH错误,而这个错误的message就是显示出来的No such process,搜索了一下,发现这个错误主要就是出现在kill调用,找不到进程,但是为什么会在这里也返回这个错误呢?准备去看看相关的代码,在看代码之前,又用dmesg看了一下内核的日志,发现了一些信息:

1
2
3
4
5
6
7
8
9
10
11
]# dmesg
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Initializing cgroup subsys cpuacct
...
[ 126.629937] Key type dns_resolver registered
[ 126.685004] Key type ceph registered
[ 126.685817] libceph: loaded (mon/osd proto 15/24)
[ 126.717441] ceph: loaded (mds proto 32)
[ 126.718859] libceph: resolve 'mon1.ichenfu.com' (ret=-3): failed
[ 126.718862] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

从内核消息里看似乎发现了一些重要信息,看起来是域名解析有问题,但是确认过本地DNS配置,包括DNS的解析是没有问题的,那为什么会报这个错误?先试试直接用IP地址挂载看看是不是真的是解析问题吧:

1
2
]# mount -v -t ceph 192.168.1.100:6789,192.168.1.101:6789,192.168.1.102:6789:/ /tmp/data
mount: 192.168.1.100:6789,192.168.1.101:6789,192.168.1.102:6789:/ mounted on /tmp/data.

成功了!,那说明肯定就是域名解析的问题了,于是就可以有目的性的去看看相关的代码了,就用resolve 'mon1.ichenfu.com' (ret=-3): failed这个消息里的resolvefailed为关键字去搜索内核代码,发现相关的逻辑在net/ceph/messenger.c这个文件里:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
static int ceph_dns_resolve_name(const char *name, size_t namelen,
struct ceph_entity_addr *addr, char delim, const char **ipend)
{
const char *end, *delim_p;
char *colon_p, *ip_addr = NULL;
int ip_len, ret;

/*
* The end of the hostname occurs immediately preceding the delimiter or
* the port marker (':') where the delimiter takes precedence.
*/
delim_p = memchr(name, delim, namelen);
colon_p = memchr(name, ':', namelen);

if (delim_p && colon_p)
end = delim_p < colon_p ? delim_p : colon_p;
else if (!delim_p && colon_p)
end = colon_p;
else {
end = delim_p;
if (!end) /* case: hostname:/ */
end = name + namelen;
}

if (end <= name)
return -EINVAL;

/* do dns_resolve upcall */
// 调用dns_query,查询DNS
ip_len = dns_query(current->nsproxy->net_ns,
NULL, name, end - name, NULL, &ip_addr, NULL, false);
if (ip_len > 0)
ret = ceph_pton(ip_addr, ip_len, addr, -1, NULL);
else
// 如果失败,则返回ESRCH,但是不知道dns_query的实际返回的ip_len是什么
ret = -ESRCH;

kfree(ip_addr);

*ipend = end;

pr_info("resolve '%.*s' (ret=%d): %s\n", (int)(end - name), name,
ret, ret ? "failed" : ceph_pr_addr(addr));

return ret;
}

返回ESRCH的源头应该就在这里了,但是信息还是不足,不知道当时dns_query的实际返回值是啥,那我们继续看看dns_query的实现,这个实现的位置在net/dns_resolver/dns_query.c,是一个dns_resolver模块:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
int dns_query(struct net *net,
const char *type, const char *name, size_t namelen,
const char *options, char **_result, time64_t *_expiry,
bool invalidate)
{
struct key *rkey;
struct user_key_payload *upayload;
const struct cred *saved_cred;
size_t typelen, desclen;
char *desc, *cp;
int ret, len;

// 进入函数的日志
kenter("%s,%*.*s,%zu,%s",
type, (int)namelen, (int)namelen, name, namelen, options);

if (!name || namelen == 0)
return -EINVAL;

/* construct the query key description as "[<type>:]<name>" */
typelen = 0;
desclen = 0;
if (type) {
typelen = strlen(type);
if (typelen < 1)
return -EINVAL;
desclen += typelen + 1;
}

if (namelen < 3 || namelen > 255)
return -EINVAL;
desclen += namelen + 1;

desc = kmalloc(desclen, GFP_KERNEL);
if (!desc)
return -ENOMEM;

cp = desc;
if (type) {
memcpy(cp, type, typelen);
cp += typelen;
*cp++ = ':';
}
memcpy(cp, name, namelen);
cp += namelen;
*cp = '\0';

if (!options)
options = "";
// 内核debug日志
kdebug("call request_key(,%s,%s)", desc, options);

/* make the upcall, using special credentials to prevent the use of
* add_key() to preinstall malicious redirections
*/
saved_cred = override_creds(dns_resolver_cache);
rkey = request_key_net(&key_type_dns_resolver, desc, net, options);
revert_creds(saved_cred);
kfree(desc);
if (IS_ERR(rkey)) {
ret = PTR_ERR(rkey);
goto out;
}

down_read(&rkey->sem);
set_bit(KEY_FLAG_ROOT_CAN_INVAL, &rkey->flags);
rkey->perm |= KEY_USR_VIEW;

ret = key_validate(rkey);
if (ret < 0)
goto put;

/* If the DNS server gave an error, return that to the caller */
ret = PTR_ERR(rkey->payload.data[dns_key_error]);
if (ret)
goto put;

upayload = user_key_payload_locked(rkey);
len = upayload->datalen;

if (_result) {
ret = -ENOMEM;
*_result = kmemdup_nul(upayload->data, len, GFP_KERNEL);
if (!*_result)
goto put;
}

if (_expiry)
*_expiry = rkey->expiry;

ret = len;
put:
up_read(&rkey->sem);
if (invalidate)
key_invalidate(rkey);
key_put(rkey);
out:
// 结束函数的日志
kleave(" = %d", ret);
return ret;
}
EXPORT_SYMBOL(dns_query);

先不关心整个函数的实现逻辑,先看看函数里打印日志的地方,先想办法把调试日志打开,拿到更详细的信息,其中kenterkleave是两个宏:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
* debug tracing
*/
extern unsigned int dns_resolver_debug;

#define kdebug(FMT, ...) \
do { \
if (unlikely(dns_resolver_debug)) \
printk(KERN_DEBUG "[%-6.6s] "FMT"\n", \
current->comm, ##__VA_ARGS__); \
} while (0)

#define kenter(FMT, ...) kdebug("==> %s("FMT")", __func__, ##__VA_ARGS__)
#define kleave(FMT, ...) kdebug("<== %s()"FMT"", __func__, ##__VA_ARGS__)

也就是说如果dns_resolver_debug的值不为0,就会使用printk输出调试的日志,那dns_resolver_debug很有可能是最为模块加载的参数传递的,在net/dns_resolver/dns_key.c中有模块定义的参数信息:

1
2
3
4
5
6
7
8
9
...
MODULE_DESCRIPTION("DNS Resolver");
MODULE_AUTHOR("Wang Lei");
MODULE_LICENSE("GPL");

unsigned int dns_resolver_debug;
module_param_named(debug, dns_resolver_debug, uint, 0644);
MODULE_PARM_DESC(debug, "DNS Resolver debugging mask");
...

看到dns_resolver_debug是由debug这个参数控制的,那么就简单了,我们手动重新加载这个模块,并加上debug参数就行:

1
2
3
4
5
]# rmmod ceph           # 先卸载依赖ceph模块
]# rmmod libceph # 卸载libceph模块
]# rmmod dns_resolver # 卸载dns_resolver
]# modprobe dns_resolver debug=1 # 加载dns_resolver模块,参数debug=1
]# modprobe ceph # 加载ceph模块

再尝试mount一下,并且看一下dmesg信息:

1
2
3
4
5
6
7
8
9
]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
mount: mount mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ on /tmp/data failed: No such process
]# dmesg
...
[ 3056.185724] [mount ] ==> dns_query((null),mon1.ichenfu.com,16,(null))
[ 3056.185733] [mount ] call request_key(,mon1.ichenfu.com,)
[ 3056.185910] [mount ] <== dns_query() = -2
[ 3056.185916] libceph: resolve 'mon1.ichenfu.com' (ret=-3): failed
[ 3056.185921] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

看到dns_query的返回是-2,用perror看一下这个errno的意义:OS error code 2: No such file or directory,没有找到文件,没找到什么文件呢?再看看这个模块的文档吧,文档在https://www.kernel.org/doc/Documentation/networking/dns_resolver.txt

========

OVERVIEW

The DNS resolver module provides a way for kernel services to make DNS queries
by way of requesting a key of key type dns_resolver. These queries are
upcalled to userspace through /sbin/request-key.

These routines must be supported by userspace tools dns.upcall, cifs.upcall and
request-key. It is under development and does not yet provide the full feature
set. The features it does support include:

(*) Implements the dns_resolver key_type to contact userspace.

It does not yet support the following AFS features:

(*) Dns query support for AFSDB resource record.

This code is extracted from the CIFS filesystem.

这个模块给内核提供一个查询DNS记录的方法,查询通过用户空间的/sbin/request-key进行,也就是说,这个模块依赖/sbin/request-key这个程序。
于是在机器上看了一下,果然,这个程序不存在。。又查询了一下,发现这个程序由keyutils这个包提供,yum install -y keyutils安装了这个包之后,问题解决了。
文档里提到有个配置文件/etc/request-key.conf,再看看这个配置文件里的配置:

1
2
3
4
5
...
#OP TYPE DESCRIPTION CALLOUT INFO PROGRAM ARG1 ARG2 ARG3 ...
#====== ======= =============== =============== ===============================
create dns_resolver * * /sbin/key.dns_resolver %k
...

发现还依赖/sbin/key.dns_resolver,不过这个也是包含在keyutils包里的。
好吧,问题也总算是解决了。