to inspire confidence in somebody.

0%

挂载CephFS时出现failed: No such process的问题

今天在我们的环境中遇到了一个比较诡异的问题,我们在一台虚拟机上想要挂载一个CephFS,但是出现了一个failed: No such process的诡异问题,具体表现如下:

1
]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
2
mount: mount mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ on /tmp/data failed: No such process

很奇怪,为什么会出现No such process这个错误,刚开始以为是内核模块加载的问题,于是就看了一下内核的模块加载情况:

1
]# lsmod |grep ceph
2
ceph                  358802  0
3
libceph               306625  1 ceph
4
libcrc32c              12644  1 libceph
5
dns_resolver           13140  1 libceph

发现不是内核模块的问题,因为一方面模块已经加载了,另外如果是内核模块的问题的话,应该会提示unknown filesystem type,而不是上面的错误。
怎么办呢,尝试strace看下具体的系统调用情况:

1
]# strace -f mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
2
...
3
stat("/sbin/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
4
stat("/sbin/fs.d/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
5
stat("/sbin/fs/mount.ceph", 0x7ffd39101680) = -1 ENOENT (No such file or directory)
6
mount("mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/", "/tmp/data", "ceph", MS_MGC_VAL, NULL) = -1 ESRCH (No such process)
7
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
8
fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
9
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb3a36e4000
10
read(3, "# Locale name alias data base.\n#"..., 4096) = 2502
11
read(3, "", 4096)                       = 0
12
close(3)                                = 0
13
...

问题出在了mount系统调用上,确实是返回了一个ESRCH错误,而这个错误的message就是显示出来的No such process,搜索了一下,发现这个错误主要就是出现在kill调用,找不到进程,但是为什么会在这里也返回这个错误呢?准备去看看相关的代码,在看代码之前,又用dmesg看了一下内核的日志,发现了一些信息:

1
]# dmesg
2
[    0.000000] Initializing cgroup subsys cpuset
3
[    0.000000] Initializing cgroup subsys cpu
4
[    0.000000] Initializing cgroup subsys cpuacct
5
...
6
[  126.629937] Key type dns_resolver registered
7
[  126.685004] Key type ceph registered
8
[  126.685817] libceph: loaded (mon/osd proto 15/24)
9
[  126.717441] ceph: loaded (mds proto 32)
10
[  126.718859] libceph: resolve 'mon1.ichenfu.com' (ret=-3): failed
11
[  126.718862] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

从内核消息里看似乎发现了一些重要信息,看起来是域名解析有问题,但是确认过本地DNS配置,包括DNS的解析是没有问题的,那为什么会报这个错误?先试试直接用IP地址挂载看看是不是真的是解析问题吧:

1
]# mount -v -t ceph 192.168.1.100:6789,192.168.1.101:6789,192.168.1.102:6789:/ /tmp/data
2
mount: 192.168.1.100:6789,192.168.1.101:6789,192.168.1.102:6789:/ mounted on /tmp/data.

成功了!,那说明肯定就是域名解析的问题了,于是就可以有目的性的去看看相关的代码了,就用resolve 'mon1.ichenfu.com' (ret=-3): failed这个消息里的resolvefailed为关键字去搜索内核代码,发现相关的逻辑在net/ceph/messenger.c这个文件里:

1
static int ceph_dns_resolve_name(const char *name, size_t namelen,
2
		struct ceph_entity_addr *addr, char delim, const char **ipend)
3
{
4
	const char *end, *delim_p;
5
	char *colon_p, *ip_addr = NULL;
6
	int ip_len, ret;
7
8
	/*
9
	 * The end of the hostname occurs immediately preceding the delimiter or
10
	 * the port marker (':') where the delimiter takes precedence.
11
	 */
12
	delim_p = memchr(name, delim, namelen);
13
	colon_p = memchr(name, ':', namelen);
14
15
	if (delim_p && colon_p)
16
		end = delim_p < colon_p ? delim_p : colon_p;
17
	else if (!delim_p && colon_p)
18
		end = colon_p;
19
	else {
20
		end = delim_p;
21
		if (!end) /* case: hostname:/ */
22
			end = name + namelen;
23
	}
24
25
	if (end <= name)
26
		return -EINVAL;
27
28
	/* do dns_resolve upcall */
29
	// 调用dns_query,查询DNS
30
	ip_len = dns_query(current->nsproxy->net_ns,
31
			   NULL, name, end - name, NULL, &ip_addr, NULL, false);
32
	if (ip_len > 0)
33
		ret = ceph_pton(ip_addr, ip_len, addr, -1, NULL);
34
	else
35
		// 如果失败,则返回ESRCH,但是不知道dns_query的实际返回的ip_len是什么
36
		ret = -ESRCH;
37
38
	kfree(ip_addr);
39
40
	*ipend = end;
41
42
	pr_info("resolve '%.*s' (ret=%d): %s\n", (int)(end - name), name,
43
			ret, ret ? "failed" : ceph_pr_addr(addr));
44
45
	return ret;
46
}

返回ESRCH的源头应该就在这里了,但是信息还是不足,不知道当时dns_query的实际返回值是啥,那我们继续看看dns_query的实现,这个实现的位置在net/dns_resolver/dns_query.c,是一个dns_resolver模块:

1
int dns_query(struct net *net,
2
	      const char *type, const char *name, size_t namelen,
3
	      const char *options, char **_result, time64_t *_expiry,
4
	      bool invalidate)
5
{
6
	struct key *rkey;
7
	struct user_key_payload *upayload;
8
	const struct cred *saved_cred;
9
	size_t typelen, desclen;
10
	char *desc, *cp;
11
	int ret, len;
12
13
	// 进入函数的日志
14
	kenter("%s,%*.*s,%zu,%s",
15
	       type, (int)namelen, (int)namelen, name, namelen, options);
16
17
	if (!name || namelen == 0)
18
		return -EINVAL;
19
20
	/* construct the query key description as "[<type>:]<name>" */
21
	typelen = 0;
22
	desclen = 0;
23
	if (type) {
24
		typelen = strlen(type);
25
		if (typelen < 1)
26
			return -EINVAL;
27
		desclen += typelen + 1;
28
	}
29
30
	if (namelen < 3 || namelen > 255)
31
		return -EINVAL;
32
	desclen += namelen + 1;
33
34
	desc = kmalloc(desclen, GFP_KERNEL);
35
	if (!desc)
36
		return -ENOMEM;
37
38
	cp = desc;
39
	if (type) {
40
		memcpy(cp, type, typelen);
41
		cp += typelen;
42
		*cp++ = ':';
43
	}
44
	memcpy(cp, name, namelen);
45
	cp += namelen;
46
	*cp = '\0';
47
48
	if (!options)
49
		options = "";
50
	// 内核debug日志
51
	kdebug("call request_key(,%s,%s)", desc, options);
52
53
	/* make the upcall, using special credentials to prevent the use of
54
	 * add_key() to preinstall malicious redirections
55
	 */
56
	saved_cred = override_creds(dns_resolver_cache);
57
	rkey = request_key_net(&key_type_dns_resolver, desc, net, options);
58
	revert_creds(saved_cred);
59
	kfree(desc);
60
	if (IS_ERR(rkey)) {
61
		ret = PTR_ERR(rkey);
62
		goto out;
63
	}
64
65
	down_read(&rkey->sem);
66
	set_bit(KEY_FLAG_ROOT_CAN_INVAL, &rkey->flags);
67
	rkey->perm |= KEY_USR_VIEW;
68
69
	ret = key_validate(rkey);
70
	if (ret < 0)
71
		goto put;
72
73
	/* If the DNS server gave an error, return that to the caller */
74
	ret = PTR_ERR(rkey->payload.data[dns_key_error]);
75
	if (ret)
76
		goto put;
77
78
	upayload = user_key_payload_locked(rkey);
79
	len = upayload->datalen;
80
81
	if (_result) {
82
		ret = -ENOMEM;
83
		*_result = kmemdup_nul(upayload->data, len, GFP_KERNEL);
84
		if (!*_result)
85
			goto put;
86
	}
87
88
	if (_expiry)
89
		*_expiry = rkey->expiry;
90
91
	ret = len;
92
put:
93
	up_read(&rkey->sem);
94
	if (invalidate)
95
		key_invalidate(rkey);
96
	key_put(rkey);
97
out:
98
	// 结束函数的日志
99
	kleave(" = %d", ret);
100
	return ret;
101
}
102
EXPORT_SYMBOL(dns_query);

先不关心整个函数的实现逻辑,先看看函数里打印日志的地方,先想办法把调试日志打开,拿到更详细的信息,其中kenterkleave是两个宏:

1
/*
2
 * debug tracing
3
 */
4
extern unsigned int dns_resolver_debug;
5
6
#define	kdebug(FMT, ...)				\
7
do {							\
8
	if (unlikely(dns_resolver_debug))		\
9
		printk(KERN_DEBUG "[%-6.6s] "FMT"\n",	\
10
		       current->comm, ##__VA_ARGS__);	\
11
} while (0)
12
13
#define kenter(FMT, ...) kdebug("==> %s("FMT")", __func__, ##__VA_ARGS__)
14
#define kleave(FMT, ...) kdebug("<== %s()"FMT"", __func__, ##__VA_ARGS__)

也就是说如果dns_resolver_debug的值不为0,就会使用printk输出调试的日志,那dns_resolver_debug很有可能是最为模块加载的参数传递的,在net/dns_resolver/dns_key.c中有模块定义的参数信息:

1
...
2
MODULE_DESCRIPTION("DNS Resolver");
3
MODULE_AUTHOR("Wang Lei");
4
MODULE_LICENSE("GPL");
5
6
unsigned int dns_resolver_debug;
7
module_param_named(debug, dns_resolver_debug, uint, 0644);
8
MODULE_PARM_DESC(debug, "DNS Resolver debugging mask");
9
...

看到dns_resolver_debug是由debug这个参数控制的,那么就简单了,我们手动重新加载这个模块,并加上debug参数就行:

1
]# rmmod ceph           # 先卸载依赖ceph模块
2
]# rmmod libceph        # 卸载libceph模块
3
]# rmmod dns_resolver   # 卸载dns_resolver
4
]# modprobe dns_resolver debug=1    # 加载dns_resolver模块,参数debug=1
5
]# modprobe ceph    # 加载ceph模块

再尝试mount一下,并且看一下dmesg信息:

1
]# mount -t ceph mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ /tmp/data
2
mount: mount mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789:/ on /tmp/data failed: No such process
3
]# dmesg
4
...
5
[ 3056.185724] [mount ] ==> dns_query((null),mon1.ichenfu.com,16,(null))
6
[ 3056.185733] [mount ] call request_key(,mon1.ichenfu.com,)
7
[ 3056.185910] [mount ] <== dns_query() = -2
8
[ 3056.185916] libceph: resolve 'mon1.ichenfu.com' (ret=-3): failed
9
[ 3056.185921] libceph: parse_ips bad ip 'mon1.ichenfu.com:6789,mon2.ichenfu.com:6789,mon3.ichenfu.com:6789'

看到dns_query的返回是-2,用perror看一下这个errno的意义:OS error code 2: No such file or directory,没有找到文件,没找到什么文件呢?再看看这个模块的文档吧,文档在https://www.kernel.org/doc/Documentation/networking/dns_resolver.txt

========
OVERVIEW
========

The DNS resolver module provides a way for kernel services to make DNS queries
by way of requesting a key of key type dns_resolver. These queries are
upcalled to userspace through /sbin/request-key.

These routines must be supported by userspace tools dns.upcall, cifs.upcall and
request-key. It is under development and does not yet provide the full feature
set. The features it does support include:

(*) Implements the dns_resolver key_type to contact userspace.

It does not yet support the following AFS features:

(*) Dns query support for AFSDB resource record.

This code is extracted from the CIFS filesystem.

这个模块给内核提供一个查询DNS记录的方法,查询通过用户空间的/sbin/request-key进行,也就是说,这个模块依赖/sbin/request-key这个程序。
于是在机器上看了一下,果然,这个程序不存在。。又查询了一下,发现这个程序由keyutils这个包提供,yum install -y keyutils安装了这个包之后,问题解决了。
文档里提到有个配置文件/etc/request-key.conf,再看看这个配置文件里的配置:

1
...
2
#OP     TYPE    DESCRIPTION     CALLOUT INFO    PROGRAM ARG1 ARG2 ARG3 ...
3
#====== ======= =============== =============== ===============================
4
create  dns_resolver *          *               /sbin/key.dns_resolver %k
5
...

发现还依赖/sbin/key.dns_resolver,不过这个也是包含在keyutils包里的。
好吧,问题也总算是解决了。