纯手动部署Ceph之OSD部署

Monitor部署结束后,需要部署Ceph的OSD,OSD是Ceph实际存储的核心,有了OSD,数据才能正常进行存储,这里还是在一台机器上部署3个OSD,实际生产环境中,会有更多的OSD以及更多的机器。
这里默认机器上已经装好了所有Ceph对应的包。出于简单考虑,所有的OSD只分配一个对应的数据目录。实际生产环境中,一般一个OSD会对应一个设备。

首先还是修改配置文件,把OSD相关配置加入到配置文件 /etc/ceph/ceph.conf 中:

[osd]
run_dir = /data0/$name
osd data = /data0/$name
osd journal = /data0/$name/journal
osd max object name len = 256
osd max object namespace len = 64

这里给每个OSD分配的存储路径在 /data0/$name 下。

每个OSD也需要一个uuid,所以生成三个uuid:

[root@test ~]# uuidgen 
63970d8a-597a-4123-9767-097f88bbcd00
[root@test ~]# uuidgen 
ba380679-0a81-4bbc-a5f0-27fd93137c78
[root@test ~]# uuidgen 
7545c284-1cda-4cad-b23a-2e7c81cb8a47

调用 ceph osd create 分配OSD编号:

[root@test ~]# ceph osd create 63970d8a-597a-4123-9767-097f88bbcd00
0
[root@test ~]# ceph osd create ba380679-0a81-4bbc-a5f0-27fd93137c78
1
[root@test ~]# ceph osd create 7545c284-1cda-4cad-b23a-2e7c81cb8a47
2

调用 ceph-osd -i {num} --mkfs --mkkey --osd-uuid {uid} 初始化OSD目录:

[root@test ~]# ceph-osd -i 0 --mkfs --mkkey --osd-uuid 63970d8a-597a-4123-9767-097f88bbcd00
2017-02-09 09:34:29.961006 7ff9e6ef3800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-02-09 09:34:30.268810 7ff9e6ef3800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-02-09 09:34:30.277939 7ff9e6ef3800 -1 filestore(/data0/osd.0) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
2017-02-09 09:34:30.607425 7ff9e6ef3800 -1 created object store /data0/osd.0 for osd.0 fsid def5bc47-3d8a-4ca0-9cd6-77243339ab0f
2017-02-09 09:34:30.607495 7ff9e6ef3800 -1 auth: error reading file: /data0/osd.0/keyring: can't open /data0/osd.0/keyring: (2) No such file or directory
2017-02-09 09:34:30.607747 7ff9e6ef3800 -1 created new key in keyring /data0/osd.0/keyring

[root@test ~]# ceph-osd -i 1 --mkfs --mkkey --osd-uuid ba380679-0a81-4bbc-a5f0-27fd93137c78
2017-02-09 09:34:40.141883 7fcac9aef800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-02-09 09:34:40.470945 7fcac9aef800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-02-09 09:34:40.481165 7fcac9aef800 -1 filestore(/data0/osd.1) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
2017-02-09 09:34:40.810678 7fcac9aef800 -1 created object store /data0/osd.1 for osd.1 fsid def5bc47-3d8a-4ca0-9cd6-77243339ab0f
2017-02-09 09:34:40.810743 7fcac9aef800 -1 auth: error reading file: /data0/osd.1/keyring: can't open /data0/osd.1/keyring: (2) No such file or directory
2017-02-09 09:34:40.810982 7fcac9aef800 -1 created new key in keyring /data0/osd.1/keyring

[root@test ~]# ceph-osd -i 2 --mkfs --mkkey --osd-uuid 7545c284-1cda-4cad-b23a-2e7c81cb8a47
2017-02-09 09:34:51.793012 7fa0fc68b800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-02-09 09:34:52.116001 7fa0fc68b800 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
2017-02-09 09:34:52.124229 7fa0fc68b800 -1 filestore(/data0/osd.2) could not find #-1:7b3f43c4:::osd_superblock:0# in index: (2) No such file or directory
2017-02-09 09:34:52.447702 7fa0fc68b800 -1 created object store /data0/osd.2 for osd.2 fsid def5bc47-3d8a-4ca0-9cd6-77243339ab0f
2017-02-09 09:34:52.447761 7fa0fc68b800 -1 auth: error reading file: /data0/osd.2/keyring: can't open /data0/osd.2/keyring: (2) No such file or directory
2017-02-09 09:34:52.447980 7fa0fc68b800 -1 created new key in keyring /data0/osd.2/keyring

添加OSD授权:

[root@test ~]# ceph auth add osd.0 osd 'allow *' mon 'allow profile osd' -i /data0/osd.0/keyring
added key for osd.0
[root@test ~]# ceph auth add osd.1 osd 'allow *' mon 'allow profile osd' -i /data0/osd.1/keyring
added key for osd.1
[root@test ~]# ceph auth add osd.2 osd 'allow *' mon 'allow profile osd' -i /data0/osd.2/keyring
added key for osd.2

到这里,OSD的准备工作就已经做好了,可以启动OSD了,在启动之前,也可以先初始化一下crush map。

[root@test ~]# ceph osd crush add-bucket node1 host     # 添加一个host节点node1
added bucket node1 type host to crush map
[root@test ~]# ceph osd crush add-bucket node2 host     # 添加一个host节点node2
added bucket node2 type host to crush map
[root@test ~]# ceph osd crush move node1 root=default   # 将node1移到default下
moved item id -2 name 'node1' to location {root=default} in crush map
[root@test ~]# ceph osd crush move node2 root=default   # 将node2移到default下
moved item id -3 name 'node2' to location {root=default} in crush map
[root@test ~]# ceph osd crush add osd.0 1.0 host=node1  # 将osd.0以1.0的权重加到node1中
add item id 0 name 'osd.0' weight 1 at location {host=node1} to crush map
[root@test ~]# ceph osd crush add osd.1 1.0 host=node1  # 将osd.1以1.0的权重加到node1中
add item id 1 name 'osd.1' weight 1 at location {host=node1} to crush map
[root@test ~]# ceph osd crush add osd.2 1.0 host=node2  # 将osd.2以1.0的权重加到node2中
add item id 2 name 'osd.2' weight 1 at location {host=node2} to crush map
[root@test ~]# ceph osd tree    # 查看当前OSD Tree
ID WEIGHT  TYPE NAME      UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.00000 root default
-2 2.00000     host node1
 0 1.00000         osd.0     down        0          1.00000
 1 1.00000         osd.1     down        0          1.00000
-3 1.00000     host node2
 2 1.00000         osd.2     down        0          1.00000

这里添加了2个虚拟节点node1和node2,并把osd.0和osd.1添加到node1,osd.2添加到node2。这么做目的是因为Ceph默认分配的策略是至少有一份数据在其他机器上,所以需要一个node2,实际生产中多台机器的情况就不会有这个问题。

最后启动所有的OSD。

[root@test ~]# ceph-osd -i 0
starting osd.0 at :/0 osd_data /data0/osd.0 /data0/osd.0/journal
[root@test ~]# ceph-osd -i 1
starting osd.1 at :/0 osd_data /data0/osd.1 /data0/osd.1/journal
[root@test ~]# ceph-osd -i 2
starting osd.2 at :/0 osd_data /data0/osd.2 /data0/osd.2/journal

[root@test ~]# ceph -s	# 查看集群状态
cluster def5bc47-3d8a-4ca0-9cd6-77243339ab0f
 health HEALTH_OK
 monmap e1: 3 mons at {mon0=10.67.15.100:6789/0,mon1=10.67.15.100:6790/0,mon2=10.67.15.100:6791/0}
        election epoch 4, quorum 0,1,2 mon0,mon1,mon2
 osdmap e18: 3 osds: 3 up, 3 in
        flags sortbitwise,require_jewel_osds
  pgmap v29: 64 pgs, 1 pools, 0 bytes data, 0 objects
        11089 MB used, 647 GB / 693 GB avail
              64 active+clean

ceph -s 输出显示 health HEALTH_OK 说明集群已经正常。
至此一个简单的拥有3 monitor和3 osd的Ceph集群就搭建好了。