++rook notes
This commit is contained in:
parent
02608ea8fa
commit
3ce72e805f
1 changed files with 514 additions and 1 deletions
|
@ -460,6 +460,7 @@ In this particular cluster we have 2 pools:
|
|||
The device class "hdd-big" is specific to this cluster as it used to
|
||||
contain 2.5" and 3.5" HDDs in different pools.
|
||||
|
||||
|
||||
### [old] Analysing the ceph cluster configuration
|
||||
|
||||
Taking the view from the old cluster, the following items are
|
||||
|
@ -481,6 +482,77 @@ allows adding and removing resources.
|
|||
### Analysing the rook configurations
|
||||
|
||||
Taking the opposite view, we can also checkout a running rook cluster
|
||||
and the rook disaster recovery documentation to identify what to
|
||||
modify.
|
||||
|
||||
Let's have a look at the secrets first:
|
||||
|
||||
```
|
||||
cluster-peer-token-rook-ceph kubernetes.io/rook 2 320d
|
||||
default-token-xm9xs kubernetes.io/service-account-token 3 320d
|
||||
rook-ceph-admin-keyring kubernetes.io/rook 1 320d
|
||||
rook-ceph-admission-controller kubernetes.io/tls 3 29d
|
||||
rook-ceph-cmd-reporter-token-5mh88 kubernetes.io/service-account-token 3 320d
|
||||
rook-ceph-config kubernetes.io/rook 2 320d
|
||||
rook-ceph-crash-collector-keyring kubernetes.io/rook 1 320d
|
||||
rook-ceph-mgr-a-keyring kubernetes.io/rook 1 320d
|
||||
rook-ceph-mgr-b-keyring kubernetes.io/rook 1 320d
|
||||
rook-ceph-mgr-token-ktt2m kubernetes.io/service-account-token 3 320d
|
||||
rook-ceph-mon kubernetes.io/rook 4 320d
|
||||
rook-ceph-mons-keyring kubernetes.io/rook 1 320d
|
||||
rook-ceph-osd-token-8m6lb kubernetes.io/service-account-token 3 320d
|
||||
rook-ceph-purge-osd-token-hznnk kubernetes.io/service-account-token 3 320d
|
||||
rook-ceph-rgw-token-wlzbc kubernetes.io/service-account-token 3 134d
|
||||
rook-ceph-system-token-lxclf kubernetes.io/service-account-token 3 320d
|
||||
rook-csi-cephfs-node kubernetes.io/rook 2 320d
|
||||
rook-csi-cephfs-plugin-sa-token-hkq2g kubernetes.io/service-account-token 3 320d
|
||||
rook-csi-cephfs-provisioner kubernetes.io/rook 2 320d
|
||||
rook-csi-cephfs-provisioner-sa-token-tb78d kubernetes.io/service-account-token 3 320d
|
||||
rook-csi-rbd-node kubernetes.io/rook 2 320d
|
||||
rook-csi-rbd-plugin-sa-token-dhhq6 kubernetes.io/service-account-token 3 320d
|
||||
rook-csi-rbd-provisioner kubernetes.io/rook 2 320d
|
||||
rook-csi-rbd-provisioner-sa-token-lhr4l kubernetes.io/service-account-token 3 320d
|
||||
```
|
||||
|
||||
TBC
|
||||
|
||||
### Creating additional resources after the cluster is bootstrapped
|
||||
|
||||
To let rook know what should be there, we already create the two
|
||||
`CephBlockPool` instances that match the existing pools:
|
||||
|
||||
```apiVersion: ceph.rook.io/v1
|
||||
kind: CephBlockPool
|
||||
metadata:
|
||||
name: one
|
||||
namespace: rook-ceph
|
||||
spec:
|
||||
failureDomain: host
|
||||
replicated:
|
||||
size: 3
|
||||
deviceClass: ssd
|
||||
```
|
||||
|
||||
And for the hdd based pool:
|
||||
|
||||
```
|
||||
apiVersion: ceph.rook.io/v1
|
||||
kind: CephBlockPool
|
||||
metadata:
|
||||
name: hdd
|
||||
namespace: rook-ceph
|
||||
spec:
|
||||
failureDomain: host
|
||||
replicated:
|
||||
size: 3
|
||||
deviceClass: hdd-big
|
||||
```
|
||||
|
||||
Saving both of these in ceph-blockpools.yaml and applying it:
|
||||
|
||||
```
|
||||
kubectl -n rook-ceph apply -f ceph-blockpools.yaml
|
||||
```
|
||||
|
||||
### Configuring ceph after the operator deployment
|
||||
|
||||
|
@ -526,6 +598,29 @@ changes. Important to note is that we use the ceph image version
|
|||
v14.2.21, which is the same version as the native cluster.
|
||||
|
||||
|
||||
|
||||
### rook v1.8 is incompatible with ceph nautilus
|
||||
|
||||
After deploying the rook operator, the following error message is
|
||||
printed in its logs:
|
||||
|
||||
```
|
||||
2022-09-03 15:14:03.543925 E | ceph-cluster-controller: failed to reconcile CephCluster "rook-ceph/rook-ceph". failed to reconcile cluster "rook-ceph": failed to configure local ceph cluster: failed the ceph version check: the version does not meet the minimum version "15.2.0-0 octopus"
|
||||
```
|
||||
|
||||
So we need to downgrade to rook v1.7. Using `helm search repo
|
||||
rook/rook-ceph --versions` we identify the latest usable version
|
||||
should be `v1.7.11`.
|
||||
|
||||
We start the downgrade process using
|
||||
|
||||
```
|
||||
helm upgrade --install --namespace rook-ceph --create-namespace --version v1.7.11 rook-ceph rook/rook-ceph
|
||||
```
|
||||
|
||||
After downgrading the operator is starting the canary monitors and
|
||||
continues to bootstrap the cluster.
|
||||
|
||||
### The ceph-toolbox
|
||||
|
||||
To be able to view the current cluster status, we also deploy the
|
||||
|
@ -552,7 +647,7 @@ spec:
|
|||
dnsPolicy: ClusterFirstWithHostNet
|
||||
containers:
|
||||
- name: rook-ceph-tools
|
||||
image: rook/ceph:v1.8.10
|
||||
image: rook/ceph:v1.7.11
|
||||
command: ["/bin/bash"]
|
||||
args: ["-m", "-c", "/usr/local/bin/toolbox.sh"]
|
||||
imagePullPolicy: IfNotPresent
|
||||
|
@ -593,6 +688,424 @@ spec:
|
|||
tolerationSeconds: 5
|
||||
```
|
||||
|
||||
### Checking the deployments
|
||||
|
||||
After the rook-operator finished deploying, the following deployments
|
||||
are visible in kubernetes:
|
||||
|
||||
```
|
||||
[17:25] blind:~% kubectl -n rook-ceph get deployment
|
||||
NAME READY UP-TO-DATE AVAILABLE AGE
|
||||
csi-cephfsplugin-provisioner 2/2 2 2 21m
|
||||
csi-rbdplugin-provisioner 2/2 2 2 21m
|
||||
rook-ceph-crashcollector-server48 1/1 1 1 2m3s
|
||||
rook-ceph-crashcollector-server52 1/1 1 1 2m24s
|
||||
rook-ceph-crashcollector-server53 1/1 1 1 2m2s
|
||||
rook-ceph-crashcollector-server56 1/1 1 1 2m17s
|
||||
rook-ceph-crashcollector-server57 1/1 1 1 2m1s
|
||||
rook-ceph-mgr-a 1/1 1 1 2m3s
|
||||
rook-ceph-mon-a 1/1 1 1 10m
|
||||
rook-ceph-mon-b 1/1 1 1 8m3s
|
||||
rook-ceph-mon-c 1/1 1 1 5m55s
|
||||
rook-ceph-mon-d 1/1 1 1 5m33s
|
||||
rook-ceph-mon-e 1/1 1 1 4m32s
|
||||
rook-ceph-operator 1/1 1 1 102m
|
||||
rook-ceph-tools 1/1 1 1 17m
|
||||
```
|
||||
|
||||
Relevant for us are the mgr, mon and operator. To stop the cluster, we
|
||||
will shutdown the deployments in the following order:
|
||||
|
||||
* rook-ceph-operator: to prevent deployments to recover
|
||||
|
||||
### Data / configuration comparison
|
||||
|
||||
Logging into a host that is running mon-a, we find the following data
|
||||
in it:
|
||||
|
||||
```
|
||||
[17:36] server56.place5:/var/lib/rook# find
|
||||
.
|
||||
./mon-a
|
||||
./mon-a/data
|
||||
./mon-a/data/keyring
|
||||
./mon-a/data/min_mon_release
|
||||
./mon-a/data/store.db
|
||||
./mon-a/data/store.db/LOCK
|
||||
./mon-a/data/store.db/000006.log
|
||||
./mon-a/data/store.db/000004.sst
|
||||
./mon-a/data/store.db/CURRENT
|
||||
./mon-a/data/store.db/MANIFEST-000005
|
||||
./mon-a/data/store.db/OPTIONS-000008
|
||||
./mon-a/data/store.db/OPTIONS-000005
|
||||
./mon-a/data/store.db/IDENTITY
|
||||
./mon-a/data/kv_backend
|
||||
./rook-ceph
|
||||
./rook-ceph/crash
|
||||
./rook-ceph/crash/posted
|
||||
./rook-ceph/log
|
||||
```
|
||||
|
||||
Which is pretty similar to the native nodes:
|
||||
|
||||
```
|
||||
[17:37:50] red3.place5:/var/lib/ceph/mon/ceph-red3# find
|
||||
.
|
||||
./sysvinit
|
||||
./keyring
|
||||
./min_mon_release
|
||||
./kv_backend
|
||||
./store.db
|
||||
./store.db/1959645.sst
|
||||
./store.db/1959800.sst
|
||||
./store.db/OPTIONS-3617174
|
||||
./store.db/2056973.sst
|
||||
./store.db/3617348.sst
|
||||
./store.db/OPTIONS-3599785
|
||||
./store.db/MANIFEST-3617171
|
||||
./store.db/1959695.sst
|
||||
./store.db/CURRENT
|
||||
./store.db/LOCK
|
||||
./store.db/2524598.sst
|
||||
./store.db/IDENTITY
|
||||
./store.db/1959580.sst
|
||||
./store.db/2514570.sst
|
||||
./store.db/1959831.sst
|
||||
./store.db/3617346.log
|
||||
./store.db/2511347.sst
|
||||
```
|
||||
|
||||
### Checking how monitors are created on native ceph
|
||||
|
||||
To prepare for the migration we take 1 step back and verify how
|
||||
monitors are created in the native cluster. The script used for
|
||||
monitoring creation can be found on
|
||||
[code.ungleich.ch](https://code.ungleich.ch/ungleich-public/ungleich-tools/src/branch/master/ceph/ceph-mon-create-start)
|
||||
and contains the following logic:
|
||||
|
||||
* get "mon." key
|
||||
* get the monmap
|
||||
* Run ceph-mon --mkfs using the monmap and keyring
|
||||
* Start it
|
||||
|
||||
In theory we could re-use these steps on a rook deployed monitor to
|
||||
join our existing cluster.
|
||||
|
||||
### Checking the toolbox and monitor pods for migration
|
||||
|
||||
When the ceph-toolbox is deployed, we get a ceph.conf and a keyring in
|
||||
/ect/ceph. The keyring is actually the admin keyring and allows us to
|
||||
make modifications to the ceph cluster. The ceph.conf points to the
|
||||
monitors and does not contain an fsid.
|
||||
|
||||
The ceph-toolbox gets this informatoin via 1 configmap
|
||||
("rook-ceph-mon-endpoints") and a secret ("rook-ceph-mon").
|
||||
|
||||
The monitor pods on the other hand have an empty ceph.conf and no
|
||||
admin keyring deployed.
|
||||
|
||||
### Try 1: recreating a monitor inside the existing cluster
|
||||
|
||||
Let's try to reuse an existing monitor and join it into the existing
|
||||
cluster. For this we will first shut down the rook-operator, to
|
||||
prevent it to intefere with our migration. Then
|
||||
modify the relevant configmaps and secrets and import the settings
|
||||
from the native cluster.
|
||||
|
||||
Lastly we will patch one of the monitor pods, inject the monmap from
|
||||
the native cluster and then restart it.
|
||||
|
||||
Let's give it a try. First we shutdown the rook-ceph-operator:
|
||||
|
||||
```
|
||||
% kubectl -n rook-ceph scale --replicas=0 deploy/rook-ceph-operator
|
||||
deployment.apps/rook-ceph-operator scaled
|
||||
```
|
||||
|
||||
Then we patch the mon deployments to not run a monitor, but only
|
||||
sleep:
|
||||
|
||||
```
|
||||
for mon in a b c d e; do
|
||||
kubectl -n rook-ceph patch deployment rook-ceph-mon-${mon} -p \
|
||||
'{"spec": {"template": {"spec": {"containers": [{"name": "mon", "command": ["sleep", "infinity"], "args": []}]}}}}';
|
||||
|
||||
kubectl -n rook-ceph patch deployment rook-ceph-mon-$mon --type='json' -p '[{"op":"remove", "path":"/spec/template/spec/containers/0/livenessProbe"}]'
|
||||
done
|
||||
```
|
||||
|
||||
No the pod is restarted and when we execute into it, we will see that
|
||||
no monitor is running in it:
|
||||
|
||||
```
|
||||
% kubectl -n rook-ceph exec -ti rook-ceph-mon-a-c9f8f554b-2fkhm -- sh
|
||||
Defaulted container "mon" out of: mon, chown-container-data-dir (init), init-mon-fs (init)
|
||||
sh-4.2# ps aux
|
||||
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
|
||||
root 1 0.0 0.0 4384 664 ? Ss 19:44 0:00 sleep infinity
|
||||
root 7 0.0 0.0 11844 2844 pts/0 Ss 19:44 0:00 sh
|
||||
root 13 0.0 0.0 51752 3384 pts/0 R+ 19:44 0:00 ps aux
|
||||
sh-4.2#
|
||||
```
|
||||
|
||||
Now for this pod to work with our existing cluster, we want to import
|
||||
the monmap and join the monitor to the native cluster. As with any
|
||||
mon, the data is stored below `/var/lib/ceph/mon/ceph-a/`.
|
||||
|
||||
Before importing the monmap, let's have a look at the different rook
|
||||
configurations that influence the ceph components
|
||||
|
||||
### Looking at the ConfigMap in detail: rook-ceph-mon-endpoints
|
||||
|
||||
As the name says, it contains the list of monitor endpoints:
|
||||
|
||||
```
|
||||
kubectl -n rook-ceph edit configmap rook-ceph-mon-endpoints
|
||||
...
|
||||
|
||||
csi-cluster-config-json: '[{"clusterID":"rook-ceph","monitors":["[2a0a:e5c0:0:15::fc2]:6789"...
|
||||
data: b=[2a0a:e5c0:0:15::9cd9]:6789,....
|
||||
mapping: '{"node":{"a":{"Name":"server56","Hostname":"server56","Address":"2a0a:e5c0::...
|
||||
```
|
||||
|
||||
As eventually we want the cluster and csi to use the in-cluster
|
||||
monitors, we don't need to modify it right away.
|
||||
|
||||
### Looking at Secrets in detail: rook-ceph-admin-keyring
|
||||
|
||||
The first interesting secret is **rook-ceph-admin-keyring**, which
|
||||
contains the admin keyring. The old one of course, so we can edit this
|
||||
secret and replace it with the client.admin secret from our native
|
||||
cluster.
|
||||
|
||||
We encode the original admin keyring using:
|
||||
|
||||
```
|
||||
cat ceph.client.admin.keyring | base64 -w 0; echo ""
|
||||
```
|
||||
|
||||
And then we update the secret it:
|
||||
|
||||
```
|
||||
kubectl -n rook-ceph edit secret rook-ceph-admin-keyring
|
||||
```
|
||||
|
||||
[done]
|
||||
|
||||
### Looking at Secrets in detail: rook-ceph-config
|
||||
|
||||
This secret contains two keys, **mon_host** and
|
||||
**mon_initial_members**. The **mon_host** is a list of monitor
|
||||
addresses. The **mon_host** only contains the monitor names, a, b, c, d and e.
|
||||
|
||||
The environment variable **ROOK_CEPH_MON_HOST** in the monitor
|
||||
deployment is set to to **mon_host** key of that secret, so monitors
|
||||
will read from it.
|
||||
|
||||
|
||||
|
||||
### Looking at Secrets in detail: rook-ceph-mon
|
||||
|
||||
This secret contains the following interesting keys:
|
||||
|
||||
* ceph-secret: the admin key (just the base64 key no section around
|
||||
it) [done]
|
||||
* ceph-username: "client.admin"
|
||||
* fsid: the ceph cluster fsid
|
||||
* mon-secret: The key of the [mon.] section
|
||||
|
||||
It's important to mention to use `echo -n` when inserting
|
||||
the keys or fsids.
|
||||
|
||||
[done]
|
||||
|
||||
### Looking at Secrets in detail: rook-ceph-mons-keyring
|
||||
|
||||
Contains the key "keyring" containing the [mon.] and [client.admin]
|
||||
sections:
|
||||
|
||||
|
||||
```
|
||||
[mon.]
|
||||
key = ...
|
||||
|
||||
[client.admin]
|
||||
key = ...
|
||||
caps mds = "allow"
|
||||
caps mgr = "allow *"
|
||||
caps mon = "allow *"
|
||||
caps osd = "allow *"
|
||||
```
|
||||
|
||||
Using `base64 -w0 < ~/mon-and-client`.
|
||||
|
||||
[done]
|
||||
|
||||
### Importing the monmap
|
||||
|
||||
Getting the current monmap from the native cluster:
|
||||
|
||||
```
|
||||
ceph mon getmap -o monmap-20220903
|
||||
|
||||
scp root@old-monitor:monmap-20220903
|
||||
```
|
||||
|
||||
Adding it into the mon pod:
|
||||
|
||||
```
|
||||
kubectl cp monmap-20220903 rook-ceph/rook-ceph-mon-a-6c46d4694-kxm5h:/tmp
|
||||
```
|
||||
|
||||
Moving the old mon db away:
|
||||
|
||||
```
|
||||
cd /var/lib/ceph/mon/ceph-a
|
||||
mkdir _old
|
||||
mv [a-z]* _old/
|
||||
```
|
||||
|
||||
Recreating the mon fails, as the volume is mounted directly onto it:
|
||||
|
||||
```
|
||||
% ceph-mon -i a --mkfs --monmap /tmp/monmap-20220903 --keyring /tmp/mon-key
|
||||
2022-09-03 21:44:48.268 7f1a738f51c0 -1 '/var/lib/ceph/mon/ceph-a' already exists and is not empty: monitor may already exist
|
||||
|
||||
% mount | grep ceph-a
|
||||
/dev/sda1 on /var/lib/ceph/mon/ceph-a type ext4 (rw,relatime)
|
||||
|
||||
```
|
||||
|
||||
We can workaround this by creating all monitors on pods with other
|
||||
names. So we can create mon b to e on the mon-a pod and mon-a on any
|
||||
other pod.
|
||||
|
||||
On rook-ceph-mon-a:
|
||||
|
||||
```
|
||||
for mon in b c d e;
|
||||
do ceph-mon -i $mon --mkfs --monmap /tmp/monmap-20220903 --keyring /tmp/mon-key;
|
||||
done
|
||||
```
|
||||
|
||||
On rook-ceph-mon-b:
|
||||
|
||||
```
|
||||
mon=a
|
||||
ceph-mon -i $mon --mkfs --monmap /tmp/monmap-20220903 --keyring /tmp/mon-key
|
||||
```
|
||||
|
||||
Then we export the newly created mon dbs:
|
||||
|
||||
```
|
||||
for mon in b c d e;
|
||||
do kubectl cp rook-ceph/rook-ceph-mon-a-6c46d4694-kxm5h:/var/lib/ceph/mon/ceph-$mon ceph-$mon;
|
||||
done
|
||||
```
|
||||
|
||||
```
|
||||
for mon in a;
|
||||
do kubectl cp rook-ceph/rook-ceph-mon-b-57d888dd9f-w8jkh:/var/lib/ceph/mon/ceph-$mon ceph-$mon;
|
||||
done
|
||||
```
|
||||
|
||||
And finally we test it by importing the mondb to mon-a:
|
||||
|
||||
```
|
||||
kubectl cp ceph-a
|
||||
rook-ceph/rook-ceph-mon-a-6c46d4694-kxm5h:/var/lib/ceph/mon/
|
||||
```
|
||||
|
||||
And the other mons:
|
||||
|
||||
```
|
||||
kubectl cp ceph-b rook-ceph/rook-ceph-mon-b-57d888dd9f-w8jkh:/var/lib/ceph/mon/
|
||||
|
||||
```
|
||||
|
||||
### Re-enabling the rook-operator
|
||||
|
||||
As the deployment
|
||||
|
||||
```
|
||||
kubectl -n rook-ceph scale --replicas=1 deploy/rook-ceph-operator
|
||||
```
|
||||
|
||||
Operator sees them running (with a shell)
|
||||
|
||||
```
|
||||
2022-09-03 22:29:26.725915 I | op-mon: mons running: [d e a b c]
|
||||
|
||||
```
|
||||
|
||||
Triggering recreation:
|
||||
|
||||
```
|
||||
% kubectl -n rook-ceph delete deployment rook-ceph-mon-a
|
||||
deployment.apps "rook-ceph-mon-a" deleted
|
||||
```
|
||||
|
||||
Connected successfully to the cluster:
|
||||
|
||||
```
|
||||
|
||||
services:
|
||||
mon: 6 daemons, quorum red1,red2,red3,server4,server3,a (age 8s)
|
||||
mgr: red3(active, since 8h), standbys: red2, red1, server4
|
||||
osd: 46 osds: 46 up, 46 in
|
||||
|
||||
```
|
||||
|
||||
A bit later:
|
||||
|
||||
```
|
||||
mon: 8 daemons, quorum (age 2w), out of quorum: red1, red2, red3, server4, server3, a, c,
|
||||
d
|
||||
mgr: red3(active, since 8h), standbys: red2, red1, server4
|
||||
osd: 46 osds: 46 up, 46 in
|
||||
|
||||
```
|
||||
|
||||
And a little bit later also the mgr joined the cluster:
|
||||
|
||||
```
|
||||
services:
|
||||
mon: 8 daemons, quorum red2,red3,server4,server3,a,c,d,e (age 46s)
|
||||
mgr: red3(active, since 9h), standbys: red1, server4, a, red2
|
||||
osd: 46 osds: 46 up, 46 in
|
||||
|
||||
```
|
||||
|
||||
And a few minutes later all mons joined successfully:
|
||||
|
||||
```
|
||||
mon: 8 daemons, quorum red3,server4,server3,a,c,d,e,b (age 31s)
|
||||
mgr: red3(active, since 105s), standbys: red1, server4, a, red2
|
||||
osd: 46 osds: 46 up, 46 in
|
||||
|
||||
```
|
||||
|
||||
We also need to ensure the toolbox is being updated/recreated:
|
||||
|
||||
```
|
||||
kubectl -n rook-ceph delete pods rook-ceph-tools-5cf88dd58f-fwwlc
|
||||
```
|
||||
|
||||
### Retiring the old monitors
|
||||
|
||||
|
||||
|
||||
### The actual migration
|
||||
|
||||
At this point we have 2 ceph clusters:
|
||||
|
||||
* A new one in rook
|
||||
* The old/native one
|
||||
|
||||
The next steps are:
|
||||
|
||||
Replace fsid in secrets/rook-ceph-mon with that of the old one.
|
||||
|
||||
|
||||
## Changelog
|
||||
|
||||
|
|
Loading…
Add table
Reference in a new issue