Additional details for rook migration

This commit is contained in:
Nico Schottelius 2022-09-03 16:39:55 +02:00
parent 0c749903b0
commit bcc155c1fd

View file

@ -334,6 +334,131 @@ So today we start with
Using BGP and calico, the kubernetes cluster is setup "as usual" (for
ungleich terms).
### Ceph.conf change
Originally our ceph.conf contained:
```
public network = 2a0a:e5c0:0:0::/64
cluster network = 2a0a:e5c0:0:0::/64
```
As of today they are removed and all daemons are restarted, allowing
the native cluster to speak with the kubernetes cluster.
### Setting up rook
Usually we deploy rook via argocd. However as we want to be easily
able to do manual intervention, we will first bootstrap rook via helm
directly and turn off various services
```
helm repo add rook https://charts.rook.io/release
helm repo update
```
We will use rook 1.8, as it is the last version to support Ceph
nautilus, which is our current ceph version. The latest 1.8 version is
1.8.10 at the moment.
```
helm upgrade --install --namespace rook-ceph --create-namespace --version v1.8.10 rook-ceph rook/rook-ceph
```
### Joining the 2 clusters, step 1: monitors and managers
In the first step we want to add rook based monitors and managers
and replace the native ones. For rook to be able to talk to our
existing cluster, it needs to know
* the current monitors/managers ("the monmap")
* the right keys to talk to the existing cluster
* the fsid
As we are using v1.8, we will follow
[the guidelines for disaster recover of rook
1.8](https://www.rook.io/docs/rook/v1.8/ceph-disaster-recovery.html).
Later we will need to create all the configurations so that rook knows
about the different pools.
### Rook: CephCluster
Rook has a configuration of type `CephCluster` that typically looks
something like this:
```
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph
spec:
cephVersion:
# see the "Cluster Settings" section below for more details on which image of ceph to run
image: quay.io/ceph/ceph:{{ .Chart.AppVersion }}
dataDirHostPath: /var/lib/rook
mon:
count: 5
allowMultiplePerNode: false
storage:
useAllNodes: true
useAllDevices: true
onlyApplyOSDPlacement: false
mgr:
count: 1
modules:
- name: pg_autoscaler
enabled: true
network:
ipFamily: "IPv6"
dualStack: false
crashCollector:
disable: false
# Uncomment daysToRetain to prune ceph crash entries older than the
# specified number of days.
daysToRetain: 30
```
For migrating, we don't want rook in the first stage to create any
OSDs. So we will replace `useAllNodes: true` with `useAllNodes: false`
and `useAllDevices: true` also with `useAllDevices: false`.
### Extracting a monmap
To get access to the existing monmap, we can export it from the native
cluster using `ceph-mon -i {mon-id} --extract-monmap {map-path}`.
More details can be found on the [documentation for adding and
removing ceph
monitors](https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/).
### Rook and Ceph pools
Rook uses `CephBlockPool` to describe ceph pools as follows:
```
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: hdd
namespace: rook-ceph
spec:
failureDomain: host
replicated:
size: 3
deviceClass: hdd
```
In this particular cluster we have 2 pools:
- one (ssd based, device class = ssd)
- hdd (hdd based, device class = hdd-big)
The device class "hdd-big" is specific to this cluster as it used to
contain 2.5" and 3.5" HDDs in different pools.
## Changelog