title: [WIP] Migrating Ceph Nautilus into Kubernetes + Rook --- pub_date: 2022-08-27 --- author: ungleich storage team --- twitter_handle: ungleich --- _hidden: no --- _discoverable: yes --- abstract: How we move our Ceph clusters into kubernetes --- body: ## Introduction At ungleich we are running multiple Ceph clusters. Some of them are running Ceph Nautilus (14.x) based on [Devuan](https://www.devuan.org/). Our newer Ceph Pacific (16.x) clusters are running based on [Rook](https://rook.io/) on [Kubernetes](https://kubernetes.io/) on top of [Alpine Linux](https://alpinelinux.org/). In this blog article we will describe how to migrate Ceph/Native/Devuan to Ceph/k8s+rook/Alpine Linux. ## Work in Progress [WIP] This blog article is work in progress. The migration planning has started, however the migration has not been finished yet. This article will feature the different paths we take for the migration. ## The Plan To continue operating the cluster during the migration, the following steps are planned: * Setup a k8s cluster that can potentially communicate with the existing ceph cluster * Using the [disaster recovery](https://rook.io/docs/rook/v1.9/Troubleshooting/disaster-recovery/) guidelines from rook to modify the rook configuration to use the previous fsid. * Spin up ceph monitors and ceph managers in rook * Retire existing monitors * Shutdown a ceph OSD node, remove it's OS disk, boot it with Alpine Linux * Join the node into the k8s cluster * Have rook pickup the existing disks and start the osds * Repeat if successful * Migrate to ceph pacific ### Original cluster The target ceph cluster we want to migrate lives in the 2a0a:e5c0::/64 network. Ceph is using: ``` public network = 2a0a:e5c0:0:0::/64 cluster network = 2a0a:e5c0:0:0::/64 ``` ### Kubernetes cluster networking inside the ceph network To be able to communicate with the existing OSDs, we will be using sub networks of 2a0a:e5c0::/64 for kubernetes. As these networks are part of the link assigned network 2a0a:e5c0::/64, we will use BGP routing on the existing ceph nodes to create more specific routes into the kubernetes cluster. As we plan to use either [cilium](https://cilium.io/) or [calico](https://www.tigera.io/project-calico/) as the CNI, we can configure kubernetes to directly BGP peer with the existing Ceph nodes. ## The setup ### Kubernetes Bootstrap As usual we bootstrap 3 control plane nodes using kubeadm. The proxy for the API resides in a different kuberentes cluster. We run ``` kubeadm init --config kubeadm.yaml ``` on the first node and join the other two control plane nodes. As usual, joining the workers last. ### k8s Networking / CNI For this setup we are using calico as described in the [ungleich kubernetes manual](https://redmine.ungleich.ch/projects/open-infrastructure/wiki/The_ungleich_kubernetes_infrastructure#section-23). ``` VERSION=v3.23.3 helm repo add projectcalico https://docs.projectcalico.org/charts helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace ``` ### BGP Networking on the old nodes To be able to import the BGP routes from Kubernetes, all old / native hosts will run bird. The installation and configuration is as follows: ``` apt-get update apt-get install -y bird2 router_id=$(hostname | sed 's/server//') cat > /etc/bird/bird.conf < 64 then accept; else reject; }; export none; }; } EOF /etc/init.d/bird restart ``` The router id must be adjusted for every host. As all hosts have a unique number, we use that one as the router id. The bird configuration allows to use dynamic peers so that any k8s node in the network can peer with the old servers. We also use a filter to avoid receiving /64 routes, as they are overlapping with the on link route. ### BGP networking in Kubernetes Calico supports BGP peering and we use a rather standard calico configuration: ``` apiVersion: projectcalico.org/v3 kind: BGPConfiguration metadata: name: default spec: logSeverityScreen: Info nodeToNodeMeshEnabled: true asNumber: 65533 serviceClusterIPs: - cidr: 2a0a:e5c0:0:aaaa::/108 serviceExternalIPs: - cidr: 2a0a:e5c0:0:aaaa::/108 ``` Plus for each server and router we create a BGPPeer: ``` apiVersion: projectcalico.org/v3 kind: BGPPeer metadata: name: serverXX spec: peerIP: 2a0a:e5c0::XX asNumber: 65530 keepOriginalNextHop: true ``` We apply the whole configuration using calicoctl: ``` ./calicoctl create -f - < ~/vcs/k8s-config/bootstrap/p5-cow/calico-bgp.yaml ``` And a few seconds later we can observer the routes on the old / native hosts: ``` bird> show protocols Name Proto Table State Since Info device1 Device --- up 23:09:01.393 kernel1 Kernel master6 up 23:09:01.393 k8s BGP --- start 23:09:01.393 Passive k8s_1 BGP --- up 23:33:01.215 Established k8s_2 BGP --- up 23:33:01.215 Established k8s_3 BGP --- up 23:33:01.420 Established k8s_4 BGP --- up 23:33:01.215 Established k8s_5 BGP --- up 23:33:01.215 Established ``` ### Testing networking To verify that the new cluster is working properly, we can deploy a tiny test deployment and see if it is globally reachable: ``` apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:1.20.0-alpine ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: nginx-service spec: selector: app: nginx ports: - protocol: TCP port: 80 ``` Using curl to access a sample service from the outside shows that networking is working: ``` % curl -v http://[2a0a:e5c0:0:aaaa::e3c9] * Trying 2a0a:e5c0:0:aaaa::e3c9:80... * Connected to 2a0a:e5c0:0:aaaa::e3c9 (2a0a:e5c0:0:aaaa::e3c9) port 80 (#0) > GET / HTTP/1.1 > Host: [2a0a:e5c0:0:aaaa::e3c9] > User-Agent: curl/7.84.0 > Accept: */* > * Mark bundle as not supporting multiuse < HTTP/1.1 200 OK < Server: nginx/1.20.0 < Date: Sat, 27 Aug 2022 22:35:49 GMT < Content-Type: text/html < Content-Length: 612 < Last-Modified: Tue, 20 Apr 2021 16:11:05 GMT < Connection: keep-alive < ETag: "607efd19-264" < Accept-Ranges: bytes < Welcome to nginx!

Welcome to nginx!

If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.

* Connection #0 to host 2a0a:e5c0:0:aaaa::e3c9 left intact ``` So far we have found 1 issue: * Sometimes the old/native servers can reach the service, sometimes they get a timeout In old calico notes on github it is referenced that overlapping Pod/CIDR networks might be a problem. Additionally we cannot use kubeadm to initialise the podsubnet to be a proper subnet of the node subnet: ``` [00:15] server57.place5:~# kubeadm init --service-cidr 2a0a:e5c0:0:cccc::/108 --pod-network-cidr 2a0a:e5c0::/100 I0829 00:16:38.659341 19400 version.go:255] remote version is much newer: v1.25.0; falling back to: stable-1.24 podSubnet: Invalid value: "2a0a:e5c0::/100": the size of pod subnet with mask 100 is smaller than the size of node subnet with mask 64 To see the stack trace of this error execute with --v=5 or higher [00:16] server57.place5:~# ``` ### Networking 2022-09-03 * Instead of trying to merge the cluster networks, we will use separate ranges * According to the [ceph users mailing list discussion](https://www.spinics.net/lists/ceph-users/msg73421.html) it is actually not necessary for mons/osds to be in the same network. In fact, we might be able to remove these settings completely. So today we start with * podSubnet: 2a0a:e5c0:0:14::/64 * serviceSubnet: 2a0a:e5c0:0:15::/108 Using BGP and calico, the kubernetes cluster is setup "as usual" (for ungleich terms). ## Changelog ### 2022-09-03 * Next try starting for migration ### 2022-08-29 * Added kubernetes/kubeadm bootstrap issue ### 2022-08-27 * The initial release of this blog article * Added k8s bootstrapping guide ## Follow up or questions You can join the discussion in the matrix room `#kubernetes:ungleich.ch` about this migration. If don't have a matrix account you can join using our chat on https://chat.with.ungleich.ch.