2022-08-27 16:37:42 +00:00
|
|
|
title: [WIP] Migrating Ceph Nautilus into Kubernetes + Rook
|
|
|
|
---
|
|
|
|
pub_date: 2022-08-27
|
|
|
|
---
|
|
|
|
author: ungleich storage team
|
|
|
|
---
|
|
|
|
twitter_handle: ungleich
|
|
|
|
---
|
|
|
|
_hidden: no
|
|
|
|
---
|
|
|
|
_discoverable: yes
|
|
|
|
---
|
|
|
|
abstract:
|
|
|
|
How we move our Ceph clusters into kubernetes
|
|
|
|
---
|
|
|
|
body:
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
At ungleich we are running multiple Ceph clusters. Some of them are
|
|
|
|
running Ceph Nautilus (14.x) based on
|
|
|
|
[Devuan](https://www.devuan.org/). Our newer Ceph Pacific (16.x)
|
|
|
|
clusters are running based on [Rook](https://rook.io/) on
|
|
|
|
[Kubernetes](https://kubernetes.io/) on top of
|
|
|
|
[Alpine Linux](https://alpinelinux.org/).
|
|
|
|
|
|
|
|
In this blog article we will describe how to migrate
|
|
|
|
Ceph/Native/Devuan to Ceph/k8s+rook/Alpine Linux.
|
|
|
|
|
|
|
|
## Work in Progress [WIP]
|
|
|
|
|
|
|
|
This blog article is work in progress. The migration planning has
|
|
|
|
started, however the migration has not been finished yet. This article
|
|
|
|
will feature the different paths we take for the migration.
|
|
|
|
|
|
|
|
## The Plan
|
|
|
|
|
|
|
|
To continue operating the cluster during the migration, the following
|
|
|
|
steps are planned:
|
|
|
|
|
|
|
|
* Setup a k8s cluster that can potentially communicate with the
|
|
|
|
existing ceph cluster
|
|
|
|
* Using the [disaster
|
|
|
|
recovery](https://rook.io/docs/rook/v1.9/Troubleshooting/disaster-recovery/)
|
|
|
|
guidelines from rook to modify the rook configuration to use the
|
|
|
|
previous fsid.
|
|
|
|
* Spin up ceph monitors and ceph managers in rook
|
|
|
|
* Retire existing monitors
|
|
|
|
* Shutdown a ceph OSD node, remove it's OS disk, boot it with Alpine
|
|
|
|
Linux
|
|
|
|
* Join the node into the k8s cluster
|
|
|
|
* Have rook pickup the existing disks and start the osds
|
|
|
|
* Repeat if successful
|
|
|
|
* Migrate to ceph pacific
|
|
|
|
|
2022-08-27 21:43:48 +00:00
|
|
|
### Original cluster
|
2022-08-27 16:37:42 +00:00
|
|
|
|
|
|
|
The target ceph cluster we want to migrate lives in the 2a0a:e5c0::/64
|
|
|
|
network. Ceph is using:
|
|
|
|
|
|
|
|
```
|
|
|
|
public network = 2a0a:e5c0:0:0::/64
|
|
|
|
cluster network = 2a0a:e5c0:0:0::/64
|
|
|
|
```
|
|
|
|
|
2022-08-27 21:43:48 +00:00
|
|
|
### Kubernetes cluster networking inside the ceph network
|
2022-08-27 16:37:42 +00:00
|
|
|
|
|
|
|
To be able to communicate with the existing OSDs, we will be using
|
|
|
|
sub networks of 2a0a:e5c0::/64 for kubernetes. As these networks
|
|
|
|
are part of the link assigned network 2a0a:e5c0::/64, we will use BGP
|
|
|
|
routing on the existing ceph nodes to create more specific routes into
|
|
|
|
the kubernetes cluster.
|
|
|
|
|
|
|
|
As we plan to use either [cilium](https://cilium.io/) or
|
|
|
|
[calico](https://www.tigera.io/project-calico/) as the CNI, we can
|
|
|
|
configure kubernetes to directly BGP peer with the existing Ceph
|
|
|
|
nodes.
|
|
|
|
|
2022-08-27 21:43:48 +00:00
|
|
|
## The setup
|
|
|
|
|
|
|
|
### Kubernetes Bootstrap
|
|
|
|
|
|
|
|
As usual we bootstrap 3 control plane nodes using kubeadm. The proxy
|
|
|
|
for the API resides in a different kuberentes cluster.
|
|
|
|
|
|
|
|
We run
|
|
|
|
|
|
|
|
```
|
|
|
|
kubeadm init --config kubeadm.yaml
|
|
|
|
```
|
|
|
|
|
|
|
|
on the first node and join the other two control plane nodes. As
|
|
|
|
usual, joining the workers last.
|
|
|
|
|
|
|
|
### k8s Networking / CNI
|
|
|
|
|
|
|
|
For this setup we are using calico as described in the
|
|
|
|
[ungleich kubernetes
|
|
|
|
manual](https://redmine.ungleich.ch/projects/open-infrastructure/wiki/The_ungleich_kubernetes_infrastructure#section-23).
|
|
|
|
|
|
|
|
```
|
|
|
|
VERSION=v3.23.3
|
|
|
|
helm repo add projectcalico https://docs.projectcalico.org/charts
|
|
|
|
helm upgrade --install --namespace tigera calico projectcalico/tigera-operator --version $VERSION --create-namespace
|
|
|
|
```
|
|
|
|
|
|
|
|
### BGP Networking on the old nodes
|
|
|
|
|
|
|
|
To be able to import the BGP routes from Kubernetes, all old / native
|
|
|
|
hosts will run bird. The installation and configuration is as follows:
|
|
|
|
|
|
|
|
```
|
|
|
|
apt-get update
|
|
|
|
apt-get install -y bird2
|
|
|
|
|
|
|
|
router_id=$(hostname | sed 's/server//')
|
|
|
|
|
|
|
|
cat > /etc/bird/bird.conf <<EOF
|
|
|
|
|
|
|
|
router id $router_id;
|
|
|
|
|
|
|
|
log syslog all;
|
|
|
|
protocol device {
|
|
|
|
}
|
|
|
|
# We are only interested in IPv6, skip another section for IPv4
|
|
|
|
protocol kernel {
|
|
|
|
ipv6 { export all; };
|
|
|
|
}
|
|
|
|
protocol bgp k8s {
|
|
|
|
local as 65530;
|
|
|
|
neighbor range 2a0a:e5c0::/64 as 65533;
|
|
|
|
dynamic name "k8s_"; direct;
|
|
|
|
|
|
|
|
ipv6 {
|
|
|
|
import filter { if net.len > 64 then accept; else reject; };
|
|
|
|
export none;
|
|
|
|
};
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
/etc/init.d/bird restart
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
The router id must be adjusted for every host. As all hosts have a
|
|
|
|
unique number, we use that one as the router id.
|
|
|
|
The bird configuration allows to use dynamic peers so that any k8s
|
|
|
|
node in the network can peer with the old servers.
|
|
|
|
|
|
|
|
We also use a filter to avoid receiving /64 routes, as they are
|
|
|
|
overlapping with the on link route.
|
|
|
|
|
|
|
|
### BGP networking in Kubernetes
|
|
|
|
|
|
|
|
Calico supports BGP peering and we use a rather standard calico
|
|
|
|
configuration:
|
|
|
|
|
|
|
|
```
|
|
|
|
apiVersion: projectcalico.org/v3
|
|
|
|
kind: BGPConfiguration
|
|
|
|
metadata:
|
|
|
|
name: default
|
|
|
|
spec:
|
|
|
|
logSeverityScreen: Info
|
|
|
|
nodeToNodeMeshEnabled: true
|
|
|
|
asNumber: 65533
|
|
|
|
serviceClusterIPs:
|
|
|
|
- cidr: 2a0a:e5c0:aaaa::/108
|
|
|
|
serviceExternalIPs:
|
|
|
|
- cidr: 2a0a:e5c0:aaaa::/108
|
|
|
|
```
|
|
|
|
|
|
|
|
Plus for each server and router we create a BGPPeer:
|
|
|
|
|
|
|
|
```
|
|
|
|
apiVersion: projectcalico.org/v3
|
|
|
|
kind: BGPPeer
|
|
|
|
metadata:
|
|
|
|
name: serverXX
|
|
|
|
spec:
|
|
|
|
peerIP: 2a0a:e5c0::XX
|
|
|
|
asNumber: 65530
|
|
|
|
keepOriginalNextHop: true
|
|
|
|
```
|
|
|
|
|
|
|
|
We apply the whole configuration using calicoctl:
|
|
|
|
|
|
|
|
```
|
|
|
|
./calicoctl create -f - < ~/vcs/k8s-config/bootstrap/p5-cow/calico-bgp.yaml
|
|
|
|
```
|
|
|
|
|
|
|
|
And a few seconds later we can observer the routes on the old / native
|
|
|
|
hosts:
|
|
|
|
|
|
|
|
```
|
|
|
|
bird> show protocols
|
|
|
|
Name Proto Table State Since Info
|
|
|
|
device1 Device --- up 23:09:01.393
|
|
|
|
kernel1 Kernel master6 up 23:09:01.393
|
|
|
|
k8s BGP --- start 23:09:01.393 Passive
|
|
|
|
k8s_1 BGP --- up 23:33:01.215 Established
|
|
|
|
k8s_2 BGP --- up 23:33:01.215 Established
|
|
|
|
k8s_3 BGP --- up 23:33:01.420 Established
|
|
|
|
k8s_4 BGP --- up 23:33:01.215 Established
|
|
|
|
k8s_5 BGP --- up 23:33:01.215 Established
|
|
|
|
|
|
|
|
```
|
2022-08-27 16:37:42 +00:00
|
|
|
|
|
|
|
## Changelog
|
|
|
|
|
|
|
|
### 2022-08-27
|
|
|
|
|
|
|
|
* The initial release of this blog article
|
|
|
|
|
|
|
|
|
|
|
|
## Follow up or questions
|
|
|
|
|
2022-08-27 16:38:58 +00:00
|
|
|
You can join the discussion in the matrix room `#kubernetes:ungleich.ch`
|
|
|
|
about this migration. If don't have a matrix
|
2022-08-27 16:37:42 +00:00
|
|
|
account you can join using our chat on https://chat.with.ungleich.ch.
|