From 1f4f926b2e251d91bc9963f69e3cfd91e48643d0 Mon Sep 17 00:00:00 2001 From: Nico Schottelius Date: Mon, 5 Sep 2022 10:39:01 +0200 Subject: [PATCH] ++blog update --- .../contents.lr | 55 ++++++++++++++++--- 1 file changed, 48 insertions(+), 7 deletions(-) diff --git a/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr b/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr index 896c395..06a1cd5 100644 --- a/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr +++ b/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr @@ -1091,21 +1091,62 @@ We also need to ensure the toolbox is being updated/recreated: kubectl -n rook-ceph delete pods rook-ceph-tools-5cf88dd58f-fwwlc ``` -### Retiring the old monitors +### Original monitors vanish +Did not add bgp peering. +Cannot reach ceph through the routers. -### The actual migration +Seems like rook did remove them. -At this point we have 2 ceph clusters: +Updating the ceph.conf for the native nodes: -* A new one in rook -* The old/native one +``` +mon host = rook-ceph-mon-a.rook-ceph.svc.., +``` -The next steps are: +### Post monitor migration issue 1: OSDs start crashing -Replace fsid in secrets/rook-ceph-mon with that of the old one. +A day after the monitor migration some OSDs start to crash. Checking +out the debug log we found the following error: +``` +2022-09-05 10:24:02.881 7fe005ce7700 -1 Processor -- bind unable to bind to v2:[2a0a:e5c0::225:b3ff:fe20:3554]:7300/3712937 on any port in range 6800-7300: (99) Cannot assign requested address +2022-09-05 10:24:02.881 7fe005ce7700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds +2022-09-05 10:24:07.897 7fe005ce7700 -1 Processor -- bind unable to bind to v2:[2a0a:e5c0::225:b3ff:fe20:3554]:7300/3712937 on any port in range 6800-7300: (99) Cannot assign requested address +2022-09-05 10:24:07.897 7fe005ce7700 -1 Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address +2022-09-05 10:24:07.897 7fe0127b1700 -1 received signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0 +2022-09-05 10:24:07.897 7fe0127b1700 -1 osd.49 100709 *** Got signal Interrupt *** +2022-09-05 10:24:07.897 7fe0127b1700 -1 osd.49 100709 *** Immediate shutdown (osd_fast_shutdown=true) *** +``` + +Trying to bind to an IPv6 address that is **not** on the system. + +https://tracker.ceph.com/issues/24602 + +Calico/CNI does IP rewriting and thus tells the OSD the wrong IPv6 +address. + +Adding + +``` +public_addr = 2a0a:e5c0::92e2:baff:fe26:642c +``` + +to the node. Verifying the binding after restarting the crashing OSD: + +``` +[10:35:06] server4.place5:/var/log/ceph# netstat -lnpW | grep 3717792 +tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6821 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 :::6822 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 :::6823 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6816 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6817 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 :::6818 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 :::6819 :::* LISTEN 3717792/ceph-osd +tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6820 :::* LISTEN 3717792/ceph-osd +unix 2 [ ACC ] STREAM LISTENING 16880318 3717792/ceph-osd /var/run/ceph/ceph-osd.49.asok +``` ## Changelog