++blog update

2022-09-05 10:39:01 +02:00 · 2022-09-05 10:39:01 +02:00 · 1f4f926b2e
commit 1f4f926b2e
parent 4b4138e5cb
1 changed files with 48 additions and 7 deletions
--- a/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr
+++ b/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr
@ -1091,21 +1091,62 @@ We also need to ensure the toolbox is being updated/recreated:
 kubectl  -n rook-ceph delete pods rook-ceph-tools-5cf88dd58f-fwwlc
 ```

-### Retiring the old monitors

+### Original monitors vanish

+Did not add bgp peering.
+Cannot reach ceph through the routers.

-### The actual migration
+Seems like rook did remove them.

-At this point we have 2 ceph clusters:
+Updating the ceph.conf for the native nodes:

-* A new one in rook
-* The old/native one
+```
+mon host            = rook-ceph-mon-a.rook-ceph.svc..,
+```

-The next steps are:
+### Post monitor migration issue 1: OSDs start crashing

-Replace fsid in secrets/rook-ceph-mon with that of the old one.
+A day after the monitor migration some OSDs start to crash. Checking
+out the debug log we found the following error:

+```
+2022-09-05 10:24:02.881 7fe005ce7700 -1  Processor -- bind unable to bind to v2:[2a0a:e5c0::225:b3ff:fe20:3554]:7300/3712937 on any port in range 6800-7300: (99) Cannot assign requested address
+2022-09-05 10:24:02.881 7fe005ce7700 -1  Processor -- bind was unable to bind. Trying again in 5 seconds
+2022-09-05 10:24:07.897 7fe005ce7700 -1  Processor -- bind unable to bind to v2:[2a0a:e5c0::225:b3ff:fe20:3554]:7300/3712937 on any port in range 6800-7300: (99) Cannot assign requested address
+2022-09-05 10:24:07.897 7fe005ce7700 -1  Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
+2022-09-05 10:24:07.897 7fe0127b1700 -1 received  signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
+2022-09-05 10:24:07.897 7fe0127b1700 -1 osd.49 100709 *** Got signal Interrupt ***
+2022-09-05 10:24:07.897 7fe0127b1700 -1 osd.49 100709 *** Immediate shutdown (osd_fast_shutdown=true) ***
+```
+
+Trying to bind to an IPv6 address that is **not** on the system.
+
+https://tracker.ceph.com/issues/24602
+
+Calico/CNI does IP rewriting and thus tells the OSD the wrong IPv6
+address.
+
+Adding
+
+```
+public_addr = 2a0a:e5c0::92e2:baff:fe26:642c
+```
+
+to the node. Verifying the binding after restarting the crashing OSD:
+
+```
+[10:35:06] server4.place5:/var/log/ceph# netstat -lnpW | grep 3717792
+tcp6       0      0 2a0a:e5c0::92e2:baff:fe26:642c:6821 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 :::6822                 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 :::6823                 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 2a0a:e5c0::92e2:baff:fe26:642c:6816 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 2a0a:e5c0::92e2:baff:fe26:642c:6817 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 :::6818                 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 :::6819                 :::*                    LISTEN      3717792/ceph-osd
+tcp6       0      0 2a0a:e5c0::92e2:baff:fe26:642c:6820 :::*                    LISTEN      3717792/ceph-osd
+unix  2      [ ACC ]     STREAM     LISTENING     16880318 3717792/ceph-osd     /var/run/ceph/ceph-osd.49.asok
+```

 ## Changelog