diff --git a/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr b/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr index 06a1cd5..7c35ea7 100644 --- a/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr +++ b/content/u/blog/2022-08-27-migrating-ceph-nautilus-into-kubernetes-with-rook/contents.lr @@ -1148,8 +1148,48 @@ tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6820 :::* unix 2 [ ACC ] STREAM LISTENING 16880318 3717792/ceph-osd /var/run/ceph/ceph-osd.49.asok ``` +### Post monitor migration issue 1: OSDs start crashing + +After roughly a week an OSD on the native cluster started to fail on +restart with the following error: + +``` + unable to parse addrs +in 'rook-ceph-mon-a.rook-ceph.svc.p5-cow.k8s.ooo, +rook-ceph-mon-b.rook-ceph.svc.p5-cow.k8s.ooo, +rook-ceph-mon-c.rook-ceph.svc.p5-cow.k8s.ooo, +rook-ceph-mon-d.rook-ceph.svc.p5-cow.k8s.ooo, +rook-ceph-mon-e.rook-ceph.svc.p5-cow.k8s.ooo' +``` + +Checking the cluster, it seems rook has replaced mon-a with mon-f: + +``` +[22:38] blind:~% kubectl -n rook-ceph get svc +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +csi-cephfsplugin-metrics ClusterIP 2a0a:e5c0:0:15::f2ac 8080/TCP,8081/TCP 7d5h +csi-rbdplugin-metrics ClusterIP 2a0a:e5c0:0:15::5fc2 8080/TCP,8081/TCP 7d5h +rook-ceph-mgr ClusterIP 2a0a:e5c0:0:15::c31c 9283/TCP 7d5h +rook-ceph-mon-b ClusterIP 2a0a:e5c0:0:15::9cd9 6789/TCP,3300/TCP 7d5h +rook-ceph-mon-c ClusterIP 2a0a:e5c0:0:15::fc2 6789/TCP,3300/TCP 7d5h +rook-ceph-mon-d ClusterIP 2a0a:e5c0:0:15::b029 6789/TCP,3300/TCP 7d5h +rook-ceph-mon-e ClusterIP 2a0a:e5c0:0:15::8c86 6789/TCP,3300/TCP 7d5h +rook-ceph-mon-f ClusterIP 2a0a:e5c0:0:15::2833 6789/TCP,3300/TCP 3d13h +``` + +At this moment it is unclear why ceph does it, but if the native hosts +had already been migrated, this would probably not have caused an +issue. However as long as ceph.conf files are deployed with static +references to the monitors, this problem might repeat. + + ## Changelog +### 2022-09-10 + +* Added missing monitor description + + ### 2022-09-03 * Next try starting for migration