rook-migration: name missing

This commit is contained in:
Nico Schottelius 2022-09-10 22:43:40 +02:00
parent 1f4f926b2e
commit 69bae862e8

View file

@ -1148,8 +1148,48 @@ tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6820 :::*
unix 2 [ ACC ] STREAM LISTENING 16880318 3717792/ceph-osd /var/run/ceph/ceph-osd.49.asok
```
### Post monitor migration issue 1: OSDs start crashing
After roughly a week an OSD on the native cluster started to fail on
restart with the following error:
```
unable to parse addrs
in 'rook-ceph-mon-a.rook-ceph.svc.p5-cow.k8s.ooo,
rook-ceph-mon-b.rook-ceph.svc.p5-cow.k8s.ooo,
rook-ceph-mon-c.rook-ceph.svc.p5-cow.k8s.ooo,
rook-ceph-mon-d.rook-ceph.svc.p5-cow.k8s.ooo,
rook-ceph-mon-e.rook-ceph.svc.p5-cow.k8s.ooo'
```
Checking the cluster, it seems rook has replaced mon-a with mon-f:
```
[22:38] blind:~% kubectl -n rook-ceph get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-cephfsplugin-metrics ClusterIP 2a0a:e5c0:0:15::f2ac <none> 8080/TCP,8081/TCP 7d5h
csi-rbdplugin-metrics ClusterIP 2a0a:e5c0:0:15::5fc2 <none> 8080/TCP,8081/TCP 7d5h
rook-ceph-mgr ClusterIP 2a0a:e5c0:0:15::c31c <none> 9283/TCP 7d5h
rook-ceph-mon-b ClusterIP 2a0a:e5c0:0:15::9cd9 <none> 6789/TCP,3300/TCP 7d5h
rook-ceph-mon-c ClusterIP 2a0a:e5c0:0:15::fc2 <none> 6789/TCP,3300/TCP 7d5h
rook-ceph-mon-d ClusterIP 2a0a:e5c0:0:15::b029 <none> 6789/TCP,3300/TCP 7d5h
rook-ceph-mon-e ClusterIP 2a0a:e5c0:0:15::8c86 <none> 6789/TCP,3300/TCP 7d5h
rook-ceph-mon-f ClusterIP 2a0a:e5c0:0:15::2833 <none> 6789/TCP,3300/TCP 3d13h
```
At this moment it is unclear why ceph does it, but if the native hosts
had already been migrated, this would probably not have caused an
issue. However as long as ceph.conf files are deployed with static
references to the monitors, this problem might repeat.
## Changelog
### 2022-09-10
* Added missing monitor description
### 2022-09-03
* Next try starting for migration