++blog update
This commit is contained in:
parent
4b4138e5cb
commit
1f4f926b2e
1 changed files with 48 additions and 7 deletions
|
@ -1091,21 +1091,62 @@ We also need to ensure the toolbox is being updated/recreated:
|
||||||
kubectl -n rook-ceph delete pods rook-ceph-tools-5cf88dd58f-fwwlc
|
kubectl -n rook-ceph delete pods rook-ceph-tools-5cf88dd58f-fwwlc
|
||||||
```
|
```
|
||||||
|
|
||||||
### Retiring the old monitors
|
|
||||||
|
|
||||||
|
### Original monitors vanish
|
||||||
|
|
||||||
|
Did not add bgp peering.
|
||||||
|
Cannot reach ceph through the routers.
|
||||||
|
|
||||||
### The actual migration
|
Seems like rook did remove them.
|
||||||
|
|
||||||
At this point we have 2 ceph clusters:
|
Updating the ceph.conf for the native nodes:
|
||||||
|
|
||||||
* A new one in rook
|
```
|
||||||
* The old/native one
|
mon host = rook-ceph-mon-a.rook-ceph.svc..,
|
||||||
|
```
|
||||||
|
|
||||||
The next steps are:
|
### Post monitor migration issue 1: OSDs start crashing
|
||||||
|
|
||||||
Replace fsid in secrets/rook-ceph-mon with that of the old one.
|
A day after the monitor migration some OSDs start to crash. Checking
|
||||||
|
out the debug log we found the following error:
|
||||||
|
|
||||||
|
```
|
||||||
|
2022-09-05 10:24:02.881 7fe005ce7700 -1 Processor -- bind unable to bind to v2:[2a0a:e5c0::225:b3ff:fe20:3554]:7300/3712937 on any port in range 6800-7300: (99) Cannot assign requested address
|
||||||
|
2022-09-05 10:24:02.881 7fe005ce7700 -1 Processor -- bind was unable to bind. Trying again in 5 seconds
|
||||||
|
2022-09-05 10:24:07.897 7fe005ce7700 -1 Processor -- bind unable to bind to v2:[2a0a:e5c0::225:b3ff:fe20:3554]:7300/3712937 on any port in range 6800-7300: (99) Cannot assign requested address
|
||||||
|
2022-09-05 10:24:07.897 7fe005ce7700 -1 Processor -- bind was unable to bind after 3 attempts: (99) Cannot assign requested address
|
||||||
|
2022-09-05 10:24:07.897 7fe0127b1700 -1 received signal: Interrupt from Kernel ( Could be generated by pthread_kill(), raise(), abort(), alarm() ) UID: 0
|
||||||
|
2022-09-05 10:24:07.897 7fe0127b1700 -1 osd.49 100709 *** Got signal Interrupt ***
|
||||||
|
2022-09-05 10:24:07.897 7fe0127b1700 -1 osd.49 100709 *** Immediate shutdown (osd_fast_shutdown=true) ***
|
||||||
|
```
|
||||||
|
|
||||||
|
Trying to bind to an IPv6 address that is **not** on the system.
|
||||||
|
|
||||||
|
https://tracker.ceph.com/issues/24602
|
||||||
|
|
||||||
|
Calico/CNI does IP rewriting and thus tells the OSD the wrong IPv6
|
||||||
|
address.
|
||||||
|
|
||||||
|
Adding
|
||||||
|
|
||||||
|
```
|
||||||
|
public_addr = 2a0a:e5c0::92e2:baff:fe26:642c
|
||||||
|
```
|
||||||
|
|
||||||
|
to the node. Verifying the binding after restarting the crashing OSD:
|
||||||
|
|
||||||
|
```
|
||||||
|
[10:35:06] server4.place5:/var/log/ceph# netstat -lnpW | grep 3717792
|
||||||
|
tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6821 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 :::6822 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 :::6823 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6816 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6817 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 :::6818 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 :::6819 :::* LISTEN 3717792/ceph-osd
|
||||||
|
tcp6 0 0 2a0a:e5c0::92e2:baff:fe26:642c:6820 :::* LISTEN 3717792/ceph-osd
|
||||||
|
unix 2 [ ACC ] STREAM LISTENING 16880318 3717792/ceph-osd /var/run/ceph/ceph-osd.49.asok
|
||||||
|
```
|
||||||
|
|
||||||
## Changelog
|
## Changelog
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue