++blog: k8s/ipv6 only
This commit is contained in:
parent
d2396b99de
commit
238c42e12c
1 changed files with 185 additions and 0 deletions
185
blog/k8s-ipv6-only-cluster.mdwn
Normal file
185
blog/k8s-ipv6-only-cluster.mdwn
Normal file
|
@ -0,0 +1,185 @@
|
|||
[[!meta title="Building an IPv6 only kubernetes cluster"]]
|
||||
|
||||
## Introduction
|
||||
|
||||
For a weeks I am working on my pet project to create a production
|
||||
ready kubernetes cluster that runs in an IPv6 only environment.
|
||||
|
||||
As the complexity and challenges for this project are rather
|
||||
interesting, I decided to start documenting them in this blog post.
|
||||
|
||||
The
|
||||
[ungleich-k8s](https://code.ungleich.ch/ungleich-public/ungleich-k8s)
|
||||
contanins all snippets and latest code.
|
||||
|
||||
|
||||
## Objective
|
||||
|
||||
The kubernetes cluster should support the following work loads:
|
||||
|
||||
* Matrix Chat instances (Synapse+postgres+nginx+element)
|
||||
* Virtual Machines (via kubevirt)
|
||||
* Provide storage to internal and external consumers using Ceph
|
||||
|
||||
## Components
|
||||
|
||||
The following is a list of components that I am using so far. This
|
||||
might change on the way, but I wanted to list already what I selected
|
||||
and why.
|
||||
|
||||
### OS: Alpine Linux
|
||||
|
||||
The operating system of choice to run the k8s cluster is
|
||||
[Alpine Linux](https://www.alpinelinux.org/) as it is small, stable
|
||||
and supports both docker and cri-o.
|
||||
|
||||
### Container management: docker
|
||||
|
||||
Originally I started with [cri-o](https://cri-o.io/). However using
|
||||
cri-o together with kubevirt and calico results in an overlayfs placed
|
||||
on / of the host, which breaks the full host functionality (see below
|
||||
for details).
|
||||
|
||||
Docker, while being deprecated, allows me to get kubevirt generally
|
||||
speaking running.
|
||||
|
||||
### Networking: IPv6 only, calico
|
||||
|
||||
I wanted to go with [cilium](https://cilium.io/) first, because it
|
||||
goes down the eBPF route from the get go. However cilium does not yet
|
||||
contain native and automated BGP peering with the upstream
|
||||
infrastructure, so managing nodes / ip network peering becomes a
|
||||
tedious, manual and error prone task. Cilium is on the way to improve
|
||||
this, but is not there yet.
|
||||
|
||||
[Calico](https://www.projectcalico.org/) on the other hand still
|
||||
relies on ip(6)tables and kube-proxy for forwarding traffic, but has
|
||||
for a long time proper BGP support. Calico also aims to add eBPF
|
||||
support, however at the moment it does not support IPv6 yet (bummer!).
|
||||
|
||||
### Storage: rook
|
||||
|
||||
[Rook](https://rook.io/) seems to be the first choice if you search
|
||||
who is doing what storage providers in the k8s world. It looks rather
|
||||
proper, even though some knobs are not yet clear to me.
|
||||
|
||||
Rook, in my opinion, is a direct alternative of running cephadm, which
|
||||
requires systemd running on your hosts. Which, given Alpine Linux,
|
||||
will never be the case.
|
||||
|
||||
### Virtualisation
|
||||
|
||||
[Kubevirt](https://kubevirt.io/) seems to provide a good
|
||||
interface. Mid term, kubevirt is projected to replace
|
||||
[OpenNebula](https://opennebula.io/) at
|
||||
[ungleich](https://ungleich.ch).
|
||||
|
||||
|
||||
## Challenges
|
||||
|
||||
### cri-o + calico + kubevirt = broken host
|
||||
|
||||
So this is a rather funky one. If you deploy cri-o and calico,
|
||||
everything works. If you then deploy kubevirt, the **virt-handler**
|
||||
pod fails to come up with the error message
|
||||
|
||||
Error: path "/var/run/kubevirt" is mounted on "/" but it is not a shared mount.
|
||||
|
||||
In the Internet there are two recommendations to fix this:
|
||||
|
||||
* Fix the systemd unit for docker: Obviously, using neither of them,
|
||||
this is not applicable...
|
||||
* Issue **mount --make-shared /**
|
||||
|
||||
The second command has a very strange side effect: Issueing that, the
|
||||
contents of a calico pod are mounted as an overlayfs **on / of the
|
||||
host**. This covers /proc and thus things like **ps**, **mount** and
|
||||
co. fail and basically the whole system becomes unusable until reboot.
|
||||
|
||||
This is fully reproducible. I first suspected the tmpfs on / to be the
|
||||
issue, used some disks instead of booting over network to check it and
|
||||
even a regular ext4 on / causes the exact same problem.
|
||||
|
||||
### docker + calico + kubevirt = other shared mounts
|
||||
|
||||
Now, given that cri-o + calico + kubevirt does not lead to the
|
||||
expected result, what does the same setup with docker look like? The
|
||||
calico node pods with docker fail to come up, if /sys is not
|
||||
shared mounted, the virt-handler pods fail if /run is not shared
|
||||
mounted.
|
||||
|
||||
Two funky findings:
|
||||
|
||||
Issueing the following commands makes both work:
|
||||
|
||||
mount --make-shared /sys
|
||||
mount --make-shared /run
|
||||
|
||||
The paths are totally different between docker and cri-o, even though
|
||||
the mapped hostpaths in the pod description are the same. And why is
|
||||
having /sys not being shared not a problem for calico in cri-o?
|
||||
|
||||
## Log
|
||||
|
||||
### Status 2021-06-06
|
||||
|
||||
Today is the first day of publishing the findings and this blog
|
||||
article will lack quite some information. If you are curious and want
|
||||
to know more that is not yet published, you can find me on Matrix
|
||||
in the **#hacking:ungleich.ch** room.
|
||||
|
||||
### What works so far
|
||||
|
||||
* Spawing pods IPv6 only
|
||||
* Spawing IPv6 only services works
|
||||
* BGP Peering and ECMP routes with the upstream infrastructure works
|
||||
|
||||
Here's an output of the upstream bird process for the routes from k8s:
|
||||
|
||||
bird> show route
|
||||
Table master6:
|
||||
2a0a:e5c0:13:e2::/108 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
|
||||
unicast [place7-server3 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
|
||||
unicast [place7-server4 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
|
||||
unicast [place7-server2 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
|
||||
2a0a:e5c0:13:e1:176b:eaa6:6d47:1c40/122 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
|
||||
unicast [place7-server4 23:45:21.591] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
|
||||
unicast [place7-server3 23:45:21.591] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
|
||||
unicast [place7-server2 23:45:21.589] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
|
||||
2a0a:e5c0:13:e1:e0d1:d390:343e:8480/122 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
|
||||
unicast [place7-server3 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
|
||||
unicast [place7-server4 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
|
||||
unicast [place7-server2 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
|
||||
2a0a:e5c0:13::/48 unreachable [v6 2021-05-16] * (200)
|
||||
2a0a:e5c0:13:e1:9b19:7142:bebb:4d80/122 unicast [place7-server1 23:45:21.589] * (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
|
||||
unicast [place7-server3 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
|
||||
unicast [place7-server4 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
|
||||
unicast [place7-server2 2021-06-05] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
|
||||
bird>
|
||||
|
||||
|
||||
### What doesn't work
|
||||
|
||||
* Rook does not format/spinup all disks
|
||||
* Deleting all rook components fails (**kubectl delete -f cluster.yaml
|
||||
hangs** forever)
|
||||
* Spawning VMs fails with **error: unable to recognize "vmi.yaml": no matches for kind "VirtualMachineInstance" in version "kubevirt.io/v1"**
|
||||
|
||||
|
||||
[[!tag kubernetes ipv6]]
|
Loading…
Reference in a new issue