ungleich-k8s/archive/v3/README.md
2021-10-18 15:15:52 +02:00

9.4 KiB

IPv6 only kubernetes clusters

This project is testing, deploying and using IPv6 only k8s clusters.

Docs

Working

  • networking (calico)
  • ceph with rook (cephfs, rbd)
  • letsencrypt (nginx, certbot, homemade)
  • k8s test on arm64
  • CI/CD using flux
  • Chart repository (chartmuseum)
  • Git repository (gitea)

Not (yet) working or tested

  • proxy for pulling images only
    • configure a proxy on crio
    • setup a proxy in the cluster (?)
  • virtualisation (VMs, kubevirt)
  • network policies
  • Prometheus for the cluster
  • Maybe LoadBalancer support (our ClusterIP already does that though)
  • (Other) DNS entrys for services
  • Internal backup / snapshots
  • External backup (rsync, rbd mirror, etc.)

Cluster setup

  • Calico CNI with BGP peering to our upstream infrastructure
  • Rook for RBD and CephFS support

The following steps are a full walk through on setting up the IPv6 only kubernetes cluster "c2.k8s.ooo".

Initialise the master with kubeadm

We are using a custom kubeadm.conf to

  • configure the cgroupdriver (for alpine)
  • configure the IP addresses
  • configure the DNS domain (c2.k8s.ooo)
kubeadm init --config k8s/c2/kubeadm.yaml

Adding worker nodes

kubeadm join [2a0a:e5c0:13:0:225:b3ff:fe20:38cc]:6443 --token cfrita.. \
        --discovery-token-ca-cert-hash sha256:...

Verifying that all nodes joined:

% kubectl get nodes
NAME       STATUS   ROLES                  AGE     VERSION
server47   Ready    control-plane,master   2m25s   v1.21.1
server48   Ready    <none>                 66s     v1.21.1
server49   Ready    <none>                 24s     v1.21.1
server50   Ready    <none>                 19s     v1.21.1

Configuring networking

  • This customised calico.yaml enables IPv6
kubectl apply -f cni-calico/calico.yaml

After applying, check that all calico pods are up and running:

% kubectl -n kube-system get pods
NAME                                      READY   STATUS    RESTARTS   AGE
calico-kube-controllers-b656ddcfc-5kfg6   0/1     Running   4          3m27s
calico-node-975vh                         1/1     Running   3          3m28s
calico-node-gbnvj                         1/1     Running   2          3m28s
calico-node-qjm5v                         0/1     Running   4          113s
calico-node-xxxmk                         1/1     Running   4          3m28s
coredns-558bd4d5db-56dv9                  1/1     Running   0          8m51s
coredns-558bd4d5db-hsspb                  1/1     Running   0          8m51s
etcd-server47                             1/1     Running   0          9m9s
kube-apiserver-server47                   1/1     Running   0          9m4s
kube-controller-manager-server47          1/1     Running   0          9m4s
kube-proxy-5g5qm                          1/1     Running   0          8m51s
kube-proxy-85mck                          1/1     Running   0          7m8s
kube-proxy-b95sv                          1/1     Running   0          7m13s
kube-proxy-mpjkm                          1/1     Running   0          7m55s
kube-scheduler-server47                   1/1     Running   0          9m10s

Often you will have some pods crashing in the beginning and you might need to make mounts shared (if they are not) like this:

mount --make-shared /sys
mount --make-shared /run

(above mounts are necessary for Alpine Linux)

Getting calicoctl

To configure calico, we need calicoctl, which we can run in yet-another-pod as following:

kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml

And we alias it for easier usage:

alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"

Adding BGP peering

We need to tell calico with which BGP peers to peer with. For this we use the bgp-c2.yaml file, which has configurations fitting for our cluster:

calicoctl create -f - < cni-calico/bgp-c2.yaml

At this point all nodes should be peering with our upstream infrastructure. We can confirm this on the upstream side, where we also run bird:

% birdc show route
BIRD 2.0.7 ready.
Table master6:
2a0a:e5c0:13:e1:f4c5:ab65:a67f:53c0/122 unicast [place7-srever1 20:04:14.222] * (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server3 20:04:14.224] (100) [AS65534i]
	via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server2 20:04:14.222] (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
                     unicast [place7-server4 20:04:14.221] (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
2a0a:e5c0:13:e2::/108 unicast [place7-server1 20:04:14.222] * (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server2 20:04:14.222] (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
                     unicast [place7-server3 20:04:14.113] (100) [AS65534i]
	via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server4 20:04:14.221] (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
2a0a:e5c0:13:e1:176b:eaa6:6d47:1c40/122 unicast [place7-server1 20:04:14.222] * (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
                     unicast [place7-server2 20:04:14.222] (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
                     unicast [place7-server3 20:04:14.221] (100) [AS65534i]
	via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
                     unicast [place7-server4 20:04:14.221] (100) [AS65534i]
	via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
2a0a:e5c0:13::/48    unreachable [v6 2021-05-16] * (200)

Testing the cluster

At this point we should have a functioning k8s cluster, now we should test whether it works using a simple nginx deployment:

Do NOT use https://k8s.io/examples/application/deployment.yaml. It contains an outdated nginx container that has no IPv6 listener. You will get results such as

% curl http://[2a0a:e5c0:13:bbb:176b:eaa6:6d47:1c41]
curl: (7) Failed to connect to 2a0a:e5c0:13:bbb:176b:eaa6:6d47:1c41 port 80: Connection refused

if you use that deployment. Instead use something on the line of the included nginx-test-deployment.yaml:

kubectl apply -f generic/nginx-test-deployment.yaml

Let's see whether the pods are coming up:

% kubectl get pods
NAME                               READY   STATUS    RESTARTS   AGE
nginx-deployment-95d596f7b-484mz   1/1     Running   0          13s
nginx-deployment-95d596f7b-4wfkp   1/1     Running   0          13s

And the associated service:

% kubectl get svc
NAME            TYPE        CLUSTER-IP              EXTERNAL-IP   PORT(S)   AGE
kubernetes      ClusterIP   2a0a:e5c0:13:e2::1      <none>        443/TCP   16m
nginx-service   ClusterIP   2a0a:e5c0:13:e2::4412   <none>        80/TCP    34s

It is up and running, let's curl it!

% curl -I http://[2a0a:e5c0:13:e2::4412]
HTTP/1.1 200 OK
Server: nginx/1.20.0
Date: Mon, 14 Jun 2021 18:08:29 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 20 Apr 2021 16:11:05 GMT
Connection: keep-alive
ETag: "607efd19-264"
Accept-Ranges: bytes

Perfect. Let's delete it again:

kubectl delete -f generic/nginx-test-deployment.yaml

Next steps

While above is already a fully running k8s cluster, we do want to have support for PersistentVolumeclaims. See the rook documentation on how to achieve the next step.

High available control plan

Above steps result in a single control plane node, however for production setups, three nodes should be in the control plane.

The guide for creating HA clusters referes to an external load balancer that

Secrets

Generating them inside the cluster

Handled via https://github.com/mittwald/kubernetes-secret-generator

helm repo add mittwald https://helm.mittwald.de
helm repo update
helm upgrade --install kubernetes-secret-generator mittwald/kubernetes-secret-generator

Generating / creating secrets:

apiVersion: v1
kind: Secret
metadata:
  name: string-secret
  annotations:
    secret-generator.v1.mittwald.de/autogenerate: password
data:
  username: c29tZXVzZXI=
  • Advantage: passwords are only in the cluster
  • Disadvantage: passwords are only in the cluster

CI/CD

What we want

  • Package everything into one git repository (charts, kustomize, etc.)
  • Be usable for multiple clusters
  • Easily apply cross cluster

What we don't want / what is problematic

  • Uploading charts to something like chartmuseum
    • Is redundant - we have a version in git
    • Is manual (could probably be automated)

ArgoCD

Looks too big, too complex, too complicated.

FluxCD2

Looks ok, handling of helm is ok, but does not feel intuitive. Seems to be more orientated on "kustomizing helm charts".

Helmfile

helmfile seems to do most of what we need.

The IPv4 "problem"

  • Clusters are IPv6 only
  • Need to have one or more services to map IPv4
  • Maybe outside haproxy w/ generic ssl/sni/host mapping
    • Could even be inside haproxy service

Flux + Chartmuseum

  • For automatic deployments, we can use flux
  • To be able to use flux with our charts, we need a Chartmuseum
  • To access a private chartmuseum, we need a shared secret
  • Thus we probably do need sops or similar

-alternative-

  • Using kustomize, local resources can be used