ungleich-k8s/README.md

320 lines
9.4 KiB
Markdown

## IPv6 only kubernetes clusters
This project is testing, deploying and using IPv6 only k8s clusters.
## Docs
* [Setting up the cluster with calico](v3-calico/README.md)
* [Bootstrapping Rook](rook/README.md)
## Working
* networking (calico)
* ceph with rook (cephfs, rbd)
* letsencrypt (nginx, certbot, homemade)
* k8s test on arm64
* CI/CD using flux
* Chart repository (chartmuseum)
* Git repository (gitea)
## Not (yet) working or tested
* proxy for pulling images only
* configure a proxy on crio
* setup a proxy in the cluster (?)
* virtualisation (VMs, kubevirt)
* network policies
* Prometheus for the cluster
* Maybe LoadBalancer support (our ClusterIP already does that though)
* (Other) DNS entrys for services
* Internal backup / snapshots
* External backup (rsync, rbd mirror, etc.)
## Cluster setup
* Calico CNI with BGP peering to our upstream infrastructure
* Rook for RBD and CephFS support
The following steps are a full walk through on setting up the
IPv6 only kubernetes cluster "c2.k8s.ooo".
### Initialise the master with kubeadm
We are using a custom kubeadm.conf to
* configure the cgroupdriver (for alpine)
* configure the IP addresses
* configure the DNS domain (c2.k8s.ooo)
```
kubeadm init --config k8s/c2/kubeadm.yaml
```
### Adding worker nodes
```
kubeadm join [2a0a:e5c0:13:0:225:b3ff:fe20:38cc]:6443 --token cfrita.. \
--discovery-token-ca-cert-hash sha256:...
```
Verifying that all nodes joined:
```
% kubectl get nodes
NAME STATUS ROLES AGE VERSION
server47 Ready control-plane,master 2m25s v1.21.1
server48 Ready <none> 66s v1.21.1
server49 Ready <none> 24s v1.21.1
server50 Ready <none> 19s v1.21.1
```
### Configuring networking
* This customised calico.yaml enables IPv6
```
kubectl apply -f cni-calico/calico.yaml
```
After applying, check that all calico pods are up and running:
```
% kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-b656ddcfc-5kfg6 0/1 Running 4 3m27s
calico-node-975vh 1/1 Running 3 3m28s
calico-node-gbnvj 1/1 Running 2 3m28s
calico-node-qjm5v 0/1 Running 4 113s
calico-node-xxxmk 1/1 Running 4 3m28s
coredns-558bd4d5db-56dv9 1/1 Running 0 8m51s
coredns-558bd4d5db-hsspb 1/1 Running 0 8m51s
etcd-server47 1/1 Running 0 9m9s
kube-apiserver-server47 1/1 Running 0 9m4s
kube-controller-manager-server47 1/1 Running 0 9m4s
kube-proxy-5g5qm 1/1 Running 0 8m51s
kube-proxy-85mck 1/1 Running 0 7m8s
kube-proxy-b95sv 1/1 Running 0 7m13s
kube-proxy-mpjkm 1/1 Running 0 7m55s
kube-scheduler-server47 1/1 Running 0 9m10s
```
Often you will have some pods crashing in the beginning and you might
need to make mounts shared (if they are not) like this:
```
mount --make-shared /sys
mount --make-shared /run
```
(above mounts are necessary for Alpine Linux)
### Getting calicoctl
To configure calico, we need calicoctl, which we can run in
yet-another-pod as following:
```
kubectl apply -f https://docs.projectcalico.org/manifests/calicoctl.yaml
```
And we alias it for easier usage:
```
alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"
```
### Adding BGP peering
We need to tell calico with which BGP peers to peer with. For this we
use the bgp-c2.yaml file, which has configurations fitting for our
cluster:
```
calicoctl create -f - < cni-calico/bgp-c2.yaml
```
At this point all nodes should be peering with our upstream
infrastructure.
We can confirm this on the upstream side, where we also run bird:
```
% birdc show route
BIRD 2.0.7 ready.
Table master6:
2a0a:e5c0:13:e1:f4c5:ab65:a67f:53c0/122 unicast [place7-srever1 20:04:14.222] * (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
unicast [place7-server3 20:04:14.224] (100) [AS65534i]
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
unicast [place7-server2 20:04:14.222] (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
unicast [place7-server4 20:04:14.221] (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
2a0a:e5c0:13:e2::/108 unicast [place7-server1 20:04:14.222] * (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
unicast [place7-server2 20:04:14.222] (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
unicast [place7-server3 20:04:14.113] (100) [AS65534i]
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
unicast [place7-server4 20:04:14.221] (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
2a0a:e5c0:13:e1:176b:eaa6:6d47:1c40/122 unicast [place7-server1 20:04:14.222] * (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
unicast [place7-server2 20:04:14.222] (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
unicast [place7-server3 20:04:14.221] (100) [AS65534i]
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
unicast [place7-server4 20:04:14.221] (100) [AS65534i]
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
2a0a:e5c0:13::/48 unreachable [v6 2021-05-16] * (200)
```
### Testing the cluster
At this point we should have a functioning k8s cluster, now we should
test whether it works using a simple nginx deployment:
Do *NOT* use https://k8s.io/examples/application/deployment.yaml. It
contains an outdated nginx container that has no IPv6 listener. You
will get results such as
```
% curl http://[2a0a:e5c0:13:bbb:176b:eaa6:6d47:1c41]
curl: (7) Failed to connect to 2a0a:e5c0:13:bbb:176b:eaa6:6d47:1c41 port 80: Connection refused
```
if you use that deployment. Instead use something on the line of the
included **nginx-test-deployment.yaml**:
```
kubectl apply -f generic/nginx-test-deployment.yaml
```
Let's see whether the pods are coming up:
```
% kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-95d596f7b-484mz 1/1 Running 0 13s
nginx-deployment-95d596f7b-4wfkp 1/1 Running 0 13s
```
And the associated service:
```
% kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 2a0a:e5c0:13:e2::1 <none> 443/TCP 16m
nginx-service ClusterIP 2a0a:e5c0:13:e2::4412 <none> 80/TCP 34s
```
It is up and running, let's curl it!
```
% curl -I http://[2a0a:e5c0:13:e2::4412]
HTTP/1.1 200 OK
Server: nginx/1.20.0
Date: Mon, 14 Jun 2021 18:08:29 GMT
Content-Type: text/html
Content-Length: 612
Last-Modified: Tue, 20 Apr 2021 16:11:05 GMT
Connection: keep-alive
ETag: "607efd19-264"
Accept-Ranges: bytes
```
Perfect. Let's delete it again:
```
kubectl delete -f generic/nginx-test-deployment.yaml
```
### Next steps
While above is already a fully running k8s cluster, we do want to have
support for **PersistentVolumeclaims**. See [the rook
documentation](rook/README.md) on how to achieve the next step.
## High available control plan
Above steps result in a single control plane node, however for
production setups, three nodes should be in the control plane.
The [guide for creating HA
clusters](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/)
referes to an external load balancer that
## Secrets
### Generating them inside the cluster
Handled via https://github.com/mittwald/kubernetes-secret-generator
```
helm repo add mittwald https://helm.mittwald.de
helm repo update
helm upgrade --install kubernetes-secret-generator mittwald/kubernetes-secret-generator
```
Generating / creating secrets:
```
apiVersion: v1
kind: Secret
metadata:
name: string-secret
annotations:
secret-generator.v1.mittwald.de/autogenerate: password
data:
username: c29tZXVzZXI=
```
* Advantage: passwords are only in the cluster
* Disadvantage: passwords are only in the cluster
## CI/CD
### What we want
* Package everything into one git repository (charts, kustomize, etc.)
* Be usable for multiple clusters
* Easily apply cross cluster
### What we don't want / what is problematic
* Uploading charts to something like chartmuseum
* Is redundant - we have a version in git
* Is manual (could probably be automated)
### ArgoCD
Looks too big, too complex, too complicated.
### FluxCD2
Looks ok, handling of helm is ok, but does not feel intuitive. Seems
to be more orientated on "kustomizing helm charts".
### Helmfile
[helmfile](https://github.com/roboll/helmfile/) seems to do most of
what we need.
## The IPv4 "problem"
* Clusters are IPv6 only
* Need to have one or more services to map IPv4
* Maybe outside haproxy w/ generic ssl/sni/host mapping
* Could even be **inside** haproxy service
## Flux + Chartmuseum
* For automatic deployments, we can use flux
* To be able to use flux with our charts, we need a Chartmuseum
* To access a private chartmuseum, we need a shared secret
* Thus we probably do need sops or similar
-alternative-
* Using kustomize, local resources can be used