++blog: disk based booting

2021-08-31 18:01:37 +02:00 · 2021-08-31 18:01:37 +02:00 · 9c45ac817d
commit 9c45ac817d
parent 54133b70ef
1 changed files with 131 additions and 0 deletions
--- a/content/u/blog/2021-08-31-datacenterlight-bye-bye-netboot/contents.lr
+++ b/content/u/blog/2021-08-31-datacenterlight-bye-bye-netboot/contents.lr
@ -0,0 +1,131 @@
+title: Bye, bye netboot
+---
+pub_date: 2021-08-31
+---
+author: ungleich infrastructure team
+---
+twitter_handle: ungleich
+---
+_hidden: no
+---
+_discoverable: yes
+---
+abstract:
+Data Center Light servers are switching to disk based boot
+---
+body:
+
+## Introduction
+
+Since the very beginning of the [Data Center Light
+project](/u/projects/data-center-light) our servers have been
+*somewhat stateless* and booted from their operating system from the
+network.
+
+From today on this changes and our servers are switched to boot from
+an disk (SSD/NVMe/HDD). While this first seems counter intuitive with
+growing a data center, let us explain why this makes sense for us.
+
+## Netboot in a nutshell
+
+There are different variants of how to netboot a server. In either
+case, the server loads an executable from the network, typically via
+TFTP or HTTP and then hands over execution to it.
+
+The first option is to load the kernel and then later switch to an NFS
+based filesystem. If the filesystem is read write, you usually need
+one location per server or you mount it read only and possibly apply
+an overlay for runtime configuration.
+
+The second option is to load the kernel and an initramfs into memory
+and stay inside the initramfs. The advantage of this approach is that
+no NFS server is needed, but the whole operating system is inside the
+memory.
+
+The second option is what we used in Data Center Light for the last
+couple of years.
+
+## Netboot history at Data Center Light
+
+Originally all our servers started with IPv4 PXE based
+netboot. However as our data center is generally speaking IPv6 only,
+the IPv4 DHCP+TFTP combination is an extra maintenance and also a
+hindrance for network debugging: if you are in a single stack IPv6
+only network, things are much easier to debug. No need to look for two
+routing tables, no need to work around DHCP settings that might
+interfere with what one wants to achieve via IPv6.
+
+As the IPv4 addresses became more of a technical debt in our
+infrastructure, we started flashing our network cards with
+[ipxe](https://ipxe.org/), which allows even older network cards to
+boot in IPv6 only networks.
+
+Also in an IPv6 only netboot environment, it is easier to run
+active-active routers, as hosts are not assigned DHCP leases. They
+assign addresses themselves, which scales much nicer.
+
+## Migrating away from netbooting
+
+So why are we migrating away from netbooting, even after we migrated
+to IPv6 only networking? There are multiple aspects:
+
+On power failure, netbooted hosts lose their state. The operating
+system that is loaded is the same for every server and needs some
+configuration post-boot. We have solved this using
+[cdist](https://www.cdi.st/), however the authentication-trigger
+mechanism is non-trivial, if you want to keep your netboot images and
+build steps public.
+
+The second reason is state synchronisation: as we are having multiple
+boot servers, we need to maintain the same state on multiple
+machines. That is solvable via CI/CD pipelines, however the level of
+automation on build servers is rather low, because the amount of OS
+changes are low.
+
+The third and main point is our ongoing migration towards
+[kubernetes](https://kubernetes.io/). Originally our servers would
+boot up, get configured for providing ceph storage or to be a
+virtualisation host. The amount of binaries to keep in our in-memory
+image was tiny, in the best case around 150MB. With the migration
+towards kubernetes, every node is downloading the containers, which
+can be comparable huge (gigabytes of data). The additional pivot_root
+workarounds that are required for initramfs usage are just an
+additional minor point that made us question our current setup.
+
+## Automating disk based boot
+
+We have servers from a variety of brands and each of them comes with a
+variety of disk controllers: from simple pass-through SATA controllers
+to full fledged hardware raid with onboard cache and battery for
+protecting the cache - everything is in the mix.
+
+So it is not easily possible to generate a stack of disks somewhere
+and then add them, as the disk controller might add some (RAID0) meta
+data to it.
+
+To work around this problem, we insert the disk that is becoming the
+boot disk in the future into the netbooted servers, install the
+operating system from the running environment and at the next
+maintenance window ensure that the server is actually booting from it.
+
+If you are curious on how this works, you can checkout the script that
+we use for
+[Devuan/Debian](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/debian-devuan-install-on-disk.sh)
+and
+[Alpine Linux](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/alpine-install-on-disk.sh)
+
+## The road continues
+
+While a data center needs to be stable, it also needs to adapt to
+newer technologies or different flows. The disk based boot is our
+current solution for our path towards kubernetes migration, but who
+knows - in the future things might look different again.
+
+If you want to join the discussion, we have a
+[Hacking and Learning
+(#hacking-and-learning:ungleich.ch)](/u/projects/open-chat/) channel
+on Matrix for an open exchange.
+
+Oh and in case [you were wondering what we did
+today](https://twitter.com/ungleich/status/1432627966316584968), we
+switched to disk based booting ;-).