++blog: disk based booting

2021-08-31 18:01:37 +02:00 · 2021-08-31 18:01:37 +02:00 · 9c45ac817d
commit 9c45ac817d
parent 54133b70ef
1 changed files with 131 additions and 0 deletions
--- a/content/u/blog/2021-08-31-datacenterlight-bye-bye-netboot/contents.lr
+++ b/content/u/blog/2021-08-31-datacenterlight-bye-bye-netboot/contents.lr
@ -0,0 +1,131 @@
 title: Bye, bye netboot
 ---
 pub_date: 2021-08-31
 ---
 author: ungleich infrastructure team
 ---
 twitter_handle: ungleich
 ---
 _hidden: no
 ---
 _discoverable: yes
 ---
 abstract:
 Data Center Light servers are switching to disk based boot
 ---
 body:
 ## Introduction
 Since the very beginning of the [Data Center Light
 project](/u/projects/data-center-light) our servers have been
 *somewhat stateless* and booted from their operating system from the
 network.
 From today on this changes and our servers are switched to boot from
 an disk (SSD/NVMe/HDD). While this first seems counter intuitive with
 growing a data center, let us explain why this makes sense for us.
 ## Netboot in a nutshell
 There are different variants of how to netboot a server. In either
 case, the server loads an executable from the network, typically via
 TFTP or HTTP and then hands over execution to it.
 The first option is to load the kernel and then later switch to an NFS
 based filesystem. If the filesystem is read write, you usually need
 one location per server or you mount it read only and possibly apply
 an overlay for runtime configuration.
 The second option is to load the kernel and an initramfs into memory
 and stay inside the initramfs. The advantage of this approach is that
 no NFS server is needed, but the whole operating system is inside the
 memory.
 The second option is what we used in Data Center Light for the last
 couple of years.
 ## Netboot history at Data Center Light
 Originally all our servers started with IPv4 PXE based
 netboot. However as our data center is generally speaking IPv6 only,
 the IPv4 DHCP+TFTP combination is an extra maintenance and also a
 hindrance for network debugging: if you are in a single stack IPv6
 only network, things are much easier to debug. No need to look for two
 routing tables, no need to work around DHCP settings that might
 interfere with what one wants to achieve via IPv6.
 As the IPv4 addresses became more of a technical debt in our
 infrastructure, we started flashing our network cards with
 [ipxe](https://ipxe.org/), which allows even older network cards to
 boot in IPv6 only networks.
 Also in an IPv6 only netboot environment, it is easier to run
 active-active routers, as hosts are not assigned DHCP leases. They
 assign addresses themselves, which scales much nicer.
 ## Migrating away from netbooting
 So why are we migrating away from netbooting, even after we migrated
 to IPv6 only networking? There are multiple aspects:
 On power failure, netbooted hosts lose their state. The operating
 system that is loaded is the same for every server and needs some
 configuration post-boot. We have solved this using
 [cdist](https://www.cdi.st/), however the authentication-trigger
 mechanism is non-trivial, if you want to keep your netboot images and
 build steps public.
 The second reason is state synchronisation: as we are having multiple
 boot servers, we need to maintain the same state on multiple
 machines. That is solvable via CI/CD pipelines, however the level of
 automation on build servers is rather low, because the amount of OS
 changes are low.
 The third and main point is our ongoing migration towards
 [kubernetes](https://kubernetes.io/). Originally our servers would
 boot up, get configured for providing ceph storage or to be a
 virtualisation host. The amount of binaries to keep in our in-memory
 image was tiny, in the best case around 150MB. With the migration
 towards kubernetes, every node is downloading the containers, which
 can be comparable huge (gigabytes of data). The additional pivot_root
 workarounds that are required for initramfs usage are just an
 additional minor point that made us question our current setup.
 ## Automating disk based boot
 We have servers from a variety of brands and each of them comes with a
 variety of disk controllers: from simple pass-through SATA controllers
 to full fledged hardware raid with onboard cache and battery for
 protecting the cache - everything is in the mix.
 So it is not easily possible to generate a stack of disks somewhere
 and then add them, as the disk controller might add some (RAID0) meta
 data to it.
 To work around this problem, we insert the disk that is becoming the
 boot disk in the future into the netbooted servers, install the
 operating system from the running environment and at the next
 maintenance window ensure that the server is actually booting from it.
 If you are curious on how this works, you can checkout the script that
 we use for
 [Devuan/Debian](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/debian-devuan-install-on-disk.sh)
 and
 [Alpine Linux](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/alpine-install-on-disk.sh)
 ## The road continues
 While a data center needs to be stable, it also needs to adapt to
 newer technologies or different flows. The disk based boot is our
 current solution for our path towards kubernetes migration, but who
 knows - in the future things might look different again.
 If you want to join the discussion, we have a
 [Hacking and Learning
 (#hacking-and-learning:ungleich.ch)](/u/projects/open-chat/) channel
 on Matrix for an open exchange.
 Oh and in case [you were wondering what we did
 today](https://twitter.com/ungleich/status/1432627966316584968), we
 switched to disk based booting ;-).