++blog: disk based booting

This commit is contained in:
Nico Schottelius 2021-08-31 18:01:37 +02:00
parent 54133b70ef
commit 9c45ac817d

View file

@ -0,0 +1,131 @@
title: Bye, bye netboot
---
pub_date: 2021-08-31
---
author: ungleich infrastructure team
---
twitter_handle: ungleich
---
_hidden: no
---
_discoverable: yes
---
abstract:
Data Center Light servers are switching to disk based boot
---
body:
## Introduction
Since the very beginning of the [Data Center Light
project](/u/projects/data-center-light) our servers have been
*somewhat stateless* and booted from their operating system from the
network.
From today on this changes and our servers are switched to boot from
an disk (SSD/NVMe/HDD). While this first seems counter intuitive with
growing a data center, let us explain why this makes sense for us.
## Netboot in a nutshell
There are different variants of how to netboot a server. In either
case, the server loads an executable from the network, typically via
TFTP or HTTP and then hands over execution to it.
The first option is to load the kernel and then later switch to an NFS
based filesystem. If the filesystem is read write, you usually need
one location per server or you mount it read only and possibly apply
an overlay for runtime configuration.
The second option is to load the kernel and an initramfs into memory
and stay inside the initramfs. The advantage of this approach is that
no NFS server is needed, but the whole operating system is inside the
memory.
The second option is what we used in Data Center Light for the last
couple of years.
## Netboot history at Data Center Light
Originally all our servers started with IPv4 PXE based
netboot. However as our data center is generally speaking IPv6 only,
the IPv4 DHCP+TFTP combination is an extra maintenance and also a
hindrance for network debugging: if you are in a single stack IPv6
only network, things are much easier to debug. No need to look for two
routing tables, no need to work around DHCP settings that might
interfere with what one wants to achieve via IPv6.
As the IPv4 addresses became more of a technical debt in our
infrastructure, we started flashing our network cards with
[ipxe](https://ipxe.org/), which allows even older network cards to
boot in IPv6 only networks.
Also in an IPv6 only netboot environment, it is easier to run
active-active routers, as hosts are not assigned DHCP leases. They
assign addresses themselves, which scales much nicer.
## Migrating away from netbooting
So why are we migrating away from netbooting, even after we migrated
to IPv6 only networking? There are multiple aspects:
On power failure, netbooted hosts lose their state. The operating
system that is loaded is the same for every server and needs some
configuration post-boot. We have solved this using
[cdist](https://www.cdi.st/), however the authentication-trigger
mechanism is non-trivial, if you want to keep your netboot images and
build steps public.
The second reason is state synchronisation: as we are having multiple
boot servers, we need to maintain the same state on multiple
machines. That is solvable via CI/CD pipelines, however the level of
automation on build servers is rather low, because the amount of OS
changes are low.
The third and main point is our ongoing migration towards
[kubernetes](https://kubernetes.io/). Originally our servers would
boot up, get configured for providing ceph storage or to be a
virtualisation host. The amount of binaries to keep in our in-memory
image was tiny, in the best case around 150MB. With the migration
towards kubernetes, every node is downloading the containers, which
can be comparable huge (gigabytes of data). The additional pivot_root
workarounds that are required for initramfs usage are just an
additional minor point that made us question our current setup.
## Automating disk based boot
We have servers from a variety of brands and each of them comes with a
variety of disk controllers: from simple pass-through SATA controllers
to full fledged hardware raid with onboard cache and battery for
protecting the cache - everything is in the mix.
So it is not easily possible to generate a stack of disks somewhere
and then add them, as the disk controller might add some (RAID0) meta
data to it.
To work around this problem, we insert the disk that is becoming the
boot disk in the future into the netbooted servers, install the
operating system from the running environment and at the next
maintenance window ensure that the server is actually booting from it.
If you are curious on how this works, you can checkout the script that
we use for
[Devuan/Debian](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/debian-devuan-install-on-disk.sh)
and
[Alpine Linux](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/alpine-install-on-disk.sh)
## The road continues
While a data center needs to be stable, it also needs to adapt to
newer technologies or different flows. The disk based boot is our
current solution for our path towards kubernetes migration, but who
knows - in the future things might look different again.
If you want to join the discussion, we have a
[Hacking and Learning
(#hacking-and-learning:ungleich.ch)](/u/projects/open-chat/) channel
on Matrix for an open exchange.
Oh and in case [you were wondering what we did
today](https://twitter.com/ungleich/status/1432627966316584968), we
switched to disk based booting ;-).