132 lines
5.2 KiB
Markdown
132 lines
5.2 KiB
Markdown
title: Bye, bye netboot
|
|
---
|
|
pub_date: 2021-08-31
|
|
---
|
|
author: ungleich infrastructure team
|
|
---
|
|
twitter_handle: ungleich
|
|
---
|
|
_hidden: no
|
|
---
|
|
_discoverable: yes
|
|
---
|
|
abstract:
|
|
Data Center Light servers are switching to disk based boot
|
|
---
|
|
body:
|
|
|
|
## Introduction
|
|
|
|
Since the very beginning of the [Data Center Light
|
|
project](/u/projects/data-center-light) our servers have been
|
|
*somewhat stateless* and booted from their operating system from the
|
|
network.
|
|
|
|
From today on this changes and our servers are switched to boot from
|
|
an disk (SSD/NVMe/HDD). While this first seems counter intuitive with
|
|
growing a data center, let us explain why this makes sense for us.
|
|
|
|
## Netboot in a nutshell
|
|
|
|
There are different variants of how to netboot a server. In either
|
|
case, the server loads an executable from the network, typically via
|
|
TFTP or HTTP and then hands over execution to it.
|
|
|
|
The first option is to load the kernel and then later switch to an NFS
|
|
based filesystem. If the filesystem is read write, you usually need
|
|
one location per server or you mount it read only and possibly apply
|
|
an overlay for runtime configuration.
|
|
|
|
The second option is to load the kernel and an initramfs into memory
|
|
and stay inside the initramfs. The advantage of this approach is that
|
|
no NFS server is needed, but the whole operating system is inside the
|
|
memory.
|
|
|
|
The second option is what we used in Data Center Light for the last
|
|
couple of years.
|
|
|
|
## Netboot history at Data Center Light
|
|
|
|
Originally all our servers started with IPv4 PXE based
|
|
netboot. However as our data center is generally speaking IPv6 only,
|
|
the IPv4 DHCP+TFTP combination is an extra maintenance and also a
|
|
hindrance for network debugging: if you are in a single stack IPv6
|
|
only network, things are much easier to debug. No need to look for two
|
|
routing tables, no need to work around DHCP settings that might
|
|
interfere with what one wants to achieve via IPv6.
|
|
|
|
As the IPv4 addresses became more of a technical debt in our
|
|
infrastructure, we started flashing our network cards with
|
|
[ipxe](https://ipxe.org/), which allows even older network cards to
|
|
boot in IPv6 only networks.
|
|
|
|
Also in an IPv6 only netboot environment, it is easier to run
|
|
active-active routers, as hosts are not assigned DHCP leases. They
|
|
assign addresses themselves, which scales much nicer.
|
|
|
|
## Migrating away from netbooting
|
|
|
|
So why are we migrating away from netbooting, even after we migrated
|
|
to IPv6 only networking? There are multiple aspects:
|
|
|
|
On power failure, netbooted hosts lose their state. The operating
|
|
system that is loaded is the same for every server and needs some
|
|
configuration post-boot. We have solved this using
|
|
[cdist](https://www.cdi.st/), however the authentication-trigger
|
|
mechanism is non-trivial, if you want to keep your netboot images and
|
|
build steps public.
|
|
|
|
The second reason is state synchronisation: as we are having multiple
|
|
boot servers, we need to maintain the same state on multiple
|
|
machines. That is solvable via CI/CD pipelines, however the level of
|
|
automation on build servers is rather low, because the amount of OS
|
|
changes are low.
|
|
|
|
The third and main point is our ongoing migration towards
|
|
[kubernetes](https://kubernetes.io/). Originally our servers would
|
|
boot up, get configured for providing ceph storage or to be a
|
|
virtualisation host. The amount of binaries to keep in our in-memory
|
|
image was tiny, in the best case around 150MB. With the migration
|
|
towards kubernetes, every node is downloading the containers, which
|
|
can be comparable huge (gigabytes of data). The additional pivot_root
|
|
workarounds that are required for initramfs usage are just an
|
|
additional minor point that made us question our current setup.
|
|
|
|
## Automating disk based boot
|
|
|
|
We have servers from a variety of brands and each of them comes with a
|
|
variety of disk controllers: from simple pass-through SATA controllers
|
|
to full fledged hardware raid with onboard cache and battery for
|
|
protecting the cache - everything is in the mix.
|
|
|
|
So it is not easily possible to generate a stack of disks somewhere
|
|
and then add them, as the disk controller might add some (RAID0) meta
|
|
data to it.
|
|
|
|
To work around this problem, we insert the disk that is becoming the
|
|
boot disk in the future into the netbooted servers, install the
|
|
operating system from the running environment and at the next
|
|
maintenance window ensure that the server is actually booting from it.
|
|
|
|
If you are curious on how this works, you can checkout the script that
|
|
we use for
|
|
[Devuan/Debian](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/debian-devuan-install-on-disk.sh)
|
|
and
|
|
[Alpine Linux](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/alpine-install-on-disk.sh)
|
|
|
|
## The road continues
|
|
|
|
While a data center needs to be stable, it also needs to adapt to
|
|
newer technologies or different flows. The disk based boot is our
|
|
current solution for our path towards kubernetes migration, but who
|
|
knows - in the future things might look different again.
|
|
|
|
If you want to join the discussion, we have a
|
|
[Hacking and Learning
|
|
(#hacking-and-learning:ungleich.ch)](/u/projects/open-chat/) channel
|
|
on Matrix for an open exchange.
|
|
|
|
Oh and in case [you were wondering what we did
|
|
today](https://twitter.com/ungleich/status/1432627966316584968), we
|
|
switched to disk based booting - that case is full of SSDs, not 1'000
|
|
CHF banknotes.
|