++blog: disk based booting
This commit is contained in:
parent
54133b70ef
commit
9c45ac817d
1 changed files with 131 additions and 0 deletions
|
@ -0,0 +1,131 @@
|
|||
title: Bye, bye netboot
|
||||
---
|
||||
pub_date: 2021-08-31
|
||||
---
|
||||
author: ungleich infrastructure team
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
Data Center Light servers are switching to disk based boot
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
Since the very beginning of the [Data Center Light
|
||||
project](/u/projects/data-center-light) our servers have been
|
||||
*somewhat stateless* and booted from their operating system from the
|
||||
network.
|
||||
|
||||
From today on this changes and our servers are switched to boot from
|
||||
an disk (SSD/NVMe/HDD). While this first seems counter intuitive with
|
||||
growing a data center, let us explain why this makes sense for us.
|
||||
|
||||
## Netboot in a nutshell
|
||||
|
||||
There are different variants of how to netboot a server. In either
|
||||
case, the server loads an executable from the network, typically via
|
||||
TFTP or HTTP and then hands over execution to it.
|
||||
|
||||
The first option is to load the kernel and then later switch to an NFS
|
||||
based filesystem. If the filesystem is read write, you usually need
|
||||
one location per server or you mount it read only and possibly apply
|
||||
an overlay for runtime configuration.
|
||||
|
||||
The second option is to load the kernel and an initramfs into memory
|
||||
and stay inside the initramfs. The advantage of this approach is that
|
||||
no NFS server is needed, but the whole operating system is inside the
|
||||
memory.
|
||||
|
||||
The second option is what we used in Data Center Light for the last
|
||||
couple of years.
|
||||
|
||||
## Netboot history at Data Center Light
|
||||
|
||||
Originally all our servers started with IPv4 PXE based
|
||||
netboot. However as our data center is generally speaking IPv6 only,
|
||||
the IPv4 DHCP+TFTP combination is an extra maintenance and also a
|
||||
hindrance for network debugging: if you are in a single stack IPv6
|
||||
only network, things are much easier to debug. No need to look for two
|
||||
routing tables, no need to work around DHCP settings that might
|
||||
interfere with what one wants to achieve via IPv6.
|
||||
|
||||
As the IPv4 addresses became more of a technical debt in our
|
||||
infrastructure, we started flashing our network cards with
|
||||
[ipxe](https://ipxe.org/), which allows even older network cards to
|
||||
boot in IPv6 only networks.
|
||||
|
||||
Also in an IPv6 only netboot environment, it is easier to run
|
||||
active-active routers, as hosts are not assigned DHCP leases. They
|
||||
assign addresses themselves, which scales much nicer.
|
||||
|
||||
## Migrating away from netbooting
|
||||
|
||||
So why are we migrating away from netbooting, even after we migrated
|
||||
to IPv6 only networking? There are multiple aspects:
|
||||
|
||||
On power failure, netbooted hosts lose their state. The operating
|
||||
system that is loaded is the same for every server and needs some
|
||||
configuration post-boot. We have solved this using
|
||||
[cdist](https://www.cdi.st/), however the authentication-trigger
|
||||
mechanism is non-trivial, if you want to keep your netboot images and
|
||||
build steps public.
|
||||
|
||||
The second reason is state synchronisation: as we are having multiple
|
||||
boot servers, we need to maintain the same state on multiple
|
||||
machines. That is solvable via CI/CD pipelines, however the level of
|
||||
automation on build servers is rather low, because the amount of OS
|
||||
changes are low.
|
||||
|
||||
The third and main point is our ongoing migration towards
|
||||
[kubernetes](https://kubernetes.io/). Originally our servers would
|
||||
boot up, get configured for providing ceph storage or to be a
|
||||
virtualisation host. The amount of binaries to keep in our in-memory
|
||||
image was tiny, in the best case around 150MB. With the migration
|
||||
towards kubernetes, every node is downloading the containers, which
|
||||
can be comparable huge (gigabytes of data). The additional pivot_root
|
||||
workarounds that are required for initramfs usage are just an
|
||||
additional minor point that made us question our current setup.
|
||||
|
||||
## Automating disk based boot
|
||||
|
||||
We have servers from a variety of brands and each of them comes with a
|
||||
variety of disk controllers: from simple pass-through SATA controllers
|
||||
to full fledged hardware raid with onboard cache and battery for
|
||||
protecting the cache - everything is in the mix.
|
||||
|
||||
So it is not easily possible to generate a stack of disks somewhere
|
||||
and then add them, as the disk controller might add some (RAID0) meta
|
||||
data to it.
|
||||
|
||||
To work around this problem, we insert the disk that is becoming the
|
||||
boot disk in the future into the netbooted servers, install the
|
||||
operating system from the running environment and at the next
|
||||
maintenance window ensure that the server is actually booting from it.
|
||||
|
||||
If you are curious on how this works, you can checkout the script that
|
||||
we use for
|
||||
[Devuan/Debian](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/debian-devuan-install-on-disk.sh)
|
||||
and
|
||||
[Alpine Linux](https://code.ungleich.ch/ungleich-public/ungleich-tools/-/blob/master/alpine-install-on-disk.sh)
|
||||
|
||||
## The road continues
|
||||
|
||||
While a data center needs to be stable, it also needs to adapt to
|
||||
newer technologies or different flows. The disk based boot is our
|
||||
current solution for our path towards kubernetes migration, but who
|
||||
knows - in the future things might look different again.
|
||||
|
||||
If you want to join the discussion, we have a
|
||||
[Hacking and Learning
|
||||
(#hacking-and-learning:ungleich.ch)](/u/projects/open-chat/) channel
|
||||
on Matrix for an open exchange.
|
||||
|
||||
Oh and in case [you were wondering what we did
|
||||
today](https://twitter.com/ungleich/status/1432627966316584968), we
|
||||
switched to disk based booting ;-).
|
Loading…
Reference in a new issue