ungleich-staticcms/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Nico Schottelius 5c37406988 ++ dcl spring
2021-05-01 12:27:15 +02:00

219 lines
6.8 KiB
Markdown

title: IPv6 only netboot in Data Center Light
---
pub_date: 2021-05-01
---
author: Nico Schottelius
---
twitter_handle: NicoSchottelius
---
_hidden: no
---
_discoverable: no
---
abstract:
How we switched from IPv4 netboot to IPv6 netboot
---
body:
In our [previous blog
article](/u/blog/datacenterlight-spring-network-cleanup)
we wrote about our motivation for the
big spring network cleanup. In this blog article we show how we
started reducing the complexity by removing our dependency on IPv4.
## IPv6 first
When you found our blog, you are probably aware: everything at
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
entries for remote access have IPv6 (AAAA) entries and there are only
rare exceptions when we utilise IPv4 for our infrastructure.
## IPv4 only Netboot
One of the big exceptions to this paradigm used to be how we boot our
servers. Because our second big paradigm is sustainability, we use a
lot of 2nd (or 3rd) generation hardware. We actually share this
passion with our friends from
[e-durable](https://recycled.cloud/), because sustainability is
something that we need to employ today and not tomorrow.
But back to the netbooting topic: For netbooting we mainly
relied on onboard network cards so far.
## Onboard network cards
We used these network cards for multiple reasons:
* they exist virtually in any server
* they usually have a ROM containing a PXE capable firmware
* it allows us to split real traffic to fiber cards and internal traffic
However using the onboard devices comes also with a couple of disadvantages:
* Their ROM is often outdated
* It requires additional cabling
## Cables
Let's have a look at the cabling situation first. Virtually all of
our servers are connected to the network using 2x 10 Gbit/s fiber cards.
On one side this provides a fast connection, but on the other side
it provides us with something even better: distances.
Our data centers employ a non-standard design due to the re-use of
existing factory halls. This means distances between servers and
switches can be up to 100m. With fiber, we can easily achieve these
distances.
Additionally, have less cables provides a simpler infrastructure
that is easier to analyse.
## Disabling onboard network cards
So can we somehow get rid of the copper cables and switch to fiber
only? It turns out that the fiber cards we use (mainly Intel X520's)
have their own ROM. So we started disabling the onboard network cards
and tried booting from the fiber cards. This worked until we wanted to
move the lab setup to production...
## Bonding (LACP) and VLAN tagging
Our servers use bonding (802.3ad) for redundant connections to the
switches and VLAN tagging on top of the bonded devices to isolate
client traffic. On the switch side we realised this using
configurations like
```
interface Port-Channel33
switchport mode trunk
mlag 33
...
interface Ethernet33
channel-group 33 mode active
```
But that does not work, if the network ROM at boot does not create an
LACP enabled link on top of which it should be doing VLAN tagging.
The ROM in our network cards **would** have allowed VLAN tagging alone
though.
To fix this problem, we reconfigured our switches as follows:
```
interface Port-Channel33
switchport trunk native vlan 10
switchport mode trunk
port-channel lacp fallback static
port-channel lacp fallback timeout 20
mlag 33
```
This basically does two things:
* If there are no LACP frames, fallback to static (non lacp)
configuration
* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
Great, our servers can now netboot from fiber! But we are not done
yet...
## IPv6 only netbooting
So how do we convince these network cards to do IPv6 netboot? Can we
actually do that at all? Our first approach was to put a custom build of
[ipxe](https://ipxe.org/) on a USB stick. We generated that
ipxe image using **rebuild-ipxe.sh** script
from the
[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
repository. Turns out using a USB stick works pretty well for most
situations.
## ROMs are not ROMs
As you can imagine, the ROM of the X520 cards does not contain IPv6
netboot support. So are we back at square 1? No, we are not. Because
the X520's have something that the onboard devices did not
consistently have: **a rewritable memory area**.
Let's take 2 steps back here first: A ROM is an **read only memory**
chip. Emphasis on **read only**. However, modern network cards and a
lot of devices that support on-device firmware do actually have a
memory (flash) area that can be written to. And that is what aids us
in our situation.
## ipxe + flbtool + x520 = fun
Trying to write ipxe into the X520 cards initially failed, because the
network card did not recognise the format of the ipxe rom file.
Luckily the folks in the ipxe community already spotted that problem
AND fixed it: The format used in these cards is called FLB. And there
is [flbtool](https://github.com/devicenull/flbtool/), which allows you
to wrap the ipxe rom file into the FLB format. For those who want to
try it yourself (at your own risk!), it basically involves:
* Get the current ROM from the card (try bootutil64e)
* Extract the contents from the rom using flbtool
* This will output some sections/parts
* Locate one part that you want to overwrite with iPXE (a previous PXE
section is very suitable)
* Replace the .bin file with your iPXE rom
* Adjust the .json file to match the length of the new binary
* Build a new .flb file using flbtool
* Flash it onto the card
While this is a bit of work, it is worth it for us, because...:
## IPv6 only netboot over fiber
With the modified ROM, basically loading iPXE at start, we can now
boot our servers in IPv6 only networks. On our infrastructure side, we
added two **tiny** things:
We use ISC dhcp with the following configuration file:
```
option dhcp6.bootfile-url code 59 = string;
option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
subnet6 2a0a:e5c0:0:6::/64 {}
```
(that is the complete configuration!)
And we used radvd to announce that there are other information,
indicating clients can actually query the dhcpv6 server:
```
interface bond0.10
{
AdvSendAdvert on;
MinRtrAdvInterval 3;
MaxRtrAdvInterval 5;
AdvDefaultLifetime 600;
# IPv6 netbooting
AdvOtherConfigFlag on;
prefix 2a0a:e5c0:0:6::/64 { };
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
};
```
## Take away
Being able to reduce cables was one big advantage in the beginning.
Switching to IPv6 only netboot does not seem like a big simplification
in the first place, besides being able to remove IPv4 in server
networks.
However as you will see in
[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
switching to IPv6 only netbooting is actually a key element on
reducing complexity in our network.