2021-05-01 09:34:11 +00:00
|
|
|
title: IPv6 only netboot in Data Center Light
|
|
|
|
---
|
|
|
|
pub_date: 2021-05-01
|
|
|
|
---
|
|
|
|
author: Nico Schottelius
|
|
|
|
---
|
|
|
|
twitter_handle: NicoSchottelius
|
|
|
|
---
|
|
|
|
_hidden: no
|
|
|
|
---
|
|
|
|
_discoverable: no
|
|
|
|
---
|
|
|
|
abstract:
|
|
|
|
How we switched from IPv4 netboot to IPv6 netboot
|
|
|
|
---
|
|
|
|
body:
|
|
|
|
|
|
|
|
In our [previous blog
|
|
|
|
article](/u/blog/datacenterlight-spring-network-cleanup)
|
|
|
|
we wrote about our motivation for the
|
|
|
|
big spring network cleanup. In this blog article we show how we
|
|
|
|
started reducing the complexity by removing our dependency on IPv4.
|
|
|
|
|
|
|
|
## IPv6 first
|
|
|
|
|
|
|
|
When you found our blog, you are probably aware: everything at
|
|
|
|
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
|
|
|
|
entries for remote access have IPv6 (AAAA) entries and there are only
|
2021-05-01 10:27:15 +00:00
|
|
|
rare exceptions when we utilise IPv4 for our infrastructure.
|
2021-05-01 09:34:11 +00:00
|
|
|
|
|
|
|
## IPv4 only Netboot
|
|
|
|
|
|
|
|
One of the big exceptions to this paradigm used to be how we boot our
|
|
|
|
servers. Because our second big paradigm is sustainability, we use a
|
|
|
|
lot of 2nd (or 3rd) generation hardware. We actually share this
|
|
|
|
passion with our friends from
|
|
|
|
[e-durable](https://recycled.cloud/), because sustainability is
|
|
|
|
something that we need to employ today and not tomorrow.
|
|
|
|
But back to the netbooting topic: For netbooting we mainly
|
|
|
|
relied on onboard network cards so far.
|
|
|
|
|
|
|
|
## Onboard network cards
|
|
|
|
|
|
|
|
We used these network cards for multiple reasons:
|
|
|
|
|
|
|
|
* they exist virtually in any server
|
|
|
|
* they usually have a ROM containing a PXE capable firmware
|
|
|
|
* it allows us to split real traffic to fiber cards and internal traffic
|
|
|
|
|
|
|
|
However using the onboard devices comes also with a couple of disadvantages:
|
|
|
|
|
|
|
|
* Their ROM is often outdated
|
|
|
|
* It requires additional cabling
|
|
|
|
|
|
|
|
## Cables
|
|
|
|
|
|
|
|
Let's have a look at the cabling situation first. Virtually all of
|
|
|
|
our servers are connected to the network using 2x 10 Gbit/s fiber cards.
|
|
|
|
|
|
|
|
On one side this provides a fast connection, but on the other side
|
|
|
|
it provides us with something even better: distances.
|
|
|
|
|
|
|
|
Our data centers employ a non-standard design due to the re-use of
|
|
|
|
existing factory halls. This means distances between servers and
|
|
|
|
switches can be up to 100m. With fiber, we can easily achieve these
|
|
|
|
distances.
|
|
|
|
|
|
|
|
Additionally, have less cables provides a simpler infrastructure
|
|
|
|
that is easier to analyse.
|
|
|
|
|
2021-05-01 10:27:15 +00:00
|
|
|
## Disabling onboard network cards
|
2021-05-01 09:34:11 +00:00
|
|
|
|
|
|
|
So can we somehow get rid of the copper cables and switch to fiber
|
|
|
|
only? It turns out that the fiber cards we use (mainly Intel X520's)
|
|
|
|
have their own ROM. So we started disabling the onboard network cards
|
|
|
|
and tried booting from the fiber cards. This worked until we wanted to
|
|
|
|
move the lab setup to production...
|
|
|
|
|
|
|
|
## Bonding (LACP) and VLAN tagging
|
|
|
|
|
|
|
|
Our servers use bonding (802.3ad) for redundant connections to the
|
|
|
|
switches and VLAN tagging on top of the bonded devices to isolate
|
|
|
|
client traffic. On the switch side we realised this using
|
|
|
|
configurations like
|
|
|
|
|
|
|
|
```
|
|
|
|
interface Port-Channel33
|
|
|
|
switchport mode trunk
|
|
|
|
mlag 33
|
|
|
|
|
|
|
|
...
|
|
|
|
interface Ethernet33
|
|
|
|
channel-group 33 mode active
|
|
|
|
```
|
|
|
|
|
|
|
|
But that does not work, if the network ROM at boot does not create an
|
|
|
|
LACP enabled link on top of which it should be doing VLAN tagging.
|
|
|
|
|
|
|
|
The ROM in our network cards **would** have allowed VLAN tagging alone
|
|
|
|
though.
|
|
|
|
|
|
|
|
To fix this problem, we reconfigured our switches as follows:
|
|
|
|
|
|
|
|
```
|
|
|
|
interface Port-Channel33
|
|
|
|
switchport trunk native vlan 10
|
|
|
|
switchport mode trunk
|
|
|
|
port-channel lacp fallback static
|
|
|
|
port-channel lacp fallback timeout 20
|
|
|
|
mlag 33
|
|
|
|
```
|
|
|
|
|
|
|
|
This basically does two things:
|
|
|
|
|
|
|
|
* If there are no LACP frames, fallback to static (non lacp)
|
|
|
|
configuration
|
|
|
|
* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
|
|
|
|
|
|
|
|
Great, our servers can now netboot from fiber! But we are not done
|
|
|
|
yet...
|
|
|
|
|
|
|
|
## IPv6 only netbooting
|
|
|
|
|
|
|
|
So how do we convince these network cards to do IPv6 netboot? Can we
|
|
|
|
actually do that at all? Our first approach was to put a custom build of
|
|
|
|
[ipxe](https://ipxe.org/) on a USB stick. We generated that
|
|
|
|
ipxe image using **rebuild-ipxe.sh** script
|
|
|
|
from the
|
|
|
|
[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
|
|
|
|
repository. Turns out using a USB stick works pretty well for most
|
|
|
|
situations.
|
|
|
|
|
|
|
|
## ROMs are not ROMs
|
|
|
|
|
|
|
|
As you can imagine, the ROM of the X520 cards does not contain IPv6
|
|
|
|
netboot support. So are we back at square 1? No, we are not. Because
|
|
|
|
the X520's have something that the onboard devices did not
|
|
|
|
consistently have: **a rewritable memory area**.
|
|
|
|
|
|
|
|
Let's take 2 steps back here first: A ROM is an **read only memory**
|
|
|
|
chip. Emphasis on **read only**. However, modern network cards and a
|
|
|
|
lot of devices that support on-device firmware do actually have a
|
|
|
|
memory (flash) area that can be written to. And that is what aids us
|
|
|
|
in our situation.
|
|
|
|
|
|
|
|
## ipxe + flbtool + x520 = fun
|
|
|
|
|
|
|
|
Trying to write ipxe into the X520 cards initially failed, because the
|
|
|
|
network card did not recognise the format of the ipxe rom file.
|
|
|
|
|
|
|
|
Luckily the folks in the ipxe community already spotted that problem
|
|
|
|
AND fixed it: The format used in these cards is called FLB. And there
|
|
|
|
is [flbtool](https://github.com/devicenull/flbtool/), which allows you
|
|
|
|
to wrap the ipxe rom file into the FLB format. For those who want to
|
|
|
|
try it yourself (at your own risk!), it basically involves:
|
|
|
|
|
|
|
|
* Get the current ROM from the card (try bootutil64e)
|
|
|
|
* Extract the contents from the rom using flbtool
|
|
|
|
* This will output some sections/parts
|
|
|
|
* Locate one part that you want to overwrite with iPXE (a previous PXE
|
|
|
|
section is very suitable)
|
|
|
|
* Replace the .bin file with your iPXE rom
|
|
|
|
* Adjust the .json file to match the length of the new binary
|
|
|
|
* Build a new .flb file using flbtool
|
|
|
|
* Flash it onto the card
|
|
|
|
|
|
|
|
While this is a bit of work, it is worth it for us, because...:
|
|
|
|
|
|
|
|
## IPv6 only netboot over fiber
|
|
|
|
|
|
|
|
With the modified ROM, basically loading iPXE at start, we can now
|
|
|
|
boot our servers in IPv6 only networks. On our infrastructure side, we
|
|
|
|
added two **tiny** things:
|
|
|
|
|
|
|
|
We use ISC dhcp with the following configuration file:
|
|
|
|
|
|
|
|
```
|
|
|
|
option dhcp6.bootfile-url code 59 = string;
|
|
|
|
|
|
|
|
option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
|
|
|
|
|
|
|
|
subnet6 2a0a:e5c0:0:6::/64 {}
|
|
|
|
```
|
|
|
|
|
|
|
|
(that is the complete configuration!)
|
|
|
|
|
|
|
|
And we used radvd to announce that there are other information,
|
|
|
|
indicating clients can actually query the dhcpv6 server:
|
|
|
|
|
|
|
|
```
|
|
|
|
interface bond0.10
|
|
|
|
{
|
|
|
|
AdvSendAdvert on;
|
|
|
|
MinRtrAdvInterval 3;
|
|
|
|
MaxRtrAdvInterval 5;
|
|
|
|
AdvDefaultLifetime 600;
|
|
|
|
|
|
|
|
# IPv6 netbooting
|
|
|
|
AdvOtherConfigFlag on;
|
|
|
|
|
|
|
|
prefix 2a0a:e5c0:0:6::/64 { };
|
|
|
|
|
|
|
|
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
|
|
|
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
|
|
|
};
|
|
|
|
```
|
|
|
|
|
|
|
|
## Take away
|
|
|
|
|
|
|
|
Being able to reduce cables was one big advantage in the beginning.
|
|
|
|
|
|
|
|
Switching to IPv6 only netboot does not seem like a big simplification
|
|
|
|
in the first place, besides being able to remove IPv4 in server
|
|
|
|
networks.
|
|
|
|
|
|
|
|
However as you will see in
|
|
|
|
[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
|
|
|
|
switching to IPv6 only netbooting is actually a key element on
|
|
|
|
reducing complexity in our network.
|