title: IPv6 only netboot in Data Center Light --- pub_date: 2021-05-01 --- author: Nico Schottelius --- twitter_handle: NicoSchottelius --- _hidden: no --- _discoverable: no --- abstract: How we switched from IPv4 netboot to IPv6 netboot --- body: In our [previous blog article](/u/blog/datacenterlight-spring-network-cleanup) we wrote about our motivation for the big spring network cleanup. In this blog article we show how we started reducing the complexity by removing our dependency on IPv4. ## IPv6 first When you found our blog, you are probably aware: everything at ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS entries for remote access have IPv6 (AAAA) entries and there are only rare exceptions when we utilise IPv4 for our infrastructure. ## IPv4 only Netboot One of the big exceptions to this paradigm used to be how we boot our servers. Because our second big paradigm is sustainability, we use a lot of 2nd (or 3rd) generation hardware. We actually share this passion with our friends from [e-durable](https://recycled.cloud/), because sustainability is something that we need to employ today and not tomorrow. But back to the netbooting topic: For netbooting we mainly relied on onboard network cards so far. ## Onboard network cards We used these network cards for multiple reasons: * they exist virtually in any server * they usually have a ROM containing a PXE capable firmware * it allows us to split real traffic to fiber cards and internal traffic However using the onboard devices comes also with a couple of disadvantages: * Their ROM is often outdated * It requires additional cabling ## Cables Let's have a look at the cabling situation first. Virtually all of our servers are connected to the network using 2x 10 Gbit/s fiber cards. On one side this provides a fast connection, but on the other side it provides us with something even better: distances. Our data centers employ a non-standard design due to the re-use of existing factory halls. This means distances between servers and switches can be up to 100m. With fiber, we can easily achieve these distances. Additionally, have less cables provides a simpler infrastructure that is easier to analyse. ## Disabling onboard network cards So can we somehow get rid of the copper cables and switch to fiber only? It turns out that the fiber cards we use (mainly Intel X520's) have their own ROM. So we started disabling the onboard network cards and tried booting from the fiber cards. This worked until we wanted to move the lab setup to production... ## Bonding (LACP) and VLAN tagging Our servers use bonding (802.3ad) for redundant connections to the switches and VLAN tagging on top of the bonded devices to isolate client traffic. On the switch side we realised this using configurations like ``` interface Port-Channel33 switchport mode trunk mlag 33 ... interface Ethernet33 channel-group 33 mode active ``` But that does not work, if the network ROM at boot does not create an LACP enabled link on top of which it should be doing VLAN tagging. The ROM in our network cards **would** have allowed VLAN tagging alone though. To fix this problem, we reconfigured our switches as follows: ``` interface Port-Channel33 switchport trunk native vlan 10 switchport mode trunk port-channel lacp fallback static port-channel lacp fallback timeout 20 mlag 33 ``` This basically does two things: * If there are no LACP frames, fallback to static (non lacp) configuration * Accept untagged traffic and map it to VLAN 10 (one of our boot networks) Great, our servers can now netboot from fiber! But we are not done yet... ## IPv6 only netbooting So how do we convince these network cards to do IPv6 netboot? Can we actually do that at all? Our first approach was to put a custom build of [ipxe](https://ipxe.org/) on a USB stick. We generated that ipxe image using **rebuild-ipxe.sh** script from the [ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools) repository. Turns out using a USB stick works pretty well for most situations. ## ROMs are not ROMs As you can imagine, the ROM of the X520 cards does not contain IPv6 netboot support. So are we back at square 1? No, we are not. Because the X520's have something that the onboard devices did not consistently have: **a rewritable memory area**. Let's take 2 steps back here first: A ROM is an **read only memory** chip. Emphasis on **read only**. However, modern network cards and a lot of devices that support on-device firmware do actually have a memory (flash) area that can be written to. And that is what aids us in our situation. ## ipxe + flbtool + x520 = fun Trying to write ipxe into the X520 cards initially failed, because the network card did not recognise the format of the ipxe rom file. Luckily the folks in the ipxe community already spotted that problem AND fixed it: The format used in these cards is called FLB. And there is [flbtool](https://github.com/devicenull/flbtool/), which allows you to wrap the ipxe rom file into the FLB format. For those who want to try it yourself (at your own risk!), it basically involves: * Get the current ROM from the card (try bootutil64e) * Extract the contents from the rom using flbtool * This will output some sections/parts * Locate one part that you want to overwrite with iPXE (a previous PXE section is very suitable) * Replace the .bin file with your iPXE rom * Adjust the .json file to match the length of the new binary * Build a new .flb file using flbtool * Flash it onto the card While this is a bit of work, it is worth it for us, because...: ## IPv6 only netboot over fiber With the modified ROM, basically loading iPXE at start, we can now boot our servers in IPv6 only networks. On our infrastructure side, we added two **tiny** things: We use ISC dhcp with the following configuration file: ``` option dhcp6.bootfile-url code 59 = string; option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript"; subnet6 2a0a:e5c0:0:6::/64 {} ``` (that is the complete configuration!) And we used radvd to announce that there are other information, indicating clients can actually query the dhcpv6 server: ``` interface bond0.10 { AdvSendAdvert on; MinRtrAdvInterval 3; MaxRtrAdvInterval 5; AdvDefaultLifetime 600; # IPv6 netbooting AdvOtherConfigFlag on; prefix 2a0a:e5c0:0:6::/64 { }; RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; }; DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ; }; ``` ## Take away Being able to reduce cables was one big advantage in the beginning. Switching to IPv6 only netboot does not seem like a big simplification in the first place, besides being able to remove IPv4 in server networks. However as you will see in [the next blog posts](/u/blog/datacenterlight-active-active-routing/), switching to IPv6 only netbooting is actually a key element on reducing complexity in our network.