++ three new blog articles

2021-05-01 11:34:11 +02:00 · 2021-05-01 11:34:11 +02:00 · 251677cf57
commit 251677cf57
parent 5d05a28e7d
3 changed files with 541 additions and 0 deletions
--- a/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
+++ b/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
@ -0,0 +1,219 @@
+title: IPv6 only netboot in Data Center Light
+---
+pub_date: 2021-05-01
+---
+author: Nico Schottelius
+---
+twitter_handle: NicoSchottelius
+---
+_hidden: no
+---
+_discoverable: no
+---
+abstract:
+How we switched from IPv4 netboot to IPv6 netboot
+---
+body:
+
+In our [previous blog
+article](/u/blog/datacenterlight-spring-network-cleanup)
+ we wrote about our motivation for the
+big spring network cleanup. In this blog article we show how we
+started reducing the complexity by removing our dependency on IPv4.
+
+## IPv6 first
+
+When you found our blog, you are probably aware: everything at
+ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
+entries for remote access have IPv6 (AAAA) entries and there are only
+rare exceptions when we utilise IPv4.
+
+## IPv4 only Netboot
+
+One of the big exceptions to this paradigm used to be how we boot our
+servers. Because our second big paradigm is sustainability, we use a
+lot of 2nd (or 3rd) generation hardware. We actually share this
+passion with our friends from
+[e-durable](https://recycled.cloud/), because sustainability is
+something that we need to employ today and not tomorrow.
+But back to the netbooting topic: For netbooting we mainly
+relied on onboard network cards so far.
+
+## Onboard network cards
+
+We used these network cards for multiple reasons:
+
+* they exist virtually in any server
+* they usually have a ROM containing a PXE capable firmware
+* it allows us to split real traffic to fiber cards and internal traffic
+
+However using the onboard devices comes also with a couple of disadvantages:
+
+* Their ROM is often outdated
+* It requires additional cabling
+
+## Cables
+
+Let's have a look at the cabling situation first. Virtually all of
+our servers are connected to the network using 2x 10 Gbit/s fiber cards.
+
+On one side this provides a fast connection, but on the other side
+it provides us with something even better: distances.
+
+Our data centers employ a non-standard design due to the re-use of
+existing factory halls. This means distances between servers and
+switches can be up to 100m. With fiber, we can easily achieve these
+distances.
+
+Additionally, have less cables provides a simpler infrastructure
+that is easier to analyse.
+
+## Reducing complexity 1
+
+So can we somehow get rid of the copper cables and switch to fiber
+only? It turns out that the fiber cards we use (mainly Intel X520's)
+have their own ROM. So we started disabling the onboard network cards
+and tried booting from the fiber cards. This worked until we wanted to
+move the lab setup to production...
+
+## Bonding (LACP) and VLAN tagging
+
+Our servers use bonding (802.3ad) for redundant connections to the
+switches and VLAN tagging on top of the bonded devices to isolate
+client traffic. On the switch side we realised this using
+configurations like
+
+```
+interface Port-Channel33
+   switchport mode trunk
+   mlag 33
+
+...
+interface Ethernet33
+   channel-group 33 mode active
+```
+
+But that does not work, if the network ROM at boot does not create an
+LACP enabled link on top of which it should be doing VLAN tagging.
+
+The ROM in our network cards **would** have allowed VLAN tagging alone
+though.
+
+To fix this problem, we reconfigured our switches as follows:
+
+```
+interface Port-Channel33
+   switchport trunk native vlan 10
+   switchport mode trunk
+   port-channel lacp fallback static
+   port-channel lacp fallback timeout 20
+   mlag 33
+```
+
+This basically does two things:
+
+* If there are no LACP frames, fallback to static (non lacp)
+  configuration
+* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
+
+Great, our servers can now netboot from fiber! But we are not done
+yet...
+
+## IPv6 only netbooting
+
+So how do we convince these network cards to do IPv6 netboot? Can we
+actually do that at all? Our first approach was to put a custom build of
+[ipxe](https://ipxe.org/) on a USB stick. We generated that
+ipxe image using **rebuild-ipxe.sh** script
+from the
+[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
+repository. Turns out using a USB stick works pretty well for most
+situations.
+
+## ROMs are not ROMs
+
+As you can imagine, the ROM of the X520 cards does not contain IPv6
+netboot support. So are we back at square 1? No, we are not. Because
+the X520's have something that the onboard devices did not
+consistently have: **a rewritable memory area**.
+
+Let's take 2 steps back here first: A ROM is an **read only memory**
+chip. Emphasis on **read only**. However, modern network cards and a
+lot of devices that support on-device firmware do actually have a
+memory (flash) area that can be written to. And that is what aids us
+in our situation.
+
+## ipxe + flbtool + x520 = fun
+
+Trying to write ipxe into the X520 cards initially failed, because the
+network card did not recognise the format of the ipxe rom file.
+
+Luckily the folks in the ipxe community already spotted that problem
+AND fixed it: The format used in these cards is called FLB. And there
+is [flbtool](https://github.com/devicenull/flbtool/), which allows you
+to wrap the ipxe rom file into the FLB format. For those who want to
+try it yourself (at your own risk!), it basically involves:
+
+* Get the current ROM from the card (try bootutil64e)
+* Extract the contents from the rom using flbtool
+* This will output some sections/parts
+* Locate one part that you want to overwrite with iPXE (a previous PXE
+  section is very suitable)
+* Replace the .bin file with your iPXE rom
+* Adjust the .json file to match the length of the new binary
+* Build a new .flb file using flbtool
+* Flash it onto the card
+
+While this is a bit of work, it is worth it for us, because...:
+
+## IPv6 only netboot over fiber
+
+With the modified ROM, basically loading iPXE at start, we can now
+boot our servers in IPv6 only networks. On our infrastructure side, we
+added two **tiny** things:
+
+We use ISC dhcp with the following configuration file:
+
+```
+option dhcp6.bootfile-url code 59 = string;
+
+option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
+
+subnet6 2a0a:e5c0:0:6::/64 {}
+```
+
+(that is the complete configuration!)
+
+And we used radvd to announce that there are other information,
+indicating clients can actually query the dhcpv6 server:
+
+```
+interface bond0.10
+{
+  AdvSendAdvert on;
+  MinRtrAdvInterval 3;
+  MaxRtrAdvInterval 5;
+  AdvDefaultLifetime 600;
+
+  # IPv6 netbooting
+  AdvOtherConfigFlag on;
+
+  prefix 2a0a:e5c0:0:6::/64      { };
+
+  RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b  { AdvRDNSSLifetime 6000; };
+  DNSSL place5.ungleich.ch {  AdvDNSSLLifetime 6000; } ;
+};
+```
+
+## Take away
+
+Being able to reduce cables was one big advantage in the beginning.
+
+Switching to IPv6 only netboot does not seem like a big simplification
+in the first place, besides being able to remove IPv4 in server
+networks.
+
+However as you will see in
+[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
+switching to IPv6 only netbooting is actually a key element on
+reducing complexity in our network.