++ three new blog articles
This commit is contained in:
parent
5d05a28e7d
commit
251677cf57
3 changed files with 541 additions and 0 deletions
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
|
|
@ -0,0 +1,219 @@
|
|||
title: IPv6 only netboot in Data Center Light
|
||||
---
|
||||
pub_date: 2021-05-01
|
||||
---
|
||||
author: Nico Schottelius
|
||||
---
|
||||
twitter_handle: NicoSchottelius
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
How we switched from IPv4 netboot to IPv6 netboot
|
||||
---
|
||||
body:
|
||||
|
||||
In our [previous blog
|
||||
article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||
we wrote about our motivation for the
|
||||
big spring network cleanup. In this blog article we show how we
|
||||
started reducing the complexity by removing our dependency on IPv4.
|
||||
|
||||
## IPv6 first
|
||||
|
||||
When you found our blog, you are probably aware: everything at
|
||||
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
|
||||
entries for remote access have IPv6 (AAAA) entries and there are only
|
||||
rare exceptions when we utilise IPv4.
|
||||
|
||||
## IPv4 only Netboot
|
||||
|
||||
One of the big exceptions to this paradigm used to be how we boot our
|
||||
servers. Because our second big paradigm is sustainability, we use a
|
||||
lot of 2nd (or 3rd) generation hardware. We actually share this
|
||||
passion with our friends from
|
||||
[e-durable](https://recycled.cloud/), because sustainability is
|
||||
something that we need to employ today and not tomorrow.
|
||||
But back to the netbooting topic: For netbooting we mainly
|
||||
relied on onboard network cards so far.
|
||||
|
||||
## Onboard network cards
|
||||
|
||||
We used these network cards for multiple reasons:
|
||||
|
||||
* they exist virtually in any server
|
||||
* they usually have a ROM containing a PXE capable firmware
|
||||
* it allows us to split real traffic to fiber cards and internal traffic
|
||||
|
||||
However using the onboard devices comes also with a couple of disadvantages:
|
||||
|
||||
* Their ROM is often outdated
|
||||
* It requires additional cabling
|
||||
|
||||
## Cables
|
||||
|
||||
Let's have a look at the cabling situation first. Virtually all of
|
||||
our servers are connected to the network using 2x 10 Gbit/s fiber cards.
|
||||
|
||||
On one side this provides a fast connection, but on the other side
|
||||
it provides us with something even better: distances.
|
||||
|
||||
Our data centers employ a non-standard design due to the re-use of
|
||||
existing factory halls. This means distances between servers and
|
||||
switches can be up to 100m. With fiber, we can easily achieve these
|
||||
distances.
|
||||
|
||||
Additionally, have less cables provides a simpler infrastructure
|
||||
that is easier to analyse.
|
||||
|
||||
## Reducing complexity 1
|
||||
|
||||
So can we somehow get rid of the copper cables and switch to fiber
|
||||
only? It turns out that the fiber cards we use (mainly Intel X520's)
|
||||
have their own ROM. So we started disabling the onboard network cards
|
||||
and tried booting from the fiber cards. This worked until we wanted to
|
||||
move the lab setup to production...
|
||||
|
||||
## Bonding (LACP) and VLAN tagging
|
||||
|
||||
Our servers use bonding (802.3ad) for redundant connections to the
|
||||
switches and VLAN tagging on top of the bonded devices to isolate
|
||||
client traffic. On the switch side we realised this using
|
||||
configurations like
|
||||
|
||||
```
|
||||
interface Port-Channel33
|
||||
switchport mode trunk
|
||||
mlag 33
|
||||
|
||||
...
|
||||
interface Ethernet33
|
||||
channel-group 33 mode active
|
||||
```
|
||||
|
||||
But that does not work, if the network ROM at boot does not create an
|
||||
LACP enabled link on top of which it should be doing VLAN tagging.
|
||||
|
||||
The ROM in our network cards **would** have allowed VLAN tagging alone
|
||||
though.
|
||||
|
||||
To fix this problem, we reconfigured our switches as follows:
|
||||
|
||||
```
|
||||
interface Port-Channel33
|
||||
switchport trunk native vlan 10
|
||||
switchport mode trunk
|
||||
port-channel lacp fallback static
|
||||
port-channel lacp fallback timeout 20
|
||||
mlag 33
|
||||
```
|
||||
|
||||
This basically does two things:
|
||||
|
||||
* If there are no LACP frames, fallback to static (non lacp)
|
||||
configuration
|
||||
* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
|
||||
|
||||
Great, our servers can now netboot from fiber! But we are not done
|
||||
yet...
|
||||
|
||||
## IPv6 only netbooting
|
||||
|
||||
So how do we convince these network cards to do IPv6 netboot? Can we
|
||||
actually do that at all? Our first approach was to put a custom build of
|
||||
[ipxe](https://ipxe.org/) on a USB stick. We generated that
|
||||
ipxe image using **rebuild-ipxe.sh** script
|
||||
from the
|
||||
[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
|
||||
repository. Turns out using a USB stick works pretty well for most
|
||||
situations.
|
||||
|
||||
## ROMs are not ROMs
|
||||
|
||||
As you can imagine, the ROM of the X520 cards does not contain IPv6
|
||||
netboot support. So are we back at square 1? No, we are not. Because
|
||||
the X520's have something that the onboard devices did not
|
||||
consistently have: **a rewritable memory area**.
|
||||
|
||||
Let's take 2 steps back here first: A ROM is an **read only memory**
|
||||
chip. Emphasis on **read only**. However, modern network cards and a
|
||||
lot of devices that support on-device firmware do actually have a
|
||||
memory (flash) area that can be written to. And that is what aids us
|
||||
in our situation.
|
||||
|
||||
## ipxe + flbtool + x520 = fun
|
||||
|
||||
Trying to write ipxe into the X520 cards initially failed, because the
|
||||
network card did not recognise the format of the ipxe rom file.
|
||||
|
||||
Luckily the folks in the ipxe community already spotted that problem
|
||||
AND fixed it: The format used in these cards is called FLB. And there
|
||||
is [flbtool](https://github.com/devicenull/flbtool/), which allows you
|
||||
to wrap the ipxe rom file into the FLB format. For those who want to
|
||||
try it yourself (at your own risk!), it basically involves:
|
||||
|
||||
* Get the current ROM from the card (try bootutil64e)
|
||||
* Extract the contents from the rom using flbtool
|
||||
* This will output some sections/parts
|
||||
* Locate one part that you want to overwrite with iPXE (a previous PXE
|
||||
section is very suitable)
|
||||
* Replace the .bin file with your iPXE rom
|
||||
* Adjust the .json file to match the length of the new binary
|
||||
* Build a new .flb file using flbtool
|
||||
* Flash it onto the card
|
||||
|
||||
While this is a bit of work, it is worth it for us, because...:
|
||||
|
||||
## IPv6 only netboot over fiber
|
||||
|
||||
With the modified ROM, basically loading iPXE at start, we can now
|
||||
boot our servers in IPv6 only networks. On our infrastructure side, we
|
||||
added two **tiny** things:
|
||||
|
||||
We use ISC dhcp with the following configuration file:
|
||||
|
||||
```
|
||||
option dhcp6.bootfile-url code 59 = string;
|
||||
|
||||
option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
|
||||
|
||||
subnet6 2a0a:e5c0:0:6::/64 {}
|
||||
```
|
||||
|
||||
(that is the complete configuration!)
|
||||
|
||||
And we used radvd to announce that there are other information,
|
||||
indicating clients can actually query the dhcpv6 server:
|
||||
|
||||
```
|
||||
interface bond0.10
|
||||
{
|
||||
AdvSendAdvert on;
|
||||
MinRtrAdvInterval 3;
|
||||
MaxRtrAdvInterval 5;
|
||||
AdvDefaultLifetime 600;
|
||||
|
||||
# IPv6 netbooting
|
||||
AdvOtherConfigFlag on;
|
||||
|
||||
prefix 2a0a:e5c0:0:6::/64 { };
|
||||
|
||||
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
||||
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
||||
};
|
||||
```
|
||||
|
||||
## Take away
|
||||
|
||||
Being able to reduce cables was one big advantage in the beginning.
|
||||
|
||||
Switching to IPv6 only netboot does not seem like a big simplification
|
||||
in the first place, besides being able to remove IPv4 in server
|
||||
networks.
|
||||
|
||||
However as you will see in
|
||||
[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
|
||||
switching to IPv6 only netbooting is actually a key element on
|
||||
reducing complexity in our network.
|
||||
Loading…
Add table
Add a link
Reference in a new issue