Commit 5c374069 authored by Nico Schottelius's avatar Nico Schottelius

++ dcl spring

parent 251677cf
Pipeline #3576 passed with stages
in 2 minutes and 42 seconds
......@@ -157,5 +157,6 @@ our thanks to Pablo Neira Ayuso, who gave very important input for
session based firewalls and session synchronisation.
So active-active routing seems not to have a straight forward
solution. Read in the [next blog article](/) on how we solved the
challenge in the end.
solution. Read in the [next blog
article](/u/blog/datacenterlight-redundant-routing-infrastructure) on
how we solved the challenge in the end.
......@@ -26,7 +26,7 @@ started reducing the complexity by removing our dependency on IPv4.
When you found our blog, you are probably aware: everything at
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
entries for remote access have IPv6 (AAAA) entries and there are only
rare exceptions when we utilise IPv4.
rare exceptions when we utilise IPv4 for our infrastructure.
## IPv4 only Netboot
......@@ -68,7 +68,7 @@ distances.
Additionally, have less cables provides a simpler infrastructure
that is easier to analyse.
## Reducing complexity 1
## Disabling onboard network cards
So can we somehow get rid of the copper cables and switch to fiber
only? It turns out that the fiber cards we use (mainly Intel X520's)
title: Redundant routing infrastructure at Data Center Light
pub_date: 2021-05-01
author: Nico Schottelius
twitter_handle: NicoSchottelius
_hidden: no
_discoverable: no
In case you have missed the previous articles, you can
get [an introduction to the Data Center Light spring
see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot)
or read about [the active-active routing
In this article we will show how we finally solved the routing issue
conceptually as well as practically.
## Active-active or passive-active routing?
In the [previous blog article](/u/blog/datacenterlight-active-active-routing/)
we reasoned that active-active routing, even with session
synchronisation does not have a straight forward solution in our
case. However in the
[first blog article](/u/blog/datacenterlight-spring-network-cleanup)
we reasoned that active-passive routers with VRRP and keepalived are
not stable enough either
So which path should we take? Or is there another solution?
## Active-Active-Passive Routing
Let us introduce Active-Active-Passive routing. Something that sounds
strange in the first place, but is going to make sense in the next
We do want multiple active routers, but we do not want to have to
deal with session synchronisation, which is not only tricky, but due
to its complexity can also be a source of error.
So what we are looking for is active-active routing without state
synchronisation. While this sounds like a contradiction, if we loosen
our requirement a little bit, we are able to support multiple active
routers without session synchronisation by using **routing
## Active-Active routing with routing priorities
Let's assume for a moment that all involved hosts (servers, clients,
routers, etc.) know about multiple routes for outgoing and incoming
traffic. Let's assume also for a moment that **we can prioritise**
those routes. Then we can create a deterministic routing path that
does not need session synchronisation.
## Steering outgoing traffic
Let's have a first look at the outgoing traffic. Can we announce
multiple routers in a network, but have the servers and clients
**prefer** one of the routers? The answer is yes!
If we checkout the manpage of
[radvd.conf(5)]( we find a
setting that is named **AdvDefaultPreference**
AdvDefaultPreference low|medium|high
Using this attribute, two routers can both actively announce
themselves, but clients in the network will prefer the one with the
higher preference setting.
### Replacing radvd with bird
At this point a short side note: We have been using radvd for some
years in the Data Center Light. However recently on our
[Alpine Linux based routers](, radvd started
to crash from time to time:
[717424.727125] device eth1 left promiscuous mode
[1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in[7f6bdd558000+48000]
[1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
[1458460.511006] device eth0 entered promiscuous mode
[1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in[7f94ec1c2000+48000]
[1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
Unfortunately it seems that either the addresses timed out or that
radvd was able to send a message de-announcing itself prior to the
crash, causing all clients to withdraw their addresses. This is
especially problematic, if you run a [ceph]( cluster
and the servers don't have IP addresses anymore...
While we did not yet investigate the full cause of this, we had a very
easy solution: as all of our routers run
[bird]( and it also supports sending router
advertisements, we replaced radvd with bird. The configuration is
actually pretty simple:
protocol radv {
# Internal
interface "eth1.5" {
max ra interval 5; # Fast failover with more routers
other config yes; # dhcpv6 boot
default preference high;
rdnss {
lifetime 3600;
ns 2a0a:e5c0:0:a::a;
ns 2a0a:e5c0:0:a::b;
dnssl {
lifetime 3600;
domain "";
## Steering incoming traffic
As the internal and the upstream routers are in the same data center,
we can use an IGP like OSPF to distribute the routes to the internal
routers. And OSPF actually has this very neat metric called **cost**.
So for the router that sets the **default preference high** for the
outgoing routes, we keep the cost at 10, for the router that
ses the **default preference low** we set the cost at 20. The actual
bird configuration on a router looks like this:
define ospf_cost = 10;
protocol ospf v3 ospf6 {
instance id 0;
ipv6 {
import all;
export none;
area 0 {
interface "eth1.*" {
authentication cryptographic;
password "weshouldhaveremovedthisfortheblogpost";
cost ospf_cost;
## Incoming + Outgoing = symmetric paths
With both directions under our control, we now have enabled symmetric
routing in both directions. Thus as long as the first router is alive,
all traffic will be handled by the first router.
## Failover scenario
In case the first router fails, clients have a low life time of 15
seconds (3x **max ra interval**)
for their routes and they will fail over to the 2nd router
automatically. Existing sessions will not continue to work, but that
is ok for our setup. When the first router with the higher priority
comes back, there will be again an interruption, but clients will
automatically change their paths.
And so will the upstream routers, as OSPF is a quick protocol that
updates alive routers and routes.
## IPv6 enables active-active-passive routing architectures
At ungleich it almost always comes back to the topic of IPv6, albeit
for a good reason. You might remember that we claimed in the
[IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article
that this is reducing complexity? If you look at the above example,
you might not spot it directly, but going IPv6 only is actually an
enabler for our setup:
We **only deploy router advertisements** using bird. We are **not using DHCPv4**
or **IPv4** for accessing our servers. Both routers run a dhcpv6
service in parallel, with the "boot server" pointing to themselves.
Besides being nice and clean,
our whole active-active-passive routing setup **would not work with
IPv4**, because dhcpv4 servers do not have the same functionality to
provide routing priorities.
## Take away
You can see that trying to solve one problem ("unreliable redundant
router setup") entailed a slew of changes, but in the end made our
infrastructure much simpler:
* No dual stack
* No private IPv4 addresses
* No actively communicating keepalived
* Two daemons less to maintain (keepalived, radvd)
We also avoided complex state synchronisation and deployed only Open
Source Software to address our problems. Furthermore hardware that
looked like unusable in modern IPv6 networks can also be upgraded with
Open Source Software (ipxe) and enables us to provide more sustainable
We hope you enjoyed our spring cleanup blog series. The next one will
be coming, because IT infrastructures always evolve. Until then:
feel free to [join our Open Soure Chat](
and join the discussion.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment