title: Redundant routing infrastructure at Data Center Light --- pub_date: 2021-05-01 --- author: Nico Schottelius --- twitter_handle: NicoSchottelius --- _hidden: no --- _discoverable: no --- abstract: --- body: In case you have missed the previous articles, you can get [an introduction to the Data Center Light spring cleanup](/u/blog/datacenterlight-spring-network-cleanup), see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) or read about [the active-active routing problems](/u/blog/datacenterlight-active-active-routing/). In this article we will show how we finally solved the routing issue conceptually as well as practically. ## Active-active or passive-active routing? In the [previous blog article](/u/blog/datacenterlight-active-active-routing/) we reasoned that active-active routing, even with session synchronisation does not have a straight forward solution in our case. However in the [first blog article](/u/blog/datacenterlight-spring-network-cleanup) we reasoned that active-passive routers with VRRP and keepalived are not stable enough either So which path should we take? Or is there another solution? ## Active-Active-Passive Routing Let us introduce Active-Active-Passive routing. Something that sounds strange in the first place, but is going to make sense in the next minutes. We do want multiple active routers, but we do not want to have to deal with session synchronisation, which is not only tricky, but due to its complexity can also be a source of error. So what we are looking for is active-active routing without state synchronisation. While this sounds like a contradiction, if we loosen our requirement a little bit, we are able to support multiple active routers without session synchronisation by using **routing priorities**. ## Active-Active routing with routing priorities Let's assume for a moment that all involved hosts (servers, clients, routers, etc.) know about multiple routes for outgoing and incoming traffic. Let's assume also for a moment that **we can prioritise** those routes. Then we can create a deterministic routing path that does not need session synchronisation. ## Steering outgoing traffic Let's have a first look at the outgoing traffic. Can we announce multiple routers in a network, but have the servers and clients **prefer** one of the routers? The answer is yes! If we checkout the manpage of [radvd.conf(5)](https://linux.die.net/man/5/radvd.conf) we find a setting that is named **AdvDefaultPreference** ``` AdvDefaultPreference low|medium|high ``` Using this attribute, two routers can both actively announce themselves, but clients in the network will prefer the one with the higher preference setting. ### Replacing radvd with bird At this point a short side note: We have been using radvd for some years in the Data Center Light. However recently on our [Alpine Linux based routers](https://alpinelinux.org/), radvd started to crash from time to time: ``` [717424.727125] device eth1 left promiscuous mode [1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in ld-musl-x86_64.so.1[7f6bdd558000+48000] [1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89 ... [1458460.511006] device eth0 entered promiscuous mode [1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in ld-musl-x86_64.so.1[7f94ec1c2000+48000] [1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89 ... ``` Unfortunately it seems that either the addresses timed out or that radvd was able to send a message de-announcing itself prior to the crash, causing all clients to withdraw their addresses. This is especially problematic, if you run a [ceph](https://ceph.io/) cluster and the servers don't have IP addresses anymore... While we did not yet investigate the full cause of this, we had a very easy solution: as all of our routers run [bird](https://bird.network.cz/) and it also supports sending router advertisements, we replaced radvd with bird. The configuration is actually pretty simple: ``` protocol radv { # Internal interface "eth1.5" { max ra interval 5; # Fast failover with more routers other config yes; # dhcpv6 boot default preference high; }; rdnss { lifetime 3600; ns 2a0a:e5c0:0:a::a; ns 2a0a:e5c0:0:a::b; }; dnssl { lifetime 3600; domain "place5.ungleich.ch"; }; } ``` ## Steering incoming traffic As the internal and the upstream routers are in the same data center, we can use an IGP like OSPF to distribute the routes to the internal routers. And OSPF actually has this very neat metric called **cost**. So for the router that sets the **default preference high** for the outgoing routes, we keep the cost at 10, for the router that ses the **default preference low** we set the cost at 20. The actual bird configuration on a router looks like this: ``` define ospf_cost = 10; ... protocol ospf v3 ospf6 { instance id 0; ipv6 { import all; export none; }; area 0 { interface "eth1.*" { authentication cryptographic; password "weshouldhaveremovedthisfortheblogpost"; cost ospf_cost; }; }; } ``` ## Incoming + Outgoing = symmetric paths With both directions under our control, we now have enabled symmetric routing in both directions. Thus as long as the first router is alive, all traffic will be handled by the first router. ## Failover scenario In case the first router fails, clients have a low life time of 15 seconds (3x **max ra interval**) for their routes and they will fail over to the 2nd router automatically. Existing sessions will not continue to work, but that is ok for our setup. When the first router with the higher priority comes back, there will be again an interruption, but clients will automatically change their paths. And so will the upstream routers, as OSPF is a quick protocol that updates alive routers and routes. ## IPv6 enables active-active-passive routing architectures At ungleich it almost always comes back to the topic of IPv6, albeit for a good reason. You might remember that we claimed in the [IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article that this is reducing complexity? If you look at the above example, you might not spot it directly, but going IPv6 only is actually an enabler for our setup: We **only deploy router advertisements** using bird. We are **not using DHCPv4** or **IPv4** for accessing our servers. Both routers run a dhcpv6 service in parallel, with the "boot server" pointing to themselves. Besides being nice and clean, our whole active-active-passive routing setup **would not work with IPv4**, because dhcpv4 servers do not have the same functionality to provide routing priorities. ## Take away You can see that trying to solve one problem ("unreliable redundant router setup") entailed a slew of changes, but in the end made our infrastructure much simpler: * No dual stack * No private IPv4 addresses * No actively communicating keepalived * Two daemons less to maintain (keepalived, radvd) We also avoided complex state synchronisation and deployed only Open Source Software to address our problems. Furthermore hardware that looked like unusable in modern IPv6 networks can also be upgraded with Open Source Software (ipxe) and enables us to provide more sustainable infrastructures. We hope you enjoyed our spring cleanup blog series. The next one will be coming, because IT infrastructures always evolve. Until then: feel free to [join our Open Soure Chat](https://chat.with.ungleich.ch) and join the discussion.