223 lines
7.9 KiB
Text
223 lines
7.9 KiB
Text
|
title: Redundant routing infrastructure at Data Center Light
|
||
|
---
|
||
|
pub_date: 2021-05-01
|
||
|
---
|
||
|
author: Nico Schottelius
|
||
|
---
|
||
|
twitter_handle: NicoSchottelius
|
||
|
---
|
||
|
_hidden: no
|
||
|
---
|
||
|
_discoverable: no
|
||
|
---
|
||
|
abstract:
|
||
|
|
||
|
---
|
||
|
body:
|
||
|
|
||
|
In case you have missed the previous articles, you can
|
||
|
get [an introduction to the Data Center Light spring
|
||
|
cleanup](/u/blog/datacenterlight-spring-network-cleanup),
|
||
|
see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot)
|
||
|
or read about [the active-active routing
|
||
|
problems](/u/blog/datacenterlight-active-active-routing/).
|
||
|
|
||
|
In this article we will show how we finally solved the routing issue
|
||
|
conceptually as well as practically.
|
||
|
|
||
|
## Active-active or passive-active routing?
|
||
|
|
||
|
In the [previous blog article](/u/blog/datacenterlight-active-active-routing/)
|
||
|
we reasoned that active-active routing, even with session
|
||
|
synchronisation does not have a straight forward solution in our
|
||
|
case. However in the
|
||
|
[first blog article](/u/blog/datacenterlight-spring-network-cleanup)
|
||
|
we reasoned that active-passive routers with VRRP and keepalived are
|
||
|
not stable enough either
|
||
|
|
||
|
So which path should we take? Or is there another solution?
|
||
|
|
||
|
## Active-Active-Passive Routing
|
||
|
|
||
|
Let us introduce Active-Active-Passive routing. Something that sounds
|
||
|
strange in the first place, but is going to make sense in the next
|
||
|
minutes.
|
||
|
|
||
|
We do want multiple active routers, but we do not want to have to
|
||
|
deal with session synchronisation, which is not only tricky, but due
|
||
|
to its complexity can also be a source of error.
|
||
|
|
||
|
So what we are looking for is active-active routing without state
|
||
|
synchronisation. While this sounds like a contradiction, if we loosen
|
||
|
our requirement a little bit, we are able to support multiple active
|
||
|
routers without session synchronisation by using **routing
|
||
|
priorities**.
|
||
|
|
||
|
## Active-Active routing with routing priorities
|
||
|
|
||
|
Let's assume for a moment that all involved hosts (servers, clients,
|
||
|
routers, etc.) know about multiple routes for outgoing and incoming
|
||
|
traffic. Let's assume also for a moment that **we can prioritise**
|
||
|
those routes. Then we can create a deterministic routing path that
|
||
|
does not need session synchronisation.
|
||
|
|
||
|
## Steering outgoing traffic
|
||
|
|
||
|
Let's have a first look at the outgoing traffic. Can we announce
|
||
|
multiple routers in a network, but have the servers and clients
|
||
|
**prefer** one of the routers? The answer is yes!
|
||
|
If we checkout the manpage of
|
||
|
[radvd.conf(5)](https://linux.die.net/man/5/radvd.conf) we find a
|
||
|
setting that is named **AdvDefaultPreference**
|
||
|
|
||
|
```
|
||
|
AdvDefaultPreference low|medium|high
|
||
|
```
|
||
|
|
||
|
Using this attribute, two routers can both actively announce
|
||
|
themselves, but clients in the network will prefer the one with the
|
||
|
higher preference setting.
|
||
|
|
||
|
### Replacing radvd with bird
|
||
|
|
||
|
At this point a short side note: We have been using radvd for some
|
||
|
years in the Data Center Light. However recently on our
|
||
|
[Alpine Linux based routers](https://alpinelinux.org/), radvd started
|
||
|
to crash from time to time:
|
||
|
|
||
|
```
|
||
|
[717424.727125] device eth1 left promiscuous mode
|
||
|
[1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in ld-musl-x86_64.so.1[7f6bdd558000+48000]
|
||
|
[1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||
|
...
|
||
|
[1458460.511006] device eth0 entered promiscuous mode
|
||
|
[1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in ld-musl-x86_64.so.1[7f94ec1c2000+48000]
|
||
|
[1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||
|
...
|
||
|
```
|
||
|
|
||
|
Unfortunately it seems that either the addresses timed out or that
|
||
|
radvd was able to send a message de-announcing itself prior to the
|
||
|
crash, causing all clients to withdraw their addresses. This is
|
||
|
especially problematic, if you run a [ceph](https://ceph.io/) cluster
|
||
|
and the servers don't have IP addresses anymore...
|
||
|
|
||
|
While we did not yet investigate the full cause of this, we had a very
|
||
|
easy solution: as all of our routers run
|
||
|
[bird](https://bird.network.cz/) and it also supports sending router
|
||
|
advertisements, we replaced radvd with bird. The configuration is
|
||
|
actually pretty simple:
|
||
|
|
||
|
```
|
||
|
protocol radv {
|
||
|
# Internal
|
||
|
interface "eth1.5" {
|
||
|
max ra interval 5; # Fast failover with more routers
|
||
|
other config yes; # dhcpv6 boot
|
||
|
default preference high;
|
||
|
};
|
||
|
rdnss {
|
||
|
lifetime 3600;
|
||
|
ns 2a0a:e5c0:0:a::a;
|
||
|
ns 2a0a:e5c0:0:a::b;
|
||
|
};
|
||
|
dnssl {
|
||
|
lifetime 3600;
|
||
|
domain "place5.ungleich.ch";
|
||
|
};
|
||
|
}
|
||
|
```
|
||
|
|
||
|
|
||
|
## Steering incoming traffic
|
||
|
|
||
|
As the internal and the upstream routers are in the same data center,
|
||
|
we can use an IGP like OSPF to distribute the routes to the internal
|
||
|
routers. And OSPF actually has this very neat metric called **cost**.
|
||
|
So for the router that sets the **default preference high** for the
|
||
|
outgoing routes, we keep the cost at 10, for the router that
|
||
|
ses the **default preference low** we set the cost at 20. The actual
|
||
|
bird configuration on a router looks like this:
|
||
|
|
||
|
```
|
||
|
define ospf_cost = 10;
|
||
|
...
|
||
|
|
||
|
protocol ospf v3 ospf6 {
|
||
|
instance id 0;
|
||
|
|
||
|
ipv6 {
|
||
|
import all;
|
||
|
export none;
|
||
|
};
|
||
|
|
||
|
area 0 {
|
||
|
interface "eth1.*" {
|
||
|
authentication cryptographic;
|
||
|
password "weshouldhaveremovedthisfortheblogpost";
|
||
|
cost ospf_cost;
|
||
|
};
|
||
|
};
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Incoming + Outgoing = symmetric paths
|
||
|
|
||
|
With both directions under our control, we now have enabled symmetric
|
||
|
routing in both directions. Thus as long as the first router is alive,
|
||
|
all traffic will be handled by the first router.
|
||
|
|
||
|
## Failover scenario
|
||
|
|
||
|
In case the first router fails, clients have a low life time of 15
|
||
|
seconds (3x **max ra interval**)
|
||
|
for their routes and they will fail over to the 2nd router
|
||
|
automatically. Existing sessions will not continue to work, but that
|
||
|
is ok for our setup. When the first router with the higher priority
|
||
|
comes back, there will be again an interruption, but clients will
|
||
|
automatically change their paths.
|
||
|
|
||
|
And so will the upstream routers, as OSPF is a quick protocol that
|
||
|
updates alive routers and routes.
|
||
|
|
||
|
|
||
|
## IPv6 enables active-active-passive routing architectures
|
||
|
|
||
|
At ungleich it almost always comes back to the topic of IPv6, albeit
|
||
|
for a good reason. You might remember that we claimed in the
|
||
|
[IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article
|
||
|
that this is reducing complexity? If you look at the above example,
|
||
|
you might not spot it directly, but going IPv6 only is actually an
|
||
|
enabler for our setup:
|
||
|
|
||
|
We **only deploy router advertisements** using bird. We are **not using DHCPv4**
|
||
|
or **IPv4** for accessing our servers. Both routers run a dhcpv6
|
||
|
service in parallel, with the "boot server" pointing to themselves.
|
||
|
|
||
|
Besides being nice and clean,
|
||
|
our whole active-active-passive routing setup **would not work with
|
||
|
IPv4**, because dhcpv4 servers do not have the same functionality to
|
||
|
provide routing priorities.
|
||
|
|
||
|
## Take away
|
||
|
|
||
|
You can see that trying to solve one problem ("unreliable redundant
|
||
|
router setup") entailed a slew of changes, but in the end made our
|
||
|
infrastructure much simpler:
|
||
|
|
||
|
* No dual stack
|
||
|
* No private IPv4 addresses
|
||
|
* No actively communicating keepalived
|
||
|
* Two daemons less to maintain (keepalived, radvd)
|
||
|
|
||
|
We also avoided complex state synchronisation and deployed only Open
|
||
|
Source Software to address our problems. Furthermore hardware that
|
||
|
looked like unusable in modern IPv6 networks can also be upgraded with
|
||
|
Open Source Software (ipxe) and enables us to provide more sustainable
|
||
|
infrastructures.
|
||
|
|
||
|
We hope you enjoyed our spring cleanup blog series. The next one will
|
||
|
be coming, because IT infrastructures always evolve. Until then:
|
||
|
feel free to [join our Open Soure Chat](https://chat.with.ungleich.ch)
|
||
|
and join the discussion.
|