++ dcl spring
This commit is contained in:
parent
251677cf57
commit
5c37406988
3 changed files with 227 additions and 4 deletions
|
@ -157,5 +157,6 @@ our thanks to Pablo Neira Ayuso, who gave very important input for
|
|||
session based firewalls and session synchronisation.
|
||||
|
||||
So active-active routing seems not to have a straight forward
|
||||
solution. Read in the [next blog article](/) on how we solved the
|
||||
challenge in the end.
|
||||
solution. Read in the [next blog
|
||||
article](/u/blog/datacenterlight-redundant-routing-infrastructure) on
|
||||
how we solved the challenge in the end.
|
||||
|
|
|
@ -26,7 +26,7 @@ started reducing the complexity by removing our dependency on IPv4.
|
|||
When you found our blog, you are probably aware: everything at
|
||||
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
|
||||
entries for remote access have IPv6 (AAAA) entries and there are only
|
||||
rare exceptions when we utilise IPv4.
|
||||
rare exceptions when we utilise IPv4 for our infrastructure.
|
||||
|
||||
## IPv4 only Netboot
|
||||
|
||||
|
@ -68,7 +68,7 @@ distances.
|
|||
Additionally, have less cables provides a simpler infrastructure
|
||||
that is easier to analyse.
|
||||
|
||||
## Reducing complexity 1
|
||||
## Disabling onboard network cards
|
||||
|
||||
So can we somehow get rid of the copper cables and switch to fiber
|
||||
only? It turns out that the fiber cards we use (mainly Intel X520's)
|
||||
|
|
|
@ -0,0 +1,222 @@
|
|||
title: Redundant routing infrastructure at Data Center Light
|
||||
---
|
||||
pub_date: 2021-05-01
|
||||
---
|
||||
author: Nico Schottelius
|
||||
---
|
||||
twitter_handle: NicoSchottelius
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
|
||||
---
|
||||
body:
|
||||
|
||||
In case you have missed the previous articles, you can
|
||||
get [an introduction to the Data Center Light spring
|
||||
cleanup](/u/blog/datacenterlight-spring-network-cleanup),
|
||||
see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot)
|
||||
or read about [the active-active routing
|
||||
problems](/u/blog/datacenterlight-active-active-routing/).
|
||||
|
||||
In this article we will show how we finally solved the routing issue
|
||||
conceptually as well as practically.
|
||||
|
||||
## Active-active or passive-active routing?
|
||||
|
||||
In the [previous blog article](/u/blog/datacenterlight-active-active-routing/)
|
||||
we reasoned that active-active routing, even with session
|
||||
synchronisation does not have a straight forward solution in our
|
||||
case. However in the
|
||||
[first blog article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||
we reasoned that active-passive routers with VRRP and keepalived are
|
||||
not stable enough either
|
||||
|
||||
So which path should we take? Or is there another solution?
|
||||
|
||||
## Active-Active-Passive Routing
|
||||
|
||||
Let us introduce Active-Active-Passive routing. Something that sounds
|
||||
strange in the first place, but is going to make sense in the next
|
||||
minutes.
|
||||
|
||||
We do want multiple active routers, but we do not want to have to
|
||||
deal with session synchronisation, which is not only tricky, but due
|
||||
to its complexity can also be a source of error.
|
||||
|
||||
So what we are looking for is active-active routing without state
|
||||
synchronisation. While this sounds like a contradiction, if we loosen
|
||||
our requirement a little bit, we are able to support multiple active
|
||||
routers without session synchronisation by using **routing
|
||||
priorities**.
|
||||
|
||||
## Active-Active routing with routing priorities
|
||||
|
||||
Let's assume for a moment that all involved hosts (servers, clients,
|
||||
routers, etc.) know about multiple routes for outgoing and incoming
|
||||
traffic. Let's assume also for a moment that **we can prioritise**
|
||||
those routes. Then we can create a deterministic routing path that
|
||||
does not need session synchronisation.
|
||||
|
||||
## Steering outgoing traffic
|
||||
|
||||
Let's have a first look at the outgoing traffic. Can we announce
|
||||
multiple routers in a network, but have the servers and clients
|
||||
**prefer** one of the routers? The answer is yes!
|
||||
If we checkout the manpage of
|
||||
[radvd.conf(5)](https://linux.die.net/man/5/radvd.conf) we find a
|
||||
setting that is named **AdvDefaultPreference**
|
||||
|
||||
```
|
||||
AdvDefaultPreference low|medium|high
|
||||
```
|
||||
|
||||
Using this attribute, two routers can both actively announce
|
||||
themselves, but clients in the network will prefer the one with the
|
||||
higher preference setting.
|
||||
|
||||
### Replacing radvd with bird
|
||||
|
||||
At this point a short side note: We have been using radvd for some
|
||||
years in the Data Center Light. However recently on our
|
||||
[Alpine Linux based routers](https://alpinelinux.org/), radvd started
|
||||
to crash from time to time:
|
||||
|
||||
```
|
||||
[717424.727125] device eth1 left promiscuous mode
|
||||
[1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in ld-musl-x86_64.so.1[7f6bdd558000+48000]
|
||||
[1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||||
...
|
||||
[1458460.511006] device eth0 entered promiscuous mode
|
||||
[1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in ld-musl-x86_64.so.1[7f94ec1c2000+48000]
|
||||
[1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||||
...
|
||||
```
|
||||
|
||||
Unfortunately it seems that either the addresses timed out or that
|
||||
radvd was able to send a message de-announcing itself prior to the
|
||||
crash, causing all clients to withdraw their addresses. This is
|
||||
especially problematic, if you run a [ceph](https://ceph.io/) cluster
|
||||
and the servers don't have IP addresses anymore...
|
||||
|
||||
While we did not yet investigate the full cause of this, we had a very
|
||||
easy solution: as all of our routers run
|
||||
[bird](https://bird.network.cz/) and it also supports sending router
|
||||
advertisements, we replaced radvd with bird. The configuration is
|
||||
actually pretty simple:
|
||||
|
||||
```
|
||||
protocol radv {
|
||||
# Internal
|
||||
interface "eth1.5" {
|
||||
max ra interval 5; # Fast failover with more routers
|
||||
other config yes; # dhcpv6 boot
|
||||
default preference high;
|
||||
};
|
||||
rdnss {
|
||||
lifetime 3600;
|
||||
ns 2a0a:e5c0:0:a::a;
|
||||
ns 2a0a:e5c0:0:a::b;
|
||||
};
|
||||
dnssl {
|
||||
lifetime 3600;
|
||||
domain "place5.ungleich.ch";
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Steering incoming traffic
|
||||
|
||||
As the internal and the upstream routers are in the same data center,
|
||||
we can use an IGP like OSPF to distribute the routes to the internal
|
||||
routers. And OSPF actually has this very neat metric called **cost**.
|
||||
So for the router that sets the **default preference high** for the
|
||||
outgoing routes, we keep the cost at 10, for the router that
|
||||
ses the **default preference low** we set the cost at 20. The actual
|
||||
bird configuration on a router looks like this:
|
||||
|
||||
```
|
||||
define ospf_cost = 10;
|
||||
...
|
||||
|
||||
protocol ospf v3 ospf6 {
|
||||
instance id 0;
|
||||
|
||||
ipv6 {
|
||||
import all;
|
||||
export none;
|
||||
};
|
||||
|
||||
area 0 {
|
||||
interface "eth1.*" {
|
||||
authentication cryptographic;
|
||||
password "weshouldhaveremovedthisfortheblogpost";
|
||||
cost ospf_cost;
|
||||
};
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Incoming + Outgoing = symmetric paths
|
||||
|
||||
With both directions under our control, we now have enabled symmetric
|
||||
routing in both directions. Thus as long as the first router is alive,
|
||||
all traffic will be handled by the first router.
|
||||
|
||||
## Failover scenario
|
||||
|
||||
In case the first router fails, clients have a low life time of 15
|
||||
seconds (3x **max ra interval**)
|
||||
for their routes and they will fail over to the 2nd router
|
||||
automatically. Existing sessions will not continue to work, but that
|
||||
is ok for our setup. When the first router with the higher priority
|
||||
comes back, there will be again an interruption, but clients will
|
||||
automatically change their paths.
|
||||
|
||||
And so will the upstream routers, as OSPF is a quick protocol that
|
||||
updates alive routers and routes.
|
||||
|
||||
|
||||
## IPv6 enables active-active-passive routing architectures
|
||||
|
||||
At ungleich it almost always comes back to the topic of IPv6, albeit
|
||||
for a good reason. You might remember that we claimed in the
|
||||
[IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article
|
||||
that this is reducing complexity? If you look at the above example,
|
||||
you might not spot it directly, but going IPv6 only is actually an
|
||||
enabler for our setup:
|
||||
|
||||
We **only deploy router advertisements** using bird. We are **not using DHCPv4**
|
||||
or **IPv4** for accessing our servers. Both routers run a dhcpv6
|
||||
service in parallel, with the "boot server" pointing to themselves.
|
||||
|
||||
Besides being nice and clean,
|
||||
our whole active-active-passive routing setup **would not work with
|
||||
IPv4**, because dhcpv4 servers do not have the same functionality to
|
||||
provide routing priorities.
|
||||
|
||||
## Take away
|
||||
|
||||
You can see that trying to solve one problem ("unreliable redundant
|
||||
router setup") entailed a slew of changes, but in the end made our
|
||||
infrastructure much simpler:
|
||||
|
||||
* No dual stack
|
||||
* No private IPv4 addresses
|
||||
* No actively communicating keepalived
|
||||
* Two daemons less to maintain (keepalived, radvd)
|
||||
|
||||
We also avoided complex state synchronisation and deployed only Open
|
||||
Source Software to address our problems. Furthermore hardware that
|
||||
looked like unusable in modern IPv6 networks can also be upgraded with
|
||||
Open Source Software (ipxe) and enables us to provide more sustainable
|
||||
infrastructures.
|
||||
|
||||
We hope you enjoyed our spring cleanup blog series. The next one will
|
||||
be coming, because IT infrastructures always evolve. Until then:
|
||||
feel free to [join our Open Soure Chat](https://chat.with.ungleich.ch)
|
||||
and join the discussion.
|
Loading…
Reference in a new issue