From 5c37406988e130a36a5cbcd234b1d0c612f1525b Mon Sep 17 00:00:00 2001 From: Nico Schottelius Date: Sat, 1 May 2021 12:27:15 +0200 Subject: [PATCH] ++ dcl spring --- .../contents.lr | 5 +- .../contents.lr | 4 +- .../contents.lr | 222 ++++++++++++++++++ 3 files changed, 227 insertions(+), 4 deletions(-) create mode 100644 content/u/blog/datacenterlight-redundant-routing-infrastructure/contents.lr diff --git a/content/u/blog/datacenterlight-active-active-routing/contents.lr b/content/u/blog/datacenterlight-active-active-routing/contents.lr index b203cb4..994815c 100644 --- a/content/u/blog/datacenterlight-active-active-routing/contents.lr +++ b/content/u/blog/datacenterlight-active-active-routing/contents.lr @@ -157,5 +157,6 @@ our thanks to Pablo Neira Ayuso, who gave very important input for session based firewalls and session synchronisation. So active-active routing seems not to have a straight forward -solution. Read in the [next blog article](/) on how we solved the -challenge in the end. +solution. Read in the [next blog +article](/u/blog/datacenterlight-redundant-routing-infrastructure) on +how we solved the challenge in the end. diff --git a/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr b/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr index 0d42d34..a2fbd73 100644 --- a/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr +++ b/content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr @@ -26,7 +26,7 @@ started reducing the complexity by removing our dependency on IPv4. When you found our blog, you are probably aware: everything at ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS entries for remote access have IPv6 (AAAA) entries and there are only -rare exceptions when we utilise IPv4. +rare exceptions when we utilise IPv4 for our infrastructure. ## IPv4 only Netboot @@ -68,7 +68,7 @@ distances. Additionally, have less cables provides a simpler infrastructure that is easier to analyse. -## Reducing complexity 1 +## Disabling onboard network cards So can we somehow get rid of the copper cables and switch to fiber only? It turns out that the fiber cards we use (mainly Intel X520's) diff --git a/content/u/blog/datacenterlight-redundant-routing-infrastructure/contents.lr b/content/u/blog/datacenterlight-redundant-routing-infrastructure/contents.lr new file mode 100644 index 0000000..c0ce350 --- /dev/null +++ b/content/u/blog/datacenterlight-redundant-routing-infrastructure/contents.lr @@ -0,0 +1,222 @@ +title: Redundant routing infrastructure at Data Center Light +--- +pub_date: 2021-05-01 +--- +author: Nico Schottelius +--- +twitter_handle: NicoSchottelius +--- +_hidden: no +--- +_discoverable: no +--- +abstract: + +--- +body: + +In case you have missed the previous articles, you can +get [an introduction to the Data Center Light spring +cleanup](/u/blog/datacenterlight-spring-network-cleanup), +see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) +or read about [the active-active routing +problems](/u/blog/datacenterlight-active-active-routing/). + +In this article we will show how we finally solved the routing issue +conceptually as well as practically. + +## Active-active or passive-active routing? + +In the [previous blog article](/u/blog/datacenterlight-active-active-routing/) +we reasoned that active-active routing, even with session +synchronisation does not have a straight forward solution in our +case. However in the +[first blog article](/u/blog/datacenterlight-spring-network-cleanup) +we reasoned that active-passive routers with VRRP and keepalived are +not stable enough either + +So which path should we take? Or is there another solution? + +## Active-Active-Passive Routing + +Let us introduce Active-Active-Passive routing. Something that sounds +strange in the first place, but is going to make sense in the next +minutes. + +We do want multiple active routers, but we do not want to have to +deal with session synchronisation, which is not only tricky, but due +to its complexity can also be a source of error. + +So what we are looking for is active-active routing without state +synchronisation. While this sounds like a contradiction, if we loosen +our requirement a little bit, we are able to support multiple active +routers without session synchronisation by using **routing +priorities**. + +## Active-Active routing with routing priorities + +Let's assume for a moment that all involved hosts (servers, clients, +routers, etc.) know about multiple routes for outgoing and incoming +traffic. Let's assume also for a moment that **we can prioritise** +those routes. Then we can create a deterministic routing path that +does not need session synchronisation. + +## Steering outgoing traffic + +Let's have a first look at the outgoing traffic. Can we announce +multiple routers in a network, but have the servers and clients +**prefer** one of the routers? The answer is yes! +If we checkout the manpage of +[radvd.conf(5)](https://linux.die.net/man/5/radvd.conf) we find a +setting that is named **AdvDefaultPreference** + +``` +AdvDefaultPreference low|medium|high +``` + +Using this attribute, two routers can both actively announce +themselves, but clients in the network will prefer the one with the +higher preference setting. + +### Replacing radvd with bird + +At this point a short side note: We have been using radvd for some +years in the Data Center Light. However recently on our +[Alpine Linux based routers](https://alpinelinux.org/), radvd started +to crash from time to time: + +``` +[717424.727125] device eth1 left promiscuous mode +[1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in ld-musl-x86_64.so.1[7f6bdd558000+48000] +[1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89 +... +[1458460.511006] device eth0 entered promiscuous mode +[1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in ld-musl-x86_64.so.1[7f94ec1c2000+48000] +[1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89 +... +``` + +Unfortunately it seems that either the addresses timed out or that +radvd was able to send a message de-announcing itself prior to the +crash, causing all clients to withdraw their addresses. This is +especially problematic, if you run a [ceph](https://ceph.io/) cluster +and the servers don't have IP addresses anymore... + +While we did not yet investigate the full cause of this, we had a very +easy solution: as all of our routers run +[bird](https://bird.network.cz/) and it also supports sending router +advertisements, we replaced radvd with bird. The configuration is +actually pretty simple: + +``` +protocol radv { + # Internal + interface "eth1.5" { + max ra interval 5; # Fast failover with more routers + other config yes; # dhcpv6 boot + default preference high; + }; + rdnss { + lifetime 3600; + ns 2a0a:e5c0:0:a::a; + ns 2a0a:e5c0:0:a::b; + }; + dnssl { + lifetime 3600; + domain "place5.ungleich.ch"; + }; +} +``` + + +## Steering incoming traffic + +As the internal and the upstream routers are in the same data center, +we can use an IGP like OSPF to distribute the routes to the internal +routers. And OSPF actually has this very neat metric called **cost**. +So for the router that sets the **default preference high** for the +outgoing routes, we keep the cost at 10, for the router that +ses the **default preference low** we set the cost at 20. The actual +bird configuration on a router looks like this: + +``` +define ospf_cost = 10; +... + +protocol ospf v3 ospf6 { + instance id 0; + + ipv6 { + import all; + export none; + }; + + area 0 { + interface "eth1.*" { + authentication cryptographic; + password "weshouldhaveremovedthisfortheblogpost"; + cost ospf_cost; + }; + }; +} +``` + +## Incoming + Outgoing = symmetric paths + +With both directions under our control, we now have enabled symmetric +routing in both directions. Thus as long as the first router is alive, +all traffic will be handled by the first router. + +## Failover scenario + +In case the first router fails, clients have a low life time of 15 +seconds (3x **max ra interval**) +for their routes and they will fail over to the 2nd router +automatically. Existing sessions will not continue to work, but that +is ok for our setup. When the first router with the higher priority +comes back, there will be again an interruption, but clients will +automatically change their paths. + +And so will the upstream routers, as OSPF is a quick protocol that +updates alive routers and routes. + + +## IPv6 enables active-active-passive routing architectures + +At ungleich it almost always comes back to the topic of IPv6, albeit +for a good reason. You might remember that we claimed in the +[IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article +that this is reducing complexity? If you look at the above example, +you might not spot it directly, but going IPv6 only is actually an +enabler for our setup: + +We **only deploy router advertisements** using bird. We are **not using DHCPv4** +or **IPv4** for accessing our servers. Both routers run a dhcpv6 +service in parallel, with the "boot server" pointing to themselves. + +Besides being nice and clean, +our whole active-active-passive routing setup **would not work with +IPv4**, because dhcpv4 servers do not have the same functionality to +provide routing priorities. + +## Take away + +You can see that trying to solve one problem ("unreliable redundant +router setup") entailed a slew of changes, but in the end made our +infrastructure much simpler: + +* No dual stack +* No private IPv4 addresses +* No actively communicating keepalived +* Two daemons less to maintain (keepalived, radvd) + +We also avoided complex state synchronisation and deployed only Open +Source Software to address our problems. Furthermore hardware that +looked like unusable in modern IPv6 networks can also be upgraded with +Open Source Software (ipxe) and enables us to provide more sustainable +infrastructures. + +We hope you enjoyed our spring cleanup blog series. The next one will +be coming, because IT infrastructures always evolve. Until then: +feel free to [join our Open Soure Chat](https://chat.with.ungleich.ch) +and join the discussion.