162 lines
5.4 KiB
Markdown
162 lines
5.4 KiB
Markdown
title: Active-Active Routing Paths in Data Center Light
|
|
---
|
|
pub_date: 2019-11-08
|
|
---
|
|
author: Nico Schottelius
|
|
---
|
|
twitter_handle: NicoSchottelius
|
|
---
|
|
_hidden: no
|
|
---
|
|
_discoverable: no
|
|
---
|
|
abstract:
|
|
|
|
---
|
|
body:
|
|
|
|
From our last two blog articles (a, b) you probably already know that
|
|
it is spring network cleanup in [Data Center Light](https://datacenterlight.ch).
|
|
|
|
In [first blog article]() we described where we started and in
|
|
the [second blog article]() you could see how we switched our
|
|
infrastructure to IPv6 only netboot.
|
|
|
|
In this article we will dive a bit more into the details of our
|
|
network architecture and which problems we face with active-active
|
|
routers.
|
|
|
|
## Network architecture
|
|
|
|
Let's have a look at a simplified (!) diagram of the network:
|
|
|
|
... IMAGE
|
|
|
|
Doesn't look that simple, does it? Let's break it down into small
|
|
pieces.
|
|
|
|
## Upstream routers
|
|
|
|
We have a set of **upstream routers** which work stateless. They don't
|
|
have any stateful firewall rules, so both of them can work actively
|
|
without state synchronisation. Moreover, both of them peer with the
|
|
data center upstreams. These are fast routers and besides forwarding,
|
|
they also do **BGP peering** with our upstreams.
|
|
|
|
Over all the upstream routers are very simple machines, mostly running
|
|
bird and forwarding packets all day. They also provide a DNS service
|
|
(resolving and authoritative), because they are always up and can
|
|
announce service IPs via BGP or via OSPF to our network.
|
|
|
|
## Internal routers
|
|
|
|
The internal routers on the other hand provide **stateful routing**,
|
|
**IP address assignments** and **netboot services**. They are a bit
|
|
more complicated compared to the upstream routers, but they care only
|
|
a small routing table.
|
|
|
|
## Communication between the routers
|
|
|
|
All routers employ OSPF and BGP for route exchange. Thus the two
|
|
upstream routers learn about the internal networks (IPv6 only, as
|
|
usual) from the internal routers.
|
|
|
|
## Sessions
|
|
|
|
Sessions in networking are almost always an evil. You need to store
|
|
them (at high speed), you need to maintain them (updating, deleting)
|
|
and if you run multiple routers, you even need to sychronise them.
|
|
|
|
In our case the internal routers do have session handling, as they are
|
|
providing a stateful firewall. As we are using a multi router setup,
|
|
things can go really wrong if the wrong routes are being used.
|
|
|
|
Let's have a look at this a bit more in detail.
|
|
|
|
## The good path
|
|
|
|
IMAGE2: good
|
|
|
|
If a server sends out a packet via router1 and router1 eventually
|
|
receives the answer, everything is fine. The returning packet matches
|
|
the state entry that was created by the outgoing packet and the
|
|
internal router forwards the packet.
|
|
|
|
## The bad path
|
|
|
|
IMAGE3: bad
|
|
|
|
However if the
|
|
|
|
## Routing paths
|
|
|
|
If we want to go active-active routing, the server can choose between
|
|
either internal router for sending out the packet. The internal
|
|
routers again have two upstream routers. So with the return path
|
|
included, the following paths exist for a packet:
|
|
|
|
Outgoing paths:
|
|
|
|
* servers->router1->upstream router1->internet
|
|
* servers->router1->upstream router2->internet
|
|
* servers->router2->upstream router1->internet
|
|
* servers->router2->upstream router2->internet
|
|
|
|
And the returning paths are:
|
|
|
|
* internet->upstream router1->router 1->servers
|
|
* internet->upstream router1->router 2->servers
|
|
* internet->upstream router2->router 1->servers
|
|
* internet->upstream router2->router 2->servers
|
|
|
|
So on average, 50% of the routes will hit the right router on
|
|
return. However servers as well as upstream routers are not using load
|
|
balancing like ECMP, so once an incorrect path has been chosen, the
|
|
packet loss is 100%.
|
|
|
|
## Session synchronisation
|
|
|
|
In the first article we talked a bit about keepalived and that
|
|
it helps to operate routers in an active-passive mode. This did not
|
|
turn out to be the most reliable method. Can we do better with
|
|
active-active routers and session synchronisation?
|
|
|
|
Linux supports this using
|
|
[conntrackd](http://conntrack-tools.netfilter.org/). However,
|
|
conntrackd supports active-active routers on a **flow based** level,
|
|
but not on a **packet** based level. The difference is that the
|
|
following will not work in active-active routers with conntrackd:
|
|
|
|
```
|
|
#1 Packet (in the original direction) updates state in Router R1 ->
|
|
submit state to R2
|
|
#2 Packet (in the reply direction) arrive to Router R2 before state
|
|
coming from R1 has been digested.
|
|
|
|
With strict stateful filtering, Packet #2 will be dropped and it will
|
|
trigger a retransmission.
|
|
```
|
|
(quote from Pablo Neira Ayuso, see below for more details)
|
|
|
|
Some of you will mumble something like **latency** in their head right
|
|
now. If the return packet is guaranteed to arrive after state
|
|
synchronisation, then everything is fine, However, if the reply is
|
|
faster than the state synchronisation, packets will get dropped.
|
|
|
|
In reality, this will work for packets coming and going to the
|
|
Internet. However, in our setup the upstream routers are route between
|
|
different data center locations, which are in the sub micro second
|
|
latency area - i.e. lan speed, because they are interconnected with
|
|
dark fiber links.
|
|
|
|
|
|
## Take away
|
|
|
|
Before moving on to the next blog article, we would like to express
|
|
our thanks to Pablo Neira Ayuso, who gave very important input for
|
|
session based firewalls and session synchronisation.
|
|
|
|
So active-active routing seems not to have a straight forward
|
|
solution. Read in the [next blog
|
|
article](/u/blog/datacenterlight-redundant-routing-infrastructure) on
|
|
how we solved the challenge in the end.
|