title: Active-Active Routing Paths in Data Center Light
---
pub_date: 2019-11-08
---
author: Nico Schottelius
---
twitter_handle: NicoSchottelius
---
_hidden: no
---
_discoverable: no
---
abstract:

---
body:

From our last two blog articles (a, b) you probably already know that
it is spring network cleanup in [Data Center Light](https://datacenterlight.ch).

In [first blog article]() we described where we started and in
the [second blog article]() you could see how we switched our
infrastructure to IPv6 only netboot.

In this article we will dive a bit more into the details of our
network architecture and which problems we face with active-active
routers.

## Network architecture

Let's have a look at a simplified (!) diagram of the network:

... IMAGE

Doesn't look that simple, does it? Let's break it down into small
pieces.

## Upstream routers

We have a set of **upstream routers** which work stateless. They don't
have any stateful firewall rules, so both of them can work actively
without state synchronisation. Moreover, both of them peer with the
data center upstreams. These are fast routers and besides forwarding,
they also do **BGP peering** with our upstreams.

Over all the upstream routers are very simple machines, mostly running
bird and forwarding packets all day. They also provide a DNS service
(resolving and authoritative), because they are always up and can
announce service IPs via BGP or via OSPF to our network.

## Internal routers

The internal routers on the other hand provide **stateful routing**,
**IP address assignments** and **netboot services**. They are a bit
more complicated compared to the upstream routers, but they care only
a small routing table.

## Communication between the routers

All routers employ OSPF and BGP for route exchange. Thus the two
upstream routers learn about the internal networks (IPv6 only, as
usual) from the internal routers.

## Sessions

Sessions in networking are almost always an evil. You need to store
them (at high speed), you need to maintain them (updating, deleting)
and if you run multiple routers, you even need to sychronise them.

In our case the internal routers do have session handling, as they are
providing a stateful firewall. As we are using a multi router setup,
things can go really wrong if the wrong routes are being used.

Let's have a look at this a bit more in detail.

## The good path

IMAGE2: good

If a server sends out a packet via router1 and router1 eventually
receives the answer, everything is fine. The returning packet matches
the state entry that was created by the outgoing packet and the
internal router forwards the packet.

## The bad path

IMAGE3: bad

However if the

## Routing paths

If we want to go active-active routing, the server can choose between
either internal router for sending out the packet. The internal
routers again have two upstream routers. So with the return path
included, the following paths exist for a packet:

Outgoing paths:

* servers->router1->upstream router1->internet
* servers->router1->upstream router2->internet
* servers->router2->upstream router1->internet
* servers->router2->upstream router2->internet

And the returning paths are:

* internet->upstream router1->router 1->servers
* internet->upstream router1->router 2->servers
* internet->upstream router2->router 1->servers
* internet->upstream router2->router 2->servers

So on average, 50% of the routes will hit the right router on
return. However servers as well as upstream routers are not using load
balancing like ECMP, so once an incorrect path has been chosen, the
packet loss is 100%.

## Session synchronisation

In the first article we talked a bit about keepalived and that
it helps to operate routers in an active-passive mode. This did not
turn out to be the most reliable method. Can we do better with
active-active routers and session synchronisation?

Linux supports this using
[conntrackd](http://conntrack-tools.netfilter.org/). However,
conntrackd supports active-active routers on a **flow based** level,
but not on a **packet** based level. The difference is that the
following will not work in active-active routers with conntrackd:

```
#1 Packet (in the original direction) updates state in Router R1 ->
   submit state to R2
#2 Packet (in the reply direction) arrive to Router R2 before state
   coming from R1 has been digested.

With strict stateful filtering, Packet #2 will be dropped and it will
trigger a retransmission.
```
(quote from Pablo Neira Ayuso, see below for more details)

Some of you will mumble something like **latency** in their head right
now. If the return packet is guaranteed to arrive after state
synchronisation, then everything is fine, However, if the reply is
faster than the state synchronisation, packets will get dropped.

In reality, this will work for packets coming and going to the
Internet. However, in our setup the upstream routers are route between
different data center locations, which are in the sub micro second
latency area - i.e. lan speed, because they are interconnected with
dark fiber links.


## Take away

Before moving on to the next blog article, we would like to express
our thanks to Pablo Neira Ayuso, who gave very important input for
session based firewalls and session synchronisation.

So active-active routing seems not to have a straight forward
solution. Read in the [next blog
article](/u/blog/datacenterlight-redundant-routing-infrastructure) on
how we solved the challenge in the end.