title: Active-Active Routing Paths in Data Center Light --- pub_date: 2019-11-08 --- author: Nico Schottelius --- twitter_handle: NicoSchottelius --- _hidden: no --- _discoverable: no --- abstract: --- body: From our last two blog articles (a, b) you probably already know that it is spring network cleanup in [Data Center Light](https://datacenterlight.ch). In [first blog article]() we described where we started and in the [second blog article]() you could see how we switched our infrastructure to IPv6 only netboot. In this article we will dive a bit more into the details of our network architecture and which problems we face with active-active routers. ## Network architecture Let's have a look at a simplified (!) diagram of the network: ... IMAGE Doesn't look that simple, does it? Let's break it down into small pieces. ## Upstream routers We have a set of **upstream routers** which work stateless. They don't have any stateful firewall rules, so both of them can work actively without state synchronisation. Moreover, both of them peer with the data center upstreams. These are fast routers and besides forwarding, they also do **BGP peering** with our upstreams. Over all the upstream routers are very simple machines, mostly running bird and forwarding packets all day. They also provide a DNS service (resolving and authoritative), because they are always up and can announce service IPs via BGP or via OSPF to our network. ## Internal routers The internal routers on the other hand provide **stateful routing**, **IP address assignments** and **netboot services**. They are a bit more complicated compared to the upstream routers, but they care only a small routing table. ## Communication between the routers All routers employ OSPF and BGP for route exchange. Thus the two upstream routers learn about the internal networks (IPv6 only, as usual) from the internal routers. ## Sessions Sessions in networking are almost always an evil. You need to store them (at high speed), you need to maintain them (updating, deleting) and if you run multiple routers, you even need to sychronise them. In our case the internal routers do have session handling, as they are providing a stateful firewall. As we are using a multi router setup, things can go really wrong if the wrong routes are being used. Let's have a look at this a bit more in detail. ## The good path IMAGE2: good If a server sends out a packet via router1 and router1 eventually receives the answer, everything is fine. The returning packet matches the state entry that was created by the outgoing packet and the internal router forwards the packet. ## The bad path IMAGE3: bad However if the ## Routing paths If we want to go active-active routing, the server can choose between either internal router for sending out the packet. The internal routers again have two upstream routers. So with the return path included, the following paths exist for a packet: Outgoing paths: * servers->router1->upstream router1->internet * servers->router1->upstream router2->internet * servers->router2->upstream router1->internet * servers->router2->upstream router2->internet And the returning paths are: * internet->upstream router1->router 1->servers * internet->upstream router1->router 2->servers * internet->upstream router2->router 1->servers * internet->upstream router2->router 2->servers So on average, 50% of the routes will hit the right router on return. However servers as well as upstream routers are not using load balancing like ECMP, so once an incorrect path has been chosen, the packet loss is 100%. ## Session synchronisation In the first article we talked a bit about keepalived and that it helps to operate routers in an active-passive mode. This did not turn out to be the most reliable method. Can we do better with active-active routers and session synchronisation? Linux supports this using [conntrackd](http://conntrack-tools.netfilter.org/). However, conntrackd supports active-active routers on a **flow based** level, but not on a **packet** based level. The difference is that the following will not work in active-active routers with conntrackd: ``` #1 Packet (in the original direction) updates state in Router R1 -> submit state to R2 #2 Packet (in the reply direction) arrive to Router R2 before state coming from R1 has been digested. With strict stateful filtering, Packet #2 will be dropped and it will trigger a retransmission. ``` (quote from Pablo Neira Ayuso, see below for more details) Some of you will mumble something like **latency** in their head right now. If the return packet is guaranteed to arrive after state synchronisation, then everything is fine, However, if the reply is faster than the state synchronisation, packets will get dropped. In reality, this will work for packets coming and going to the Internet. However, in our setup the upstream routers are route between different data center locations, which are in the sub micro second latency area - i.e. lan speed, because they are interconnected with dark fiber links. ## Take away Before moving on to the next blog article, we would like to express our thanks to Pablo Neira Ayuso, who gave very important input for session based firewalls and session synchronisation. So active-active routing seems not to have a straight forward solution. Read in the [next blog article](/u/blog/datacenterlight-redundant-routing-infrastructure) on how we solved the challenge in the end.