title: Data Center Light: Spring network cleanup --- pub_date: 2021-05-01 --- author: Nico Schottelius --- twitter_handle: NicoSchottelius --- _hidden: no --- _discoverable: no --- abstract: From today on ungleich offers free, encrypted IPv6 VPNs for hackerspaces --- body: ## Introduction Spring is the time for cleanup. Cleanup up your apartment, removing dust from the cabinet, letting the light shine through the windows, or like in our case: improving the networking situation. In this article we give an introduction of where we started and what the typical setup used to be in our data center. ## Best practice When we started [Data Center Light](https://datacenterlight.ch) in 2017, we orientated ourselves at "best practice" for networking. We started with IPv6 only networks and used RFC1918 network (10/8) for internal IPv4 routing. And we started with 2 routers for every network to provide redundancy. ## Router redundancy So what do you do when you have two routers? In the Linux world the software [keepalived](https://keepalived.org/) is very popular to provide redundant routing using the [VRRP protocol](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol). ## Active-Passive While VRRP is designed to allow multiple (not only two) routers to co-exist in a network, its design is basically active-passive: you have one active router and n passive routers, in our case 1 additional. ## Keepalived: a closer look A typical keepalived configuration in our network looked like this: ``` vrrp_instance router_v4 { interface INTERFACE virtual_router_id 2 priority PRIORITY advert_int 1 virtual_ipaddress { 10.0.0.1/22 dev eth1.5 # Internal } notify_backup "/usr/local/bin/vrrp_notify_backup.sh" notify_fault "/usr/local/bin/vrrp_notify_fault.sh" notify_master "/usr/local/bin/vrrp_notify_master.sh" } vrrp_instance router_v6 { interface INTERFACE virtual_router_id 1 priority PRIORITY advert_int 1 virtual_ipaddress { 2a0a:e5c0:1:8::48/128 dev eth1.8 # Transfer for routing from outside 2a0a:e5c0:0:44::7/64 dev bond0.18 # zhaw 2a0a:e5c0:2:15::7/64 dev bond0.20 # } } ``` This is a template that we distribute via [cdist](https:/cdi.st). The strings INTERFACE and PRIORITY are replaced via cdist. The interface field defines which interface to use for VRRP communication and the priority field determines which of the routers is the active one. So far, so good. However let's have a look at a tiny detail of this configuration file: ``` notify_backup "/usr/local/bin/vrrp_notify_backup.sh" notify_fault "/usr/local/bin/vrrp_notify_fault.sh" notify_master "/usr/local/bin/vrrp_notify_master.sh" ``` These three lines basically say: "start something if you are the master" and "stop something in case you are not". And why did we do this? Because of stateful services. ## Stateful services A typical shell script that we would call containes lines like this: ``` /etc/init.d/radvd stop /etc/init.d/dhcpd stop ``` (or start in the case of the master version) In earlier days, this even contained openvpn, which was running on our first generation router version. But more about OpenVPN later. The reason why we stopped and started dhcp and radvd is to make clients of the network use the active router. We used radvd to provide IPv6 addresses as the primary access method to servers. And we used dhcp mainly to allow servers to netboot. The active router would carry state (firewall!) and thus the flow of packets always need to go through the active router. Restarting radvd on a different machine keeps the IPv6 addresses the same, as clients assign then themselves using EUI-64. In case of dhcp (IPv4) we would have used hardcoded IPv4 addresses using a mapping of MAC address to IPv4 address, but we opted out for this. The main reason is that dhcp clients re-request their same leas and even if an IPv4 addresses changes, it is not really of importance. During a failover this would lead to a few seconds interrupt and re-establishing sessions. Given that routers are usually rather stable and restarting them is not a daily task, we initially accepted this. ## Keepalived/VRRP changes One of the more tricky things is changes to keepalived. Because keepalived uses the *number of addresses and routes* to verify that the received VRRP packet matches its configuration, adding or deleting IP addresses and routes, causes a problem: While one router was updated, the number of IP addresses or routes is different. This causes both routers to ignore the others VRRP messages and both routers think they should be the master process. This leads to the problem that both routers receive client and outside traffic. This causes the firewall (nftables) to not recognise returning packets, if they were sent out by router1, but received back by router2 and, because nftables is configured *stateful*, will drop the returning packet. However not only changes to the configuration can trigger this problem, but also any communication problem between the two routers. Since 2017 we experienced it multiple times that keepalived was unable to receive or send messages from the other router and thus both of them again became the master process. ## Take away While in theory keepalived should improve the reliability, in practice the number of problems due to double master situations we had, made us question whether the keepalived concept is the fitting one for us. You can read how we evolved from this setup in [the next blog article](/u/blog/datacenterlight-ipv6-only-netboot/).