++ three new blog articles
This commit is contained in:
parent
5d05a28e7d
commit
251677cf57
3 changed files with 541 additions and 0 deletions
161
content/u/blog/datacenterlight-active-active-routing/contents.lr
Normal file
161
content/u/blog/datacenterlight-active-active-routing/contents.lr
Normal file
|
@ -0,0 +1,161 @@
|
||||||
|
title: Active-Active Routing Paths in Data Center Light
|
||||||
|
---
|
||||||
|
pub_date: 2019-11-08
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
From our last two blog articles (a, b) you probably already know that
|
||||||
|
it is spring network cleanup in [Data Center Light](https://datacenterlight.ch).
|
||||||
|
|
||||||
|
In [first blog article]() we described where we started and in
|
||||||
|
the [second blog article]() you could see how we switched our
|
||||||
|
infrastructure to IPv6 only netboot.
|
||||||
|
|
||||||
|
In this article we will dive a bit more into the details of our
|
||||||
|
network architecture and which problems we face with active-active
|
||||||
|
routers.
|
||||||
|
|
||||||
|
## Network architecture
|
||||||
|
|
||||||
|
Let's have a look at a simplified (!) diagram of the network:
|
||||||
|
|
||||||
|
... IMAGE
|
||||||
|
|
||||||
|
Doesn't look that simple, does it? Let's break it down into small
|
||||||
|
pieces.
|
||||||
|
|
||||||
|
## Upstream routers
|
||||||
|
|
||||||
|
We have a set of **upstream routers** which work stateless. They don't
|
||||||
|
have any stateful firewall rules, so both of them can work actively
|
||||||
|
without state synchronisation. Moreover, both of them peer with the
|
||||||
|
data center upstreams. These are fast routers and besides forwarding,
|
||||||
|
they also do **BGP peering** with our upstreams.
|
||||||
|
|
||||||
|
Over all the upstream routers are very simple machines, mostly running
|
||||||
|
bird and forwarding packets all day. They also provide a DNS service
|
||||||
|
(resolving and authoritative), because they are always up and can
|
||||||
|
announce service IPs via BGP or via OSPF to our network.
|
||||||
|
|
||||||
|
## Internal routers
|
||||||
|
|
||||||
|
The internal routers on the other hand provide **stateful routing**,
|
||||||
|
**IP address assignments** and **netboot services**. They are a bit
|
||||||
|
more complicated compared to the upstream routers, but they care only
|
||||||
|
a small routing table.
|
||||||
|
|
||||||
|
## Communication between the routers
|
||||||
|
|
||||||
|
All routers employ OSPF and BGP for route exchange. Thus the two
|
||||||
|
upstream routers learn about the internal networks (IPv6 only, as
|
||||||
|
usual) from the internal routers.
|
||||||
|
|
||||||
|
## Sessions
|
||||||
|
|
||||||
|
Sessions in networking are almost always an evil. You need to store
|
||||||
|
them (at high speed), you need to maintain them (updating, deleting)
|
||||||
|
and if you run multiple routers, you even need to sychronise them.
|
||||||
|
|
||||||
|
In our case the internal routers do have session handling, as they are
|
||||||
|
providing a stateful firewall. As we are using a multi router setup,
|
||||||
|
things can go really wrong if the wrong routes are being used.
|
||||||
|
|
||||||
|
Let's have a look at this a bit more in detail.
|
||||||
|
|
||||||
|
## The good path
|
||||||
|
|
||||||
|
IMAGE2: good
|
||||||
|
|
||||||
|
If a server sends out a packet via router1 and router1 eventually
|
||||||
|
receives the answer, everything is fine. The returning packet matches
|
||||||
|
the state entry that was created by the outgoing packet and the
|
||||||
|
internal router forwards the packet.
|
||||||
|
|
||||||
|
## The bad path
|
||||||
|
|
||||||
|
IMAGE3: bad
|
||||||
|
|
||||||
|
However if the
|
||||||
|
|
||||||
|
## Routing paths
|
||||||
|
|
||||||
|
If we want to go active-active routing, the server can choose between
|
||||||
|
either internal router for sending out the packet. The internal
|
||||||
|
routers again have two upstream routers. So with the return path
|
||||||
|
included, the following paths exist for a packet:
|
||||||
|
|
||||||
|
Outgoing paths:
|
||||||
|
|
||||||
|
* servers->router1->upstream router1->internet
|
||||||
|
* servers->router1->upstream router2->internet
|
||||||
|
* servers->router2->upstream router1->internet
|
||||||
|
* servers->router2->upstream router2->internet
|
||||||
|
|
||||||
|
And the returning paths are:
|
||||||
|
|
||||||
|
* internet->upstream router1->router 1->servers
|
||||||
|
* internet->upstream router1->router 2->servers
|
||||||
|
* internet->upstream router2->router 1->servers
|
||||||
|
* internet->upstream router2->router 2->servers
|
||||||
|
|
||||||
|
So on average, 50% of the routes will hit the right router on
|
||||||
|
return. However servers as well as upstream routers are not using load
|
||||||
|
balancing like ECMP, so once an incorrect path has been chosen, the
|
||||||
|
packet loss is 100%.
|
||||||
|
|
||||||
|
## Session synchronisation
|
||||||
|
|
||||||
|
In the first article we talked a bit about keepalived and that
|
||||||
|
it helps to operate routers in an active-passive mode. This did not
|
||||||
|
turn out to be the most reliable method. Can we do better with
|
||||||
|
active-active routers and session synchronisation?
|
||||||
|
|
||||||
|
Linux supports this using
|
||||||
|
[conntrackd](http://conntrack-tools.netfilter.org/). However,
|
||||||
|
conntrackd supports active-active routers on a **flow based** level,
|
||||||
|
but not on a **packet** based level. The difference is that the
|
||||||
|
following will not work in active-active routers with conntrackd:
|
||||||
|
|
||||||
|
```
|
||||||
|
#1 Packet (in the original direction) updates state in Router R1 ->
|
||||||
|
submit state to R2
|
||||||
|
#2 Packet (in the reply direction) arrive to Router R2 before state
|
||||||
|
coming from R1 has been digested.
|
||||||
|
|
||||||
|
With strict stateful filtering, Packet #2 will be dropped and it will
|
||||||
|
trigger a retransmission.
|
||||||
|
```
|
||||||
|
(quote from Pablo Neira Ayuso, see below for more details)
|
||||||
|
|
||||||
|
Some of you will mumble something like **latency** in their head right
|
||||||
|
now. If the return packet is guaranteed to arrive after state
|
||||||
|
synchronisation, then everything is fine, However, if the reply is
|
||||||
|
faster than the state synchronisation, packets will get dropped.
|
||||||
|
|
||||||
|
In reality, this will work for packets coming and going to the
|
||||||
|
Internet. However, in our setup the upstream routers are route between
|
||||||
|
different data center locations, which are in the sub micro second
|
||||||
|
latency area - i.e. lan speed, because they are interconnected with
|
||||||
|
dark fiber links.
|
||||||
|
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
Before moving on to the next blog article, we would like to express
|
||||||
|
our thanks to Pablo Neira Ayuso, who gave very important input for
|
||||||
|
session based firewalls and session synchronisation.
|
||||||
|
|
||||||
|
So active-active routing seems not to have a straight forward
|
||||||
|
solution. Read in the [next blog article](/) on how we solved the
|
||||||
|
challenge in the end.
|
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
|
@ -0,0 +1,219 @@
|
||||||
|
title: IPv6 only netboot in Data Center Light
|
||||||
|
---
|
||||||
|
pub_date: 2021-05-01
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
How we switched from IPv4 netboot to IPv6 netboot
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
In our [previous blog
|
||||||
|
article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||||
|
we wrote about our motivation for the
|
||||||
|
big spring network cleanup. In this blog article we show how we
|
||||||
|
started reducing the complexity by removing our dependency on IPv4.
|
||||||
|
|
||||||
|
## IPv6 first
|
||||||
|
|
||||||
|
When you found our blog, you are probably aware: everything at
|
||||||
|
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
|
||||||
|
entries for remote access have IPv6 (AAAA) entries and there are only
|
||||||
|
rare exceptions when we utilise IPv4.
|
||||||
|
|
||||||
|
## IPv4 only Netboot
|
||||||
|
|
||||||
|
One of the big exceptions to this paradigm used to be how we boot our
|
||||||
|
servers. Because our second big paradigm is sustainability, we use a
|
||||||
|
lot of 2nd (or 3rd) generation hardware. We actually share this
|
||||||
|
passion with our friends from
|
||||||
|
[e-durable](https://recycled.cloud/), because sustainability is
|
||||||
|
something that we need to employ today and not tomorrow.
|
||||||
|
But back to the netbooting topic: For netbooting we mainly
|
||||||
|
relied on onboard network cards so far.
|
||||||
|
|
||||||
|
## Onboard network cards
|
||||||
|
|
||||||
|
We used these network cards for multiple reasons:
|
||||||
|
|
||||||
|
* they exist virtually in any server
|
||||||
|
* they usually have a ROM containing a PXE capable firmware
|
||||||
|
* it allows us to split real traffic to fiber cards and internal traffic
|
||||||
|
|
||||||
|
However using the onboard devices comes also with a couple of disadvantages:
|
||||||
|
|
||||||
|
* Their ROM is often outdated
|
||||||
|
* It requires additional cabling
|
||||||
|
|
||||||
|
## Cables
|
||||||
|
|
||||||
|
Let's have a look at the cabling situation first. Virtually all of
|
||||||
|
our servers are connected to the network using 2x 10 Gbit/s fiber cards.
|
||||||
|
|
||||||
|
On one side this provides a fast connection, but on the other side
|
||||||
|
it provides us with something even better: distances.
|
||||||
|
|
||||||
|
Our data centers employ a non-standard design due to the re-use of
|
||||||
|
existing factory halls. This means distances between servers and
|
||||||
|
switches can be up to 100m. With fiber, we can easily achieve these
|
||||||
|
distances.
|
||||||
|
|
||||||
|
Additionally, have less cables provides a simpler infrastructure
|
||||||
|
that is easier to analyse.
|
||||||
|
|
||||||
|
## Reducing complexity 1
|
||||||
|
|
||||||
|
So can we somehow get rid of the copper cables and switch to fiber
|
||||||
|
only? It turns out that the fiber cards we use (mainly Intel X520's)
|
||||||
|
have their own ROM. So we started disabling the onboard network cards
|
||||||
|
and tried booting from the fiber cards. This worked until we wanted to
|
||||||
|
move the lab setup to production...
|
||||||
|
|
||||||
|
## Bonding (LACP) and VLAN tagging
|
||||||
|
|
||||||
|
Our servers use bonding (802.3ad) for redundant connections to the
|
||||||
|
switches and VLAN tagging on top of the bonded devices to isolate
|
||||||
|
client traffic. On the switch side we realised this using
|
||||||
|
configurations like
|
||||||
|
|
||||||
|
```
|
||||||
|
interface Port-Channel33
|
||||||
|
switchport mode trunk
|
||||||
|
mlag 33
|
||||||
|
|
||||||
|
...
|
||||||
|
interface Ethernet33
|
||||||
|
channel-group 33 mode active
|
||||||
|
```
|
||||||
|
|
||||||
|
But that does not work, if the network ROM at boot does not create an
|
||||||
|
LACP enabled link on top of which it should be doing VLAN tagging.
|
||||||
|
|
||||||
|
The ROM in our network cards **would** have allowed VLAN tagging alone
|
||||||
|
though.
|
||||||
|
|
||||||
|
To fix this problem, we reconfigured our switches as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
interface Port-Channel33
|
||||||
|
switchport trunk native vlan 10
|
||||||
|
switchport mode trunk
|
||||||
|
port-channel lacp fallback static
|
||||||
|
port-channel lacp fallback timeout 20
|
||||||
|
mlag 33
|
||||||
|
```
|
||||||
|
|
||||||
|
This basically does two things:
|
||||||
|
|
||||||
|
* If there are no LACP frames, fallback to static (non lacp)
|
||||||
|
configuration
|
||||||
|
* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
|
||||||
|
|
||||||
|
Great, our servers can now netboot from fiber! But we are not done
|
||||||
|
yet...
|
||||||
|
|
||||||
|
## IPv6 only netbooting
|
||||||
|
|
||||||
|
So how do we convince these network cards to do IPv6 netboot? Can we
|
||||||
|
actually do that at all? Our first approach was to put a custom build of
|
||||||
|
[ipxe](https://ipxe.org/) on a USB stick. We generated that
|
||||||
|
ipxe image using **rebuild-ipxe.sh** script
|
||||||
|
from the
|
||||||
|
[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
|
||||||
|
repository. Turns out using a USB stick works pretty well for most
|
||||||
|
situations.
|
||||||
|
|
||||||
|
## ROMs are not ROMs
|
||||||
|
|
||||||
|
As you can imagine, the ROM of the X520 cards does not contain IPv6
|
||||||
|
netboot support. So are we back at square 1? No, we are not. Because
|
||||||
|
the X520's have something that the onboard devices did not
|
||||||
|
consistently have: **a rewritable memory area**.
|
||||||
|
|
||||||
|
Let's take 2 steps back here first: A ROM is an **read only memory**
|
||||||
|
chip. Emphasis on **read only**. However, modern network cards and a
|
||||||
|
lot of devices that support on-device firmware do actually have a
|
||||||
|
memory (flash) area that can be written to. And that is what aids us
|
||||||
|
in our situation.
|
||||||
|
|
||||||
|
## ipxe + flbtool + x520 = fun
|
||||||
|
|
||||||
|
Trying to write ipxe into the X520 cards initially failed, because the
|
||||||
|
network card did not recognise the format of the ipxe rom file.
|
||||||
|
|
||||||
|
Luckily the folks in the ipxe community already spotted that problem
|
||||||
|
AND fixed it: The format used in these cards is called FLB. And there
|
||||||
|
is [flbtool](https://github.com/devicenull/flbtool/), which allows you
|
||||||
|
to wrap the ipxe rom file into the FLB format. For those who want to
|
||||||
|
try it yourself (at your own risk!), it basically involves:
|
||||||
|
|
||||||
|
* Get the current ROM from the card (try bootutil64e)
|
||||||
|
* Extract the contents from the rom using flbtool
|
||||||
|
* This will output some sections/parts
|
||||||
|
* Locate one part that you want to overwrite with iPXE (a previous PXE
|
||||||
|
section is very suitable)
|
||||||
|
* Replace the .bin file with your iPXE rom
|
||||||
|
* Adjust the .json file to match the length of the new binary
|
||||||
|
* Build a new .flb file using flbtool
|
||||||
|
* Flash it onto the card
|
||||||
|
|
||||||
|
While this is a bit of work, it is worth it for us, because...:
|
||||||
|
|
||||||
|
## IPv6 only netboot over fiber
|
||||||
|
|
||||||
|
With the modified ROM, basically loading iPXE at start, we can now
|
||||||
|
boot our servers in IPv6 only networks. On our infrastructure side, we
|
||||||
|
added two **tiny** things:
|
||||||
|
|
||||||
|
We use ISC dhcp with the following configuration file:
|
||||||
|
|
||||||
|
```
|
||||||
|
option dhcp6.bootfile-url code 59 = string;
|
||||||
|
|
||||||
|
option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
|
||||||
|
|
||||||
|
subnet6 2a0a:e5c0:0:6::/64 {}
|
||||||
|
```
|
||||||
|
|
||||||
|
(that is the complete configuration!)
|
||||||
|
|
||||||
|
And we used radvd to announce that there are other information,
|
||||||
|
indicating clients can actually query the dhcpv6 server:
|
||||||
|
|
||||||
|
```
|
||||||
|
interface bond0.10
|
||||||
|
{
|
||||||
|
AdvSendAdvert on;
|
||||||
|
MinRtrAdvInterval 3;
|
||||||
|
MaxRtrAdvInterval 5;
|
||||||
|
AdvDefaultLifetime 600;
|
||||||
|
|
||||||
|
# IPv6 netbooting
|
||||||
|
AdvOtherConfigFlag on;
|
||||||
|
|
||||||
|
prefix 2a0a:e5c0:0:6::/64 { };
|
||||||
|
|
||||||
|
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
||||||
|
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
Being able to reduce cables was one big advantage in the beginning.
|
||||||
|
|
||||||
|
Switching to IPv6 only netboot does not seem like a big simplification
|
||||||
|
in the first place, besides being able to remove IPv4 in server
|
||||||
|
networks.
|
||||||
|
|
||||||
|
However as you will see in
|
||||||
|
[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
|
||||||
|
switching to IPv6 only netbooting is actually a key element on
|
||||||
|
reducing complexity in our network.
|
|
@ -0,0 +1,161 @@
|
||||||
|
title: Data Center Light: Spring network cleanup
|
||||||
|
---
|
||||||
|
pub_date: 2021-05-01
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
From today on ungleich offers free, encrypted IPv6 VPNs for hackerspaces
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Spring is the time for cleanup. Cleanup up your apartment, removing
|
||||||
|
dust from the cabinet, letting the light shine through the windows,
|
||||||
|
or like in our case: improving the networking situation.
|
||||||
|
|
||||||
|
In this article we give an introduction of where we started and what
|
||||||
|
the typical setup used to be in our data center.
|
||||||
|
|
||||||
|
## Best practice
|
||||||
|
|
||||||
|
When we started [Data Center Light](https://datacenterlight.ch) in
|
||||||
|
2017, we orientated ourselves at "best practice" for networking. We
|
||||||
|
started with IPv6 only networks and used RFC1918 network (10/8) for
|
||||||
|
internal IPv4 routing.
|
||||||
|
|
||||||
|
And we started with 2 routers for every network to provide
|
||||||
|
redundancy.
|
||||||
|
|
||||||
|
## Router redundancy
|
||||||
|
|
||||||
|
So what do you do when you have two routers? In the Linux world the
|
||||||
|
software [keepalived](https://keepalived.org/)
|
||||||
|
is very popular to provide redundant routing
|
||||||
|
using the [VRRP protocol](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol).
|
||||||
|
|
||||||
|
## Active-Passive
|
||||||
|
|
||||||
|
While VRRP is designed to allow multiple (not only two) routers to
|
||||||
|
co-exist in a network, its design is basically active-passive: you
|
||||||
|
have one active router and n passive routers, in our case 1
|
||||||
|
additional.
|
||||||
|
|
||||||
|
## Keepalived: a closer look
|
||||||
|
|
||||||
|
A typical keepalived configuration in our network looked like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
vrrp_instance router_v4 {
|
||||||
|
interface INTERFACE
|
||||||
|
virtual_router_id 2
|
||||||
|
priority PRIORITY
|
||||||
|
advert_int 1
|
||||||
|
virtual_ipaddress {
|
||||||
|
10.0.0.1/22 dev eth1.5 # Internal
|
||||||
|
}
|
||||||
|
notify_backup "/usr/local/bin/vrrp_notify_backup.sh"
|
||||||
|
notify_fault "/usr/local/bin/vrrp_notify_fault.sh"
|
||||||
|
notify_master "/usr/local/bin/vrrp_notify_master.sh"
|
||||||
|
}
|
||||||
|
|
||||||
|
vrrp_instance router_v6 {
|
||||||
|
interface INTERFACE
|
||||||
|
virtual_router_id 1
|
||||||
|
priority PRIORITY
|
||||||
|
advert_int 1
|
||||||
|
virtual_ipaddress {
|
||||||
|
2a0a:e5c0:1:8::48/128 dev eth1.8 # Transfer for routing from outside
|
||||||
|
2a0a:e5c0:0:44::7/64 dev bond0.18 # zhaw
|
||||||
|
2a0a:e5c0:2:15::7/64 dev bond0.20 #
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a template that we distribute via [cdist](https:/cdi.st). The
|
||||||
|
strings INTERFACE and PRIORITY are replaced via cdist. The interface
|
||||||
|
field defines which interface to use for VRRP communication and the
|
||||||
|
priority field determines which of the routers is the active one.
|
||||||
|
|
||||||
|
So far, so good. However let's have a look at a tiny detail of this
|
||||||
|
configuration file:
|
||||||
|
|
||||||
|
```
|
||||||
|
notify_backup "/usr/local/bin/vrrp_notify_backup.sh"
|
||||||
|
notify_fault "/usr/local/bin/vrrp_notify_fault.sh"
|
||||||
|
notify_master "/usr/local/bin/vrrp_notify_master.sh"
|
||||||
|
```
|
||||||
|
|
||||||
|
These three lines basically say: "start something if you are the
|
||||||
|
master" and "stop something in case you are not". And why did we do
|
||||||
|
this? Because of stateful services.
|
||||||
|
|
||||||
|
## Stateful services
|
||||||
|
|
||||||
|
A typical shell script that we would call containes lines like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
/etc/init.d/radvd stop
|
||||||
|
/etc/init.d/dhcpd stop
|
||||||
|
```
|
||||||
|
(or start in the case of the master version)
|
||||||
|
|
||||||
|
In earlier days, this even contained openvpn, which was running on our
|
||||||
|
first generation router version. But more about OpenVPN later.
|
||||||
|
|
||||||
|
The reason why we stopped and started dhcp and radvd is to make
|
||||||
|
clients of the network use the active router. We used radvd to provide
|
||||||
|
IPv6 addresses as the primary access method to servers. And we used
|
||||||
|
dhcp mainly to allow servers to netboot. The active router would
|
||||||
|
carry state (firewall!) and thus the flow of packets always need to go
|
||||||
|
through the active router.
|
||||||
|
|
||||||
|
Restarting radvd on a different machine keeps the IPv6 addresses the
|
||||||
|
same, as clients assign then themselves using EUI-64. In case of dhcp
|
||||||
|
(IPv4) we would have used hardcoded IPv4 addresses using a mapping of
|
||||||
|
MAC address to IPv4 address, but we opted out for this. The main
|
||||||
|
reason is that dhcp clients re-request their same leas and even if an
|
||||||
|
IPv4 addresses changes, it is not really of importance.
|
||||||
|
|
||||||
|
During a failover this would lead to a few seconds interrupt and
|
||||||
|
re-establishing sessions. Given that routers are usually rather stable
|
||||||
|
and restarting them is not a daily task, we initially accepted this.
|
||||||
|
|
||||||
|
## Keepalived/VRRP changes
|
||||||
|
|
||||||
|
One of the more tricky things is changes to keepalived. Because
|
||||||
|
keepalived uses the *number of addresses and routes* to verify
|
||||||
|
that the received VRRP packet matches its configuration, adding or
|
||||||
|
deleting IP addresses and routes, causes a problem:
|
||||||
|
|
||||||
|
While one router was updated, the number of IP addresses or routes is
|
||||||
|
different. This causes both routers to ignore the others VRRP messages
|
||||||
|
and both routers think they should be the master process.
|
||||||
|
|
||||||
|
This leads to the problem that both routers receive client and outside
|
||||||
|
traffic. This causes the firewall (nftables) to not recognise
|
||||||
|
returning packets, if they were sent out by router1, but received back
|
||||||
|
by router2 and, because nftables is configured *stateful*, will drop
|
||||||
|
the returning packet.
|
||||||
|
|
||||||
|
However not only changes to the configuration can trigger this
|
||||||
|
problem, but also any communication problem between the two
|
||||||
|
routers. Since 2017 we experienced it multiple times that keepalived
|
||||||
|
was unable to receive or send messages from the other router and thus
|
||||||
|
both of them again became the master process.
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
While in theory keepalived should improve the reliability, in practice
|
||||||
|
the number of problems due to double master situations we had, made us
|
||||||
|
question whether the keepalived concept is the fitting one for us.
|
||||||
|
|
||||||
|
You can read how we evolved from this setup in
|
||||||
|
[the next blog article](/u/blog/datacenterlight-ipv6-only-netboot/).
|
Loading…
Reference in a new issue