Merge branch 'master' of code.ungleich.ch:ungleich-public/ungleich-staticcms
This commit is contained in:
commit
bc5fc19ca7
21 changed files with 2216 additions and 2 deletions
BIN
assets/u/image/k8s-v6-v4-dns.png
Normal file
BIN
assets/u/image/k8s-v6-v4-dns.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 88 KiB |
162
content/u/blog/datacenterlight-active-active-routing/contents.lr
Normal file
162
content/u/blog/datacenterlight-active-active-routing/contents.lr
Normal file
|
@ -0,0 +1,162 @@
|
||||||
|
title: Active-Active Routing Paths in Data Center Light
|
||||||
|
---
|
||||||
|
pub_date: 2019-11-08
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
From our last two blog articles (a, b) you probably already know that
|
||||||
|
it is spring network cleanup in [Data Center Light](https://datacenterlight.ch).
|
||||||
|
|
||||||
|
In [first blog article]() we described where we started and in
|
||||||
|
the [second blog article]() you could see how we switched our
|
||||||
|
infrastructure to IPv6 only netboot.
|
||||||
|
|
||||||
|
In this article we will dive a bit more into the details of our
|
||||||
|
network architecture and which problems we face with active-active
|
||||||
|
routers.
|
||||||
|
|
||||||
|
## Network architecture
|
||||||
|
|
||||||
|
Let's have a look at a simplified (!) diagram of the network:
|
||||||
|
|
||||||
|
... IMAGE
|
||||||
|
|
||||||
|
Doesn't look that simple, does it? Let's break it down into small
|
||||||
|
pieces.
|
||||||
|
|
||||||
|
## Upstream routers
|
||||||
|
|
||||||
|
We have a set of **upstream routers** which work stateless. They don't
|
||||||
|
have any stateful firewall rules, so both of them can work actively
|
||||||
|
without state synchronisation. Moreover, both of them peer with the
|
||||||
|
data center upstreams. These are fast routers and besides forwarding,
|
||||||
|
they also do **BGP peering** with our upstreams.
|
||||||
|
|
||||||
|
Over all the upstream routers are very simple machines, mostly running
|
||||||
|
bird and forwarding packets all day. They also provide a DNS service
|
||||||
|
(resolving and authoritative), because they are always up and can
|
||||||
|
announce service IPs via BGP or via OSPF to our network.
|
||||||
|
|
||||||
|
## Internal routers
|
||||||
|
|
||||||
|
The internal routers on the other hand provide **stateful routing**,
|
||||||
|
**IP address assignments** and **netboot services**. They are a bit
|
||||||
|
more complicated compared to the upstream routers, but they care only
|
||||||
|
a small routing table.
|
||||||
|
|
||||||
|
## Communication between the routers
|
||||||
|
|
||||||
|
All routers employ OSPF and BGP for route exchange. Thus the two
|
||||||
|
upstream routers learn about the internal networks (IPv6 only, as
|
||||||
|
usual) from the internal routers.
|
||||||
|
|
||||||
|
## Sessions
|
||||||
|
|
||||||
|
Sessions in networking are almost always an evil. You need to store
|
||||||
|
them (at high speed), you need to maintain them (updating, deleting)
|
||||||
|
and if you run multiple routers, you even need to sychronise them.
|
||||||
|
|
||||||
|
In our case the internal routers do have session handling, as they are
|
||||||
|
providing a stateful firewall. As we are using a multi router setup,
|
||||||
|
things can go really wrong if the wrong routes are being used.
|
||||||
|
|
||||||
|
Let's have a look at this a bit more in detail.
|
||||||
|
|
||||||
|
## The good path
|
||||||
|
|
||||||
|
IMAGE2: good
|
||||||
|
|
||||||
|
If a server sends out a packet via router1 and router1 eventually
|
||||||
|
receives the answer, everything is fine. The returning packet matches
|
||||||
|
the state entry that was created by the outgoing packet and the
|
||||||
|
internal router forwards the packet.
|
||||||
|
|
||||||
|
## The bad path
|
||||||
|
|
||||||
|
IMAGE3: bad
|
||||||
|
|
||||||
|
However if the
|
||||||
|
|
||||||
|
## Routing paths
|
||||||
|
|
||||||
|
If we want to go active-active routing, the server can choose between
|
||||||
|
either internal router for sending out the packet. The internal
|
||||||
|
routers again have two upstream routers. So with the return path
|
||||||
|
included, the following paths exist for a packet:
|
||||||
|
|
||||||
|
Outgoing paths:
|
||||||
|
|
||||||
|
* servers->router1->upstream router1->internet
|
||||||
|
* servers->router1->upstream router2->internet
|
||||||
|
* servers->router2->upstream router1->internet
|
||||||
|
* servers->router2->upstream router2->internet
|
||||||
|
|
||||||
|
And the returning paths are:
|
||||||
|
|
||||||
|
* internet->upstream router1->router 1->servers
|
||||||
|
* internet->upstream router1->router 2->servers
|
||||||
|
* internet->upstream router2->router 1->servers
|
||||||
|
* internet->upstream router2->router 2->servers
|
||||||
|
|
||||||
|
So on average, 50% of the routes will hit the right router on
|
||||||
|
return. However servers as well as upstream routers are not using load
|
||||||
|
balancing like ECMP, so once an incorrect path has been chosen, the
|
||||||
|
packet loss is 100%.
|
||||||
|
|
||||||
|
## Session synchronisation
|
||||||
|
|
||||||
|
In the first article we talked a bit about keepalived and that
|
||||||
|
it helps to operate routers in an active-passive mode. This did not
|
||||||
|
turn out to be the most reliable method. Can we do better with
|
||||||
|
active-active routers and session synchronisation?
|
||||||
|
|
||||||
|
Linux supports this using
|
||||||
|
[conntrackd](http://conntrack-tools.netfilter.org/). However,
|
||||||
|
conntrackd supports active-active routers on a **flow based** level,
|
||||||
|
but not on a **packet** based level. The difference is that the
|
||||||
|
following will not work in active-active routers with conntrackd:
|
||||||
|
|
||||||
|
```
|
||||||
|
#1 Packet (in the original direction) updates state in Router R1 ->
|
||||||
|
submit state to R2
|
||||||
|
#2 Packet (in the reply direction) arrive to Router R2 before state
|
||||||
|
coming from R1 has been digested.
|
||||||
|
|
||||||
|
With strict stateful filtering, Packet #2 will be dropped and it will
|
||||||
|
trigger a retransmission.
|
||||||
|
```
|
||||||
|
(quote from Pablo Neira Ayuso, see below for more details)
|
||||||
|
|
||||||
|
Some of you will mumble something like **latency** in their head right
|
||||||
|
now. If the return packet is guaranteed to arrive after state
|
||||||
|
synchronisation, then everything is fine, However, if the reply is
|
||||||
|
faster than the state synchronisation, packets will get dropped.
|
||||||
|
|
||||||
|
In reality, this will work for packets coming and going to the
|
||||||
|
Internet. However, in our setup the upstream routers are route between
|
||||||
|
different data center locations, which are in the sub micro second
|
||||||
|
latency area - i.e. lan speed, because they are interconnected with
|
||||||
|
dark fiber links.
|
||||||
|
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
Before moving on to the next blog article, we would like to express
|
||||||
|
our thanks to Pablo Neira Ayuso, who gave very important input for
|
||||||
|
session based firewalls and session synchronisation.
|
||||||
|
|
||||||
|
So active-active routing seems not to have a straight forward
|
||||||
|
solution. Read in the [next blog
|
||||||
|
article](/u/blog/datacenterlight-redundant-routing-infrastructure) on
|
||||||
|
how we solved the challenge in the end.
|
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
|
@ -0,0 +1,219 @@
|
||||||
|
title: IPv6 only netboot in Data Center Light
|
||||||
|
---
|
||||||
|
pub_date: 2021-05-01
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
How we switched from IPv4 netboot to IPv6 netboot
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
In our [previous blog
|
||||||
|
article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||||
|
we wrote about our motivation for the
|
||||||
|
big spring network cleanup. In this blog article we show how we
|
||||||
|
started reducing the complexity by removing our dependency on IPv4.
|
||||||
|
|
||||||
|
## IPv6 first
|
||||||
|
|
||||||
|
When you found our blog, you are probably aware: everything at
|
||||||
|
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
|
||||||
|
entries for remote access have IPv6 (AAAA) entries and there are only
|
||||||
|
rare exceptions when we utilise IPv4 for our infrastructure.
|
||||||
|
|
||||||
|
## IPv4 only Netboot
|
||||||
|
|
||||||
|
One of the big exceptions to this paradigm used to be how we boot our
|
||||||
|
servers. Because our second big paradigm is sustainability, we use a
|
||||||
|
lot of 2nd (or 3rd) generation hardware. We actually share this
|
||||||
|
passion with our friends from
|
||||||
|
[e-durable](https://recycled.cloud/), because sustainability is
|
||||||
|
something that we need to employ today and not tomorrow.
|
||||||
|
But back to the netbooting topic: For netbooting we mainly
|
||||||
|
relied on onboard network cards so far.
|
||||||
|
|
||||||
|
## Onboard network cards
|
||||||
|
|
||||||
|
We used these network cards for multiple reasons:
|
||||||
|
|
||||||
|
* they exist virtually in any server
|
||||||
|
* they usually have a ROM containing a PXE capable firmware
|
||||||
|
* it allows us to split real traffic to fiber cards and internal traffic
|
||||||
|
|
||||||
|
However using the onboard devices comes also with a couple of disadvantages:
|
||||||
|
|
||||||
|
* Their ROM is often outdated
|
||||||
|
* It requires additional cabling
|
||||||
|
|
||||||
|
## Cables
|
||||||
|
|
||||||
|
Let's have a look at the cabling situation first. Virtually all of
|
||||||
|
our servers are connected to the network using 2x 10 Gbit/s fiber cards.
|
||||||
|
|
||||||
|
On one side this provides a fast connection, but on the other side
|
||||||
|
it provides us with something even better: distances.
|
||||||
|
|
||||||
|
Our data centers employ a non-standard design due to the re-use of
|
||||||
|
existing factory halls. This means distances between servers and
|
||||||
|
switches can be up to 100m. With fiber, we can easily achieve these
|
||||||
|
distances.
|
||||||
|
|
||||||
|
Additionally, have less cables provides a simpler infrastructure
|
||||||
|
that is easier to analyse.
|
||||||
|
|
||||||
|
## Disabling onboard network cards
|
||||||
|
|
||||||
|
So can we somehow get rid of the copper cables and switch to fiber
|
||||||
|
only? It turns out that the fiber cards we use (mainly Intel X520's)
|
||||||
|
have their own ROM. So we started disabling the onboard network cards
|
||||||
|
and tried booting from the fiber cards. This worked until we wanted to
|
||||||
|
move the lab setup to production...
|
||||||
|
|
||||||
|
## Bonding (LACP) and VLAN tagging
|
||||||
|
|
||||||
|
Our servers use bonding (802.3ad) for redundant connections to the
|
||||||
|
switches and VLAN tagging on top of the bonded devices to isolate
|
||||||
|
client traffic. On the switch side we realised this using
|
||||||
|
configurations like
|
||||||
|
|
||||||
|
```
|
||||||
|
interface Port-Channel33
|
||||||
|
switchport mode trunk
|
||||||
|
mlag 33
|
||||||
|
|
||||||
|
...
|
||||||
|
interface Ethernet33
|
||||||
|
channel-group 33 mode active
|
||||||
|
```
|
||||||
|
|
||||||
|
But that does not work, if the network ROM at boot does not create an
|
||||||
|
LACP enabled link on top of which it should be doing VLAN tagging.
|
||||||
|
|
||||||
|
The ROM in our network cards **would** have allowed VLAN tagging alone
|
||||||
|
though.
|
||||||
|
|
||||||
|
To fix this problem, we reconfigured our switches as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
interface Port-Channel33
|
||||||
|
switchport trunk native vlan 10
|
||||||
|
switchport mode trunk
|
||||||
|
port-channel lacp fallback static
|
||||||
|
port-channel lacp fallback timeout 20
|
||||||
|
mlag 33
|
||||||
|
```
|
||||||
|
|
||||||
|
This basically does two things:
|
||||||
|
|
||||||
|
* If there are no LACP frames, fallback to static (non lacp)
|
||||||
|
configuration
|
||||||
|
* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
|
||||||
|
|
||||||
|
Great, our servers can now netboot from fiber! But we are not done
|
||||||
|
yet...
|
||||||
|
|
||||||
|
## IPv6 only netbooting
|
||||||
|
|
||||||
|
So how do we convince these network cards to do IPv6 netboot? Can we
|
||||||
|
actually do that at all? Our first approach was to put a custom build of
|
||||||
|
[ipxe](https://ipxe.org/) on a USB stick. We generated that
|
||||||
|
ipxe image using **rebuild-ipxe.sh** script
|
||||||
|
from the
|
||||||
|
[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
|
||||||
|
repository. Turns out using a USB stick works pretty well for most
|
||||||
|
situations.
|
||||||
|
|
||||||
|
## ROMs are not ROMs
|
||||||
|
|
||||||
|
As you can imagine, the ROM of the X520 cards does not contain IPv6
|
||||||
|
netboot support. So are we back at square 1? No, we are not. Because
|
||||||
|
the X520's have something that the onboard devices did not
|
||||||
|
consistently have: **a rewritable memory area**.
|
||||||
|
|
||||||
|
Let's take 2 steps back here first: A ROM is an **read only memory**
|
||||||
|
chip. Emphasis on **read only**. However, modern network cards and a
|
||||||
|
lot of devices that support on-device firmware do actually have a
|
||||||
|
memory (flash) area that can be written to. And that is what aids us
|
||||||
|
in our situation.
|
||||||
|
|
||||||
|
## ipxe + flbtool + x520 = fun
|
||||||
|
|
||||||
|
Trying to write ipxe into the X520 cards initially failed, because the
|
||||||
|
network card did not recognise the format of the ipxe rom file.
|
||||||
|
|
||||||
|
Luckily the folks in the ipxe community already spotted that problem
|
||||||
|
AND fixed it: The format used in these cards is called FLB. And there
|
||||||
|
is [flbtool](https://github.com/devicenull/flbtool/), which allows you
|
||||||
|
to wrap the ipxe rom file into the FLB format. For those who want to
|
||||||
|
try it yourself (at your own risk!), it basically involves:
|
||||||
|
|
||||||
|
* Get the current ROM from the card (try bootutil64e)
|
||||||
|
* Extract the contents from the rom using flbtool
|
||||||
|
* This will output some sections/parts
|
||||||
|
* Locate one part that you want to overwrite with iPXE (a previous PXE
|
||||||
|
section is very suitable)
|
||||||
|
* Replace the .bin file with your iPXE rom
|
||||||
|
* Adjust the .json file to match the length of the new binary
|
||||||
|
* Build a new .flb file using flbtool
|
||||||
|
* Flash it onto the card
|
||||||
|
|
||||||
|
While this is a bit of work, it is worth it for us, because...:
|
||||||
|
|
||||||
|
## IPv6 only netboot over fiber
|
||||||
|
|
||||||
|
With the modified ROM, basically loading iPXE at start, we can now
|
||||||
|
boot our servers in IPv6 only networks. On our infrastructure side, we
|
||||||
|
added two **tiny** things:
|
||||||
|
|
||||||
|
We use ISC dhcp with the following configuration file:
|
||||||
|
|
||||||
|
```
|
||||||
|
option dhcp6.bootfile-url code 59 = string;
|
||||||
|
|
||||||
|
option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
|
||||||
|
|
||||||
|
subnet6 2a0a:e5c0:0:6::/64 {}
|
||||||
|
```
|
||||||
|
|
||||||
|
(that is the complete configuration!)
|
||||||
|
|
||||||
|
And we used radvd to announce that there are other information,
|
||||||
|
indicating clients can actually query the dhcpv6 server:
|
||||||
|
|
||||||
|
```
|
||||||
|
interface bond0.10
|
||||||
|
{
|
||||||
|
AdvSendAdvert on;
|
||||||
|
MinRtrAdvInterval 3;
|
||||||
|
MaxRtrAdvInterval 5;
|
||||||
|
AdvDefaultLifetime 600;
|
||||||
|
|
||||||
|
# IPv6 netbooting
|
||||||
|
AdvOtherConfigFlag on;
|
||||||
|
|
||||||
|
prefix 2a0a:e5c0:0:6::/64 { };
|
||||||
|
|
||||||
|
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
||||||
|
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
Being able to reduce cables was one big advantage in the beginning.
|
||||||
|
|
||||||
|
Switching to IPv6 only netboot does not seem like a big simplification
|
||||||
|
in the first place, besides being able to remove IPv4 in server
|
||||||
|
networks.
|
||||||
|
|
||||||
|
However as you will see in
|
||||||
|
[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
|
||||||
|
switching to IPv6 only netbooting is actually a key element on
|
||||||
|
reducing complexity in our network.
|
|
@ -0,0 +1,222 @@
|
||||||
|
title: Redundant routing infrastructure at Data Center Light
|
||||||
|
---
|
||||||
|
pub_date: 2021-05-01
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
In case you have missed the previous articles, you can
|
||||||
|
get [an introduction to the Data Center Light spring
|
||||||
|
cleanup](/u/blog/datacenterlight-spring-network-cleanup),
|
||||||
|
see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot)
|
||||||
|
or read about [the active-active routing
|
||||||
|
problems](/u/blog/datacenterlight-active-active-routing/).
|
||||||
|
|
||||||
|
In this article we will show how we finally solved the routing issue
|
||||||
|
conceptually as well as practically.
|
||||||
|
|
||||||
|
## Active-active or passive-active routing?
|
||||||
|
|
||||||
|
In the [previous blog article](/u/blog/datacenterlight-active-active-routing/)
|
||||||
|
we reasoned that active-active routing, even with session
|
||||||
|
synchronisation does not have a straight forward solution in our
|
||||||
|
case. However in the
|
||||||
|
[first blog article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||||
|
we reasoned that active-passive routers with VRRP and keepalived are
|
||||||
|
not stable enough either
|
||||||
|
|
||||||
|
So which path should we take? Or is there another solution?
|
||||||
|
|
||||||
|
## Active-Active-Passive Routing
|
||||||
|
|
||||||
|
Let us introduce Active-Active-Passive routing. Something that sounds
|
||||||
|
strange in the first place, but is going to make sense in the next
|
||||||
|
minutes.
|
||||||
|
|
||||||
|
We do want multiple active routers, but we do not want to have to
|
||||||
|
deal with session synchronisation, which is not only tricky, but due
|
||||||
|
to its complexity can also be a source of error.
|
||||||
|
|
||||||
|
So what we are looking for is active-active routing without state
|
||||||
|
synchronisation. While this sounds like a contradiction, if we loosen
|
||||||
|
our requirement a little bit, we are able to support multiple active
|
||||||
|
routers without session synchronisation by using **routing
|
||||||
|
priorities**.
|
||||||
|
|
||||||
|
## Active-Active routing with routing priorities
|
||||||
|
|
||||||
|
Let's assume for a moment that all involved hosts (servers, clients,
|
||||||
|
routers, etc.) know about multiple routes for outgoing and incoming
|
||||||
|
traffic. Let's assume also for a moment that **we can prioritise**
|
||||||
|
those routes. Then we can create a deterministic routing path that
|
||||||
|
does not need session synchronisation.
|
||||||
|
|
||||||
|
## Steering outgoing traffic
|
||||||
|
|
||||||
|
Let's have a first look at the outgoing traffic. Can we announce
|
||||||
|
multiple routers in a network, but have the servers and clients
|
||||||
|
**prefer** one of the routers? The answer is yes!
|
||||||
|
If we checkout the manpage of
|
||||||
|
[radvd.conf(5)](https://linux.die.net/man/5/radvd.conf) we find a
|
||||||
|
setting that is named **AdvDefaultPreference**
|
||||||
|
|
||||||
|
```
|
||||||
|
AdvDefaultPreference low|medium|high
|
||||||
|
```
|
||||||
|
|
||||||
|
Using this attribute, two routers can both actively announce
|
||||||
|
themselves, but clients in the network will prefer the one with the
|
||||||
|
higher preference setting.
|
||||||
|
|
||||||
|
### Replacing radvd with bird
|
||||||
|
|
||||||
|
At this point a short side note: We have been using radvd for some
|
||||||
|
years in the Data Center Light. However recently on our
|
||||||
|
[Alpine Linux based routers](https://alpinelinux.org/), radvd started
|
||||||
|
to crash from time to time:
|
||||||
|
|
||||||
|
```
|
||||||
|
[717424.727125] device eth1 left promiscuous mode
|
||||||
|
[1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in ld-musl-x86_64.so.1[7f6bdd558000+48000]
|
||||||
|
[1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||||||
|
...
|
||||||
|
[1458460.511006] device eth0 entered promiscuous mode
|
||||||
|
[1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in ld-musl-x86_64.so.1[7f94ec1c2000+48000]
|
||||||
|
[1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Unfortunately it seems that either the addresses timed out or that
|
||||||
|
radvd was able to send a message de-announcing itself prior to the
|
||||||
|
crash, causing all clients to withdraw their addresses. This is
|
||||||
|
especially problematic, if you run a [ceph](https://ceph.io/) cluster
|
||||||
|
and the servers don't have IP addresses anymore...
|
||||||
|
|
||||||
|
While we did not yet investigate the full cause of this, we had a very
|
||||||
|
easy solution: as all of our routers run
|
||||||
|
[bird](https://bird.network.cz/) and it also supports sending router
|
||||||
|
advertisements, we replaced radvd with bird. The configuration is
|
||||||
|
actually pretty simple:
|
||||||
|
|
||||||
|
```
|
||||||
|
protocol radv {
|
||||||
|
# Internal
|
||||||
|
interface "eth1.5" {
|
||||||
|
max ra interval 5; # Fast failover with more routers
|
||||||
|
other config yes; # dhcpv6 boot
|
||||||
|
default preference high;
|
||||||
|
};
|
||||||
|
rdnss {
|
||||||
|
lifetime 3600;
|
||||||
|
ns 2a0a:e5c0:0:a::a;
|
||||||
|
ns 2a0a:e5c0:0:a::b;
|
||||||
|
};
|
||||||
|
dnssl {
|
||||||
|
lifetime 3600;
|
||||||
|
domain "place5.ungleich.ch";
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
## Steering incoming traffic
|
||||||
|
|
||||||
|
As the internal and the upstream routers are in the same data center,
|
||||||
|
we can use an IGP like OSPF to distribute the routes to the internal
|
||||||
|
routers. And OSPF actually has this very neat metric called **cost**.
|
||||||
|
So for the router that sets the **default preference high** for the
|
||||||
|
outgoing routes, we keep the cost at 10, for the router that
|
||||||
|
ses the **default preference low** we set the cost at 20. The actual
|
||||||
|
bird configuration on a router looks like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
define ospf_cost = 10;
|
||||||
|
...
|
||||||
|
|
||||||
|
protocol ospf v3 ospf6 {
|
||||||
|
instance id 0;
|
||||||
|
|
||||||
|
ipv6 {
|
||||||
|
import all;
|
||||||
|
export none;
|
||||||
|
};
|
||||||
|
|
||||||
|
area 0 {
|
||||||
|
interface "eth1.*" {
|
||||||
|
authentication cryptographic;
|
||||||
|
password "weshouldhaveremovedthisfortheblogpost";
|
||||||
|
cost ospf_cost;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Incoming + Outgoing = symmetric paths
|
||||||
|
|
||||||
|
With both directions under our control, we now have enabled symmetric
|
||||||
|
routing in both directions. Thus as long as the first router is alive,
|
||||||
|
all traffic will be handled by the first router.
|
||||||
|
|
||||||
|
## Failover scenario
|
||||||
|
|
||||||
|
In case the first router fails, clients have a low life time of 15
|
||||||
|
seconds (3x **max ra interval**)
|
||||||
|
for their routes and they will fail over to the 2nd router
|
||||||
|
automatically. Existing sessions will not continue to work, but that
|
||||||
|
is ok for our setup. When the first router with the higher priority
|
||||||
|
comes back, there will be again an interruption, but clients will
|
||||||
|
automatically change their paths.
|
||||||
|
|
||||||
|
And so will the upstream routers, as OSPF is a quick protocol that
|
||||||
|
updates alive routers and routes.
|
||||||
|
|
||||||
|
|
||||||
|
## IPv6 enables active-active-passive routing architectures
|
||||||
|
|
||||||
|
At ungleich it almost always comes back to the topic of IPv6, albeit
|
||||||
|
for a good reason. You might remember that we claimed in the
|
||||||
|
[IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article
|
||||||
|
that this is reducing complexity? If you look at the above example,
|
||||||
|
you might not spot it directly, but going IPv6 only is actually an
|
||||||
|
enabler for our setup:
|
||||||
|
|
||||||
|
We **only deploy router advertisements** using bird. We are **not using DHCPv4**
|
||||||
|
or **IPv4** for accessing our servers. Both routers run a dhcpv6
|
||||||
|
service in parallel, with the "boot server" pointing to themselves.
|
||||||
|
|
||||||
|
Besides being nice and clean,
|
||||||
|
our whole active-active-passive routing setup **would not work with
|
||||||
|
IPv4**, because dhcpv4 servers do not have the same functionality to
|
||||||
|
provide routing priorities.
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
You can see that trying to solve one problem ("unreliable redundant
|
||||||
|
router setup") entailed a slew of changes, but in the end made our
|
||||||
|
infrastructure much simpler:
|
||||||
|
|
||||||
|
* No dual stack
|
||||||
|
* No private IPv4 addresses
|
||||||
|
* No actively communicating keepalived
|
||||||
|
* Two daemons less to maintain (keepalived, radvd)
|
||||||
|
|
||||||
|
We also avoided complex state synchronisation and deployed only Open
|
||||||
|
Source Software to address our problems. Furthermore hardware that
|
||||||
|
looked like unusable in modern IPv6 networks can also be upgraded with
|
||||||
|
Open Source Software (ipxe) and enables us to provide more sustainable
|
||||||
|
infrastructures.
|
||||||
|
|
||||||
|
We hope you enjoyed our spring cleanup blog series. The next one will
|
||||||
|
be coming, because IT infrastructures always evolve. Until then:
|
||||||
|
feel free to [join our Open Soure Chat](https://chat.with.ungleich.ch)
|
||||||
|
and join the discussion.
|
|
@ -0,0 +1,161 @@
|
||||||
|
title: Data Center Light: Spring network cleanup
|
||||||
|
---
|
||||||
|
pub_date: 2021-05-01
|
||||||
|
---
|
||||||
|
author: Nico Schottelius
|
||||||
|
---
|
||||||
|
twitter_handle: NicoSchottelius
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
From today on ungleich offers free, encrypted IPv6 VPNs for hackerspaces
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Spring is the time for cleanup. Cleanup up your apartment, removing
|
||||||
|
dust from the cabinet, letting the light shine through the windows,
|
||||||
|
or like in our case: improving the networking situation.
|
||||||
|
|
||||||
|
In this article we give an introduction of where we started and what
|
||||||
|
the typical setup used to be in our data center.
|
||||||
|
|
||||||
|
## Best practice
|
||||||
|
|
||||||
|
When we started [Data Center Light](https://datacenterlight.ch) in
|
||||||
|
2017, we orientated ourselves at "best practice" for networking. We
|
||||||
|
started with IPv6 only networks and used RFC1918 network (10/8) for
|
||||||
|
internal IPv4 routing.
|
||||||
|
|
||||||
|
And we started with 2 routers for every network to provide
|
||||||
|
redundancy.
|
||||||
|
|
||||||
|
## Router redundancy
|
||||||
|
|
||||||
|
So what do you do when you have two routers? In the Linux world the
|
||||||
|
software [keepalived](https://keepalived.org/)
|
||||||
|
is very popular to provide redundant routing
|
||||||
|
using the [VRRP protocol](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol).
|
||||||
|
|
||||||
|
## Active-Passive
|
||||||
|
|
||||||
|
While VRRP is designed to allow multiple (not only two) routers to
|
||||||
|
co-exist in a network, its design is basically active-passive: you
|
||||||
|
have one active router and n passive routers, in our case 1
|
||||||
|
additional.
|
||||||
|
|
||||||
|
## Keepalived: a closer look
|
||||||
|
|
||||||
|
A typical keepalived configuration in our network looked like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
vrrp_instance router_v4 {
|
||||||
|
interface INTERFACE
|
||||||
|
virtual_router_id 2
|
||||||
|
priority PRIORITY
|
||||||
|
advert_int 1
|
||||||
|
virtual_ipaddress {
|
||||||
|
10.0.0.1/22 dev eth1.5 # Internal
|
||||||
|
}
|
||||||
|
notify_backup "/usr/local/bin/vrrp_notify_backup.sh"
|
||||||
|
notify_fault "/usr/local/bin/vrrp_notify_fault.sh"
|
||||||
|
notify_master "/usr/local/bin/vrrp_notify_master.sh"
|
||||||
|
}
|
||||||
|
|
||||||
|
vrrp_instance router_v6 {
|
||||||
|
interface INTERFACE
|
||||||
|
virtual_router_id 1
|
||||||
|
priority PRIORITY
|
||||||
|
advert_int 1
|
||||||
|
virtual_ipaddress {
|
||||||
|
2a0a:e5c0:1:8::48/128 dev eth1.8 # Transfer for routing from outside
|
||||||
|
2a0a:e5c0:0:44::7/64 dev bond0.18 # zhaw
|
||||||
|
2a0a:e5c0:2:15::7/64 dev bond0.20 #
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a template that we distribute via [cdist](https:/cdi.st). The
|
||||||
|
strings INTERFACE and PRIORITY are replaced via cdist. The interface
|
||||||
|
field defines which interface to use for VRRP communication and the
|
||||||
|
priority field determines which of the routers is the active one.
|
||||||
|
|
||||||
|
So far, so good. However let's have a look at a tiny detail of this
|
||||||
|
configuration file:
|
||||||
|
|
||||||
|
```
|
||||||
|
notify_backup "/usr/local/bin/vrrp_notify_backup.sh"
|
||||||
|
notify_fault "/usr/local/bin/vrrp_notify_fault.sh"
|
||||||
|
notify_master "/usr/local/bin/vrrp_notify_master.sh"
|
||||||
|
```
|
||||||
|
|
||||||
|
These three lines basically say: "start something if you are the
|
||||||
|
master" and "stop something in case you are not". And why did we do
|
||||||
|
this? Because of stateful services.
|
||||||
|
|
||||||
|
## Stateful services
|
||||||
|
|
||||||
|
A typical shell script that we would call containes lines like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
/etc/init.d/radvd stop
|
||||||
|
/etc/init.d/dhcpd stop
|
||||||
|
```
|
||||||
|
(or start in the case of the master version)
|
||||||
|
|
||||||
|
In earlier days, this even contained openvpn, which was running on our
|
||||||
|
first generation router version. But more about OpenVPN later.
|
||||||
|
|
||||||
|
The reason why we stopped and started dhcp and radvd is to make
|
||||||
|
clients of the network use the active router. We used radvd to provide
|
||||||
|
IPv6 addresses as the primary access method to servers. And we used
|
||||||
|
dhcp mainly to allow servers to netboot. The active router would
|
||||||
|
carry state (firewall!) and thus the flow of packets always need to go
|
||||||
|
through the active router.
|
||||||
|
|
||||||
|
Restarting radvd on a different machine keeps the IPv6 addresses the
|
||||||
|
same, as clients assign then themselves using EUI-64. In case of dhcp
|
||||||
|
(IPv4) we would have used hardcoded IPv4 addresses using a mapping of
|
||||||
|
MAC address to IPv4 address, but we opted out for this. The main
|
||||||
|
reason is that dhcp clients re-request their same leas and even if an
|
||||||
|
IPv4 addresses changes, it is not really of importance.
|
||||||
|
|
||||||
|
During a failover this would lead to a few seconds interrupt and
|
||||||
|
re-establishing sessions. Given that routers are usually rather stable
|
||||||
|
and restarting them is not a daily task, we initially accepted this.
|
||||||
|
|
||||||
|
## Keepalived/VRRP changes
|
||||||
|
|
||||||
|
One of the more tricky things is changes to keepalived. Because
|
||||||
|
keepalived uses the *number of addresses and routes* to verify
|
||||||
|
that the received VRRP packet matches its configuration, adding or
|
||||||
|
deleting IP addresses and routes, causes a problem:
|
||||||
|
|
||||||
|
While one router was updated, the number of IP addresses or routes is
|
||||||
|
different. This causes both routers to ignore the others VRRP messages
|
||||||
|
and both routers think they should be the master process.
|
||||||
|
|
||||||
|
This leads to the problem that both routers receive client and outside
|
||||||
|
traffic. This causes the firewall (nftables) to not recognise
|
||||||
|
returning packets, if they were sent out by router1, but received back
|
||||||
|
by router2 and, because nftables is configured *stateful*, will drop
|
||||||
|
the returning packet.
|
||||||
|
|
||||||
|
However not only changes to the configuration can trigger this
|
||||||
|
problem, but also any communication problem between the two
|
||||||
|
routers. Since 2017 we experienced it multiple times that keepalived
|
||||||
|
was unable to receive or send messages from the other router and thus
|
||||||
|
both of them again became the master process.
|
||||||
|
|
||||||
|
## Take away
|
||||||
|
|
||||||
|
While in theory keepalived should improve the reliability, in practice
|
||||||
|
the number of problems due to double master situations we had, made us
|
||||||
|
question whether the keepalived concept is the fitting one for us.
|
||||||
|
|
||||||
|
You can read how we evolved from this setup in
|
||||||
|
[the next blog article](/u/blog/datacenterlight-ipv6-only-netboot/).
|
192
content/u/blog/glamp-1-2021/contents.lr
Normal file
192
content/u/blog/glamp-1-2021/contents.lr
Normal file
|
@ -0,0 +1,192 @@
|
||||||
|
title: GLAMP #1 2021
|
||||||
|
---
|
||||||
|
pub_date: 2021-07-17
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: yes
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
The first un-hack4glarus happens as a camp - Thursday 2021-08-19 to Sunday 2021-08-22.
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Tl;DR
|
||||||
|
|
||||||
|
Get your tent, connect it to power and 10Gbit/s Internet in the midst
|
||||||
|
of the Glarner mountains. Happenening Thursday 2021-08-19 to Sunday 2021-08-22.
|
||||||
|
Apply for participation by mail (information at the bottom of the page).
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
It has been some time since our
|
||||||
|
[last Hack4Glarus](https://hack4glarus.ch) and we have been missing
|
||||||
|
all our friends, hackers and participants. At ungleich we have been
|
||||||
|
watching the development of the Coronavirus world wide and as you
|
||||||
|
might know, we have decided against a Hack4Glarus for this summer, as
|
||||||
|
the Hack4Glarus has been an indoor event so far.
|
||||||
|
|
||||||
|
## No Hack4Glarus = GLAMP
|
||||||
|
|
||||||
|
However, we want to try a different format that ensures proper
|
||||||
|
safety. Instead of an indoor Hack4Glarus in Linthal, we introduce
|
||||||
|
the Glarus Camp (or GLAMP in short) to you. An outdoor event with
|
||||||
|
sufficient space for distancing. As a camping site we can use the
|
||||||
|
surrounding of the Hacking Villa, supported by the Hacking Villa
|
||||||
|
facilities.
|
||||||
|
|
||||||
|
Compared to the Hack4Glarus, the GLAMP will focus more on
|
||||||
|
*relaxation*, *hangout* than being a hackathon. We think times are
|
||||||
|
hard enough to give everyone a break.
|
||||||
|
|
||||||
|
## The setting
|
||||||
|
|
||||||
|
Many of you know the [Hacking Villa](/u/projects/hacking-villa/) in
|
||||||
|
Diesbach already. Located just next to the pretty waterfall and the amazing
|
||||||
|
Legler Areal. The villa is connected with 10 Gbit/s to the
|
||||||
|
[Data Center Light](/u/projects/data-center-light/) and offers a lot
|
||||||
|
of fun things to do.
|
||||||
|
|
||||||
|
## Coronavirus measures beforehand
|
||||||
|
|
||||||
|
To ensure safety for everyone, we ask everyone attending to provide a
|
||||||
|
reasonable proof of not spreading the corona virus with one of the
|
||||||
|
following proofs:
|
||||||
|
|
||||||
|
* You have been vaccinated
|
||||||
|
* You had the corona virus and you are symptom free for at least 14
|
||||||
|
days
|
||||||
|
* You have been tested with a PCR test (7 days old at maximum) and the
|
||||||
|
result was negative
|
||||||
|
|
||||||
|
All participants will be required to take an short antigen test on
|
||||||
|
site.
|
||||||
|
|
||||||
|
**Please do not attend if you feel sick for the safety of everyone else.**
|
||||||
|
|
||||||
|
## Coronavirus measures on site
|
||||||
|
|
||||||
|
To keep the space safe on site as well, we ask you to follow these
|
||||||
|
rules:
|
||||||
|
|
||||||
|
* Sleep in your own tent
|
||||||
|
* Wear masks inside the Hacking Villa
|
||||||
|
* Especially if you are preparing food shared with others
|
||||||
|
* Keep distance and respect others safety wishes
|
||||||
|
|
||||||
|
## Hacking Villa Facilities
|
||||||
|
|
||||||
|
* Fast Internet (what do you need more?)
|
||||||
|
* A shared, open area outside for hacking
|
||||||
|
* Toilets and bath room located inside
|
||||||
|
|
||||||
|
## What to bring
|
||||||
|
|
||||||
|
* A tent + sleeping equipment
|
||||||
|
* Fun stuff
|
||||||
|
* Your computer
|
||||||
|
* Wifi / IoT / Hacking things
|
||||||
|
* If you want wired Internet in your tent: a 15m+ Ethernet cable
|
||||||
|
* WiFi will be provided everywhere
|
||||||
|
|
||||||
|
## What is provided
|
||||||
|
|
||||||
|
* Breakfast every morning
|
||||||
|
* A place for a tent
|
||||||
|
* Power to the tent (Swiss plug)
|
||||||
|
* WiFi to the tent
|
||||||
|
* Traditional closing event spaghetti
|
||||||
|
|
||||||
|
## What you can find nearby
|
||||||
|
|
||||||
|
* A nearby supermarket (2km) reachable by foot, scooter, bike
|
||||||
|
* A waterfall + barbecue place (~400m)
|
||||||
|
* Daily attractions such as hacking, hiking, biking, hanging out
|
||||||
|
|
||||||
|
## Registration
|
||||||
|
|
||||||
|
As the space is limited, we can accomodate about 10 tents (roughly 23
|
||||||
|
people). To register, send an email to support@ungleich.ch based on
|
||||||
|
the following template:
|
||||||
|
|
||||||
|
```
|
||||||
|
Subject: GLAMP#1 2021
|
||||||
|
|
||||||
|
For each person with you (including yourself):
|
||||||
|
|
||||||
|
Non Coronavirus proof:
|
||||||
|
(see requirements on the glamp page)
|
||||||
|
|
||||||
|
Name(s):
|
||||||
|
(how you want to be called)
|
||||||
|
|
||||||
|
Interests:
|
||||||
|
(will be shown to others at the glamp)
|
||||||
|
|
||||||
|
Skills:
|
||||||
|
(will be shown to others at the glamp)
|
||||||
|
|
||||||
|
Food interests:
|
||||||
|
(we use this for pooling food orders)
|
||||||
|
|
||||||
|
What I would like to do:
|
||||||
|
(will be shown to others at the glamp)
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
The particaption fee is 70 CHF/person (to be paid on arrival).
|
||||||
|
|
||||||
|
## Time, Date and Location
|
||||||
|
|
||||||
|
* Arrival possible from Wednesday 2021-08-18 16:00
|
||||||
|
* GLAMP#1 starts officially on Thursday 2021-08-19, 1000
|
||||||
|
* GLAMP#1 closing lunch Sunday 2021-08-22, 1200
|
||||||
|
* GLAMP#1 ends officially on to Sunday 2021-08-22, 1400
|
||||||
|
|
||||||
|
Location: [Hacking Villa](/u/projects/hacking-villa/)
|
||||||
|
|
||||||
|
|
||||||
|
## FAQ
|
||||||
|
|
||||||
|
### Where do I get Internet?
|
||||||
|
|
||||||
|
It is available everywhere at/around the Hacking Villa via WiFi. For
|
||||||
|
cable based Internet bring a 15m+ Ethernet cable.
|
||||||
|
|
||||||
|
### Where do I get Electricity?
|
||||||
|
|
||||||
|
You'll get electricity directly to the tent. Additionally the shared
|
||||||
|
area also has electricity. You can also bring solar panels, if you
|
||||||
|
like.
|
||||||
|
|
||||||
|
### Where do I get food?
|
||||||
|
|
||||||
|
Breakfast is provided by us. But what about the rest of the day?
|
||||||
|
There are a lot of delivery services available, ranging from Pizza,
|
||||||
|
Tibetan, Thai, Swiss (yes!), etc. available.
|
||||||
|
|
||||||
|
Nearby are 2 Volg supermarkets, next Coop is in Schwanden, bigger
|
||||||
|
Migros in Glarus and very big Coop can be found in Netstal. The Volg
|
||||||
|
is reachable by foot, all others are reachable by train or bike.
|
||||||
|
|
||||||
|
There is also a kitchen inside the Hacking Villa for cooking.
|
||||||
|
There is also a great barbecue place just next to the waterfall.
|
||||||
|
|
||||||
|
### What can I do at the GLAMP?
|
||||||
|
|
||||||
|
There are
|
||||||
|
[alot](http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-you-at-everything.html)
|
||||||
|
of opportunities at the GLAMP:
|
||||||
|
|
||||||
|
You can ...
|
||||||
|
|
||||||
|
* just relax and hangout
|
||||||
|
* hack on project that you post poned for long
|
||||||
|
* hike up mountains (up to 3612m! Lower is also possible)
|
||||||
|
* meet other hackers
|
||||||
|
* explore the biggest water power plant in Europe (Linth Limmern)
|
||||||
|
* and much much more!
|
BIN
content/u/blog/glamp-1-2021/diesback-bg-small.jpg
Normal file
BIN
content/u/blog/glamp-1-2021/diesback-bg-small.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 380 KiB |
Binary file not shown.
After Width: | Height: | Size: 167 KiB |
|
@ -0,0 +1,123 @@
|
||||||
|
title: Configuring bind to only forward DNS to a specific zone
|
||||||
|
---
|
||||||
|
pub_date: 2021-07-25
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: yes
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
Want to use BIND for proxying to another server? This is how you do it.
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
In this article we'll show you an easy solution to host DNS zones on
|
||||||
|
IPv6 only or private DNS servers. The method we use here is **DNS
|
||||||
|
forwarding** as offered in ISC BIND, but one could also see this as
|
||||||
|
**DNS proxying**.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
Sometimes you might have a DNS server that is authoritative for DNS
|
||||||
|
data, but is not reachable for all clients. This might be the case for
|
||||||
|
instance, if
|
||||||
|
|
||||||
|
* your DNS server is IPv6 only: it won't be directly reachable from
|
||||||
|
the IPv4 Internet
|
||||||
|
* your DNS server is running in a private network, either IPv4 or IPv6
|
||||||
|
|
||||||
|
In both cases, you need something that is publicly reachable, to
|
||||||
|
enable clients to access the zone, like show in the following picture:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## The problem: Forwarding requires recursive queries
|
||||||
|
|
||||||
|
ISC Bind allows to forward queries to another name server. However to
|
||||||
|
do so, it need to be configured to allow handling recursive querying.
|
||||||
|
However, if we allow recursive querying by any client, we basically
|
||||||
|
create an [Open DNS resolver, which can be quite
|
||||||
|
dangerous](https://www.ncsc.gov.ie/emailsfrom/DDoS/DNS/).
|
||||||
|
|
||||||
|
## The solution
|
||||||
|
|
||||||
|
ISC Bind by default has a root hints file compiled in, which allows it
|
||||||
|
to function as a resolver without any additional configuration
|
||||||
|
files. That is great, but not if you want to prevent it to work as
|
||||||
|
forwarder as described above. But we can easily fix that problem. Now,
|
||||||
|
let's have a look at a real world use case, step-by-step:
|
||||||
|
|
||||||
|
### Step 1: Global options
|
||||||
|
|
||||||
|
In the first step, we need to set the global to allow recursion from
|
||||||
|
anyone, as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
options {
|
||||||
|
directory "/var/cache/bind";
|
||||||
|
|
||||||
|
listen-on-v6 { any; };
|
||||||
|
|
||||||
|
allow-recursion { ::/0; 0.0.0.0/0; };
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
However as mentioned above, this would create an open resolver. To
|
||||||
|
prevent this, let's disable the root hints:
|
||||||
|
|
||||||
|
### Step 2: Disable root hints
|
||||||
|
|
||||||
|
The root hints are served in the root zone, also know as ".". To
|
||||||
|
disable it, we give bind an empty file to use:
|
||||||
|
|
||||||
|
```
|
||||||
|
zone "." {
|
||||||
|
type hint;
|
||||||
|
file "/dev/null";
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: in case you do want to allow recursive function for some
|
||||||
|
clients, **you can create multiple DNS views**.
|
||||||
|
|
||||||
|
### Step 3: The actual DNS file
|
||||||
|
|
||||||
|
In our case, we have a lot of IPv6 only kubernetes clusters, which are
|
||||||
|
named `xx.k8s.ooo` and have a world wide rachable CoreDNS server built
|
||||||
|
in. In this case, we want to allow the domain c1.k8s.ooo to be world
|
||||||
|
reachable, so we configure the dual stack server as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
zone "c1.k8s.ooo" {
|
||||||
|
type forward;
|
||||||
|
forward only;
|
||||||
|
forwarders { 2a0a:e5c0:2:f::a; };
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### Step 4: adjusting the zone file
|
||||||
|
|
||||||
|
In case you are running an IPv6 only server, you need to configure the
|
||||||
|
upstream DNS server. In our case this looks as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
; The domain: c1.k8s.ooo
|
||||||
|
c1 NS kube-dns.kube-system.svc.c1
|
||||||
|
|
||||||
|
; The IPv6 only DNS server
|
||||||
|
kube-dns.kube-system.svc.c1 AAAA 2a0a:e5c0:2:f::a
|
||||||
|
|
||||||
|
; The forwarding IPv4 server
|
||||||
|
kube-dns.kube-system.svc.c1 A 194.5.220.43
|
||||||
|
```
|
||||||
|
|
||||||
|
## DNS, IPv6, Kubernetes?
|
||||||
|
|
||||||
|
If you are curious to learn more about either of these topics, feel
|
||||||
|
[free to join us on our chat](/u/projects/open-chat/).
|
Binary file not shown.
After Width: | Height: | Size: 154 KiB |
210
content/u/blog/ipv6-link-local-support-in-browsers/contents.lr
Normal file
210
content/u/blog/ipv6-link-local-support-in-browsers/contents.lr
Normal file
|
@ -0,0 +1,210 @@
|
||||||
|
title: Support for IPv6 link local addresses in browsers
|
||||||
|
---
|
||||||
|
pub_date: 2021-06-14
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: yes
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
Tracking the progress of browser support for link local addresses
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
Link Local addresses
|
||||||
|
([fe80::/10](https://en.wikipedia.org/wiki/Link-local_address)) are
|
||||||
|
used for addressing devices in your local subnet. They can be
|
||||||
|
automatically generated and using the IPv6 multicast address
|
||||||
|
**ff02::1**, all hosts on the local subnet can easily be located.
|
||||||
|
|
||||||
|
However browsers like Chrome or Firefox do not support **entering link
|
||||||
|
local addresses inside a URL**, which prevents accessing devices
|
||||||
|
locally with a browser, for instance for configuring them.
|
||||||
|
|
||||||
|
Link local addresses need **zone identifiers** to specify which
|
||||||
|
network device to use as an outgoing interface. This is because
|
||||||
|
**you have link local addresses on every interface** and your network
|
||||||
|
stack does not know on its own, which interface to use. So typically a
|
||||||
|
link local address is something on the line of
|
||||||
|
**fe80::fae4:e3ff:fee2:37a4%eth0**, where **eth0** is the zone
|
||||||
|
identifier.
|
||||||
|
|
||||||
|
Them problem is becoming more emphasised, as the world is moving more
|
||||||
|
and more towards **IPv6 only networks**.
|
||||||
|
|
||||||
|
You might not even know the address of your network equipment anymore,
|
||||||
|
but you can easily locate iit using the **ff02::1 multicast
|
||||||
|
address**. So we need support in browsers, to allow network
|
||||||
|
configurations.
|
||||||
|
|
||||||
|
## Status of implementation
|
||||||
|
|
||||||
|
The main purpose of this document is to track the status of the
|
||||||
|
link-local address support in the different browsers and related
|
||||||
|
standards. The current status is:
|
||||||
|
|
||||||
|
* Firefox says whatwg did not define it
|
||||||
|
* Whatwg says zone id is intentionally omitted and and reference w3.org
|
||||||
|
* w3.org has a longer reasoning, but it basically boils down to
|
||||||
|
"Firefox and chrome don't do it and it's complicated and nobody needs it"
|
||||||
|
* Chromium says it seems not to be worth the effort
|
||||||
|
|
||||||
|
Given that chain of events, if either Firefox, Chrome, W3.org or
|
||||||
|
Whatwg where to add support for it, it seems likely that the others
|
||||||
|
would be following.
|
||||||
|
|
||||||
|
## IPv6 link local address support in Firefox
|
||||||
|
|
||||||
|
The progress of IPv6 link local addresses for Firefox is tracked
|
||||||
|
on [the mozilla
|
||||||
|
bugzilla](https://bugzilla.mozilla.org/show_bug.cgi?id=700999). The
|
||||||
|
current situation is that Firefox references to the lack of
|
||||||
|
standardisation by whatwg as a reason for not implementing it. Quoting
|
||||||
|
Valentin Gosu from the Mozilla team:
|
||||||
|
|
||||||
|
```
|
||||||
|
The main reason the zone identifier is not supported in Firefox is
|
||||||
|
that parsing URLs is hard. You'd think we can just pass whatever
|
||||||
|
string to the system API and it will work or fail depending on whether
|
||||||
|
it's valid or not, but that's not the case. In bug 1199430 for example
|
||||||
|
it was apparent that we need to make sure that the hostname string is
|
||||||
|
really valid before passing it to the OS.
|
||||||
|
|
||||||
|
I have no reason to oppose zone identifiers in URLs as long as the URL
|
||||||
|
spec defines how to parse them. As such, I encourage you to engage
|
||||||
|
with the standard at https://github.com/whatwg/url/issues/392 instead
|
||||||
|
of here.
|
||||||
|
|
||||||
|
Thank you!
|
||||||
|
```
|
||||||
|
|
||||||
|
## IPv6 link local address support in whatwg
|
||||||
|
|
||||||
|
The situation at [whatwg](https://whatwg.org/) is that there is a
|
||||||
|
[closed bug report on github](https://github.com/whatwg/url/issues/392)
|
||||||
|
and [in the spec it says](https://url.spec.whatwg.org/#concept-ipv6)
|
||||||
|
that
|
||||||
|
|
||||||
|
Support for <zone_id> is intentionally omitted.
|
||||||
|
|
||||||
|
That paragraph links to a bug registered at w3.org (see next chapter).
|
||||||
|
|
||||||
|
|
||||||
|
## IPv6 link local address support at w3.org
|
||||||
|
|
||||||
|
At [w3.org](https://www.w3.org/) there is a
|
||||||
|
bug titled
|
||||||
|
[Support IPv6 link-local
|
||||||
|
addresses?](https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2)
|
||||||
|
that is set to status **RESOLVED WONTFIX**. It is closed basically
|
||||||
|
based on the following statement from Ryan Sleevi:
|
||||||
|
|
||||||
|
```
|
||||||
|
Yes, we're especially not keen to support these in Chrome and have
|
||||||
|
repeatedly decided not to. The platform-specific nature of <zone_id>
|
||||||
|
makes it difficult to impossible to validate the well-formedness of
|
||||||
|
the URL (see https://tools.ietf.org/html/rfc4007#section-11.2 , as
|
||||||
|
referenced in 6874, to fully appreciate this special hell). Even if we
|
||||||
|
could reliably parse these (from a URL spec standpoint), it then has
|
||||||
|
to be handed 'somewhere', and that opens a new can of worms.
|
||||||
|
|
||||||
|
Even 6874 notes how unlikely it is to encounter these in practice -
|
||||||
|
"Thus, URIs including a
|
||||||
|
ZoneID are unlikely to be encountered in HTML documents. However, if
|
||||||
|
they do (for example, in a diagnostic script coded in HTML), it would
|
||||||
|
be appropriate to treat them exactly as above."
|
||||||
|
|
||||||
|
Note that a 'dumb' parser may not be sufficient, as the Security Considerations of 6874 note:
|
||||||
|
"To limit this risk, implementations MUST NOT allow use of this format
|
||||||
|
except for well-defined usages, such as sending to link-local
|
||||||
|
addresses under prefix fe80::/10. At the time of writing, this is
|
||||||
|
the only well-defined usage known."
|
||||||
|
|
||||||
|
And also
|
||||||
|
"An HTTP client, proxy, or other intermediary MUST remove any ZoneID
|
||||||
|
attached to an outgoing URI, as it has only local significance at the
|
||||||
|
sending host."
|
||||||
|
|
||||||
|
This requires a transformative rewrite of any URLs going out the
|
||||||
|
wire. That's pretty substantial. Anne, do you recall the bug talking
|
||||||
|
about IP canonicalization (e.g. http://127.0.0.1 vs
|
||||||
|
http://[::127.0.0.1] vs http://012345 and friends?) This is
|
||||||
|
conceptually a similar issue - except it's explicitly required in the
|
||||||
|
context of <zone_id> that the <zone_id> not be emitted.
|
||||||
|
|
||||||
|
There's also the issue that zone_id precludes/requires the use of APIs
|
||||||
|
that user agents would otherwise prefer to avoid, in order to
|
||||||
|
'properly' handle the zone_id interpretation. For example, Chromium on
|
||||||
|
some platforms uses a built in DNS resolver, and so our address lookup
|
||||||
|
functions would need to define and support <zone_id>'s and map them to
|
||||||
|
system concepts. In doing so, you could end up with weird situations
|
||||||
|
where a URL works in Firefox but not Chrome, even though both
|
||||||
|
'hypothetically' supported <zone_id>'s, because FF may use an OS
|
||||||
|
routine and Chrome may use a built-in routine and they diverge.
|
||||||
|
|
||||||
|
Overall, our internal consensus is that <zone_id>'s are bonkers on
|
||||||
|
many grounds - the technical ambiguity (and RFC 6874 doesn't really
|
||||||
|
resolve the ambiguity as much as it fully owns it and just says
|
||||||
|
#YOLOSWAG) - and supporting them would add a lot of complexity for
|
||||||
|
what is explicitly and admittedly a limited value use case.
|
||||||
|
```
|
||||||
|
|
||||||
|
This bug references the Mozilla Firefox bug above and
|
||||||
|
[RFC3986 (replaced by RFC
|
||||||
|
6874)](https://datatracker.ietf.org/doc/html/rfc6874#section-2).
|
||||||
|
|
||||||
|
## IPv6 link local address support in Chrome / Chromium
|
||||||
|
|
||||||
|
On the chrome side there is a
|
||||||
|
[huge bug
|
||||||
|
report](https://bugs.chromium.org/p/chromium/issues/detail?id=70762)
|
||||||
|
which again references a huge number of other bugs that try to request
|
||||||
|
IPv6 link local support, too.
|
||||||
|
|
||||||
|
The bug was closed by cbentzel@chromium.org stating:
|
||||||
|
|
||||||
|
```
|
||||||
|
There are a large number of special cases which are required on core
|
||||||
|
networking/navigation/etc. and it does not seem like it is worth the
|
||||||
|
up-front and ongoing maintenance costs given that this is a very
|
||||||
|
niche - albeit legitimate - need.
|
||||||
|
```
|
||||||
|
|
||||||
|
The bug at chromium has been made un-editable so it is basically
|
||||||
|
frozen, besides people have added suggestions to the ticket on how to
|
||||||
|
solve it.
|
||||||
|
|
||||||
|
## Work Arounds
|
||||||
|
|
||||||
|
### IPv6 link local connect hack
|
||||||
|
|
||||||
|
Peter has [documented on the IPv6 link local connect
|
||||||
|
hack](https://website.peterjin.org/wiki/Snippets:IPv6_link_local_connect_hack)
|
||||||
|
to make firefox use **fe90:0:[scope id]:[IP address]** to reach
|
||||||
|
**fe80::[IP address]%[scope id]**. Checkout his website for details!
|
||||||
|
|
||||||
|
### IPv6 hack using ip6tables
|
||||||
|
|
||||||
|
Also from Peter is the hint that you can also use newer iptable
|
||||||
|
versions to achieve a similar mapping:
|
||||||
|
|
||||||
|
"On modern Linux kernels you can also run
|
||||||
|
|
||||||
|
```ip6tables -t nat -A OUTPUT -d fef0::/64 -j NETMAP --to fe80::/64```
|
||||||
|
|
||||||
|
if you have exactly one outbound interface, so that fef0::1 translates
|
||||||
|
to fe80::1"
|
||||||
|
|
||||||
|
Thanks again for the pointer!
|
||||||
|
|
||||||
|
## Other resources
|
||||||
|
|
||||||
|
If you are aware of other resources regarding IPv6 link local support
|
||||||
|
in browsers, please join the [IPv6.chat](https://IPv6.chat) and let us
|
||||||
|
know about it.
|
144
content/u/blog/kubernetes-dns-entries-nat64/contents.lr
Normal file
144
content/u/blog/kubernetes-dns-entries-nat64/contents.lr
Normal file
|
@ -0,0 +1,144 @@
|
||||||
|
title: Automatic A and AAAA DNS entries with NAT64 for kubernetes?
|
||||||
|
---
|
||||||
|
pub_date: 2021-06-24
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: yes
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
Given a kubernetes cluster and NAT64 - how do you create DNS entries?
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## The DNS kubernetes quiz
|
||||||
|
|
||||||
|
Today our blog entry does not (yet) show a solution, but more a tricky
|
||||||
|
quiz on creating DNS entries. The problem to solve is the following:
|
||||||
|
|
||||||
|
* How to make every IPv6 only service in kubernetes also IPv4
|
||||||
|
reachable?
|
||||||
|
|
||||||
|
Let's see who can solve it first or the prettiest. Below are some
|
||||||
|
thoughts on how to approach this problem.
|
||||||
|
|
||||||
|
## The situation
|
||||||
|
|
||||||
|
Assume your kubernetes cluster is IPv6 only and all services
|
||||||
|
have proper AAAA DNS entries. This allows you
|
||||||
|
[to directly receive traffic from the
|
||||||
|
Internet](/u/blog/kubernetes-without-ingress/) to
|
||||||
|
your kubernetes services.
|
||||||
|
|
||||||
|
Now to make that service also IPv4 reachable, we can deploy NAT64
|
||||||
|
service that maps an IPv4 address outside the cluster to an IPv6 service
|
||||||
|
address inside the cluster:
|
||||||
|
|
||||||
|
```
|
||||||
|
A.B.C.D --> 2001:db8::1
|
||||||
|
```
|
||||||
|
|
||||||
|
So all traffic to that IPv4 address is converted to IPv6 by the
|
||||||
|
external NAT64 translator.
|
||||||
|
|
||||||
|
## The proxy service
|
||||||
|
|
||||||
|
Let's say the service running on 2001:db8::1 is named "ipv4-proxy" and
|
||||||
|
thus reachable at ipv4-proxy.default.svc.example.com.
|
||||||
|
|
||||||
|
What we want to achieve is to expose every possible service
|
||||||
|
inside the cluster **also via IPv4**. For this purpose we have created
|
||||||
|
an haproxy container that access *.svc.example.com and forwards it via
|
||||||
|
IPv6.
|
||||||
|
|
||||||
|
So the actual flow would look like:
|
||||||
|
|
||||||
|
```
|
||||||
|
IPv4 client --[ipv4]--> NAT64 -[ipv6]-> proxy service
|
||||||
|
|
|
||||||
|
|
|
||||||
|
v
|
||||||
|
IPv6 client ---------------------> kubernetes service
|
||||||
|
```
|
||||||
|
|
||||||
|
## The DNS dilemma
|
||||||
|
|
||||||
|
It would be very tempting to create a wildcard DNS entry or to
|
||||||
|
configure/patch CoreDNS to also include an A entry for every service
|
||||||
|
that is:
|
||||||
|
|
||||||
|
```
|
||||||
|
*.svc IN A A.B.C.D
|
||||||
|
```
|
||||||
|
|
||||||
|
So essentially all services resolve to the IPv4 address A.B.C.D. That
|
||||||
|
however would also influence the kubernetes cluster, as pods
|
||||||
|
potentially resolve A entries (not only AAAA) as well.
|
||||||
|
|
||||||
|
As the containers / pods do not have any IPv4 address (nor IPv4
|
||||||
|
routing), access to IPv4 is not possible. There are various outcomes
|
||||||
|
of this situation:
|
||||||
|
|
||||||
|
1. The software in the container does happy eyeballs and tries both
|
||||||
|
A/AAAA and uses the working IPv6 connection.
|
||||||
|
|
||||||
|
2. The software in the container misbehaves and takes the first record
|
||||||
|
and uses IPv4 (nodejs is known to have or had a broken resolver
|
||||||
|
that did exactly that).
|
||||||
|
|
||||||
|
So adding that wildcard might not be the smartest option. And
|
||||||
|
additionally it is unclear whether coreDNS would support that.
|
||||||
|
|
||||||
|
## Alternative automatic DNS entries
|
||||||
|
|
||||||
|
The *.svc names in a kubernetes cluster are special in the sense that
|
||||||
|
they are used for connecting internally. What if coreDNS (or any other
|
||||||
|
DNS) server would instead of using *.svc, use a second subdomain like
|
||||||
|
*abc*.*namespace*.v4andv6.example.com and generate the same AAAA
|
||||||
|
record as for the service and a static A record like describe above?
|
||||||
|
|
||||||
|
That could solve the problem. But again, does coreDNS support that?
|
||||||
|
|
||||||
|
## Automated DNS entries in other zones
|
||||||
|
|
||||||
|
Instead of fully automated creating the entries as above, another
|
||||||
|
option would be to specify DNS entries via annotations in a totally
|
||||||
|
different zone, if coreDNS was supporting this. So let's say we also
|
||||||
|
have control over example.org and we could instruct coreDNS to create
|
||||||
|
the following entries automatically with an annotation:
|
||||||
|
|
||||||
|
```
|
||||||
|
abc.something.example.org AAAA <same as the service IP>
|
||||||
|
abc.something.example.org A <a static IPv4 address A.B.C.D>
|
||||||
|
```
|
||||||
|
|
||||||
|
In theory this might be solved via some scripting, maybe via a DNS
|
||||||
|
server like powerDNS?
|
||||||
|
|
||||||
|
## Alternative solution with BIND
|
||||||
|
|
||||||
|
The bind DNS server, which is not usually deployed in a kubernetes
|
||||||
|
cluster, supports **views**. Views enable different replies to the
|
||||||
|
same query depending on the source IP address. Thus in theory
|
||||||
|
something like that could be done, assuming a secondary zone
|
||||||
|
*example.org*:
|
||||||
|
|
||||||
|
* If the request comes from the kubernetes cluster, return a CNAME
|
||||||
|
back to example.com.
|
||||||
|
* If the request comes from outside the kubernetes cluster, return an
|
||||||
|
A entry with the static IP
|
||||||
|
* Unsolved: how to match on the AAAA entries (because we don't CNAME
|
||||||
|
with the added A entry)
|
||||||
|
|
||||||
|
|
||||||
|
## Other solution?
|
||||||
|
|
||||||
|
As you can see, mixing the dynamic IP generation and coupling it with
|
||||||
|
static DNS entries for IPv4 resolution is not the easiest tasks. If
|
||||||
|
you have a smart idea on how to solve this without manually creating
|
||||||
|
entries for each and every service,
|
||||||
|
[give us a shout!](/u/contact)
|
|
@ -0,0 +1,227 @@
|
||||||
|
title: Making kubernetes kube-dns publicly reachable
|
||||||
|
---
|
||||||
|
pub_date: 2021-06-13
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: yes
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
Looking into IPv6 only DNS provided by kubernetes
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
If you have seen our
|
||||||
|
[article about running kubernetes
|
||||||
|
Ingress-less](/u/blog/kubernetes-without-ingress/), you are aware that
|
||||||
|
we are pushing IPv6 only kubernetes clusters at ungleich.
|
||||||
|
|
||||||
|
Today, we are looking at making the "internal" kube-dns service world
|
||||||
|
reachable using IPv6 and global DNS servers.
|
||||||
|
|
||||||
|
## The kubernetes DNS service
|
||||||
|
|
||||||
|
If you have a look at your typical k8s cluster, you will notice that
|
||||||
|
you usually have two coredns pods running:
|
||||||
|
|
||||||
|
```
|
||||||
|
% kubectl -n kube-system get pods -l k8s-app=kube-dns
|
||||||
|
NAME READY STATUS RESTARTS AGE
|
||||||
|
coredns-558bd4d5db-gz5c7 1/1 Running 0 6d
|
||||||
|
coredns-558bd4d5db-hrzhz 1/1 Running 0 6d
|
||||||
|
```
|
||||||
|
|
||||||
|
These pods are usually served by the **kube-dns** service:
|
||||||
|
|
||||||
|
```
|
||||||
|
% kubectl -n kube-system get svc -l k8s-app=kube-dns
|
||||||
|
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||||
|
kube-dns ClusterIP 2a0a:e5c0:13:e2::a <none> 53/UDP,53/TCP,9153/TCP 6d1h
|
||||||
|
```
|
||||||
|
|
||||||
|
As you can see, the kube-dns service is running on a publicly
|
||||||
|
reachable IPv6 address.
|
||||||
|
|
||||||
|
## IPv6 only DNS
|
||||||
|
|
||||||
|
IPv6 only DNS servers have one drawback: they cannot be reached via DNS
|
||||||
|
recursions, if the resolver is IPv4 only.
|
||||||
|
|
||||||
|
At [ungleich we run a variety of
|
||||||
|
services](https://redmine.ungleich.ch/projects/open-infrastructure/wiki)
|
||||||
|
to make IPv6 only services usable in the real world. In case of DNS,
|
||||||
|
we are using **DNS forwarders**. They are acting similar to HTTP
|
||||||
|
proxies, but for DNS.
|
||||||
|
|
||||||
|
So in our main DNS servers, dns1.ungleich.ch, dns2.ungleich.ch
|
||||||
|
and dns3.ungleich.ch we have added the following configuration:
|
||||||
|
|
||||||
|
```
|
||||||
|
zone "k8s.place7.ungleich.ch" {
|
||||||
|
type forward;
|
||||||
|
forward only;
|
||||||
|
forwarders { 2a0a:e5c0:13:e2::a; };
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
This tells the DNS servers to forward DNS queries that come in for
|
||||||
|
k8s.place7.ungleich.ch to **2a0a:e5c0:13:e2::a**.
|
||||||
|
|
||||||
|
Additionally we have added **DNS delegation** in the
|
||||||
|
place7.ungleich.ch zone:
|
||||||
|
|
||||||
|
```
|
||||||
|
k8s NS dns1.ungleich.ch.
|
||||||
|
k8s NS dns2.ungleich.ch.
|
||||||
|
k8s NS dns3.ungleich.ch.
|
||||||
|
```
|
||||||
|
|
||||||
|
## Using the kubernetes DNS service in the wild
|
||||||
|
|
||||||
|
With this configuration, we can now access IPv6 only
|
||||||
|
kubernetes services directly from the Internet. Let's first discover
|
||||||
|
the kube-dns service itself:
|
||||||
|
|
||||||
|
```
|
||||||
|
% dig kube-dns.kube-system.svc.k8s.place7.ungleich.ch. aaaa
|
||||||
|
|
||||||
|
; <<>> DiG 9.16.16 <<>> kube-dns.kube-system.svc.k8s.place7.ungleich.ch. aaaa
|
||||||
|
;; global options: +cmd
|
||||||
|
;; Got answer:
|
||||||
|
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23274
|
||||||
|
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
|
||||||
|
|
||||||
|
;; OPT PSEUDOSECTION:
|
||||||
|
; EDNS: version: 0, flags:; udp: 4096
|
||||||
|
; COOKIE: f61925944f5218c9ac21e43960c64f254792e60f2b10f3f5 (good)
|
||||||
|
;; QUESTION SECTION:
|
||||||
|
;kube-dns.kube-system.svc.k8s.place7.ungleich.ch. IN AAAA
|
||||||
|
|
||||||
|
;; ANSWER SECTION:
|
||||||
|
kube-dns.kube-system.svc.k8s.place7.ungleich.ch. 27 IN AAAA 2a0a:e5c0:13:e2::a
|
||||||
|
|
||||||
|
;; AUTHORITY SECTION:
|
||||||
|
k8s.place7.ungleich.ch. 13 IN NS kube-dns.kube-system.svc.k8s.place7.ungleich.ch.
|
||||||
|
```
|
||||||
|
|
||||||
|
As you can see, the **kube-dns** service in the **kube-system**
|
||||||
|
namespace resolves to 2a0a:e5c0:13:e2::a, which is exactly what we
|
||||||
|
have configured.
|
||||||
|
|
||||||
|
At the moment, there is also an etherpad test service
|
||||||
|
named "ungleich-etherpad" running:
|
||||||
|
|
||||||
|
```
|
||||||
|
% kubectl get svc -l app=ungleichetherpad
|
||||||
|
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||||
|
ungleich-etherpad ClusterIP 2a0a:e5c0:13:e2::b7db <none> 9001/TCP 3d19h
|
||||||
|
```
|
||||||
|
|
||||||
|
Let's first verify that it resolves:
|
||||||
|
|
||||||
|
```
|
||||||
|
% dig +short ungleich-etherpad.default.svc.k8s.place7.ungleich.ch aaaa
|
||||||
|
2a0a:e5c0:13:e2::b7db
|
||||||
|
```
|
||||||
|
|
||||||
|
And if that works, well, then we should also be able to access the
|
||||||
|
service itself!
|
||||||
|
|
||||||
|
```
|
||||||
|
% curl -I http://ungleich-etherpad.default.svc.k8s.place7.ungleich.ch:9001/
|
||||||
|
HTTP/1.1 200 OK
|
||||||
|
X-Powered-By: Express
|
||||||
|
X-UA-Compatible: IE=Edge,chrome=1
|
||||||
|
Referrer-Policy: same-origin
|
||||||
|
Content-Type: text/html; charset=utf-8
|
||||||
|
Content-Length: 6039
|
||||||
|
ETag: W/"1797-Dq3+mr7XP0PQshikMNRpm5RSkGA"
|
||||||
|
Set-Cookie: express_sid=s%3AZGKdDe3FN1v5UPcS-7rsZW7CeloPrQ7p.VaL1V0M4780TBm8bT9hPVQMWPX5Lcte%2BzotO9Lsejlk; Path=/; HttpOnly; SameSite=Lax
|
||||||
|
Date: Sun, 13 Jun 2021 18:36:23 GMT
|
||||||
|
Connection: keep-alive
|
||||||
|
Keep-Alive: timeout=5
|
||||||
|
```
|
||||||
|
|
||||||
|
(attention, this is a test service and might not be running when you
|
||||||
|
read this article at a later time)
|
||||||
|
|
||||||
|
## IPv6 vs. IPv4
|
||||||
|
|
||||||
|
Could we have achived the same with IPv4? The answere here is "maybe":
|
||||||
|
If the kubernetes service is reachable from globally reachable
|
||||||
|
nameservers via IPv4, then the answer is yes. This could be done via
|
||||||
|
public IPv4 addresses in the kubernetes cluster, via tunnels, VPNs,
|
||||||
|
etc.
|
||||||
|
|
||||||
|
However, generally speaking, the DNS service of a
|
||||||
|
kubernetes cluster running on RFC1918 IP addresses, is probably not
|
||||||
|
reachable from globally reachable DNS servers by default.
|
||||||
|
|
||||||
|
For IPv6 the case is a bit different: we are using globally reachable
|
||||||
|
IPv6 addresses in our k8s clusters, so they can potentially be
|
||||||
|
reachable without the need of any tunnel or whatsoever. Firewalling
|
||||||
|
and network policies can obviously prevent access, but if the IP
|
||||||
|
addresses are properly routed, they will be accessible from the public
|
||||||
|
Internet.
|
||||||
|
|
||||||
|
And this makes things much easier for DNS servers, which are also
|
||||||
|
having IPv6 connectivity.
|
||||||
|
|
||||||
|
The following pictures shows the practical difference between the two
|
||||||
|
approaches:
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
## Does this make sense?
|
||||||
|
|
||||||
|
That clearly depends on your use-case. If you want your service DNS
|
||||||
|
records to be publicly accessible, then the clear answer is yes.
|
||||||
|
|
||||||
|
If your cluster services are intended to be internal only
|
||||||
|
(see [previous blog post](/u/blog/kubernetes-without-ingress/), then
|
||||||
|
exposing the DNS service to the world might not be the best option.
|
||||||
|
|
||||||
|
## Note on security
|
||||||
|
|
||||||
|
CoreDNS inside kubernetes is by default configured to allow resolving
|
||||||
|
for *any* client that can reach it. Thus if you make your kube-dns
|
||||||
|
service world reachable, you also turn it into an open resolver.
|
||||||
|
|
||||||
|
At the time of writing this blog article, the following coredns
|
||||||
|
configuration **does NOT** correctly block requests:
|
||||||
|
|
||||||
|
```
|
||||||
|
Corefile: |
|
||||||
|
.:53 {
|
||||||
|
acl k8s.place7.ungleich.ch {
|
||||||
|
allow net ::/0
|
||||||
|
}
|
||||||
|
acl . {
|
||||||
|
allow net 2a0a:e5c0:13::/48
|
||||||
|
block
|
||||||
|
}
|
||||||
|
forward . /etc/resolv.conf {
|
||||||
|
max_concurrent 1000
|
||||||
|
}
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Until this is solved, we recommend to place a firewall before your
|
||||||
|
public kube-dns service to only allow requests from the forwarding DNS
|
||||||
|
servers.
|
||||||
|
|
||||||
|
|
||||||
|
## More of this
|
||||||
|
|
||||||
|
We are discussing
|
||||||
|
kubernetes and IPv6 related topics in
|
||||||
|
**the #hacking:ungleich.ch Matrix channel**
|
||||||
|
([you can signup here if you don't have an
|
||||||
|
account](https://chat.with.ungleich.ch)) and will post more about our
|
||||||
|
k8s journey in this blog. Stay tuned!
|
122
content/u/blog/kubernetes-network-planning-with-ipv6/contents.lr
Normal file
122
content/u/blog/kubernetes-network-planning-with-ipv6/contents.lr
Normal file
|
@ -0,0 +1,122 @@
|
||||||
|
title: Kubernetes Network planning with IPv6
|
||||||
|
---
|
||||||
|
pub_date: 2021-06-26
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
Learn which networks are good to use with kubernetes
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
While IPv6 has a huge address space, you will need to specify a
|
||||||
|
**podCidr** (the network for the pods) and a **serviceCidr** (the
|
||||||
|
network for the services) for kubernetes. In this blog article we show
|
||||||
|
our findings and give a recommendation on what are the "most sensible"
|
||||||
|
networks to use for kubernetes.
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
|
||||||
|
## Kubernetes limitations
|
||||||
|
|
||||||
|
In a typical IPv6 network, you would "just assign a /64" to anything
|
||||||
|
that needs to be a network. It is a bit the IPv6-no-brainer way of
|
||||||
|
handling networking.
|
||||||
|
|
||||||
|
However, kubernetes has a limitation:
|
||||||
|
[the serviceCidr cannot be bigger than a /108 at the
|
||||||
|
moment](https://github.com/kubernetes/kubernetes/pull/90115).
|
||||||
|
This is something very atypical for the IPv6 world, but nothing we
|
||||||
|
cannot handle. There are various pull requests and issues to fix this
|
||||||
|
behaviour on github, some of them listed below:
|
||||||
|
|
||||||
|
* https://github.com/kubernetes/enhancements/pull/1534
|
||||||
|
* https://github.com/kubernetes/kubernetes/pull/79993
|
||||||
|
* https://github.com/kubernetes/kubernetes/pull/90115 (this one is
|
||||||
|
quite interesting to read)
|
||||||
|
|
||||||
|
That said, it is possible to use a /64 for the **podCidr**.
|
||||||
|
|
||||||
|
## The "correct way" without the /108 limitation
|
||||||
|
|
||||||
|
If kubernetes did not have this limitation, our recommendation would
|
||||||
|
be to use one /64 for the podCidr and one /64 for the serviceCidr. If
|
||||||
|
in the future the limitations of kubernetes have been lifted, skip
|
||||||
|
reading this article and just use two /64's.
|
||||||
|
|
||||||
|
Do not be tempted to suggest making /108's the default, even if they
|
||||||
|
"have enough space", because using /64's allows you to stay in much
|
||||||
|
easier network plans.
|
||||||
|
|
||||||
|
## Sanity checking the /108
|
||||||
|
|
||||||
|
To be able to plan kubernetes clusters, it is important to know where
|
||||||
|
they should live, especially if you plan having a lot of kubernetes
|
||||||
|
clusters. Let's have a short look at the /108 network limitation:
|
||||||
|
|
||||||
|
A /108 allows 20 bit to be used for generating addresses, or a total
|
||||||
|
of 1048576 hosts. This is probably enough for the number of services
|
||||||
|
in a cluster. Now, can we be consistent and also use a /108 for the
|
||||||
|
podCidr? Let's assume for the moment that we do exactly that, so we
|
||||||
|
run a maximum of 1048576 pods at the same time. Assuming each service
|
||||||
|
consumes on average 4 pods, this would allow one to run 262144
|
||||||
|
services.
|
||||||
|
|
||||||
|
Assuming each pod uses around 0.1 CPUs and 100Mi RAM, if all pods were
|
||||||
|
to run at the same time, you would need ca. 100'000 CPUs and 100 TB
|
||||||
|
RAM. Assuming further that each node contains at maximum 128 CPUs and
|
||||||
|
at maximum 1 TB RAM (quite powerful servers), we would need more than
|
||||||
|
750 servers just for the CPUs.
|
||||||
|
|
||||||
|
So we can reason that **we can** run kubernetes clusters of quite some
|
||||||
|
size even with a **podCidr of /108**.
|
||||||
|
|
||||||
|
## Organising /108's
|
||||||
|
|
||||||
|
Let's assume that we organise all our kubernetes clusters in a single
|
||||||
|
/64, like 2001:db8:1:2::/64, which looks like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
% sipcalc 2001:db8:1:2::/64
|
||||||
|
-[ipv6 : 2001:db8:1:2::/64] - 0
|
||||||
|
|
||||||
|
[IPV6 INFO]
|
||||||
|
Expanded Address - 2001:0db8:0001:0002:0000:0000:0000:0000
|
||||||
|
Compressed address - 2001:db8:1:2::
|
||||||
|
Subnet prefix (masked) - 2001:db8:1:2:0:0:0:0/64
|
||||||
|
Address ID (masked) - 0:0:0:0:0:0:0:0/64
|
||||||
|
Prefix address - ffff:ffff:ffff:ffff:0:0:0:0
|
||||||
|
Prefix length - 64
|
||||||
|
Address type - Aggregatable Global Unicast Addresses
|
||||||
|
Network range - 2001:0db8:0001:0002:0000:0000:0000:0000 -
|
||||||
|
2001:0db8:0001:0002:ffff:ffff:ffff:ffff
|
||||||
|
```
|
||||||
|
|
||||||
|
A /108 network on the other hand looks like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
% sipcalc 2001:db8:1:2::/108
|
||||||
|
-[ipv6 : 2001:db8:1:2::/108] - 0
|
||||||
|
|
||||||
|
[IPV6 INFO]
|
||||||
|
Expanded Address - 2001:0db8:0001:0002:0000:0000:0000:0000
|
||||||
|
Compressed address - 2001:db8:1:2::
|
||||||
|
Subnet prefix (masked) - 2001:db8:1:2:0:0:0:0/108
|
||||||
|
Address ID (masked) - 0:0:0:0:0:0:0:0/108
|
||||||
|
Prefix address - ffff:ffff:ffff:ffff:ffff:ffff:fff0:0
|
||||||
|
Prefix length - 108
|
||||||
|
Address type - Aggregatable Global Unicast Addresses
|
||||||
|
Network range - 2001:0db8:0001:0002:0000:0000:0000:0000 -
|
||||||
|
2001:0db8:0001:0002:0000:0000:000f:ffff
|
||||||
|
```
|
||||||
|
|
||||||
|
Assuming for a moment that we assign a /108, this looks as follows:
|
70
content/u/blog/kubernetes-production-cluster-1/contents.lr
Normal file
70
content/u/blog/kubernetes-production-cluster-1/contents.lr
Normal file
|
@ -0,0 +1,70 @@
|
||||||
|
title: ungleich production cluster #1
|
||||||
|
---
|
||||||
|
pub_date: 2021-07-05
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
In this blog article we describe our way to our first production
|
||||||
|
kubernetes cluster.
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
This article is WIP to describe all steps required for our first
|
||||||
|
production kubernetes cluster and the services that we run in it.
|
||||||
|
|
||||||
|
## Setup
|
||||||
|
|
||||||
|
### Bootstrapping
|
||||||
|
|
||||||
|
* All nodes are running [Alpine Linux](https://alpinelinux.org)
|
||||||
|
* All nodes are configured using [cdist](https://cdi.st)
|
||||||
|
* Mainly installing kubeadm, kubectl, crio *and* docker
|
||||||
|
* At the moment we try to use crio
|
||||||
|
* The cluster is initalised using **kubeadm init --config
|
||||||
|
k8s/c2/kubeadm.yaml** from the [ungleich-k8s repo](https://code.ungleich.ch/ungleich-public/ungleich-k8s)
|
||||||
|
|
||||||
|
### CNI/Networking
|
||||||
|
|
||||||
|
* Calico is installed using **kubectl apply -f
|
||||||
|
cni-calico/calico.yaml** from the [ungleich-k8s
|
||||||
|
repo](https://code.ungleich.ch/ungleich-public/ungleich-k8s)
|
||||||
|
* Installing calicoctl using **kubectl apply -f
|
||||||
|
https://docs.projectcalico.org/manifests/calicoctl.yaml**
|
||||||
|
* Aliasing calicoctl: **alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"**
|
||||||
|
* All nodes BGP peer with our infrastructure using **calicoctl create -f - < cni-calico/bgp-c2.yaml**
|
||||||
|
|
||||||
|
### Persistent Volume Claim support
|
||||||
|
|
||||||
|
* Provided by rook
|
||||||
|
* Using customized manifests to support IPv6 from ungleich-k8s
|
||||||
|
|
||||||
|
```
|
||||||
|
for yaml in crds common operator cluster storageclass-cephfs storageclass-rbd toolbox; do
|
||||||
|
kubectl apply -f ${yaml}.yaml
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
### Flux
|
||||||
|
|
||||||
|
Starting with the 2nd cluster?
|
||||||
|
|
||||||
|
|
||||||
|
## Follow up
|
||||||
|
|
||||||
|
If you are interesting in continuing the discussion,
|
||||||
|
we are there for you in
|
||||||
|
**the #kubernetes:ungleich.ch Matrix channel**
|
||||||
|
[you can signup here if you don't have an
|
||||||
|
account](https://chat.with.ungleich.ch).
|
||||||
|
|
||||||
|
Or if you are interested in an IPv6 only kubernetes cluster,
|
||||||
|
drop a mail to **support**-at-**ungleich.ch**.
|
201
content/u/blog/kubernetes-without-ingress/contents.lr
Normal file
201
content/u/blog/kubernetes-without-ingress/contents.lr
Normal file
|
@ -0,0 +1,201 @@
|
||||||
|
title: Building Ingress-less Kubernetes Clusters
|
||||||
|
---
|
||||||
|
pub_date: 2021-06-09
|
||||||
|
---
|
||||||
|
author: ungleich
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: yes
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
On [our journey to build and define IPv6 only kubernetes
|
||||||
|
clusters](https://www.nico.schottelius.org/blog/k8s-ipv6-only-cluster/)
|
||||||
|
we came accross some principles that seem awkward in the IPv6 only
|
||||||
|
world. Let us today have a look at the *LoadBalancer* and *Ingress*
|
||||||
|
concepts.
|
||||||
|
|
||||||
|
## Ingress
|
||||||
|
|
||||||
|
Let's have a look at the [Ingress
|
||||||
|
definition](https://kubernetes.io/docs/concepts/services-networking/ingress/)
|
||||||
|
definiton from the kubernetes website:
|
||||||
|
|
||||||
|
```
|
||||||
|
Ingress exposes HTTP and HTTPS routes from outside the cluster to
|
||||||
|
services within the cluster. Traffic routing is controlled by rules
|
||||||
|
defined on the Ingress resource.
|
||||||
|
```
|
||||||
|
|
||||||
|
So the ingress basically routes from outside to inside. But, in the
|
||||||
|
IPv6 world, services are already publicly reachable. It just
|
||||||
|
depends on your network policy.
|
||||||
|
|
||||||
|
### Update 2021-06-13: Ingress vs. Service
|
||||||
|
|
||||||
|
As some people pointed out (thanks a lot!), a public service is
|
||||||
|
**not the same** as an Ingress. Ingress has also the possibility to
|
||||||
|
route based on layer 7 information like the path, domain name, etc.
|
||||||
|
|
||||||
|
However, if all of the traffic from an Ingress points to a single
|
||||||
|
IPv6 HTTP/HTTPS Service, effectively the IPv6 service will do the
|
||||||
|
same, with one hop less.
|
||||||
|
|
||||||
|
## Services
|
||||||
|
|
||||||
|
Let's have a look at how services in IPv6 only clusters look like:
|
||||||
|
|
||||||
|
```
|
||||||
|
% kubectl get svc
|
||||||
|
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||||
|
etherpad ClusterIP 2a0a:e5c0:13:e2::a94b <none> 9001/TCP 19h
|
||||||
|
nginx-service ClusterIP 2a0a:e5c0:13:e2::3607 <none> 80/TCP 43h
|
||||||
|
postgres ClusterIP 2a0a:e5c0:13:e2::c9e0 <none> 5432/TCP 19h
|
||||||
|
...
|
||||||
|
```
|
||||||
|
All these services are world reachable, depending on your network
|
||||||
|
policy.
|
||||||
|
|
||||||
|
## ServiceTypes
|
||||||
|
|
||||||
|
While we are at looking at the k8s primitives, let's have a closer
|
||||||
|
look at the **Service**, specifically at 3 of the **ServiceTypes**
|
||||||
|
supported by k8s, including it's definition:
|
||||||
|
|
||||||
|
### ClusterIP
|
||||||
|
|
||||||
|
The k8s website says
|
||||||
|
|
||||||
|
```
|
||||||
|
Exposes the Service on a cluster-internal IP. Choosing this value
|
||||||
|
makes the Service only reachable from within the cluster. This is the
|
||||||
|
default ServiceType.
|
||||||
|
```
|
||||||
|
|
||||||
|
So in the context of IPv6, this sounds wrong. There is nothing that
|
||||||
|
makes an global IPv6 address be "internal", besides possible network
|
||||||
|
policies. The concept is probably coming from the strict difference of
|
||||||
|
RFC1918 space usually used in k8s clusters and not public IPv4.
|
||||||
|
|
||||||
|
This difference does not make a lot of sense in the IPv6 world though.
|
||||||
|
Seeing **services as public by default**, makes much more sense.
|
||||||
|
And simplifies your clusters a lot.
|
||||||
|
|
||||||
|
### NodePort
|
||||||
|
|
||||||
|
Let's first have a look at the definition again:
|
||||||
|
|
||||||
|
```
|
||||||
|
Exposes the Service on each Node's IP at a static port (the
|
||||||
|
NodePort). A ClusterIP Service, to which the NodePort Service routes,
|
||||||
|
is automatically created. You'll be able to contact the NodePort
|
||||||
|
Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
|
||||||
|
```
|
||||||
|
|
||||||
|
Conceptually this can be similarily utilised in the IPv6 only world
|
||||||
|
like it does in the IPv4 world. However given that there are enough
|
||||||
|
addresses available with IPv6, this might not be such an interesting
|
||||||
|
ServiceType anymore.
|
||||||
|
|
||||||
|
|
||||||
|
### LoadBalancer
|
||||||
|
|
||||||
|
Before we have a look at this type, let's take some steps back
|
||||||
|
first to ...
|
||||||
|
|
||||||
|
|
||||||
|
## ... Load Balancing
|
||||||
|
|
||||||
|
There are a variety of possibilities to do load balancing. From simple
|
||||||
|
round robin, to ECMP based load balancing, to application aware,
|
||||||
|
potentially weighted load balancing.
|
||||||
|
|
||||||
|
So for load balancing, there is usually more than one solution and
|
||||||
|
there is likely not one size fits all.
|
||||||
|
|
||||||
|
So with this said, let.s have a look at the
|
||||||
|
**ServiceType LoadBalancer** definition:
|
||||||
|
|
||||||
|
```
|
||||||
|
Exposes the Service externally using a cloud provider's load
|
||||||
|
balancer. NodePort and ClusterIP Services, to which the external load
|
||||||
|
balancer routes, are automatically created.
|
||||||
|
```
|
||||||
|
|
||||||
|
So whatever the cloud provider offers, can be used, and that is a good
|
||||||
|
thing. However, let's have a look at how you get load balancing for
|
||||||
|
free in IPv6 only clusters:
|
||||||
|
|
||||||
|
## Load Balancing in IPv6 only clusters
|
||||||
|
|
||||||
|
So what is the most easy way of reliable load balancing in network?
|
||||||
|
[ECMP (equal cost multi path)](https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing)
|
||||||
|
comes to the mind right away. Given that
|
||||||
|
kubernetes nodes can BGP peer with the network (upstream or the
|
||||||
|
switches), this basically gives load balancing to the world for free:
|
||||||
|
|
||||||
|
```
|
||||||
|
[ The Internet ]
|
||||||
|
|
|
||||||
|
[ k8s-node-1 ]-----------[ network ]-----------[ k8s-node-n]
|
||||||
|
[ ECMP ]
|
||||||
|
|
|
||||||
|
[ k8s-node-2]
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
In the real world on a bird based BGP upstream router
|
||||||
|
this looks as follows:
|
||||||
|
|
||||||
|
```
|
||||||
|
[18:13:02] red.place7:~# birdc show route
|
||||||
|
BIRD 2.0.7 ready.
|
||||||
|
Table master6:
|
||||||
|
...
|
||||||
|
2a0a:e5c0:13:e2::/108 unicast [place7-server1 2021-06-07] * (100) [AS65534i]
|
||||||
|
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
|
||||||
|
unicast [place7-server4 2021-06-08] (100) [AS65534i]
|
||||||
|
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
|
||||||
|
unicast [place7-server2 2021-06-07] (100) [AS65534i]
|
||||||
|
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
|
||||||
|
unicast [place7-server3 2021-06-07] (100) [AS65534i]
|
||||||
|
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Which results into the following kernel route:
|
||||||
|
|
||||||
|
```
|
||||||
|
2a0a:e5c0:13:e2::/108 proto bird metric 32
|
||||||
|
nexthop via 2a0a:e5c0:13:0:224:81ff:fee0:db7a dev eth0 weight 1
|
||||||
|
nexthop via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 dev eth0 weight 1
|
||||||
|
nexthop via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 dev eth0 weight 1
|
||||||
|
nexthop via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc dev eth0 weight 1 pref medium
|
||||||
|
```
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
We know, a TL;DR at the end is not the right thing to do, but hey, we
|
||||||
|
are at ungleich, aren't we?
|
||||||
|
|
||||||
|
In a nutshell, with IPv6 the concept of **Ingress**,
|
||||||
|
**Service** and the **LoadBalancer** ServiceType
|
||||||
|
types need to be revised, as IPv6 allows direct access without having
|
||||||
|
to jump through hoops.
|
||||||
|
|
||||||
|
If you are interesting in continuing the discussion,
|
||||||
|
we are there for you in
|
||||||
|
**the #hacking:ungleich.ch Matrix channel**
|
||||||
|
[you can signup here if you don't have an
|
||||||
|
account](https://chat.with.ungleich.ch).
|
||||||
|
|
||||||
|
Or if you are interested in an IPv6 only kubernetes cluster,
|
||||||
|
drop a mail to **support**-at-**ungleich.ch**.
|
|
@ -0,0 +1,32 @@
|
||||||
|
title: Building stateless redundant IPv6 routers
|
||||||
|
---
|
||||||
|
pub_date: 2021-04-21
|
||||||
|
---
|
||||||
|
author: ungleich virtualisation team
|
||||||
|
---
|
||||||
|
twitter_handle: ungleich
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
abstract:
|
||||||
|
It's time for IPv6 in docker, too.
|
||||||
|
---
|
||||||
|
body:
|
||||||
|
|
||||||
|
```
|
||||||
|
interface eth1.2
|
||||||
|
{
|
||||||
|
AdvSendAdvert on;
|
||||||
|
MinRtrAdvInterval 3;
|
||||||
|
MaxRtrAdvInterval 5;
|
||||||
|
AdvDefaultLifetime 10;
|
||||||
|
|
||||||
|
prefix 2a0a:e5c0:0:0::/64 { };
|
||||||
|
prefix 2a0a:e5c0:0:10::/64 { };
|
||||||
|
|
||||||
|
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
||||||
|
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
||||||
|
};
|
||||||
|
```
|
|
@ -1,4 +1,4 @@
|
||||||
title: Accessing IPv4 only hosts via IPv4
|
title: Accessing IPv4 only hosts via IPv6
|
||||||
---
|
---
|
||||||
pub_date: 2021-02-28
|
pub_date: 2021-02-28
|
||||||
---
|
---
|
||||||
|
|
110
content/u/products/ungleich-sla/contents.lr
Normal file
110
content/u/products/ungleich-sla/contents.lr
Normal file
|
@ -0,0 +1,110 @@
|
||||||
|
_discoverable: no
|
||||||
|
---
|
||||||
|
_hidden: no
|
||||||
|
---
|
||||||
|
title: ungleich SLA levels
|
||||||
|
---
|
||||||
|
subtitle: ungleich service level agreements
|
||||||
|
---
|
||||||
|
description1:
|
||||||
|
|
||||||
|
What is the right SLA (service level agreement) for you? At ungleich
|
||||||
|
we know that every organisation has individual needs and resources.
|
||||||
|
Depending on your need, we offer different types of service level
|
||||||
|
agreements.
|
||||||
|
|
||||||
|
## The standard SLA
|
||||||
|
|
||||||
|
If not otherwise specified in the product or service you acquired from
|
||||||
|
us, the standard SLA will apply. This SLA covers standard operations
|
||||||
|
and is suitable for non-critical deployments. The standard SLA covers:
|
||||||
|
|
||||||
|
* Target uptime of all services: 99.9%
|
||||||
|
* Service level: best effort
|
||||||
|
* Included for all products
|
||||||
|
* Support via support@ungleich.ch (answered 9-17 on work days)
|
||||||
|
* Individual Development and Support available at standard rate of 220 CHF/h
|
||||||
|
* No telephone support
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
feature1_title: Bronze SLA
|
||||||
|
---
|
||||||
|
feature1_text:
|
||||||
|
|
||||||
|
The business SLA is suited for running regular applications with a
|
||||||
|
focus of business continuity and individual support. Compared to the
|
||||||
|
standard SLA it **guarantees you responses within 5 hours** on work
|
||||||
|
days. You also can **reach our staff at extended** hours.
|
||||||
|
|
||||||
|
---
|
||||||
|
feature2_title: Enterprise SLA
|
||||||
|
---
|
||||||
|
feature2_text:
|
||||||
|
|
||||||
|
The Enterprise SLA is right for you if you need high availability, but
|
||||||
|
you don't require instant reaction times from our team.
|
||||||
|
|
||||||
|
|
||||||
|
How this works:
|
||||||
|
|
||||||
|
* All services are setup in a high availability setup (additional
|
||||||
|
charges for resources apply)
|
||||||
|
* The target uptime of services: 99.99%
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
feature3_title: High Availability (HA) SLA
|
||||||
|
---
|
||||||
|
feature3_text:
|
||||||
|
If your application is mission critical, this is the right SLA for
|
||||||
|
you. The **HA SLA** guarantees high availability, multi location
|
||||||
|
deployments with cross-datacenter backups and fast reaction times
|
||||||
|
on 24 hours per day.
|
||||||
|
|
||||||
|
---
|
||||||
|
offer1_title: Business SLA
|
||||||
|
---
|
||||||
|
offer1_text:
|
||||||
|
|
||||||
|
* Target uptime of all services: 99.9%
|
||||||
|
* Service level: guaranteed reaction within 1 business day
|
||||||
|
* Development/Support (need to phrase this well): 180 CHF/h
|
||||||
|
* Telephone support (8-18 work days)
|
||||||
|
* Mail support (8-18 work days)
|
||||||
|
* Optional out of business hours hotline (360 CHF/h)
|
||||||
|
* 3'000 CHF/6 months
|
||||||
|
|
||||||
|
---
|
||||||
|
offer1_link: https://ungleich.ch/u/contact/
|
||||||
|
---
|
||||||
|
offer2_title: Enterprise SLA
|
||||||
|
---
|
||||||
|
offer2_text:
|
||||||
|
|
||||||
|
** Requires High availability setup for all services with separate pricing
|
||||||
|
* Service level: reaction within 4 hours
|
||||||
|
* Telephone support (24x7 work days)
|
||||||
|
* Services are provided in multiple data centers
|
||||||
|
* Included out of business hours hotline (180 CHF/h)
|
||||||
|
* 18'000 CHF/6 months
|
||||||
|
|
||||||
|
---
|
||||||
|
offer2_link: https://ungleich.ch/u/contact/
|
||||||
|
---
|
||||||
|
offer3_title: HA SLA
|
||||||
|
---
|
||||||
|
offer3_text:
|
||||||
|
|
||||||
|
* Uptime guarantees >= 99.99%
|
||||||
|
* Ticketing system reaction time < 3h
|
||||||
|
* 24x7 telephone support
|
||||||
|
* Applications running in multiple data centers
|
||||||
|
* Minimum monthly fee: 3000 CHF (according to individual service definition)
|
||||||
|
|
||||||
|
Individual pricing. Contact us on support@ungleich.ch for an indivual
|
||||||
|
quote and we will get back to you.
|
||||||
|
|
||||||
|
---
|
||||||
|
offer3_link: https://ungleich.ch/u/contact/
|
|
@ -58,6 +58,15 @@ Checkout the [SBB
|
||||||
page](https://www.sbb.ch/de/kaufen/pages/fahrplan/fahrplan.xhtml?von=Zurich&nach=Diesbach-Betschwanden)
|
page](https://www.sbb.ch/de/kaufen/pages/fahrplan/fahrplan.xhtml?von=Zurich&nach=Diesbach-Betschwanden)
|
||||||
for the next train.
|
for the next train.
|
||||||
|
|
||||||
|
The address is:
|
||||||
|
|
||||||
|
```
|
||||||
|
Hacking Villa
|
||||||
|
Hauptstrasse 28
|
||||||
|
8777 Diesbach
|
||||||
|
Switzerland
|
||||||
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
content1_image: hacking-villa-diesbach.jpg
|
content1_image: hacking-villa-diesbach.jpg
|
||||||
---
|
---
|
||||||
|
|
|
@ -45,6 +45,16 @@ Specifically for learning new technologies and to exchange knowledge
|
||||||
we created the **Hacking & Learning channel** which can be found at
|
we created the **Hacking & Learning channel** which can be found at
|
||||||
**#hacking-and-learning:ungleich.ch**.
|
**#hacking-and-learning:ungleich.ch**.
|
||||||
|
|
||||||
|
## Kubernetes
|
||||||
|
|
||||||
|
Recently (in 2021) we started to run Kubernetes cluster at
|
||||||
|
ungleich. We share our experiences in **#kubernetes:ungleich.ch**.
|
||||||
|
|
||||||
|
## Ceph
|
||||||
|
|
||||||
|
To exchange experiences and trouble shooting for ceph, we are running
|
||||||
|
**#ceph:ungleich.ch**.
|
||||||
|
|
||||||
## cdist
|
## cdist
|
||||||
|
|
||||||
We meet for cdist discussions about using, developing and more
|
We meet for cdist discussions about using, developing and more
|
||||||
|
@ -57,7 +67,7 @@ We discuss topics related to sustainability in
|
||||||
|
|
||||||
## More channels
|
## More channels
|
||||||
|
|
||||||
* The main / hangout channel is **o#town-square:ungleich.ch** (also bridged
|
* The main / hangout channel is **#town-square:ungleich.ch** (also bridged
|
||||||
to Freenode IRC as #ungleich and
|
to Freenode IRC as #ungleich and
|
||||||
[discord](https://discord.com/channels/706144469925363773/706144469925363776))
|
[discord](https://discord.com/channels/706144469925363773/706144469925363776))
|
||||||
* The bi-yearly hackathon Hack4Glarus can be found in
|
* The bi-yearly hackathon Hack4Glarus can be found in
|
||||||
|
|
Loading…
Add table
Reference in a new issue