Merge branch 'master' of code.ungleich.ch:ungleich-public/ungleich-staticcms
This commit is contained in:
commit
bc5fc19ca7
21 changed files with 2216 additions and 2 deletions
BIN
assets/u/image/k8s-v6-v4-dns.png
Normal file
BIN
assets/u/image/k8s-v6-v4-dns.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 88 KiB |
162
content/u/blog/datacenterlight-active-active-routing/contents.lr
Normal file
162
content/u/blog/datacenterlight-active-active-routing/contents.lr
Normal file
|
@ -0,0 +1,162 @@
|
|||
title: Active-Active Routing Paths in Data Center Light
|
||||
---
|
||||
pub_date: 2019-11-08
|
||||
---
|
||||
author: Nico Schottelius
|
||||
---
|
||||
twitter_handle: NicoSchottelius
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
|
||||
---
|
||||
body:
|
||||
|
||||
From our last two blog articles (a, b) you probably already know that
|
||||
it is spring network cleanup in [Data Center Light](https://datacenterlight.ch).
|
||||
|
||||
In [first blog article]() we described where we started and in
|
||||
the [second blog article]() you could see how we switched our
|
||||
infrastructure to IPv6 only netboot.
|
||||
|
||||
In this article we will dive a bit more into the details of our
|
||||
network architecture and which problems we face with active-active
|
||||
routers.
|
||||
|
||||
## Network architecture
|
||||
|
||||
Let's have a look at a simplified (!) diagram of the network:
|
||||
|
||||
... IMAGE
|
||||
|
||||
Doesn't look that simple, does it? Let's break it down into small
|
||||
pieces.
|
||||
|
||||
## Upstream routers
|
||||
|
||||
We have a set of **upstream routers** which work stateless. They don't
|
||||
have any stateful firewall rules, so both of them can work actively
|
||||
without state synchronisation. Moreover, both of them peer with the
|
||||
data center upstreams. These are fast routers and besides forwarding,
|
||||
they also do **BGP peering** with our upstreams.
|
||||
|
||||
Over all the upstream routers are very simple machines, mostly running
|
||||
bird and forwarding packets all day. They also provide a DNS service
|
||||
(resolving and authoritative), because they are always up and can
|
||||
announce service IPs via BGP or via OSPF to our network.
|
||||
|
||||
## Internal routers
|
||||
|
||||
The internal routers on the other hand provide **stateful routing**,
|
||||
**IP address assignments** and **netboot services**. They are a bit
|
||||
more complicated compared to the upstream routers, but they care only
|
||||
a small routing table.
|
||||
|
||||
## Communication between the routers
|
||||
|
||||
All routers employ OSPF and BGP for route exchange. Thus the two
|
||||
upstream routers learn about the internal networks (IPv6 only, as
|
||||
usual) from the internal routers.
|
||||
|
||||
## Sessions
|
||||
|
||||
Sessions in networking are almost always an evil. You need to store
|
||||
them (at high speed), you need to maintain them (updating, deleting)
|
||||
and if you run multiple routers, you even need to sychronise them.
|
||||
|
||||
In our case the internal routers do have session handling, as they are
|
||||
providing a stateful firewall. As we are using a multi router setup,
|
||||
things can go really wrong if the wrong routes are being used.
|
||||
|
||||
Let's have a look at this a bit more in detail.
|
||||
|
||||
## The good path
|
||||
|
||||
IMAGE2: good
|
||||
|
||||
If a server sends out a packet via router1 and router1 eventually
|
||||
receives the answer, everything is fine. The returning packet matches
|
||||
the state entry that was created by the outgoing packet and the
|
||||
internal router forwards the packet.
|
||||
|
||||
## The bad path
|
||||
|
||||
IMAGE3: bad
|
||||
|
||||
However if the
|
||||
|
||||
## Routing paths
|
||||
|
||||
If we want to go active-active routing, the server can choose between
|
||||
either internal router for sending out the packet. The internal
|
||||
routers again have two upstream routers. So with the return path
|
||||
included, the following paths exist for a packet:
|
||||
|
||||
Outgoing paths:
|
||||
|
||||
* servers->router1->upstream router1->internet
|
||||
* servers->router1->upstream router2->internet
|
||||
* servers->router2->upstream router1->internet
|
||||
* servers->router2->upstream router2->internet
|
||||
|
||||
And the returning paths are:
|
||||
|
||||
* internet->upstream router1->router 1->servers
|
||||
* internet->upstream router1->router 2->servers
|
||||
* internet->upstream router2->router 1->servers
|
||||
* internet->upstream router2->router 2->servers
|
||||
|
||||
So on average, 50% of the routes will hit the right router on
|
||||
return. However servers as well as upstream routers are not using load
|
||||
balancing like ECMP, so once an incorrect path has been chosen, the
|
||||
packet loss is 100%.
|
||||
|
||||
## Session synchronisation
|
||||
|
||||
In the first article we talked a bit about keepalived and that
|
||||
it helps to operate routers in an active-passive mode. This did not
|
||||
turn out to be the most reliable method. Can we do better with
|
||||
active-active routers and session synchronisation?
|
||||
|
||||
Linux supports this using
|
||||
[conntrackd](http://conntrack-tools.netfilter.org/). However,
|
||||
conntrackd supports active-active routers on a **flow based** level,
|
||||
but not on a **packet** based level. The difference is that the
|
||||
following will not work in active-active routers with conntrackd:
|
||||
|
||||
```
|
||||
#1 Packet (in the original direction) updates state in Router R1 ->
|
||||
submit state to R2
|
||||
#2 Packet (in the reply direction) arrive to Router R2 before state
|
||||
coming from R1 has been digested.
|
||||
|
||||
With strict stateful filtering, Packet #2 will be dropped and it will
|
||||
trigger a retransmission.
|
||||
```
|
||||
(quote from Pablo Neira Ayuso, see below for more details)
|
||||
|
||||
Some of you will mumble something like **latency** in their head right
|
||||
now. If the return packet is guaranteed to arrive after state
|
||||
synchronisation, then everything is fine, However, if the reply is
|
||||
faster than the state synchronisation, packets will get dropped.
|
||||
|
||||
In reality, this will work for packets coming and going to the
|
||||
Internet. However, in our setup the upstream routers are route between
|
||||
different data center locations, which are in the sub micro second
|
||||
latency area - i.e. lan speed, because they are interconnected with
|
||||
dark fiber links.
|
||||
|
||||
|
||||
## Take away
|
||||
|
||||
Before moving on to the next blog article, we would like to express
|
||||
our thanks to Pablo Neira Ayuso, who gave very important input for
|
||||
session based firewalls and session synchronisation.
|
||||
|
||||
So active-active routing seems not to have a straight forward
|
||||
solution. Read in the [next blog
|
||||
article](/u/blog/datacenterlight-redundant-routing-infrastructure) on
|
||||
how we solved the challenge in the end.
|
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
219
content/u/blog/datacenterlight-ipv6-only-netboot/contents.lr
Normal file
|
@ -0,0 +1,219 @@
|
|||
title: IPv6 only netboot in Data Center Light
|
||||
---
|
||||
pub_date: 2021-05-01
|
||||
---
|
||||
author: Nico Schottelius
|
||||
---
|
||||
twitter_handle: NicoSchottelius
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
How we switched from IPv4 netboot to IPv6 netboot
|
||||
---
|
||||
body:
|
||||
|
||||
In our [previous blog
|
||||
article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||
we wrote about our motivation for the
|
||||
big spring network cleanup. In this blog article we show how we
|
||||
started reducing the complexity by removing our dependency on IPv4.
|
||||
|
||||
## IPv6 first
|
||||
|
||||
When you found our blog, you are probably aware: everything at
|
||||
ungleich is IPv6 first. Many of our networks are IPv6 only, all DNS
|
||||
entries for remote access have IPv6 (AAAA) entries and there are only
|
||||
rare exceptions when we utilise IPv4 for our infrastructure.
|
||||
|
||||
## IPv4 only Netboot
|
||||
|
||||
One of the big exceptions to this paradigm used to be how we boot our
|
||||
servers. Because our second big paradigm is sustainability, we use a
|
||||
lot of 2nd (or 3rd) generation hardware. We actually share this
|
||||
passion with our friends from
|
||||
[e-durable](https://recycled.cloud/), because sustainability is
|
||||
something that we need to employ today and not tomorrow.
|
||||
But back to the netbooting topic: For netbooting we mainly
|
||||
relied on onboard network cards so far.
|
||||
|
||||
## Onboard network cards
|
||||
|
||||
We used these network cards for multiple reasons:
|
||||
|
||||
* they exist virtually in any server
|
||||
* they usually have a ROM containing a PXE capable firmware
|
||||
* it allows us to split real traffic to fiber cards and internal traffic
|
||||
|
||||
However using the onboard devices comes also with a couple of disadvantages:
|
||||
|
||||
* Their ROM is often outdated
|
||||
* It requires additional cabling
|
||||
|
||||
## Cables
|
||||
|
||||
Let's have a look at the cabling situation first. Virtually all of
|
||||
our servers are connected to the network using 2x 10 Gbit/s fiber cards.
|
||||
|
||||
On one side this provides a fast connection, but on the other side
|
||||
it provides us with something even better: distances.
|
||||
|
||||
Our data centers employ a non-standard design due to the re-use of
|
||||
existing factory halls. This means distances between servers and
|
||||
switches can be up to 100m. With fiber, we can easily achieve these
|
||||
distances.
|
||||
|
||||
Additionally, have less cables provides a simpler infrastructure
|
||||
that is easier to analyse.
|
||||
|
||||
## Disabling onboard network cards
|
||||
|
||||
So can we somehow get rid of the copper cables and switch to fiber
|
||||
only? It turns out that the fiber cards we use (mainly Intel X520's)
|
||||
have their own ROM. So we started disabling the onboard network cards
|
||||
and tried booting from the fiber cards. This worked until we wanted to
|
||||
move the lab setup to production...
|
||||
|
||||
## Bonding (LACP) and VLAN tagging
|
||||
|
||||
Our servers use bonding (802.3ad) for redundant connections to the
|
||||
switches and VLAN tagging on top of the bonded devices to isolate
|
||||
client traffic. On the switch side we realised this using
|
||||
configurations like
|
||||
|
||||
```
|
||||
interface Port-Channel33
|
||||
switchport mode trunk
|
||||
mlag 33
|
||||
|
||||
...
|
||||
interface Ethernet33
|
||||
channel-group 33 mode active
|
||||
```
|
||||
|
||||
But that does not work, if the network ROM at boot does not create an
|
||||
LACP enabled link on top of which it should be doing VLAN tagging.
|
||||
|
||||
The ROM in our network cards **would** have allowed VLAN tagging alone
|
||||
though.
|
||||
|
||||
To fix this problem, we reconfigured our switches as follows:
|
||||
|
||||
```
|
||||
interface Port-Channel33
|
||||
switchport trunk native vlan 10
|
||||
switchport mode trunk
|
||||
port-channel lacp fallback static
|
||||
port-channel lacp fallback timeout 20
|
||||
mlag 33
|
||||
```
|
||||
|
||||
This basically does two things:
|
||||
|
||||
* If there are no LACP frames, fallback to static (non lacp)
|
||||
configuration
|
||||
* Accept untagged traffic and map it to VLAN 10 (one of our boot networks)
|
||||
|
||||
Great, our servers can now netboot from fiber! But we are not done
|
||||
yet...
|
||||
|
||||
## IPv6 only netbooting
|
||||
|
||||
So how do we convince these network cards to do IPv6 netboot? Can we
|
||||
actually do that at all? Our first approach was to put a custom build of
|
||||
[ipxe](https://ipxe.org/) on a USB stick. We generated that
|
||||
ipxe image using **rebuild-ipxe.sh** script
|
||||
from the
|
||||
[ungleich-tools](https://code.ungleich.ch/ungleich-public/ungleich-tools)
|
||||
repository. Turns out using a USB stick works pretty well for most
|
||||
situations.
|
||||
|
||||
## ROMs are not ROMs
|
||||
|
||||
As you can imagine, the ROM of the X520 cards does not contain IPv6
|
||||
netboot support. So are we back at square 1? No, we are not. Because
|
||||
the X520's have something that the onboard devices did not
|
||||
consistently have: **a rewritable memory area**.
|
||||
|
||||
Let's take 2 steps back here first: A ROM is an **read only memory**
|
||||
chip. Emphasis on **read only**. However, modern network cards and a
|
||||
lot of devices that support on-device firmware do actually have a
|
||||
memory (flash) area that can be written to. And that is what aids us
|
||||
in our situation.
|
||||
|
||||
## ipxe + flbtool + x520 = fun
|
||||
|
||||
Trying to write ipxe into the X520 cards initially failed, because the
|
||||
network card did not recognise the format of the ipxe rom file.
|
||||
|
||||
Luckily the folks in the ipxe community already spotted that problem
|
||||
AND fixed it: The format used in these cards is called FLB. And there
|
||||
is [flbtool](https://github.com/devicenull/flbtool/), which allows you
|
||||
to wrap the ipxe rom file into the FLB format. For those who want to
|
||||
try it yourself (at your own risk!), it basically involves:
|
||||
|
||||
* Get the current ROM from the card (try bootutil64e)
|
||||
* Extract the contents from the rom using flbtool
|
||||
* This will output some sections/parts
|
||||
* Locate one part that you want to overwrite with iPXE (a previous PXE
|
||||
section is very suitable)
|
||||
* Replace the .bin file with your iPXE rom
|
||||
* Adjust the .json file to match the length of the new binary
|
||||
* Build a new .flb file using flbtool
|
||||
* Flash it onto the card
|
||||
|
||||
While this is a bit of work, it is worth it for us, because...:
|
||||
|
||||
## IPv6 only netboot over fiber
|
||||
|
||||
With the modified ROM, basically loading iPXE at start, we can now
|
||||
boot our servers in IPv6 only networks. On our infrastructure side, we
|
||||
added two **tiny** things:
|
||||
|
||||
We use ISC dhcp with the following configuration file:
|
||||
|
||||
```
|
||||
option dhcp6.bootfile-url code 59 = string;
|
||||
|
||||
option dhcp6.bootfile-url "http://[2a0a:e5c0:0:6::46]/ipxescript";
|
||||
|
||||
subnet6 2a0a:e5c0:0:6::/64 {}
|
||||
```
|
||||
|
||||
(that is the complete configuration!)
|
||||
|
||||
And we used radvd to announce that there are other information,
|
||||
indicating clients can actually query the dhcpv6 server:
|
||||
|
||||
```
|
||||
interface bond0.10
|
||||
{
|
||||
AdvSendAdvert on;
|
||||
MinRtrAdvInterval 3;
|
||||
MaxRtrAdvInterval 5;
|
||||
AdvDefaultLifetime 600;
|
||||
|
||||
# IPv6 netbooting
|
||||
AdvOtherConfigFlag on;
|
||||
|
||||
prefix 2a0a:e5c0:0:6::/64 { };
|
||||
|
||||
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
||||
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
||||
};
|
||||
```
|
||||
|
||||
## Take away
|
||||
|
||||
Being able to reduce cables was one big advantage in the beginning.
|
||||
|
||||
Switching to IPv6 only netboot does not seem like a big simplification
|
||||
in the first place, besides being able to remove IPv4 in server
|
||||
networks.
|
||||
|
||||
However as you will see in
|
||||
[the next blog posts](/u/blog/datacenterlight-active-active-routing/),
|
||||
switching to IPv6 only netbooting is actually a key element on
|
||||
reducing complexity in our network.
|
|
@ -0,0 +1,222 @@
|
|||
title: Redundant routing infrastructure at Data Center Light
|
||||
---
|
||||
pub_date: 2021-05-01
|
||||
---
|
||||
author: Nico Schottelius
|
||||
---
|
||||
twitter_handle: NicoSchottelius
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
|
||||
---
|
||||
body:
|
||||
|
||||
In case you have missed the previous articles, you can
|
||||
get [an introduction to the Data Center Light spring
|
||||
cleanup](/u/blog/datacenterlight-spring-network-cleanup),
|
||||
see [how we switched to IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot)
|
||||
or read about [the active-active routing
|
||||
problems](/u/blog/datacenterlight-active-active-routing/).
|
||||
|
||||
In this article we will show how we finally solved the routing issue
|
||||
conceptually as well as practically.
|
||||
|
||||
## Active-active or passive-active routing?
|
||||
|
||||
In the [previous blog article](/u/blog/datacenterlight-active-active-routing/)
|
||||
we reasoned that active-active routing, even with session
|
||||
synchronisation does not have a straight forward solution in our
|
||||
case. However in the
|
||||
[first blog article](/u/blog/datacenterlight-spring-network-cleanup)
|
||||
we reasoned that active-passive routers with VRRP and keepalived are
|
||||
not stable enough either
|
||||
|
||||
So which path should we take? Or is there another solution?
|
||||
|
||||
## Active-Active-Passive Routing
|
||||
|
||||
Let us introduce Active-Active-Passive routing. Something that sounds
|
||||
strange in the first place, but is going to make sense in the next
|
||||
minutes.
|
||||
|
||||
We do want multiple active routers, but we do not want to have to
|
||||
deal with session synchronisation, which is not only tricky, but due
|
||||
to its complexity can also be a source of error.
|
||||
|
||||
So what we are looking for is active-active routing without state
|
||||
synchronisation. While this sounds like a contradiction, if we loosen
|
||||
our requirement a little bit, we are able to support multiple active
|
||||
routers without session synchronisation by using **routing
|
||||
priorities**.
|
||||
|
||||
## Active-Active routing with routing priorities
|
||||
|
||||
Let's assume for a moment that all involved hosts (servers, clients,
|
||||
routers, etc.) know about multiple routes for outgoing and incoming
|
||||
traffic. Let's assume also for a moment that **we can prioritise**
|
||||
those routes. Then we can create a deterministic routing path that
|
||||
does not need session synchronisation.
|
||||
|
||||
## Steering outgoing traffic
|
||||
|
||||
Let's have a first look at the outgoing traffic. Can we announce
|
||||
multiple routers in a network, but have the servers and clients
|
||||
**prefer** one of the routers? The answer is yes!
|
||||
If we checkout the manpage of
|
||||
[radvd.conf(5)](https://linux.die.net/man/5/radvd.conf) we find a
|
||||
setting that is named **AdvDefaultPreference**
|
||||
|
||||
```
|
||||
AdvDefaultPreference low|medium|high
|
||||
```
|
||||
|
||||
Using this attribute, two routers can both actively announce
|
||||
themselves, but clients in the network will prefer the one with the
|
||||
higher preference setting.
|
||||
|
||||
### Replacing radvd with bird
|
||||
|
||||
At this point a short side note: We have been using radvd for some
|
||||
years in the Data Center Light. However recently on our
|
||||
[Alpine Linux based routers](https://alpinelinux.org/), radvd started
|
||||
to crash from time to time:
|
||||
|
||||
```
|
||||
[717424.727125] device eth1 left promiscuous mode
|
||||
[1303962.899600] radvd[24196]: segfault at 63f42258 ip 00007f6bdd59353b sp 00007ffc63f421b8 error 4 in ld-musl-x86_64.so.1[7f6bdd558000+48000]
|
||||
[1303962.899609] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||||
...
|
||||
[1458460.511006] device eth0 entered promiscuous mode
|
||||
[1458460.511168] radvd[27905]: segfault at 4dfce818 ip 00007f94ec1fd53b sp 00007ffd4dfce778 error 4 in ld-musl-x86_64.so.1[7f94ec1c2000+48000]
|
||||
[1458460.511177] Code: 48 09 c8 4c 85 c8 75 0d 49 83 c4 08 eb d4 39 f0 74 0c 49 ff c4 41 0f b6 04 24 84 c0 75 f0 4c 89 e0 41 5c c3 31 c9 0f b6 04 0f <0f> b6 14 0e 38 d0 75 07 48 ff c1 84 c0 75 ed 29 d0 c3 41 54 49 89
|
||||
...
|
||||
```
|
||||
|
||||
Unfortunately it seems that either the addresses timed out or that
|
||||
radvd was able to send a message de-announcing itself prior to the
|
||||
crash, causing all clients to withdraw their addresses. This is
|
||||
especially problematic, if you run a [ceph](https://ceph.io/) cluster
|
||||
and the servers don't have IP addresses anymore...
|
||||
|
||||
While we did not yet investigate the full cause of this, we had a very
|
||||
easy solution: as all of our routers run
|
||||
[bird](https://bird.network.cz/) and it also supports sending router
|
||||
advertisements, we replaced radvd with bird. The configuration is
|
||||
actually pretty simple:
|
||||
|
||||
```
|
||||
protocol radv {
|
||||
# Internal
|
||||
interface "eth1.5" {
|
||||
max ra interval 5; # Fast failover with more routers
|
||||
other config yes; # dhcpv6 boot
|
||||
default preference high;
|
||||
};
|
||||
rdnss {
|
||||
lifetime 3600;
|
||||
ns 2a0a:e5c0:0:a::a;
|
||||
ns 2a0a:e5c0:0:a::b;
|
||||
};
|
||||
dnssl {
|
||||
lifetime 3600;
|
||||
domain "place5.ungleich.ch";
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Steering incoming traffic
|
||||
|
||||
As the internal and the upstream routers are in the same data center,
|
||||
we can use an IGP like OSPF to distribute the routes to the internal
|
||||
routers. And OSPF actually has this very neat metric called **cost**.
|
||||
So for the router that sets the **default preference high** for the
|
||||
outgoing routes, we keep the cost at 10, for the router that
|
||||
ses the **default preference low** we set the cost at 20. The actual
|
||||
bird configuration on a router looks like this:
|
||||
|
||||
```
|
||||
define ospf_cost = 10;
|
||||
...
|
||||
|
||||
protocol ospf v3 ospf6 {
|
||||
instance id 0;
|
||||
|
||||
ipv6 {
|
||||
import all;
|
||||
export none;
|
||||
};
|
||||
|
||||
area 0 {
|
||||
interface "eth1.*" {
|
||||
authentication cryptographic;
|
||||
password "weshouldhaveremovedthisfortheblogpost";
|
||||
cost ospf_cost;
|
||||
};
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
## Incoming + Outgoing = symmetric paths
|
||||
|
||||
With both directions under our control, we now have enabled symmetric
|
||||
routing in both directions. Thus as long as the first router is alive,
|
||||
all traffic will be handled by the first router.
|
||||
|
||||
## Failover scenario
|
||||
|
||||
In case the first router fails, clients have a low life time of 15
|
||||
seconds (3x **max ra interval**)
|
||||
for their routes and they will fail over to the 2nd router
|
||||
automatically. Existing sessions will not continue to work, but that
|
||||
is ok for our setup. When the first router with the higher priority
|
||||
comes back, there will be again an interruption, but clients will
|
||||
automatically change their paths.
|
||||
|
||||
And so will the upstream routers, as OSPF is a quick protocol that
|
||||
updates alive routers and routes.
|
||||
|
||||
|
||||
## IPv6 enables active-active-passive routing architectures
|
||||
|
||||
At ungleich it almost always comes back to the topic of IPv6, albeit
|
||||
for a good reason. You might remember that we claimed in the
|
||||
[IPv6 only netboot](/u/blog/datacenterlight-ipv6-only-netboot) article
|
||||
that this is reducing complexity? If you look at the above example,
|
||||
you might not spot it directly, but going IPv6 only is actually an
|
||||
enabler for our setup:
|
||||
|
||||
We **only deploy router advertisements** using bird. We are **not using DHCPv4**
|
||||
or **IPv4** for accessing our servers. Both routers run a dhcpv6
|
||||
service in parallel, with the "boot server" pointing to themselves.
|
||||
|
||||
Besides being nice and clean,
|
||||
our whole active-active-passive routing setup **would not work with
|
||||
IPv4**, because dhcpv4 servers do not have the same functionality to
|
||||
provide routing priorities.
|
||||
|
||||
## Take away
|
||||
|
||||
You can see that trying to solve one problem ("unreliable redundant
|
||||
router setup") entailed a slew of changes, but in the end made our
|
||||
infrastructure much simpler:
|
||||
|
||||
* No dual stack
|
||||
* No private IPv4 addresses
|
||||
* No actively communicating keepalived
|
||||
* Two daemons less to maintain (keepalived, radvd)
|
||||
|
||||
We also avoided complex state synchronisation and deployed only Open
|
||||
Source Software to address our problems. Furthermore hardware that
|
||||
looked like unusable in modern IPv6 networks can also be upgraded with
|
||||
Open Source Software (ipxe) and enables us to provide more sustainable
|
||||
infrastructures.
|
||||
|
||||
We hope you enjoyed our spring cleanup blog series. The next one will
|
||||
be coming, because IT infrastructures always evolve. Until then:
|
||||
feel free to [join our Open Soure Chat](https://chat.with.ungleich.ch)
|
||||
and join the discussion.
|
|
@ -0,0 +1,161 @@
|
|||
title: Data Center Light: Spring network cleanup
|
||||
---
|
||||
pub_date: 2021-05-01
|
||||
---
|
||||
author: Nico Schottelius
|
||||
---
|
||||
twitter_handle: NicoSchottelius
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
From today on ungleich offers free, encrypted IPv6 VPNs for hackerspaces
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
Spring is the time for cleanup. Cleanup up your apartment, removing
|
||||
dust from the cabinet, letting the light shine through the windows,
|
||||
or like in our case: improving the networking situation.
|
||||
|
||||
In this article we give an introduction of where we started and what
|
||||
the typical setup used to be in our data center.
|
||||
|
||||
## Best practice
|
||||
|
||||
When we started [Data Center Light](https://datacenterlight.ch) in
|
||||
2017, we orientated ourselves at "best practice" for networking. We
|
||||
started with IPv6 only networks and used RFC1918 network (10/8) for
|
||||
internal IPv4 routing.
|
||||
|
||||
And we started with 2 routers for every network to provide
|
||||
redundancy.
|
||||
|
||||
## Router redundancy
|
||||
|
||||
So what do you do when you have two routers? In the Linux world the
|
||||
software [keepalived](https://keepalived.org/)
|
||||
is very popular to provide redundant routing
|
||||
using the [VRRP protocol](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol).
|
||||
|
||||
## Active-Passive
|
||||
|
||||
While VRRP is designed to allow multiple (not only two) routers to
|
||||
co-exist in a network, its design is basically active-passive: you
|
||||
have one active router and n passive routers, in our case 1
|
||||
additional.
|
||||
|
||||
## Keepalived: a closer look
|
||||
|
||||
A typical keepalived configuration in our network looked like this:
|
||||
|
||||
```
|
||||
vrrp_instance router_v4 {
|
||||
interface INTERFACE
|
||||
virtual_router_id 2
|
||||
priority PRIORITY
|
||||
advert_int 1
|
||||
virtual_ipaddress {
|
||||
10.0.0.1/22 dev eth1.5 # Internal
|
||||
}
|
||||
notify_backup "/usr/local/bin/vrrp_notify_backup.sh"
|
||||
notify_fault "/usr/local/bin/vrrp_notify_fault.sh"
|
||||
notify_master "/usr/local/bin/vrrp_notify_master.sh"
|
||||
}
|
||||
|
||||
vrrp_instance router_v6 {
|
||||
interface INTERFACE
|
||||
virtual_router_id 1
|
||||
priority PRIORITY
|
||||
advert_int 1
|
||||
virtual_ipaddress {
|
||||
2a0a:e5c0:1:8::48/128 dev eth1.8 # Transfer for routing from outside
|
||||
2a0a:e5c0:0:44::7/64 dev bond0.18 # zhaw
|
||||
2a0a:e5c0:2:15::7/64 dev bond0.20 #
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This is a template that we distribute via [cdist](https:/cdi.st). The
|
||||
strings INTERFACE and PRIORITY are replaced via cdist. The interface
|
||||
field defines which interface to use for VRRP communication and the
|
||||
priority field determines which of the routers is the active one.
|
||||
|
||||
So far, so good. However let's have a look at a tiny detail of this
|
||||
configuration file:
|
||||
|
||||
```
|
||||
notify_backup "/usr/local/bin/vrrp_notify_backup.sh"
|
||||
notify_fault "/usr/local/bin/vrrp_notify_fault.sh"
|
||||
notify_master "/usr/local/bin/vrrp_notify_master.sh"
|
||||
```
|
||||
|
||||
These three lines basically say: "start something if you are the
|
||||
master" and "stop something in case you are not". And why did we do
|
||||
this? Because of stateful services.
|
||||
|
||||
## Stateful services
|
||||
|
||||
A typical shell script that we would call containes lines like this:
|
||||
|
||||
```
|
||||
/etc/init.d/radvd stop
|
||||
/etc/init.d/dhcpd stop
|
||||
```
|
||||
(or start in the case of the master version)
|
||||
|
||||
In earlier days, this even contained openvpn, which was running on our
|
||||
first generation router version. But more about OpenVPN later.
|
||||
|
||||
The reason why we stopped and started dhcp and radvd is to make
|
||||
clients of the network use the active router. We used radvd to provide
|
||||
IPv6 addresses as the primary access method to servers. And we used
|
||||
dhcp mainly to allow servers to netboot. The active router would
|
||||
carry state (firewall!) and thus the flow of packets always need to go
|
||||
through the active router.
|
||||
|
||||
Restarting radvd on a different machine keeps the IPv6 addresses the
|
||||
same, as clients assign then themselves using EUI-64. In case of dhcp
|
||||
(IPv4) we would have used hardcoded IPv4 addresses using a mapping of
|
||||
MAC address to IPv4 address, but we opted out for this. The main
|
||||
reason is that dhcp clients re-request their same leas and even if an
|
||||
IPv4 addresses changes, it is not really of importance.
|
||||
|
||||
During a failover this would lead to a few seconds interrupt and
|
||||
re-establishing sessions. Given that routers are usually rather stable
|
||||
and restarting them is not a daily task, we initially accepted this.
|
||||
|
||||
## Keepalived/VRRP changes
|
||||
|
||||
One of the more tricky things is changes to keepalived. Because
|
||||
keepalived uses the *number of addresses and routes* to verify
|
||||
that the received VRRP packet matches its configuration, adding or
|
||||
deleting IP addresses and routes, causes a problem:
|
||||
|
||||
While one router was updated, the number of IP addresses or routes is
|
||||
different. This causes both routers to ignore the others VRRP messages
|
||||
and both routers think they should be the master process.
|
||||
|
||||
This leads to the problem that both routers receive client and outside
|
||||
traffic. This causes the firewall (nftables) to not recognise
|
||||
returning packets, if they were sent out by router1, but received back
|
||||
by router2 and, because nftables is configured *stateful*, will drop
|
||||
the returning packet.
|
||||
|
||||
However not only changes to the configuration can trigger this
|
||||
problem, but also any communication problem between the two
|
||||
routers. Since 2017 we experienced it multiple times that keepalived
|
||||
was unable to receive or send messages from the other router and thus
|
||||
both of them again became the master process.
|
||||
|
||||
## Take away
|
||||
|
||||
While in theory keepalived should improve the reliability, in practice
|
||||
the number of problems due to double master situations we had, made us
|
||||
question whether the keepalived concept is the fitting one for us.
|
||||
|
||||
You can read how we evolved from this setup in
|
||||
[the next blog article](/u/blog/datacenterlight-ipv6-only-netboot/).
|
192
content/u/blog/glamp-1-2021/contents.lr
Normal file
192
content/u/blog/glamp-1-2021/contents.lr
Normal file
|
@ -0,0 +1,192 @@
|
|||
title: GLAMP #1 2021
|
||||
---
|
||||
pub_date: 2021-07-17
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
The first un-hack4glarus happens as a camp - Thursday 2021-08-19 to Sunday 2021-08-22.
|
||||
---
|
||||
body:
|
||||
|
||||
## Tl;DR
|
||||
|
||||
Get your tent, connect it to power and 10Gbit/s Internet in the midst
|
||||
of the Glarner mountains. Happenening Thursday 2021-08-19 to Sunday 2021-08-22.
|
||||
Apply for participation by mail (information at the bottom of the page).
|
||||
|
||||
## Introduction
|
||||
|
||||
It has been some time since our
|
||||
[last Hack4Glarus](https://hack4glarus.ch) and we have been missing
|
||||
all our friends, hackers and participants. At ungleich we have been
|
||||
watching the development of the Coronavirus world wide and as you
|
||||
might know, we have decided against a Hack4Glarus for this summer, as
|
||||
the Hack4Glarus has been an indoor event so far.
|
||||
|
||||
## No Hack4Glarus = GLAMP
|
||||
|
||||
However, we want to try a different format that ensures proper
|
||||
safety. Instead of an indoor Hack4Glarus in Linthal, we introduce
|
||||
the Glarus Camp (or GLAMP in short) to you. An outdoor event with
|
||||
sufficient space for distancing. As a camping site we can use the
|
||||
surrounding of the Hacking Villa, supported by the Hacking Villa
|
||||
facilities.
|
||||
|
||||
Compared to the Hack4Glarus, the GLAMP will focus more on
|
||||
*relaxation*, *hangout* than being a hackathon. We think times are
|
||||
hard enough to give everyone a break.
|
||||
|
||||
## The setting
|
||||
|
||||
Many of you know the [Hacking Villa](/u/projects/hacking-villa/) in
|
||||
Diesbach already. Located just next to the pretty waterfall and the amazing
|
||||
Legler Areal. The villa is connected with 10 Gbit/s to the
|
||||
[Data Center Light](/u/projects/data-center-light/) and offers a lot
|
||||
of fun things to do.
|
||||
|
||||
## Coronavirus measures beforehand
|
||||
|
||||
To ensure safety for everyone, we ask everyone attending to provide a
|
||||
reasonable proof of not spreading the corona virus with one of the
|
||||
following proofs:
|
||||
|
||||
* You have been vaccinated
|
||||
* You had the corona virus and you are symptom free for at least 14
|
||||
days
|
||||
* You have been tested with a PCR test (7 days old at maximum) and the
|
||||
result was negative
|
||||
|
||||
All participants will be required to take an short antigen test on
|
||||
site.
|
||||
|
||||
**Please do not attend if you feel sick for the safety of everyone else.**
|
||||
|
||||
## Coronavirus measures on site
|
||||
|
||||
To keep the space safe on site as well, we ask you to follow these
|
||||
rules:
|
||||
|
||||
* Sleep in your own tent
|
||||
* Wear masks inside the Hacking Villa
|
||||
* Especially if you are preparing food shared with others
|
||||
* Keep distance and respect others safety wishes
|
||||
|
||||
## Hacking Villa Facilities
|
||||
|
||||
* Fast Internet (what do you need more?)
|
||||
* A shared, open area outside for hacking
|
||||
* Toilets and bath room located inside
|
||||
|
||||
## What to bring
|
||||
|
||||
* A tent + sleeping equipment
|
||||
* Fun stuff
|
||||
* Your computer
|
||||
* Wifi / IoT / Hacking things
|
||||
* If you want wired Internet in your tent: a 15m+ Ethernet cable
|
||||
* WiFi will be provided everywhere
|
||||
|
||||
## What is provided
|
||||
|
||||
* Breakfast every morning
|
||||
* A place for a tent
|
||||
* Power to the tent (Swiss plug)
|
||||
* WiFi to the tent
|
||||
* Traditional closing event spaghetti
|
||||
|
||||
## What you can find nearby
|
||||
|
||||
* A nearby supermarket (2km) reachable by foot, scooter, bike
|
||||
* A waterfall + barbecue place (~400m)
|
||||
* Daily attractions such as hacking, hiking, biking, hanging out
|
||||
|
||||
## Registration
|
||||
|
||||
As the space is limited, we can accomodate about 10 tents (roughly 23
|
||||
people). To register, send an email to support@ungleich.ch based on
|
||||
the following template:
|
||||
|
||||
```
|
||||
Subject: GLAMP#1 2021
|
||||
|
||||
For each person with you (including yourself):
|
||||
|
||||
Non Coronavirus proof:
|
||||
(see requirements on the glamp page)
|
||||
|
||||
Name(s):
|
||||
(how you want to be called)
|
||||
|
||||
Interests:
|
||||
(will be shown to others at the glamp)
|
||||
|
||||
Skills:
|
||||
(will be shown to others at the glamp)
|
||||
|
||||
Food interests:
|
||||
(we use this for pooling food orders)
|
||||
|
||||
What I would like to do:
|
||||
(will be shown to others at the glamp)
|
||||
|
||||
```
|
||||
|
||||
The particaption fee is 70 CHF/person (to be paid on arrival).
|
||||
|
||||
## Time, Date and Location
|
||||
|
||||
* Arrival possible from Wednesday 2021-08-18 16:00
|
||||
* GLAMP#1 starts officially on Thursday 2021-08-19, 1000
|
||||
* GLAMP#1 closing lunch Sunday 2021-08-22, 1200
|
||||
* GLAMP#1 ends officially on to Sunday 2021-08-22, 1400
|
||||
|
||||
Location: [Hacking Villa](/u/projects/hacking-villa/)
|
||||
|
||||
|
||||
## FAQ
|
||||
|
||||
### Where do I get Internet?
|
||||
|
||||
It is available everywhere at/around the Hacking Villa via WiFi. For
|
||||
cable based Internet bring a 15m+ Ethernet cable.
|
||||
|
||||
### Where do I get Electricity?
|
||||
|
||||
You'll get electricity directly to the tent. Additionally the shared
|
||||
area also has electricity. You can also bring solar panels, if you
|
||||
like.
|
||||
|
||||
### Where do I get food?
|
||||
|
||||
Breakfast is provided by us. But what about the rest of the day?
|
||||
There are a lot of delivery services available, ranging from Pizza,
|
||||
Tibetan, Thai, Swiss (yes!), etc. available.
|
||||
|
||||
Nearby are 2 Volg supermarkets, next Coop is in Schwanden, bigger
|
||||
Migros in Glarus and very big Coop can be found in Netstal. The Volg
|
||||
is reachable by foot, all others are reachable by train or bike.
|
||||
|
||||
There is also a kitchen inside the Hacking Villa for cooking.
|
||||
There is also a great barbecue place just next to the waterfall.
|
||||
|
||||
### What can I do at the GLAMP?
|
||||
|
||||
There are
|
||||
[alot](http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better-than-you-at-everything.html)
|
||||
of opportunities at the GLAMP:
|
||||
|
||||
You can ...
|
||||
|
||||
* just relax and hangout
|
||||
* hack on project that you post poned for long
|
||||
* hike up mountains (up to 3612m! Lower is also possible)
|
||||
* meet other hackers
|
||||
* explore the biggest water power plant in Europe (Linth Limmern)
|
||||
* and much much more!
|
BIN
content/u/blog/glamp-1-2021/diesback-bg-small.jpg
Normal file
BIN
content/u/blog/glamp-1-2021/diesback-bg-small.jpg
Normal file
Binary file not shown.
After Width: | Height: | Size: 380 KiB |
Binary file not shown.
After Width: | Height: | Size: 167 KiB |
|
@ -0,0 +1,123 @@
|
|||
title: Configuring bind to only forward DNS to a specific zone
|
||||
---
|
||||
pub_date: 2021-07-25
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
Want to use BIND for proxying to another server? This is how you do it.
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
In this article we'll show you an easy solution to host DNS zones on
|
||||
IPv6 only or private DNS servers. The method we use here is **DNS
|
||||
forwarding** as offered in ISC BIND, but one could also see this as
|
||||
**DNS proxying**.
|
||||
|
||||
## Background
|
||||
|
||||
Sometimes you might have a DNS server that is authoritative for DNS
|
||||
data, but is not reachable for all clients. This might be the case for
|
||||
instance, if
|
||||
|
||||
* your DNS server is IPv6 only: it won't be directly reachable from
|
||||
the IPv4 Internet
|
||||
* your DNS server is running in a private network, either IPv4 or IPv6
|
||||
|
||||
In both cases, you need something that is publicly reachable, to
|
||||
enable clients to access the zone, like show in the following picture:
|
||||
|
||||
![](dns-proxy-forward.png)
|
||||
|
||||
## The problem: Forwarding requires recursive queries
|
||||
|
||||
ISC Bind allows to forward queries to another name server. However to
|
||||
do so, it need to be configured to allow handling recursive querying.
|
||||
However, if we allow recursive querying by any client, we basically
|
||||
create an [Open DNS resolver, which can be quite
|
||||
dangerous](https://www.ncsc.gov.ie/emailsfrom/DDoS/DNS/).
|
||||
|
||||
## The solution
|
||||
|
||||
ISC Bind by default has a root hints file compiled in, which allows it
|
||||
to function as a resolver without any additional configuration
|
||||
files. That is great, but not if you want to prevent it to work as
|
||||
forwarder as described above. But we can easily fix that problem. Now,
|
||||
let's have a look at a real world use case, step-by-step:
|
||||
|
||||
### Step 1: Global options
|
||||
|
||||
In the first step, we need to set the global to allow recursion from
|
||||
anyone, as follows:
|
||||
|
||||
```
|
||||
options {
|
||||
directory "/var/cache/bind";
|
||||
|
||||
listen-on-v6 { any; };
|
||||
|
||||
allow-recursion { ::/0; 0.0.0.0/0; };
|
||||
};
|
||||
```
|
||||
|
||||
However as mentioned above, this would create an open resolver. To
|
||||
prevent this, let's disable the root hints:
|
||||
|
||||
### Step 2: Disable root hints
|
||||
|
||||
The root hints are served in the root zone, also know as ".". To
|
||||
disable it, we give bind an empty file to use:
|
||||
|
||||
```
|
||||
zone "." {
|
||||
type hint;
|
||||
file "/dev/null";
|
||||
};
|
||||
```
|
||||
|
||||
Note: in case you do want to allow recursive function for some
|
||||
clients, **you can create multiple DNS views**.
|
||||
|
||||
### Step 3: The actual DNS file
|
||||
|
||||
In our case, we have a lot of IPv6 only kubernetes clusters, which are
|
||||
named `xx.k8s.ooo` and have a world wide rachable CoreDNS server built
|
||||
in. In this case, we want to allow the domain c1.k8s.ooo to be world
|
||||
reachable, so we configure the dual stack server as follows:
|
||||
|
||||
```
|
||||
zone "c1.k8s.ooo" {
|
||||
type forward;
|
||||
forward only;
|
||||
forwarders { 2a0a:e5c0:2:f::a; };
|
||||
};
|
||||
```
|
||||
|
||||
### Step 4: adjusting the zone file
|
||||
|
||||
In case you are running an IPv6 only server, you need to configure the
|
||||
upstream DNS server. In our case this looks as follows:
|
||||
|
||||
```
|
||||
; The domain: c1.k8s.ooo
|
||||
c1 NS kube-dns.kube-system.svc.c1
|
||||
|
||||
; The IPv6 only DNS server
|
||||
kube-dns.kube-system.svc.c1 AAAA 2a0a:e5c0:2:f::a
|
||||
|
||||
; The forwarding IPv4 server
|
||||
kube-dns.kube-system.svc.c1 A 194.5.220.43
|
||||
```
|
||||
|
||||
## DNS, IPv6, Kubernetes?
|
||||
|
||||
If you are curious to learn more about either of these topics, feel
|
||||
[free to join us on our chat](/u/projects/open-chat/).
|
Binary file not shown.
After Width: | Height: | Size: 154 KiB |
210
content/u/blog/ipv6-link-local-support-in-browsers/contents.lr
Normal file
210
content/u/blog/ipv6-link-local-support-in-browsers/contents.lr
Normal file
|
@ -0,0 +1,210 @@
|
|||
title: Support for IPv6 link local addresses in browsers
|
||||
---
|
||||
pub_date: 2021-06-14
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
Tracking the progress of browser support for link local addresses
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
Link Local addresses
|
||||
([fe80::/10](https://en.wikipedia.org/wiki/Link-local_address)) are
|
||||
used for addressing devices in your local subnet. They can be
|
||||
automatically generated and using the IPv6 multicast address
|
||||
**ff02::1**, all hosts on the local subnet can easily be located.
|
||||
|
||||
However browsers like Chrome or Firefox do not support **entering link
|
||||
local addresses inside a URL**, which prevents accessing devices
|
||||
locally with a browser, for instance for configuring them.
|
||||
|
||||
Link local addresses need **zone identifiers** to specify which
|
||||
network device to use as an outgoing interface. This is because
|
||||
**you have link local addresses on every interface** and your network
|
||||
stack does not know on its own, which interface to use. So typically a
|
||||
link local address is something on the line of
|
||||
**fe80::fae4:e3ff:fee2:37a4%eth0**, where **eth0** is the zone
|
||||
identifier.
|
||||
|
||||
Them problem is becoming more emphasised, as the world is moving more
|
||||
and more towards **IPv6 only networks**.
|
||||
|
||||
You might not even know the address of your network equipment anymore,
|
||||
but you can easily locate iit using the **ff02::1 multicast
|
||||
address**. So we need support in browsers, to allow network
|
||||
configurations.
|
||||
|
||||
## Status of implementation
|
||||
|
||||
The main purpose of this document is to track the status of the
|
||||
link-local address support in the different browsers and related
|
||||
standards. The current status is:
|
||||
|
||||
* Firefox says whatwg did not define it
|
||||
* Whatwg says zone id is intentionally omitted and and reference w3.org
|
||||
* w3.org has a longer reasoning, but it basically boils down to
|
||||
"Firefox and chrome don't do it and it's complicated and nobody needs it"
|
||||
* Chromium says it seems not to be worth the effort
|
||||
|
||||
Given that chain of events, if either Firefox, Chrome, W3.org or
|
||||
Whatwg where to add support for it, it seems likely that the others
|
||||
would be following.
|
||||
|
||||
## IPv6 link local address support in Firefox
|
||||
|
||||
The progress of IPv6 link local addresses for Firefox is tracked
|
||||
on [the mozilla
|
||||
bugzilla](https://bugzilla.mozilla.org/show_bug.cgi?id=700999). The
|
||||
current situation is that Firefox references to the lack of
|
||||
standardisation by whatwg as a reason for not implementing it. Quoting
|
||||
Valentin Gosu from the Mozilla team:
|
||||
|
||||
```
|
||||
The main reason the zone identifier is not supported in Firefox is
|
||||
that parsing URLs is hard. You'd think we can just pass whatever
|
||||
string to the system API and it will work or fail depending on whether
|
||||
it's valid or not, but that's not the case. In bug 1199430 for example
|
||||
it was apparent that we need to make sure that the hostname string is
|
||||
really valid before passing it to the OS.
|
||||
|
||||
I have no reason to oppose zone identifiers in URLs as long as the URL
|
||||
spec defines how to parse them. As such, I encourage you to engage
|
||||
with the standard at https://github.com/whatwg/url/issues/392 instead
|
||||
of here.
|
||||
|
||||
Thank you!
|
||||
```
|
||||
|
||||
## IPv6 link local address support in whatwg
|
||||
|
||||
The situation at [whatwg](https://whatwg.org/) is that there is a
|
||||
[closed bug report on github](https://github.com/whatwg/url/issues/392)
|
||||
and [in the spec it says](https://url.spec.whatwg.org/#concept-ipv6)
|
||||
that
|
||||
|
||||
Support for <zone_id> is intentionally omitted.
|
||||
|
||||
That paragraph links to a bug registered at w3.org (see next chapter).
|
||||
|
||||
|
||||
## IPv6 link local address support at w3.org
|
||||
|
||||
At [w3.org](https://www.w3.org/) there is a
|
||||
bug titled
|
||||
[Support IPv6 link-local
|
||||
addresses?](https://www.w3.org/Bugs/Public/show_bug.cgi?id=27234#c2)
|
||||
that is set to status **RESOLVED WONTFIX**. It is closed basically
|
||||
based on the following statement from Ryan Sleevi:
|
||||
|
||||
```
|
||||
Yes, we're especially not keen to support these in Chrome and have
|
||||
repeatedly decided not to. The platform-specific nature of <zone_id>
|
||||
makes it difficult to impossible to validate the well-formedness of
|
||||
the URL (see https://tools.ietf.org/html/rfc4007#section-11.2 , as
|
||||
referenced in 6874, to fully appreciate this special hell). Even if we
|
||||
could reliably parse these (from a URL spec standpoint), it then has
|
||||
to be handed 'somewhere', and that opens a new can of worms.
|
||||
|
||||
Even 6874 notes how unlikely it is to encounter these in practice -
|
||||
"Thus, URIs including a
|
||||
ZoneID are unlikely to be encountered in HTML documents. However, if
|
||||
they do (for example, in a diagnostic script coded in HTML), it would
|
||||
be appropriate to treat them exactly as above."
|
||||
|
||||
Note that a 'dumb' parser may not be sufficient, as the Security Considerations of 6874 note:
|
||||
"To limit this risk, implementations MUST NOT allow use of this format
|
||||
except for well-defined usages, such as sending to link-local
|
||||
addresses under prefix fe80::/10. At the time of writing, this is
|
||||
the only well-defined usage known."
|
||||
|
||||
And also
|
||||
"An HTTP client, proxy, or other intermediary MUST remove any ZoneID
|
||||
attached to an outgoing URI, as it has only local significance at the
|
||||
sending host."
|
||||
|
||||
This requires a transformative rewrite of any URLs going out the
|
||||
wire. That's pretty substantial. Anne, do you recall the bug talking
|
||||
about IP canonicalization (e.g. http://127.0.0.1 vs
|
||||
http://[::127.0.0.1] vs http://012345 and friends?) This is
|
||||
conceptually a similar issue - except it's explicitly required in the
|
||||
context of <zone_id> that the <zone_id> not be emitted.
|
||||
|
||||
There's also the issue that zone_id precludes/requires the use of APIs
|
||||
that user agents would otherwise prefer to avoid, in order to
|
||||
'properly' handle the zone_id interpretation. For example, Chromium on
|
||||
some platforms uses a built in DNS resolver, and so our address lookup
|
||||
functions would need to define and support <zone_id>'s and map them to
|
||||
system concepts. In doing so, you could end up with weird situations
|
||||
where a URL works in Firefox but not Chrome, even though both
|
||||
'hypothetically' supported <zone_id>'s, because FF may use an OS
|
||||
routine and Chrome may use a built-in routine and they diverge.
|
||||
|
||||
Overall, our internal consensus is that <zone_id>'s are bonkers on
|
||||
many grounds - the technical ambiguity (and RFC 6874 doesn't really
|
||||
resolve the ambiguity as much as it fully owns it and just says
|
||||
#YOLOSWAG) - and supporting them would add a lot of complexity for
|
||||
what is explicitly and admittedly a limited value use case.
|
||||
```
|
||||
|
||||
This bug references the Mozilla Firefox bug above and
|
||||
[RFC3986 (replaced by RFC
|
||||
6874)](https://datatracker.ietf.org/doc/html/rfc6874#section-2).
|
||||
|
||||
## IPv6 link local address support in Chrome / Chromium
|
||||
|
||||
On the chrome side there is a
|
||||
[huge bug
|
||||
report](https://bugs.chromium.org/p/chromium/issues/detail?id=70762)
|
||||
which again references a huge number of other bugs that try to request
|
||||
IPv6 link local support, too.
|
||||
|
||||
The bug was closed by cbentzel@chromium.org stating:
|
||||
|
||||
```
|
||||
There are a large number of special cases which are required on core
|
||||
networking/navigation/etc. and it does not seem like it is worth the
|
||||
up-front and ongoing maintenance costs given that this is a very
|
||||
niche - albeit legitimate - need.
|
||||
```
|
||||
|
||||
The bug at chromium has been made un-editable so it is basically
|
||||
frozen, besides people have added suggestions to the ticket on how to
|
||||
solve it.
|
||||
|
||||
## Work Arounds
|
||||
|
||||
### IPv6 link local connect hack
|
||||
|
||||
Peter has [documented on the IPv6 link local connect
|
||||
hack](https://website.peterjin.org/wiki/Snippets:IPv6_link_local_connect_hack)
|
||||
to make firefox use **fe90:0:[scope id]:[IP address]** to reach
|
||||
**fe80::[IP address]%[scope id]**. Checkout his website for details!
|
||||
|
||||
### IPv6 hack using ip6tables
|
||||
|
||||
Also from Peter is the hint that you can also use newer iptable
|
||||
versions to achieve a similar mapping:
|
||||
|
||||
"On modern Linux kernels you can also run
|
||||
|
||||
```ip6tables -t nat -A OUTPUT -d fef0::/64 -j NETMAP --to fe80::/64```
|
||||
|
||||
if you have exactly one outbound interface, so that fef0::1 translates
|
||||
to fe80::1"
|
||||
|
||||
Thanks again for the pointer!
|
||||
|
||||
## Other resources
|
||||
|
||||
If you are aware of other resources regarding IPv6 link local support
|
||||
in browsers, please join the [IPv6.chat](https://IPv6.chat) and let us
|
||||
know about it.
|
144
content/u/blog/kubernetes-dns-entries-nat64/contents.lr
Normal file
144
content/u/blog/kubernetes-dns-entries-nat64/contents.lr
Normal file
|
@ -0,0 +1,144 @@
|
|||
title: Automatic A and AAAA DNS entries with NAT64 for kubernetes?
|
||||
---
|
||||
pub_date: 2021-06-24
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
Given a kubernetes cluster and NAT64 - how do you create DNS entries?
|
||||
---
|
||||
body:
|
||||
|
||||
## The DNS kubernetes quiz
|
||||
|
||||
Today our blog entry does not (yet) show a solution, but more a tricky
|
||||
quiz on creating DNS entries. The problem to solve is the following:
|
||||
|
||||
* How to make every IPv6 only service in kubernetes also IPv4
|
||||
reachable?
|
||||
|
||||
Let's see who can solve it first or the prettiest. Below are some
|
||||
thoughts on how to approach this problem.
|
||||
|
||||
## The situation
|
||||
|
||||
Assume your kubernetes cluster is IPv6 only and all services
|
||||
have proper AAAA DNS entries. This allows you
|
||||
[to directly receive traffic from the
|
||||
Internet](/u/blog/kubernetes-without-ingress/) to
|
||||
your kubernetes services.
|
||||
|
||||
Now to make that service also IPv4 reachable, we can deploy NAT64
|
||||
service that maps an IPv4 address outside the cluster to an IPv6 service
|
||||
address inside the cluster:
|
||||
|
||||
```
|
||||
A.B.C.D --> 2001:db8::1
|
||||
```
|
||||
|
||||
So all traffic to that IPv4 address is converted to IPv6 by the
|
||||
external NAT64 translator.
|
||||
|
||||
## The proxy service
|
||||
|
||||
Let's say the service running on 2001:db8::1 is named "ipv4-proxy" and
|
||||
thus reachable at ipv4-proxy.default.svc.example.com.
|
||||
|
||||
What we want to achieve is to expose every possible service
|
||||
inside the cluster **also via IPv4**. For this purpose we have created
|
||||
an haproxy container that access *.svc.example.com and forwards it via
|
||||
IPv6.
|
||||
|
||||
So the actual flow would look like:
|
||||
|
||||
```
|
||||
IPv4 client --[ipv4]--> NAT64 -[ipv6]-> proxy service
|
||||
|
|
||||
|
|
||||
v
|
||||
IPv6 client ---------------------> kubernetes service
|
||||
```
|
||||
|
||||
## The DNS dilemma
|
||||
|
||||
It would be very tempting to create a wildcard DNS entry or to
|
||||
configure/patch CoreDNS to also include an A entry for every service
|
||||
that is:
|
||||
|
||||
```
|
||||
*.svc IN A A.B.C.D
|
||||
```
|
||||
|
||||
So essentially all services resolve to the IPv4 address A.B.C.D. That
|
||||
however would also influence the kubernetes cluster, as pods
|
||||
potentially resolve A entries (not only AAAA) as well.
|
||||
|
||||
As the containers / pods do not have any IPv4 address (nor IPv4
|
||||
routing), access to IPv4 is not possible. There are various outcomes
|
||||
of this situation:
|
||||
|
||||
1. The software in the container does happy eyeballs and tries both
|
||||
A/AAAA and uses the working IPv6 connection.
|
||||
|
||||
2. The software in the container misbehaves and takes the first record
|
||||
and uses IPv4 (nodejs is known to have or had a broken resolver
|
||||
that did exactly that).
|
||||
|
||||
So adding that wildcard might not be the smartest option. And
|
||||
additionally it is unclear whether coreDNS would support that.
|
||||
|
||||
## Alternative automatic DNS entries
|
||||
|
||||
The *.svc names in a kubernetes cluster are special in the sense that
|
||||
they are used for connecting internally. What if coreDNS (or any other
|
||||
DNS) server would instead of using *.svc, use a second subdomain like
|
||||
*abc*.*namespace*.v4andv6.example.com and generate the same AAAA
|
||||
record as for the service and a static A record like describe above?
|
||||
|
||||
That could solve the problem. But again, does coreDNS support that?
|
||||
|
||||
## Automated DNS entries in other zones
|
||||
|
||||
Instead of fully automated creating the entries as above, another
|
||||
option would be to specify DNS entries via annotations in a totally
|
||||
different zone, if coreDNS was supporting this. So let's say we also
|
||||
have control over example.org and we could instruct coreDNS to create
|
||||
the following entries automatically with an annotation:
|
||||
|
||||
```
|
||||
abc.something.example.org AAAA <same as the service IP>
|
||||
abc.something.example.org A <a static IPv4 address A.B.C.D>
|
||||
```
|
||||
|
||||
In theory this might be solved via some scripting, maybe via a DNS
|
||||
server like powerDNS?
|
||||
|
||||
## Alternative solution with BIND
|
||||
|
||||
The bind DNS server, which is not usually deployed in a kubernetes
|
||||
cluster, supports **views**. Views enable different replies to the
|
||||
same query depending on the source IP address. Thus in theory
|
||||
something like that could be done, assuming a secondary zone
|
||||
*example.org*:
|
||||
|
||||
* If the request comes from the kubernetes cluster, return a CNAME
|
||||
back to example.com.
|
||||
* If the request comes from outside the kubernetes cluster, return an
|
||||
A entry with the static IP
|
||||
* Unsolved: how to match on the AAAA entries (because we don't CNAME
|
||||
with the added A entry)
|
||||
|
||||
|
||||
## Other solution?
|
||||
|
||||
As you can see, mixing the dynamic IP generation and coupling it with
|
||||
static DNS entries for IPv4 resolution is not the easiest tasks. If
|
||||
you have a smart idea on how to solve this without manually creating
|
||||
entries for each and every service,
|
||||
[give us a shout!](/u/contact)
|
|
@ -0,0 +1,227 @@
|
|||
title: Making kubernetes kube-dns publicly reachable
|
||||
---
|
||||
pub_date: 2021-06-13
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
Looking into IPv6 only DNS provided by kubernetes
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
If you have seen our
|
||||
[article about running kubernetes
|
||||
Ingress-less](/u/blog/kubernetes-without-ingress/), you are aware that
|
||||
we are pushing IPv6 only kubernetes clusters at ungleich.
|
||||
|
||||
Today, we are looking at making the "internal" kube-dns service world
|
||||
reachable using IPv6 and global DNS servers.
|
||||
|
||||
## The kubernetes DNS service
|
||||
|
||||
If you have a look at your typical k8s cluster, you will notice that
|
||||
you usually have two coredns pods running:
|
||||
|
||||
```
|
||||
% kubectl -n kube-system get pods -l k8s-app=kube-dns
|
||||
NAME READY STATUS RESTARTS AGE
|
||||
coredns-558bd4d5db-gz5c7 1/1 Running 0 6d
|
||||
coredns-558bd4d5db-hrzhz 1/1 Running 0 6d
|
||||
```
|
||||
|
||||
These pods are usually served by the **kube-dns** service:
|
||||
|
||||
```
|
||||
% kubectl -n kube-system get svc -l k8s-app=kube-dns
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
kube-dns ClusterIP 2a0a:e5c0:13:e2::a <none> 53/UDP,53/TCP,9153/TCP 6d1h
|
||||
```
|
||||
|
||||
As you can see, the kube-dns service is running on a publicly
|
||||
reachable IPv6 address.
|
||||
|
||||
## IPv6 only DNS
|
||||
|
||||
IPv6 only DNS servers have one drawback: they cannot be reached via DNS
|
||||
recursions, if the resolver is IPv4 only.
|
||||
|
||||
At [ungleich we run a variety of
|
||||
services](https://redmine.ungleich.ch/projects/open-infrastructure/wiki)
|
||||
to make IPv6 only services usable in the real world. In case of DNS,
|
||||
we are using **DNS forwarders**. They are acting similar to HTTP
|
||||
proxies, but for DNS.
|
||||
|
||||
So in our main DNS servers, dns1.ungleich.ch, dns2.ungleich.ch
|
||||
and dns3.ungleich.ch we have added the following configuration:
|
||||
|
||||
```
|
||||
zone "k8s.place7.ungleich.ch" {
|
||||
type forward;
|
||||
forward only;
|
||||
forwarders { 2a0a:e5c0:13:e2::a; };
|
||||
};
|
||||
```
|
||||
|
||||
This tells the DNS servers to forward DNS queries that come in for
|
||||
k8s.place7.ungleich.ch to **2a0a:e5c0:13:e2::a**.
|
||||
|
||||
Additionally we have added **DNS delegation** in the
|
||||
place7.ungleich.ch zone:
|
||||
|
||||
```
|
||||
k8s NS dns1.ungleich.ch.
|
||||
k8s NS dns2.ungleich.ch.
|
||||
k8s NS dns3.ungleich.ch.
|
||||
```
|
||||
|
||||
## Using the kubernetes DNS service in the wild
|
||||
|
||||
With this configuration, we can now access IPv6 only
|
||||
kubernetes services directly from the Internet. Let's first discover
|
||||
the kube-dns service itself:
|
||||
|
||||
```
|
||||
% dig kube-dns.kube-system.svc.k8s.place7.ungleich.ch. aaaa
|
||||
|
||||
; <<>> DiG 9.16.16 <<>> kube-dns.kube-system.svc.k8s.place7.ungleich.ch. aaaa
|
||||
;; global options: +cmd
|
||||
;; Got answer:
|
||||
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23274
|
||||
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 1, ADDITIONAL: 1
|
||||
|
||||
;; OPT PSEUDOSECTION:
|
||||
; EDNS: version: 0, flags:; udp: 4096
|
||||
; COOKIE: f61925944f5218c9ac21e43960c64f254792e60f2b10f3f5 (good)
|
||||
;; QUESTION SECTION:
|
||||
;kube-dns.kube-system.svc.k8s.place7.ungleich.ch. IN AAAA
|
||||
|
||||
;; ANSWER SECTION:
|
||||
kube-dns.kube-system.svc.k8s.place7.ungleich.ch. 27 IN AAAA 2a0a:e5c0:13:e2::a
|
||||
|
||||
;; AUTHORITY SECTION:
|
||||
k8s.place7.ungleich.ch. 13 IN NS kube-dns.kube-system.svc.k8s.place7.ungleich.ch.
|
||||
```
|
||||
|
||||
As you can see, the **kube-dns** service in the **kube-system**
|
||||
namespace resolves to 2a0a:e5c0:13:e2::a, which is exactly what we
|
||||
have configured.
|
||||
|
||||
At the moment, there is also an etherpad test service
|
||||
named "ungleich-etherpad" running:
|
||||
|
||||
```
|
||||
% kubectl get svc -l app=ungleichetherpad
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
ungleich-etherpad ClusterIP 2a0a:e5c0:13:e2::b7db <none> 9001/TCP 3d19h
|
||||
```
|
||||
|
||||
Let's first verify that it resolves:
|
||||
|
||||
```
|
||||
% dig +short ungleich-etherpad.default.svc.k8s.place7.ungleich.ch aaaa
|
||||
2a0a:e5c0:13:e2::b7db
|
||||
```
|
||||
|
||||
And if that works, well, then we should also be able to access the
|
||||
service itself!
|
||||
|
||||
```
|
||||
% curl -I http://ungleich-etherpad.default.svc.k8s.place7.ungleich.ch:9001/
|
||||
HTTP/1.1 200 OK
|
||||
X-Powered-By: Express
|
||||
X-UA-Compatible: IE=Edge,chrome=1
|
||||
Referrer-Policy: same-origin
|
||||
Content-Type: text/html; charset=utf-8
|
||||
Content-Length: 6039
|
||||
ETag: W/"1797-Dq3+mr7XP0PQshikMNRpm5RSkGA"
|
||||
Set-Cookie: express_sid=s%3AZGKdDe3FN1v5UPcS-7rsZW7CeloPrQ7p.VaL1V0M4780TBm8bT9hPVQMWPX5Lcte%2BzotO9Lsejlk; Path=/; HttpOnly; SameSite=Lax
|
||||
Date: Sun, 13 Jun 2021 18:36:23 GMT
|
||||
Connection: keep-alive
|
||||
Keep-Alive: timeout=5
|
||||
```
|
||||
|
||||
(attention, this is a test service and might not be running when you
|
||||
read this article at a later time)
|
||||
|
||||
## IPv6 vs. IPv4
|
||||
|
||||
Could we have achived the same with IPv4? The answere here is "maybe":
|
||||
If the kubernetes service is reachable from globally reachable
|
||||
nameservers via IPv4, then the answer is yes. This could be done via
|
||||
public IPv4 addresses in the kubernetes cluster, via tunnels, VPNs,
|
||||
etc.
|
||||
|
||||
However, generally speaking, the DNS service of a
|
||||
kubernetes cluster running on RFC1918 IP addresses, is probably not
|
||||
reachable from globally reachable DNS servers by default.
|
||||
|
||||
For IPv6 the case is a bit different: we are using globally reachable
|
||||
IPv6 addresses in our k8s clusters, so they can potentially be
|
||||
reachable without the need of any tunnel or whatsoever. Firewalling
|
||||
and network policies can obviously prevent access, but if the IP
|
||||
addresses are properly routed, they will be accessible from the public
|
||||
Internet.
|
||||
|
||||
And this makes things much easier for DNS servers, which are also
|
||||
having IPv6 connectivity.
|
||||
|
||||
The following pictures shows the practical difference between the two
|
||||
approaches:
|
||||
|
||||
![](/u/image/k8s-v6-v4-dns.png)
|
||||
|
||||
## Does this make sense?
|
||||
|
||||
That clearly depends on your use-case. If you want your service DNS
|
||||
records to be publicly accessible, then the clear answer is yes.
|
||||
|
||||
If your cluster services are intended to be internal only
|
||||
(see [previous blog post](/u/blog/kubernetes-without-ingress/), then
|
||||
exposing the DNS service to the world might not be the best option.
|
||||
|
||||
## Note on security
|
||||
|
||||
CoreDNS inside kubernetes is by default configured to allow resolving
|
||||
for *any* client that can reach it. Thus if you make your kube-dns
|
||||
service world reachable, you also turn it into an open resolver.
|
||||
|
||||
At the time of writing this blog article, the following coredns
|
||||
configuration **does NOT** correctly block requests:
|
||||
|
||||
```
|
||||
Corefile: |
|
||||
.:53 {
|
||||
acl k8s.place7.ungleich.ch {
|
||||
allow net ::/0
|
||||
}
|
||||
acl . {
|
||||
allow net 2a0a:e5c0:13::/48
|
||||
block
|
||||
}
|
||||
forward . /etc/resolv.conf {
|
||||
max_concurrent 1000
|
||||
}
|
||||
...
|
||||
```
|
||||
|
||||
Until this is solved, we recommend to place a firewall before your
|
||||
public kube-dns service to only allow requests from the forwarding DNS
|
||||
servers.
|
||||
|
||||
|
||||
## More of this
|
||||
|
||||
We are discussing
|
||||
kubernetes and IPv6 related topics in
|
||||
**the #hacking:ungleich.ch Matrix channel**
|
||||
([you can signup here if you don't have an
|
||||
account](https://chat.with.ungleich.ch)) and will post more about our
|
||||
k8s journey in this blog. Stay tuned!
|
122
content/u/blog/kubernetes-network-planning-with-ipv6/contents.lr
Normal file
122
content/u/blog/kubernetes-network-planning-with-ipv6/contents.lr
Normal file
|
@ -0,0 +1,122 @@
|
|||
title: Kubernetes Network planning with IPv6
|
||||
---
|
||||
pub_date: 2021-06-26
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
Learn which networks are good to use with kubernetes
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
While IPv6 has a huge address space, you will need to specify a
|
||||
**podCidr** (the network for the pods) and a **serviceCidr** (the
|
||||
network for the services) for kubernetes. In this blog article we show
|
||||
our findings and give a recommendation on what are the "most sensible"
|
||||
networks to use for kubernetes.
|
||||
|
||||
## TL;DR
|
||||
|
||||
|
||||
## Kubernetes limitations
|
||||
|
||||
In a typical IPv6 network, you would "just assign a /64" to anything
|
||||
that needs to be a network. It is a bit the IPv6-no-brainer way of
|
||||
handling networking.
|
||||
|
||||
However, kubernetes has a limitation:
|
||||
[the serviceCidr cannot be bigger than a /108 at the
|
||||
moment](https://github.com/kubernetes/kubernetes/pull/90115).
|
||||
This is something very atypical for the IPv6 world, but nothing we
|
||||
cannot handle. There are various pull requests and issues to fix this
|
||||
behaviour on github, some of them listed below:
|
||||
|
||||
* https://github.com/kubernetes/enhancements/pull/1534
|
||||
* https://github.com/kubernetes/kubernetes/pull/79993
|
||||
* https://github.com/kubernetes/kubernetes/pull/90115 (this one is
|
||||
quite interesting to read)
|
||||
|
||||
That said, it is possible to use a /64 for the **podCidr**.
|
||||
|
||||
## The "correct way" without the /108 limitation
|
||||
|
||||
If kubernetes did not have this limitation, our recommendation would
|
||||
be to use one /64 for the podCidr and one /64 for the serviceCidr. If
|
||||
in the future the limitations of kubernetes have been lifted, skip
|
||||
reading this article and just use two /64's.
|
||||
|
||||
Do not be tempted to suggest making /108's the default, even if they
|
||||
"have enough space", because using /64's allows you to stay in much
|
||||
easier network plans.
|
||||
|
||||
## Sanity checking the /108
|
||||
|
||||
To be able to plan kubernetes clusters, it is important to know where
|
||||
they should live, especially if you plan having a lot of kubernetes
|
||||
clusters. Let's have a short look at the /108 network limitation:
|
||||
|
||||
A /108 allows 20 bit to be used for generating addresses, or a total
|
||||
of 1048576 hosts. This is probably enough for the number of services
|
||||
in a cluster. Now, can we be consistent and also use a /108 for the
|
||||
podCidr? Let's assume for the moment that we do exactly that, so we
|
||||
run a maximum of 1048576 pods at the same time. Assuming each service
|
||||
consumes on average 4 pods, this would allow one to run 262144
|
||||
services.
|
||||
|
||||
Assuming each pod uses around 0.1 CPUs and 100Mi RAM, if all pods were
|
||||
to run at the same time, you would need ca. 100'000 CPUs and 100 TB
|
||||
RAM. Assuming further that each node contains at maximum 128 CPUs and
|
||||
at maximum 1 TB RAM (quite powerful servers), we would need more than
|
||||
750 servers just for the CPUs.
|
||||
|
||||
So we can reason that **we can** run kubernetes clusters of quite some
|
||||
size even with a **podCidr of /108**.
|
||||
|
||||
## Organising /108's
|
||||
|
||||
Let's assume that we organise all our kubernetes clusters in a single
|
||||
/64, like 2001:db8:1:2::/64, which looks like this:
|
||||
|
||||
```
|
||||
% sipcalc 2001:db8:1:2::/64
|
||||
-[ipv6 : 2001:db8:1:2::/64] - 0
|
||||
|
||||
[IPV6 INFO]
|
||||
Expanded Address - 2001:0db8:0001:0002:0000:0000:0000:0000
|
||||
Compressed address - 2001:db8:1:2::
|
||||
Subnet prefix (masked) - 2001:db8:1:2:0:0:0:0/64
|
||||
Address ID (masked) - 0:0:0:0:0:0:0:0/64
|
||||
Prefix address - ffff:ffff:ffff:ffff:0:0:0:0
|
||||
Prefix length - 64
|
||||
Address type - Aggregatable Global Unicast Addresses
|
||||
Network range - 2001:0db8:0001:0002:0000:0000:0000:0000 -
|
||||
2001:0db8:0001:0002:ffff:ffff:ffff:ffff
|
||||
```
|
||||
|
||||
A /108 network on the other hand looks like this:
|
||||
|
||||
```
|
||||
% sipcalc 2001:db8:1:2::/108
|
||||
-[ipv6 : 2001:db8:1:2::/108] - 0
|
||||
|
||||
[IPV6 INFO]
|
||||
Expanded Address - 2001:0db8:0001:0002:0000:0000:0000:0000
|
||||
Compressed address - 2001:db8:1:2::
|
||||
Subnet prefix (masked) - 2001:db8:1:2:0:0:0:0/108
|
||||
Address ID (masked) - 0:0:0:0:0:0:0:0/108
|
||||
Prefix address - ffff:ffff:ffff:ffff:ffff:ffff:fff0:0
|
||||
Prefix length - 108
|
||||
Address type - Aggregatable Global Unicast Addresses
|
||||
Network range - 2001:0db8:0001:0002:0000:0000:0000:0000 -
|
||||
2001:0db8:0001:0002:0000:0000:000f:ffff
|
||||
```
|
||||
|
||||
Assuming for a moment that we assign a /108, this looks as follows:
|
70
content/u/blog/kubernetes-production-cluster-1/contents.lr
Normal file
70
content/u/blog/kubernetes-production-cluster-1/contents.lr
Normal file
|
@ -0,0 +1,70 @@
|
|||
title: ungleich production cluster #1
|
||||
---
|
||||
pub_date: 2021-07-05
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
In this blog article we describe our way to our first production
|
||||
kubernetes cluster.
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
This article is WIP to describe all steps required for our first
|
||||
production kubernetes cluster and the services that we run in it.
|
||||
|
||||
## Setup
|
||||
|
||||
### Bootstrapping
|
||||
|
||||
* All nodes are running [Alpine Linux](https://alpinelinux.org)
|
||||
* All nodes are configured using [cdist](https://cdi.st)
|
||||
* Mainly installing kubeadm, kubectl, crio *and* docker
|
||||
* At the moment we try to use crio
|
||||
* The cluster is initalised using **kubeadm init --config
|
||||
k8s/c2/kubeadm.yaml** from the [ungleich-k8s repo](https://code.ungleich.ch/ungleich-public/ungleich-k8s)
|
||||
|
||||
### CNI/Networking
|
||||
|
||||
* Calico is installed using **kubectl apply -f
|
||||
cni-calico/calico.yaml** from the [ungleich-k8s
|
||||
repo](https://code.ungleich.ch/ungleich-public/ungleich-k8s)
|
||||
* Installing calicoctl using **kubectl apply -f
|
||||
https://docs.projectcalico.org/manifests/calicoctl.yaml**
|
||||
* Aliasing calicoctl: **alias calicoctl="kubectl exec -i -n kube-system calicoctl -- /calicoctl"**
|
||||
* All nodes BGP peer with our infrastructure using **calicoctl create -f - < cni-calico/bgp-c2.yaml**
|
||||
|
||||
### Persistent Volume Claim support
|
||||
|
||||
* Provided by rook
|
||||
* Using customized manifests to support IPv6 from ungleich-k8s
|
||||
|
||||
```
|
||||
for yaml in crds common operator cluster storageclass-cephfs storageclass-rbd toolbox; do
|
||||
kubectl apply -f ${yaml}.yaml
|
||||
done
|
||||
```
|
||||
|
||||
### Flux
|
||||
|
||||
Starting with the 2nd cluster?
|
||||
|
||||
|
||||
## Follow up
|
||||
|
||||
If you are interesting in continuing the discussion,
|
||||
we are there for you in
|
||||
**the #kubernetes:ungleich.ch Matrix channel**
|
||||
[you can signup here if you don't have an
|
||||
account](https://chat.with.ungleich.ch).
|
||||
|
||||
Or if you are interested in an IPv6 only kubernetes cluster,
|
||||
drop a mail to **support**-at-**ungleich.ch**.
|
201
content/u/blog/kubernetes-without-ingress/contents.lr
Normal file
201
content/u/blog/kubernetes-without-ingress/contents.lr
Normal file
|
@ -0,0 +1,201 @@
|
|||
title: Building Ingress-less Kubernetes Clusters
|
||||
---
|
||||
pub_date: 2021-06-09
|
||||
---
|
||||
author: ungleich
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: yes
|
||||
---
|
||||
abstract:
|
||||
|
||||
---
|
||||
body:
|
||||
|
||||
## Introduction
|
||||
|
||||
On [our journey to build and define IPv6 only kubernetes
|
||||
clusters](https://www.nico.schottelius.org/blog/k8s-ipv6-only-cluster/)
|
||||
we came accross some principles that seem awkward in the IPv6 only
|
||||
world. Let us today have a look at the *LoadBalancer* and *Ingress*
|
||||
concepts.
|
||||
|
||||
## Ingress
|
||||
|
||||
Let's have a look at the [Ingress
|
||||
definition](https://kubernetes.io/docs/concepts/services-networking/ingress/)
|
||||
definiton from the kubernetes website:
|
||||
|
||||
```
|
||||
Ingress exposes HTTP and HTTPS routes from outside the cluster to
|
||||
services within the cluster. Traffic routing is controlled by rules
|
||||
defined on the Ingress resource.
|
||||
```
|
||||
|
||||
So the ingress basically routes from outside to inside. But, in the
|
||||
IPv6 world, services are already publicly reachable. It just
|
||||
depends on your network policy.
|
||||
|
||||
### Update 2021-06-13: Ingress vs. Service
|
||||
|
||||
As some people pointed out (thanks a lot!), a public service is
|
||||
**not the same** as an Ingress. Ingress has also the possibility to
|
||||
route based on layer 7 information like the path, domain name, etc.
|
||||
|
||||
However, if all of the traffic from an Ingress points to a single
|
||||
IPv6 HTTP/HTTPS Service, effectively the IPv6 service will do the
|
||||
same, with one hop less.
|
||||
|
||||
## Services
|
||||
|
||||
Let's have a look at how services in IPv6 only clusters look like:
|
||||
|
||||
```
|
||||
% kubectl get svc
|
||||
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
||||
etherpad ClusterIP 2a0a:e5c0:13:e2::a94b <none> 9001/TCP 19h
|
||||
nginx-service ClusterIP 2a0a:e5c0:13:e2::3607 <none> 80/TCP 43h
|
||||
postgres ClusterIP 2a0a:e5c0:13:e2::c9e0 <none> 5432/TCP 19h
|
||||
...
|
||||
```
|
||||
All these services are world reachable, depending on your network
|
||||
policy.
|
||||
|
||||
## ServiceTypes
|
||||
|
||||
While we are at looking at the k8s primitives, let's have a closer
|
||||
look at the **Service**, specifically at 3 of the **ServiceTypes**
|
||||
supported by k8s, including it's definition:
|
||||
|
||||
### ClusterIP
|
||||
|
||||
The k8s website says
|
||||
|
||||
```
|
||||
Exposes the Service on a cluster-internal IP. Choosing this value
|
||||
makes the Service only reachable from within the cluster. This is the
|
||||
default ServiceType.
|
||||
```
|
||||
|
||||
So in the context of IPv6, this sounds wrong. There is nothing that
|
||||
makes an global IPv6 address be "internal", besides possible network
|
||||
policies. The concept is probably coming from the strict difference of
|
||||
RFC1918 space usually used in k8s clusters and not public IPv4.
|
||||
|
||||
This difference does not make a lot of sense in the IPv6 world though.
|
||||
Seeing **services as public by default**, makes much more sense.
|
||||
And simplifies your clusters a lot.
|
||||
|
||||
### NodePort
|
||||
|
||||
Let's first have a look at the definition again:
|
||||
|
||||
```
|
||||
Exposes the Service on each Node's IP at a static port (the
|
||||
NodePort). A ClusterIP Service, to which the NodePort Service routes,
|
||||
is automatically created. You'll be able to contact the NodePort
|
||||
Service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
|
||||
```
|
||||
|
||||
Conceptually this can be similarily utilised in the IPv6 only world
|
||||
like it does in the IPv4 world. However given that there are enough
|
||||
addresses available with IPv6, this might not be such an interesting
|
||||
ServiceType anymore.
|
||||
|
||||
|
||||
### LoadBalancer
|
||||
|
||||
Before we have a look at this type, let's take some steps back
|
||||
first to ...
|
||||
|
||||
|
||||
## ... Load Balancing
|
||||
|
||||
There are a variety of possibilities to do load balancing. From simple
|
||||
round robin, to ECMP based load balancing, to application aware,
|
||||
potentially weighted load balancing.
|
||||
|
||||
So for load balancing, there is usually more than one solution and
|
||||
there is likely not one size fits all.
|
||||
|
||||
So with this said, let.s have a look at the
|
||||
**ServiceType LoadBalancer** definition:
|
||||
|
||||
```
|
||||
Exposes the Service externally using a cloud provider's load
|
||||
balancer. NodePort and ClusterIP Services, to which the external load
|
||||
balancer routes, are automatically created.
|
||||
```
|
||||
|
||||
So whatever the cloud provider offers, can be used, and that is a good
|
||||
thing. However, let's have a look at how you get load balancing for
|
||||
free in IPv6 only clusters:
|
||||
|
||||
## Load Balancing in IPv6 only clusters
|
||||
|
||||
So what is the most easy way of reliable load balancing in network?
|
||||
[ECMP (equal cost multi path)](https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing)
|
||||
comes to the mind right away. Given that
|
||||
kubernetes nodes can BGP peer with the network (upstream or the
|
||||
switches), this basically gives load balancing to the world for free:
|
||||
|
||||
```
|
||||
[ The Internet ]
|
||||
|
|
||||
[ k8s-node-1 ]-----------[ network ]-----------[ k8s-node-n]
|
||||
[ ECMP ]
|
||||
|
|
||||
[ k8s-node-2]
|
||||
|
||||
```
|
||||
|
||||
In the real world on a bird based BGP upstream router
|
||||
this looks as follows:
|
||||
|
||||
```
|
||||
[18:13:02] red.place7:~# birdc show route
|
||||
BIRD 2.0.7 ready.
|
||||
Table master6:
|
||||
...
|
||||
2a0a:e5c0:13:e2::/108 unicast [place7-server1 2021-06-07] * (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 on eth0
|
||||
unicast [place7-server4 2021-06-08] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 on eth0
|
||||
unicast [place7-server2 2021-06-07] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc on eth0
|
||||
unicast [place7-server3 2021-06-07] (100) [AS65534i]
|
||||
via 2a0a:e5c0:13:0:224:81ff:fee0:db7a on eth0
|
||||
...
|
||||
```
|
||||
|
||||
Which results into the following kernel route:
|
||||
|
||||
```
|
||||
2a0a:e5c0:13:e2::/108 proto bird metric 32
|
||||
nexthop via 2a0a:e5c0:13:0:224:81ff:fee0:db7a dev eth0 weight 1
|
||||
nexthop via 2a0a:e5c0:13:0:225:b3ff:fe20:3554 dev eth0 weight 1
|
||||
nexthop via 2a0a:e5c0:13:0:225:b3ff:fe20:3564 dev eth0 weight 1
|
||||
nexthop via 2a0a:e5c0:13:0:225:b3ff:fe20:38cc dev eth0 weight 1 pref medium
|
||||
```
|
||||
|
||||
## TL;DR
|
||||
|
||||
We know, a TL;DR at the end is not the right thing to do, but hey, we
|
||||
are at ungleich, aren't we?
|
||||
|
||||
In a nutshell, with IPv6 the concept of **Ingress**,
|
||||
**Service** and the **LoadBalancer** ServiceType
|
||||
types need to be revised, as IPv6 allows direct access without having
|
||||
to jump through hoops.
|
||||
|
||||
If you are interesting in continuing the discussion,
|
||||
we are there for you in
|
||||
**the #hacking:ungleich.ch Matrix channel**
|
||||
[you can signup here if you don't have an
|
||||
account](https://chat.with.ungleich.ch).
|
||||
|
||||
Or if you are interested in an IPv6 only kubernetes cluster,
|
||||
drop a mail to **support**-at-**ungleich.ch**.
|
|
@ -0,0 +1,32 @@
|
|||
title: Building stateless redundant IPv6 routers
|
||||
---
|
||||
pub_date: 2021-04-21
|
||||
---
|
||||
author: ungleich virtualisation team
|
||||
---
|
||||
twitter_handle: ungleich
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
_discoverable: no
|
||||
---
|
||||
abstract:
|
||||
It's time for IPv6 in docker, too.
|
||||
---
|
||||
body:
|
||||
|
||||
```
|
||||
interface eth1.2
|
||||
{
|
||||
AdvSendAdvert on;
|
||||
MinRtrAdvInterval 3;
|
||||
MaxRtrAdvInterval 5;
|
||||
AdvDefaultLifetime 10;
|
||||
|
||||
prefix 2a0a:e5c0:0:0::/64 { };
|
||||
prefix 2a0a:e5c0:0:10::/64 { };
|
||||
|
||||
RDNSS 2a0a:e5c0:0:a::a 2a0a:e5c0:0:a::b { AdvRDNSSLifetime 6000; };
|
||||
DNSSL place5.ungleich.ch { AdvDNSSLLifetime 6000; } ;
|
||||
};
|
||||
```
|
|
@ -1,4 +1,4 @@
|
|||
title: Accessing IPv4 only hosts via IPv4
|
||||
title: Accessing IPv4 only hosts via IPv6
|
||||
---
|
||||
pub_date: 2021-02-28
|
||||
---
|
||||
|
|
110
content/u/products/ungleich-sla/contents.lr
Normal file
110
content/u/products/ungleich-sla/contents.lr
Normal file
|
@ -0,0 +1,110 @@
|
|||
_discoverable: no
|
||||
---
|
||||
_hidden: no
|
||||
---
|
||||
title: ungleich SLA levels
|
||||
---
|
||||
subtitle: ungleich service level agreements
|
||||
---
|
||||
description1:
|
||||
|
||||
What is the right SLA (service level agreement) for you? At ungleich
|
||||
we know that every organisation has individual needs and resources.
|
||||
Depending on your need, we offer different types of service level
|
||||
agreements.
|
||||
|
||||
## The standard SLA
|
||||
|
||||
If not otherwise specified in the product or service you acquired from
|
||||
us, the standard SLA will apply. This SLA covers standard operations
|
||||
and is suitable for non-critical deployments. The standard SLA covers:
|
||||
|
||||
* Target uptime of all services: 99.9%
|
||||
* Service level: best effort
|
||||
* Included for all products
|
||||
* Support via support@ungleich.ch (answered 9-17 on work days)
|
||||
* Individual Development and Support available at standard rate of 220 CHF/h
|
||||
* No telephone support
|
||||
|
||||
|
||||
---
|
||||
feature1_title: Bronze SLA
|
||||
---
|
||||
feature1_text:
|
||||
|
||||
The business SLA is suited for running regular applications with a
|
||||
focus of business continuity and individual support. Compared to the
|
||||
standard SLA it **guarantees you responses within 5 hours** on work
|
||||
days. You also can **reach our staff at extended** hours.
|
||||
|
||||
---
|
||||
feature2_title: Enterprise SLA
|
||||
---
|
||||
feature2_text:
|
||||
|
||||
The Enterprise SLA is right for you if you need high availability, but
|
||||
you don't require instant reaction times from our team.
|
||||
|
||||
|
||||
How this works:
|
||||
|
||||
* All services are setup in a high availability setup (additional
|
||||
charges for resources apply)
|
||||
* The target uptime of services: 99.99%
|
||||
|
||||
|
||||
|
||||
---
|
||||
feature3_title: High Availability (HA) SLA
|
||||
---
|
||||
feature3_text:
|
||||
If your application is mission critical, this is the right SLA for
|
||||
you. The **HA SLA** guarantees high availability, multi location
|
||||
deployments with cross-datacenter backups and fast reaction times
|
||||
on 24 hours per day.
|
||||
|
||||
---
|
||||
offer1_title: Business SLA
|
||||
---
|
||||
offer1_text:
|
||||
|
||||
* Target uptime of all services: 99.9%
|
||||
* Service level: guaranteed reaction within 1 business day
|
||||
* Development/Support (need to phrase this well): 180 CHF/h
|
||||
* Telephone support (8-18 work days)
|
||||
* Mail support (8-18 work days)
|
||||
* Optional out of business hours hotline (360 CHF/h)
|
||||
* 3'000 CHF/6 months
|
||||
|
||||
---
|
||||
offer1_link: https://ungleich.ch/u/contact/
|
||||
---
|
||||
offer2_title: Enterprise SLA
|
||||
---
|
||||
offer2_text:
|
||||
|
||||
** Requires High availability setup for all services with separate pricing
|
||||
* Service level: reaction within 4 hours
|
||||
* Telephone support (24x7 work days)
|
||||
* Services are provided in multiple data centers
|
||||
* Included out of business hours hotline (180 CHF/h)
|
||||
* 18'000 CHF/6 months
|
||||
|
||||
---
|
||||
offer2_link: https://ungleich.ch/u/contact/
|
||||
---
|
||||
offer3_title: HA SLA
|
||||
---
|
||||
offer3_text:
|
||||
|
||||
* Uptime guarantees >= 99.99%
|
||||
* Ticketing system reaction time < 3h
|
||||
* 24x7 telephone support
|
||||
* Applications running in multiple data centers
|
||||
* Minimum monthly fee: 3000 CHF (according to individual service definition)
|
||||
|
||||
Individual pricing. Contact us on support@ungleich.ch for an indivual
|
||||
quote and we will get back to you.
|
||||
|
||||
---
|
||||
offer3_link: https://ungleich.ch/u/contact/
|
|
@ -58,6 +58,15 @@ Checkout the [SBB
|
|||
page](https://www.sbb.ch/de/kaufen/pages/fahrplan/fahrplan.xhtml?von=Zurich&nach=Diesbach-Betschwanden)
|
||||
for the next train.
|
||||
|
||||
The address is:
|
||||
|
||||
```
|
||||
Hacking Villa
|
||||
Hauptstrasse 28
|
||||
8777 Diesbach
|
||||
Switzerland
|
||||
```
|
||||
|
||||
---
|
||||
content1_image: hacking-villa-diesbach.jpg
|
||||
---
|
||||
|
|
|
@ -45,6 +45,16 @@ Specifically for learning new technologies and to exchange knowledge
|
|||
we created the **Hacking & Learning channel** which can be found at
|
||||
**#hacking-and-learning:ungleich.ch**.
|
||||
|
||||
## Kubernetes
|
||||
|
||||
Recently (in 2021) we started to run Kubernetes cluster at
|
||||
ungleich. We share our experiences in **#kubernetes:ungleich.ch**.
|
||||
|
||||
## Ceph
|
||||
|
||||
To exchange experiences and trouble shooting for ceph, we are running
|
||||
**#ceph:ungleich.ch**.
|
||||
|
||||
## cdist
|
||||
|
||||
We meet for cdist discussions about using, developing and more
|
||||
|
@ -57,7 +67,7 @@ We discuss topics related to sustainability in
|
|||
|
||||
## More channels
|
||||
|
||||
* The main / hangout channel is **o#town-square:ungleich.ch** (also bridged
|
||||
* The main / hangout channel is **#town-square:ungleich.ch** (also bridged
|
||||
to Freenode IRC as #ungleich and
|
||||
[discord](https://discord.com/channels/706144469925363773/706144469925363776))
|
||||
* The bi-yearly hackathon Hack4Glarus can be found in
|
||||
|
|
Loading…
Reference in a new issue