++blog
This commit is contained in:
		
					parent
					
						
							
								eafcb97d87
							
						
					
				
			
			
				commit
				
					
						2f1d043281
					
				
			
		
					 2 changed files with 198 additions and 0 deletions
				
			
		| 
						 | 
					@ -0,0 +1,197 @@
 | 
				
			||||||
 | 
					title: How to build an OpenStack alternative: Step 1, the prototype
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					pub_date: 2020-01-11
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					author: ungleich virtualisation team
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					twitter_handle: ungleich
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					_hidden: no
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					_discoverable: yes
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					abstract:
 | 
				
			||||||
 | 
					The step by step guide for doing it yourself
 | 
				
			||||||
 | 
					---
 | 
				
			||||||
 | 
					body:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In this article we describe a first
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Find out what you need
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					When we say building an alternative to OpenStack, we have something
 | 
				
			||||||
 | 
					specific in our mind. This might be different from what you think
 | 
				
			||||||
 | 
					OpenStack is for. For us it is running a lot of virtual machines for
 | 
				
			||||||
 | 
					customers with a lot of storage attached. With self service and
 | 
				
			||||||
 | 
					automated payments.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					All code I refer to in this article can be found on
 | 
				
			||||||
 | 
					[code.ungleich.ch](https://code.ungleich.ch/uncloud/uncloud/tree/master/uncloud/hack/hackcloud).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Creating a network
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The current setup at [Data Center
 | 
				
			||||||
 | 
					Light](/u/projects/data-center-light) relies heavily on VLANs. VLANs
 | 
				
			||||||
 | 
					however have a similar problem as IPv4 addresses: there are not that
 | 
				
			||||||
 | 
					many of them. So for our Openstack replacement we decided to go with
 | 
				
			||||||
 | 
					[VXLANs](https://en.wikipedia.org/wiki/Virtual_Extensible_LAN)
 | 
				
			||||||
 | 
					instead. We also considered
 | 
				
			||||||
 | 
					[SRV6](https://www.segment-routing.net/tutorials/2017-12-05-srv6-introduction/),
 | 
				
			||||||
 | 
					however we did not see a advantage for our use case. In fact, VXLANs
 | 
				
			||||||
 | 
					seems to be much simpler.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					So before running a VM, we create a new VXLAN device and add it to a
 | 
				
			||||||
 | 
					bridge. This roughly looks as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					netid=100
 | 
				
			||||||
 | 
					dev=eth0
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					vxlandev=vxlan${netid}
 | 
				
			||||||
 | 
					bridgedev=br${netid}
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Create the vxlan device
 | 
				
			||||||
 | 
					ip -6 link add ${vxlandev} type vxlan \
 | 
				
			||||||
 | 
					    id ${netid} \
 | 
				
			||||||
 | 
					    dstport 4789 \
 | 
				
			||||||
 | 
					    group ff05::${netid} \
 | 
				
			||||||
 | 
					    dev ${dev} \
 | 
				
			||||||
 | 
					    ttl 5
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					ip link set ${vxlandev} up
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Create the bridge
 | 
				
			||||||
 | 
					ip link add ${bridgedev} type bridge
 | 
				
			||||||
 | 
					ip link set ${bridgedev} up
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# Add the vxlan device into the bridge
 | 
				
			||||||
 | 
					ip link set ${vxlandev} master ${bridgedev} up
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					As you can see, we are using IPv6 multicast underlying the VXLAN,
 | 
				
			||||||
 | 
					which is very practical in an IPv6 first data center.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## IP address management (IPAM)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Speaking of IPv6 first, all VMs in our new setup will again be IPv6
 | 
				
			||||||
 | 
					only and IPv4 addresses will be mapped to it via NAT64. This is very
 | 
				
			||||||
 | 
					similar to what you see at AWS, just that AWS uses
 | 
				
			||||||
 | 
					[RFC1918](https://tools.ietf.org/html/rfc1918) private IPv4 space
 | 
				
			||||||
 | 
					instead of [global unique IPv6
 | 
				
			||||||
 | 
					addresses](https://tools.ietf.org/html/rfc3587), which we do.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The advantage of using IPv6 here is that you will never ever have a
 | 
				
			||||||
 | 
					collision and that your VM is very clean: no need to think about IPv4
 | 
				
			||||||
 | 
					firewall rules, you only need to configure IPv6 settings.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					In the IPv6 world, we use router advertisements as an alternative to
 | 
				
			||||||
 | 
					DHCP in the IPv4 world. This has the advantage that no state is kept
 | 
				
			||||||
 | 
					on the server.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To enable our IPAM, we add an IPv6 address to our bridge and enable
 | 
				
			||||||
 | 
					the radvd daemon:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					ip addr add ${ip} dev ${bridgedev}
 | 
				
			||||||
 | 
					radvd -C ./radvd.conf  -n -p ./radvdpid
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A sample radvd configuration we used for testing looks like this:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					interface br100
 | 
				
			||||||
 | 
					{
 | 
				
			||||||
 | 
					  AdvSendAdvert on;
 | 
				
			||||||
 | 
					  MinRtrAdvInterval 3;
 | 
				
			||||||
 | 
					  MaxRtrAdvInterval 5;
 | 
				
			||||||
 | 
					  AdvDefaultLifetime 3600;
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  prefix 2a0a:e5c1:111:888::/64 {
 | 
				
			||||||
 | 
					  };
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  RDNSS 2a0a:e5c0::3 2a0a:e5c0::4 { AdvRDNSSLifetime 6000; };
 | 
				
			||||||
 | 
					  DNSSL place7.ungleich.ch {  AdvDNSSLLifetime 6000; } ;
 | 
				
			||||||
 | 
					};
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					With this, we are ready to spawn a VM!
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Create a VM
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The current setup at Data Center Light uses libvirtd for creating
 | 
				
			||||||
 | 
					VMs. This is problematic, because libvirtd is not very reliabe:
 | 
				
			||||||
 | 
					sometimes it stops to answer `virsh` commands or begins to use 100%
 | 
				
			||||||
 | 
					CPU and needs to be killed and restarted regularly. We have seen this
 | 
				
			||||||
 | 
					behaviour on CentOS 5, CentOS 6, Debian 8 and Devuan 9.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					So in our version, we skip libvirt and run qemu directly. It turns out
 | 
				
			||||||
 | 
					that this is actually not that hard and can be done using the
 | 
				
			||||||
 | 
					following script:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					vmid=$1; shift
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					qemu=/usr/bin/qemu-system-x86_64
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					accel=kvm
 | 
				
			||||||
 | 
					#accel=tcg
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					memory=1024
 | 
				
			||||||
 | 
					cores=2
 | 
				
			||||||
 | 
					uuid=732e08c7-84f8-4d43-9571-263db4f80080
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					export bridge=br100
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					$qemu -name uc${vmid} \
 | 
				
			||||||
 | 
					      -machine pc,accel=${accel} \
 | 
				
			||||||
 | 
					      -m ${memory} \
 | 
				
			||||||
 | 
					      -smp ${cores} \
 | 
				
			||||||
 | 
					      -uuid ${uuid} \
 | 
				
			||||||
 | 
					      -drive file=alpine-virt-3.11.2-x86_64.iso,media=cdrom \
 | 
				
			||||||
 | 
					      -netdev tap,id=netmain,script=./ifup.sh \
 | 
				
			||||||
 | 
					      -device virtio-net-pci,netdev=netmain,id=net0,mac=02:00:f0:a9:c4:4e
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					This starts a VM with a hard coded mac address using KVM
 | 
				
			||||||
 | 
					acceleration. We give the VM 2 cores and assign it an UUID so that we
 | 
				
			||||||
 | 
					can easily find it again later. For testing, we have attached an
 | 
				
			||||||
 | 
					[Alpine Linux ISO](https://alpinelinux.org/).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The interesting part is however the network part. We create a virtio
 | 
				
			||||||
 | 
					based network card and execute `ifup.sh` after qemu has been started.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					The ifup.sh script looks as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					dev=$1; shift
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					# bridge is setup from outside
 | 
				
			||||||
 | 
					ip link set dev "$dev" master ${bridge}
 | 
				
			||||||
 | 
					ip link set dev "$dev" up
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					It basically adds the tap device to the previously created bridge.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## That's all there is
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Only using above steps we spawned a test VM on a test machine that is
 | 
				
			||||||
 | 
					reachable at `2a0a:e5c1:111:888:0:f0ff:fea9:c44e`, world wide. If our
 | 
				
			||||||
 | 
					test machine is on, you should be able to reach it from anywhere in
 | 
				
			||||||
 | 
					the world.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Obviously this is not a full OpenStack replacement. However we wanted
 | 
				
			||||||
 | 
					to share the small steps that we take for creating it. And we really
 | 
				
			||||||
 | 
					like running a virtual machine hosting and wanted to show you how
 | 
				
			||||||
 | 
					much fun it can be.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Next step
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A lot of things in the above example are hard code and aren't usable
 | 
				
			||||||
 | 
					for customers directly. In the next step we will generalise some of
 | 
				
			||||||
 | 
					the above functions to get more and more nearby to provide a fully
 | 
				
			||||||
 | 
					usable OpenStack alternative.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					If you are interested in this topic, you can join us on the [ungleich
 | 
				
			||||||
 | 
					chat](https://chat.ungleich.ch), the full development of our
 | 
				
			||||||
 | 
					alternative is open source.
 | 
				
			||||||
							
								
								
									
										1
									
								
								content/u/blog/uncloud-next
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										1
									
								
								content/u/blog/uncloud-next
									
										
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1 @@
 | 
				
			||||||
 | 
					- how to secure the network
 | 
				
			||||||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue