From e324ef31a712d5defa683503aca134d731dea74f Mon Sep 17 00:00:00 2001 From: Nico Schottelius Date: Fri, 1 Jan 2010 22:47:24 +0100 Subject: [PATCH] add blog article about linux virtual machines Signed-off-by: Nico Schottelius --- blog/linux-virtual-machines-a-real-pain.mdwn | 238 +++++++++++++++++++ tags/vm.mdwn | 3 + 2 files changed, 241 insertions(+) create mode 100644 blog/linux-virtual-machines-a-real-pain.mdwn create mode 100644 tags/vm.mdwn diff --git a/blog/linux-virtual-machines-a-real-pain.mdwn b/blog/linux-virtual-machines-a-real-pain.mdwn new file mode 100644 index 00000000..f3124d1b --- /dev/null +++ b/blog/linux-virtual-machines-a-real-pain.mdwn @@ -0,0 +1,238 @@ +[[!meta title="Linux virtual machine software is a real pain"]] + +This report is a about my todays experience with virtual machines. + +## UML (user-mode-linux) + +It began this morning, when I tried to setup a new virtual +machine with [User mode Linux](http://user-mode-linux.sourceforge.net/). +I could easily reuse an existing installation using +copy-on-write with the following command: + + linux umid=vm4 uml_dir=/home/nico/vm/uml con1=pts ubda=/home/nico/vm/cow/vm4,/home/nico/vm/images/debian eth0=tuntap,,,192.168.4.1 mem=4096M + +After I issued + + apt-get update && apt-get dist-upgrade + +in the virtual machine, it hung. It did not react to new ssh connections. +I've seen this behaviour quite often with **user mode Linux**, when I have +"a lot" of disk input/output. Ok, I wanted to use some kind of +framework for my virtual machines anyway, so for the time being, +let's forget about uml and try the libvirt+kvm. + +## Libvirt + +The [libvirt](http://libvirt.org/) project looks quite promising +from its documentation, especially in combination with +[virt-manager](http://virt-manager.org/). Trying to create a +new virtual machine with virt-manager is kind of strange, because +it insists of having an installation medium. Though, locating the +Debian live CD is not so difficult. But then came the big problem: +When I tried to create a new disk image, virt-manager just hung +for several minutes, without the host system doing anything. +Some time before I had massive problems using virt-manager and selecting +a different pool for the images, which caused several problems when +trying to start the VM. + +But well, let's give [virsh](http://www.libvirt.org/apps.html) a try, +the command line utility to manage libvirt. Creating a new disk image +with virsh is pretty easy: + + vol-create-as default jr.img 8G + +A bit confusing is the fact that the **vol-create** command without +**-as** prefix expects a XML-file as input. Having a look at the +other create commands confirms guess: + + ikn:~% LANG=C LC_ALL=C virsh help | grep create + create create a domain from an XML file + net-create create a network from an XML file + nodedev-create create a device defined by an XML file on the node + pool-create create a pool from an XML file + pool-create-as create a pool from a set of args + vol-create create a vol from an XML file + vol-create-from create a vol, using another volume as input + vol-create-as create a volume from a set of args + ikn:~% virsh --version + 0.7.4 + +Some commands do not support creation from the command line, but +only from an XML-file, which makes virsh useless for interactive +and scripting use. + +This brings me to the new kid on the block: ganeti + +## Ganeti + +When I first experienced problems with libvirt, some people pointed +me to [ganeti](http://code.google.com/p/ganeti/) +(to speak truth, it was one of the ganeti developers). +Until today I delayed this idea, but after the problems with libvirt +I decided to give ganeti-2.0.5-1 (Debian package) a try. First of all +I tried to follow the +[installation tutorial](http://ganeti-doc.googlecode.com/svn/ganeti-1.2/install.html) +referenced on the homepage, which is heavily orientated on using +[Xen](http://www.xen.org/) and [LVM](http://tldp.org/HOWTO/LVM-HOWTO/), both +of them I do not plan to use. Trying to get ganeti running, I was meeting +some interesting problems: + + [11:26] tee:root# gnt-cluster init ganeti.schottelius.org + Failure: prerequisites not met for this operation: + This host's IP resolves to the private range (127.0.1.1). Please fix DNS or /etc/hosts. + +This is described in the ganeti manual and easily fixed by commenting out the +relevant entry in ***/etc/hosts***: + + [11:27] tee:root# grep tee.schottelius.org /etc/hosts + #127.0.1.1 tee.schottelius.org tee + +After that I was a bit confused by ganeti not finding its cluster name: + + [11:27] tee:root# gnt-cluster init ganeti.schottelius.org + Failure: can't resolve hostname 'ganeti.schottelius.org' + [11:28] tee:root# ping ganeti.schottelius.org + PING ganeti.schottelius.org (77.109.138.195) 56(84) bytes of data. + 64 bytes from ganeti.schottelius.org (77.109.138.195): icmp_seq=1 ttl=64 time=0.026 ms + ^C + --- ganeti.schottelius.org ping statistics --- + 1 packets transmitted, 1 received, 0% packet loss, time 0ms + rtt min/avg/max/mdev = 0.026/0.026/0.026/0.000 ms + +Retrying two times "solved" the problem, which is a bit confusing for me +as ganeti and ping both use the same resolver library. After that I met the +"no-lvm-problem": + + [11:38] tee:root# gnt-cluster init -b br0 ganeti.schottelius.org + Failure: prerequisites not met for this operation: + Error: volume group 'xenvg' missing + specify --no-lvm-storage if you are not using lvm + +Specifying the required parameter led me into a new problem: + + [11:38] tee:root# gnt-cluster init -b br0 --no-lvm-storage ganeti.schottelius.org + Failure: prerequisites not met for this operation: + Invalid master netdev given (xen-br0): 'Device "xen-br0" does not exist.' + +Which is interesting, **ganeti seems to ignore the bridge paramater -b**. +So, to use ganeti, I **renamed the bridge from br0 to xen-br0** in +***/etc/network/interfaces***: + + auto xen-br0 + iface xen-br0 inet manual + bridge_ports eth1 + +And finally I was able to initialise the ganeti cluster: + + [15:06] tee:root# gnt-cluster init -b br0 --no-lvm-storage ganeti.schottelius.org + +Then I tried to join the host into the cluster, which failed, but retrieving +status information also failed: + + [15:06] tee:root# gnt-node add tee.schottelius.org + Node tee.schottelius.org already in the cluster (as tee.schottelius.org) - please retry with '--readd' + [15:07] tee:root# gnt-node list + Node DTotal DFree MTotal MNode MFree Pinst Sinst + tee.schottelius.org ? ? ? ? ? 0 0 + +Trying to re-add it, results in an error without an error message +and does not fix the problem: + + [15:07] tee:root# gnt-node add --readd tee.schottelius.org + The authenticity of host 'tee.schottelius.org (77.109.138.222)' can't be established. + RSA key fingerprint is c7:d0:a8:32:ad:f0:9b:fa:1e:77:d5:1f:64:d8:9b:db. + Are you sure you want to continue connecting (yes/no)? yes + Thu Dec 31 15:08:23 2009 - INFO: Readding a node, the offline/drained flags were reset + Thu Dec 31 15:08:23 2009 - INFO: Node will be a master candidate + Failure: command execution error: + [15:08] tee:root# + [15:32] tee:root# gnt-node list + Node DTotal DFree MTotal MNode MFree Pinst Sinst + tee.schottelius.org ? ? ? ? ? 0 0 + [15:34] tee:root# + +At that point I was pointed to the +[more recent documentation](http://ganeti-doc.googlecode.com/svn/ganeti-2.0/install.html) +of ganeti and began from scratch: + + [16:22] tee:vm# gnt-cluster destroy --yes-do-it + [16:23] tee:vm# gnt-cluster init --no-lvm-storage ganeti.schottelius.org + [16:26] tee:vm# gnt-node list + Node DTotal DFree MTotal MNode MFree Pinst Sinst + tee.schottelius.org ? ? ? ? ? 0 0 + +After double checking that the needed daemons are running +(/etc/init.d/ganeti restart), I got a good hint: One has to specify +the supervisor to use during initialisation: + + [16:34] tee:vm# gnt-cluster destroy --yes-do-it + [16:35] tee:vm# gnt-cluster init --no-lvm-storage -t kvm ganeti.schottelius.org + [16:36] tee:vm# gnt-node list + Node DTotal DFree MTotal MNode MFree Pinst Sinst + tee.schottelius.org ? ? 19.6G 2.7G 17.6G 0 0 + +Now I tried to add a new virtual machine instance, which resulted in another +error: + + [16:51] tee:vm# gnt-instance add -t file -s 4G -o debootstrap -n tee.schottelius.org jr.nachtbrand.ch + Failure: prerequisites not met for this operation: + Hypervisor parameter validation failed on node tee.schottelius.org: Instance kernel '/boot/vmlinuz-2.6-kvmU' not found or not a file + +This seems to be some kind ganeti logic to have the kernel outside the +block device, which is similar to the user mode Linux approach. After linking one +of the host kernels and its initrd adding an instance succeeded: + + [16:59] tee:/boot# ln -s vmlinuz-2.6.30-2-amd64 vmlinuz-2.6-kvmU + [16:59] tee:/boot# ln -s initrd.img-2.6.30-2-amd64 initrd-2.6-kvmU + [17:00] tee:vm# gnt-instance add -t file -s 4G -o debootstrap -n tee.schottelius.org jr.nachtbrand.ch + [17:01] tee:/boot# gnt-instance list + Instance Hypervisor OS Primary_node Status Memory + jr.nachtbrand.ch kvm debootstrap tee.schottelius.org running 128M + +It is also correctly connected to the bridge, seen as valid by **gnt-os** +and **gnt-cluster verify** looks good: + + [17:14] tee:/boot# brctl show + bridge name bridge id STP enabled interfaces + br0 8000.000000000000 no + xen-br0 8000.0015176a26f7 no eth1 + tap4 + [17:16] tee:/boot# gnt-os diagnose + OS: debootstrap [global status: valid] + Node: tee.schottelius.org, status: VALID (path: /usr/share/ganeti/os/debootstrap) + [17:17] tee:/boot# gnt-cluster verify + Thu Dec 31 17:17:24 2009 * Verifying global settings + Thu Dec 31 17:17:24 2009 * Gathering data (1 nodes) + Thu Dec 31 17:17:24 2009 * Verifying node tee.schottelius.org (master) + Thu Dec 31 17:17:24 2009 * Verifying instance jr.nachtbrand.ch + Thu Dec 31 17:17:24 2009 * Verifying orphan volumes + Thu Dec 31 17:17:24 2009 * Verifying remaining instances + Thu Dec 31 17:17:24 2009 * Verifying N+1 Memory redundancy + Thu Dec 31 17:17:24 2009 * Other Notes + Thu Dec 31 17:17:24 2009 - NOTICE: 1 non-redundant instance(s) found. + Thu Dec 31 17:17:24 2009 * Hooks Results + +As specified in the documentation, I tried to connect to the console: + + [17:28] tee:/boot# gnt-instance console jr.nachtbrand.ch + [17:30] tee:/boot# gnt-instance console --show-cmd jr.nachtbrand.ch + ssh -q -oEscapeChar=none -oHashKnownHosts=no -oGlobalKnownHostsFile=/var/lib/ganeti/known_hosts -oUserKnownHostsFile=/dev/null -oHostKeyAlias=ganeti.schottelius.org -oBatchMode=yes -oStrictHostKeyChecking=yes -t root@tee.schottelius.org '/usr/bin/socat STDIO,echo=0,icanon=0 UNIX-CONNECT:/var/run/ganeti/kvm-hypervisor/ctrl/jr.nachtbrand.ch.serial' + +The problem is that the newly debootstrapped system +*does not have a serial console setup*. + +As you can see, in the evening of this day I had a lot of new experiences, +but *no reliable running virtualisation framework*. That brings me to the +end of this report: + + * User mode Linux does not work reliable under some I/O load. + * Virt-manager is absolutely not able to change the simplest parameters. + * Virsh is unusable, if you don't want to edit XML-files. + * Ganeti has a lot of unhandled problems and still relies very much on Xen + LVM. + +As next Monday my vacation ends, I will have a look at the commercial virtualisation +frameworks. For the folks of the named FOSS stuff above: Guys, you've to improve +a lot, until one can call your software "good and clean software". + + +[[!tag unix vm]] diff --git a/tags/vm.mdwn b/tags/vm.mdwn new file mode 100644 index 00000000..3ae62457 --- /dev/null +++ b/tags/vm.mdwn @@ -0,0 +1,3 @@ +Virtual machine related stuff + +[[!inline pages="tagged(vm)" archive="yes" show=0 quick=no]]