kvm documentation blog entry
Signed-off-by: Nico Schottelius <nico@bento.schottelius.org>
This commit is contained in:
parent
bbe186c801
commit
ba28e35310
2 changed files with 350 additions and 0 deletions
350
blog/kvm-vms-with-cdist-at-local.ch.mdwn
Normal file
350
blog/kvm-vms-with-cdist-at-local.ch.mdwn
Normal file
|
@ -0,0 +1,350 @@
|
|||
[[!meta title="KVM Virtual Machines managed with cdist and sexy @ local.ch"]]
|
||||
|
||||
## Introduction
|
||||
|
||||
This article describes the KVM setup of [local.ch](http://www.local.ch), which is
|
||||
managed by [[sexy|software/sexy]] and configured by [[cdist/software/cdist]].
|
||||
|
||||
If you haven't so far, you may want to have a look at the
|
||||
[[Sexy and cdist @ local.ch|sexy-and-cdist-at-local.ch]]
|
||||
article before continuing to read this one.
|
||||
|
||||
## KVM Host configuration
|
||||
|
||||
The KVM hosts are Dell R815 with CentOS 6.x installed. Why Dell? Because they
|
||||
offered us a good price/value combination for the boxes. Why CentOS? Historical
|
||||
reasons. The hosts got a minimal set of BIOS tuning to support the VM performance:
|
||||
|
||||
* Enable the usual virtualisation flags (don't forget the IOMMU!)
|
||||
* Change the power profile to **Maximum Perforamnce**
|
||||
|
||||
Furthermore, as the CentOS kernel is pretty old (2.6.32-279) and
|
||||
conservatively configured, the kernel needs the following
|
||||
command line option to enable the IOMMU:
|
||||
|
||||
amd_iommu=on
|
||||
|
||||
Not enabling this option degrades the performance by at least 100%. In our case,
|
||||
enabling it dropped the latency of the application by a factor of 10.
|
||||
|
||||
One big motivation of the the KVM setup at local.ch is to make the
|
||||
KVM hosts as independent as possible and sensibly fault tolerant. That said,
|
||||
VMs are stored on local storage and hosts are always redundantly connected
|
||||
to two switches use [LACP](https://en.wikipedia.org/wiki/Link_aggregation).
|
||||
|
||||
|
||||
## KVM Host Network Configuration
|
||||
|
||||
[[!img kvm-setup-local.ch-overview.png alt="Overview of KVM setup at local.ch"]]
|
||||
|
||||
As can be seen in the picture above, every KVM host is connected to two
|
||||
**10G Arista switches (7050T-52-R)** using LACP. Besides being capable
|
||||
of running 10G, the Arista switches are actually pretty neat for the Unix geek,
|
||||
because they are Linux based with a
|
||||
[FPGA](https://en.wikipedia.org/wiki/Field-programmable_gate_array)
|
||||
attached. Furthermore you can easily
|
||||
gain access to a shell by typing **enable** followed by **bash**.
|
||||
|
||||
The Arista switches are connected together with 2x 10G links, over which LACP+MLAG
|
||||
is configured. This gives us the ability to connect every KVM host with LACP to two
|
||||
**different** switches: They use MLAG to synchronise their LACP states.
|
||||
|
||||
On the KVM host, the network is configured as follows:
|
||||
|
||||
The dual Port 10G card (Intel Corporation 82599EB) is bonded together into bond0.
|
||||
|
||||
[root@kvm-hw-inx01 network-scripts]# cat /proc/net/bonding/bond0
|
||||
Ethernet Channel Bonding Driver: v3.6.0 (September 26, 2009)
|
||||
|
||||
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
|
||||
Transmit Hash Policy: layer2 (0)
|
||||
MII Status: up
|
||||
MII Polling Interval (ms): 0
|
||||
Up Delay (ms): 0
|
||||
Down Delay (ms): 0
|
||||
|
||||
802.3ad info
|
||||
LACP rate: slow
|
||||
Aggregator selection policy (ad_select): stable
|
||||
Active Aggregator Info:
|
||||
Aggregator ID: 3
|
||||
Number of ports: 2
|
||||
Actor Key: 33
|
||||
Partner Key: 30
|
||||
Partner Mac Address: 02:1c:73:1b:f5:b2
|
||||
|
||||
Slave Interface: eth4
|
||||
MII Status: up
|
||||
Speed: 10000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: 68:05:ca:0b:5b:6a
|
||||
Aggregator ID: 3
|
||||
Slave queue ID: 0
|
||||
|
||||
Slave Interface: eth5
|
||||
MII Status: up
|
||||
Speed: 10000 Mbps
|
||||
Duplex: full
|
||||
Link Failure Count: 0
|
||||
Permanent HW addr: 68:05:ca:0b:5b:6b
|
||||
Aggregator ID: 3
|
||||
Slave queue ID: 0
|
||||
|
||||
The following configuration is used to create the bond0 device:
|
||||
|
||||
[root@kvm-hw-inx01 network-scripts]# cat ifcfg-bond0
|
||||
DEVICE=bond0
|
||||
BOOTPROTO=none
|
||||
BONDING_OPTS="mode=802.3ad"
|
||||
ONBOOT=yes
|
||||
MTU=9000
|
||||
|
||||
[root@kvm-hw-inx01 sysconfig]# cat network-scripts/ifcfg-eth4
|
||||
DEVICE="eth4"
|
||||
NM_CONTROLLED="yes"
|
||||
USERCTL=no
|
||||
ONBOOT=yes
|
||||
MASTER=bond0
|
||||
SLAVE=yes
|
||||
BOOTPROTO=none
|
||||
|
||||
[root@kvm-hw-inx01 sysconfig]# cat network-scripts/ifcfg-eth5
|
||||
DEVICE="eth5"
|
||||
NM_CONTROLLED="yes"
|
||||
USERCTL=no
|
||||
ONBOOT=yes
|
||||
MASTER=bond0
|
||||
SLAVE=yes
|
||||
BOOTPROTO=none
|
||||
|
||||
The MTU of the 10G cards has been set to 9000, as the Aristas support
|
||||
[Jumbo Frames](https://en.wikipedia.org/wiki/Jumbo_frame).
|
||||
|
||||
Every VM is attached to two different networks:
|
||||
|
||||
* PZ: presentation (for general traffic) (10.18x.0.0/22 network)
|
||||
* FZ: filerzone (for NFS and database traffic) (10.18x.64.0/22 network)
|
||||
|
||||
Both networks are seperated using the VLAN tags 2 (pz) and 3 (fz), which result
|
||||
in **bond0.2** and **bond0.3**:
|
||||
|
||||
[root@kvm-hw-inx01 network-scripts]# ip l | grep bond
|
||||
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
|
||||
7: eth5: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0 state UP qlen 1000
|
||||
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
|
||||
139: bond0.2@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
|
||||
140: bond0.3@bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP
|
||||
|
||||
To keep things simple, the two vlan tagged (bonded) interfaces are added to a bridge each,
|
||||
to which the VMs are attached later on. The full configuration looks like this:
|
||||
|
||||
[root@kvm-hw-inx01 network-scripts]# cat ifcfg-bond0.2
|
||||
DEVICE="bond0.2"
|
||||
ONBOOT=yes
|
||||
VLAN=yes
|
||||
BRIDGE=brpz
|
||||
|
||||
[root@kvm-hw-inx01 network-scripts]# cat ifcfg-brpz
|
||||
DEVICE=brpz
|
||||
TYPE=Bridge
|
||||
ONBOOT=yes
|
||||
DELAY=0
|
||||
NM_CONTROLLED=no
|
||||
MTU=9000
|
||||
|
||||
This is how a bridge looks like in production (with about 70 lines stripped):
|
||||
|
||||
[root@kvm-hw-inx01 network-scripts]# brctl show
|
||||
bridge name bridge id STP enabled interfaces
|
||||
brfz 8000.024db29ca91f no bond0.3
|
||||
tap13
|
||||
tap73
|
||||
[...]
|
||||
brpz 8000.02f6742800b2 no bond0.2
|
||||
tap0
|
||||
tap1
|
||||
[...]
|
||||
|
||||
Summarised, the network configuration of a KVM host looks like this:
|
||||
|
||||
arista1 arista2
|
||||
| |
|
||||
[eth4 + eth5] -> bond0
|
||||
|
|
||||
|
|
||||
/ \
|
||||
bond0.2 bond0.3
|
||||
/ \
|
||||
brpz brfz
|
||||
\ /
|
||||
tap1 tap2
|
||||
\ /
|
||||
VM
|
||||
|
||||
|
||||
|
||||
## VM configuration
|
||||
|
||||
The VM configuration can be found below **/opt/local.ch/sys/kvm**
|
||||
on every KVM host. Every VM is stored below
|
||||
**/opt/local.ch/sys/kvm/vm/<vm name>** and contains the following
|
||||
files:
|
||||
|
||||
[root@kvm-hw-inx03 jira-vm-inx01.intra.local.ch]# ls
|
||||
monitor pid start start-on-boot system-disk vnc
|
||||
|
||||
|
||||
* monitor: socket to the monitor from KVM
|
||||
* pid: the pid of the VM
|
||||
* start: the script to start the VM (see below for an example)
|
||||
* start-on-boot: if this file exists, the VM will be started on boot
|
||||
* system-disk: the qcow2 image of the system disk
|
||||
* vnc: socket to the screen of the VM
|
||||
|
||||
With the exception of monitor, pid and vnc are all files generated by cdist.
|
||||
One of the major concerns of this KVM setup is that all hosts have as little
|
||||
as possible dependencies. That said, the start script of a VM looks like this:
|
||||
|
||||
[root@kvm-hw-inx03 jira-vm-inx01.intra.local.ch]# cat start
|
||||
#!/bin/sh
|
||||
# Generated shell script - do not modify
|
||||
#
|
||||
|
||||
/usr/libexec/qemu-kvm \
|
||||
-name jira-vm-inx01.intra.local.ch \
|
||||
-enable-kvm \
|
||||
-m 8192 \
|
||||
-drive file=/opt/local.ch/sys/kvm/vm/jira-vm-inx01.intra.local.ch/system-disk,if=virtio \
|
||||
-vnc unix:/opt/local.ch/sys/kvm/vm/jira-vm-inx01.intra.local.ch/vnc \
|
||||
-cpu host \
|
||||
-boot order=nc \
|
||||
-pidfile "/opt/local.ch/sys/kvm/vm/jira-vm-inx01.intra.local.ch/pid" \
|
||||
-monitor "unix:/opt/local.ch/sys/kvm/vm/jira-vm-inx01.intra.local.ch/monitor,server,nowait" \
|
||||
-net nic,macaddr=00:16:3e:02:00:ab,model=virtio,vlan=200 \
|
||||
-net tap,script=/opt/local.ch/sys/kvm/bin/ifup-pz,downscript=/opt/local.ch/sys/kvm/bin/ifdown,vlan=200 \
|
||||
-net nic,macaddr=00:16:3e:02:00:ac,model=virtio,vlan=300 \
|
||||
-net tap,script=/opt/local.ch/sys/kvm/bin/ifup-fz,downscript=/opt/local.ch/sys/kvm/bin/ifdown,vlan=300 \
|
||||
-smp 4
|
||||
|
||||
Most parameter values depend on output of sexy, which uses the cdist type, which in turn
|
||||
assembles this start script. The above script may be useful for one or more of my readers,
|
||||
as it includes a lot of tuning we have done to KVM.
|
||||
|
||||
|
||||
## Automatic startup of VMs
|
||||
|
||||
The virtual machines are brought up by an init script located at
|
||||
***/etc/init.d/kvm-vms***. As every VM contains its own startup script
|
||||
and is marked whether it should be started at boot, the init script
|
||||
is pretty simple:
|
||||
|
||||
basedir=/opt/local.ch/sys/kvm/vm
|
||||
|
||||
broken_lock_file_for_centos=/var/lock/subsys/kvm-vms
|
||||
|
||||
case "$1" in
|
||||
start)
|
||||
cd "$basedir"
|
||||
|
||||
# Specific VM given
|
||||
if [ "$2" ]; then
|
||||
vm_list=$2
|
||||
else
|
||||
vm_list=$(ls)
|
||||
fi
|
||||
|
||||
for vm in $vm_list; do
|
||||
vm_base_dir="$basedir/$vm"
|
||||
start_script="$vm_base_dir/start"
|
||||
|
||||
# Skip start of machines which should not start
|
||||
if [ ! -f "$vm/start-on-boot" ]; then
|
||||
continue
|
||||
fi
|
||||
|
||||
echo "Starting VM $vm ..."
|
||||
logger -t kvm-vms "Starting VM $vm ..."
|
||||
screen -d -m -S "$vm" "$start_script"
|
||||
done
|
||||
|
||||
touch "$broken_lock_file_for_centos"
|
||||
;;
|
||||
|
||||
As you can see, every VM is started in its own
|
||||
[screen](http://www.gnu.org/software/screen/). We decided to go for this approach,
|
||||
as screen is sometimes buggy and hangs itself up. This way, we only lose on machine
|
||||
on every screen death, not all of them at the same time. Furthermore, screen is usually
|
||||
limited to a maximum number of windows it can server.
|
||||
When everything went successful, the process output for a virtual machine looks like this:
|
||||
|
||||
root 64611 0.0 0.0 118840 852 ? Ss Mar11 0:00 SCREEN -d -m -S binarypool-vm-inx02.intra.local.ch /opt/local.ch/sys/kvm/vm/binarypool-vm-inx02.intra.local.ch/start
|
||||
root 64613 0.0 0.0 106092 1180 pts/22 Ss+ Mar11 0:00 /bin/sh /opt/local.ch/sys/kvm/vm/binarypool-vm-inx02.intra.local.ch/start
|
||||
root 64614 2.9 2.2 9106828 5819748 pts/22 Sl+ Mar11 5221:41 /usr/libexec/qemu-kvm -name binarypool-vm-inx02.intra.local.ch -enable-kvm -m 8192 -drive file=/opt/local.ch/sys/kvm/vm/binarypool-vm-inx02.intra.local.ch/system-disk,if=virtio -vnc unix:/opt/local.ch/sys/kvm/vm/binarypool-vm-inx02.intra.local.ch/vnc -cpu host -boot order=nc -pidfile /opt/local.ch/sys/kvm/vm/binarypool-vm-inx02.intra.local.ch/pid -monitor unix:/opt/local.ch/sys/kvm/vm/binarypool-vm-inx02.intra.local.ch/monitor,server,nowait -net nic,macaddr=00:16:3e:02:00:7f,model=virtio,vlan=200 -net tap,script=/opt/local.ch/sys/kvm/bin/ifup-pz,downscript=/opt/local.ch/sys/kvm/bin/ifdown,vlan=200 -net nic,macaddr=00:16:3e:02:00:80,model=virtio,vlan=300 -net tap,script=/opt/local.ch/sys/kvm/bin/ifup-fz,downscript=/opt/local.ch/sys/kvm/bin/ifdown,vlan=300 -smp 4
|
||||
|
||||
## Common Tasks
|
||||
|
||||
The following sections show you how to do regular maintenance
|
||||
tasks on the KVM infrastructure.
|
||||
|
||||
### Create a VM
|
||||
|
||||
VMs can easily be created using the script **vm/create-vm** from the sysadmin-logs repository
|
||||
(local.ch internally), which looks like this:
|
||||
|
||||
sexy host add --type vm $fqdn
|
||||
sexy host vm-host-set --vm-host $vmhost $fqdn
|
||||
sexy host disk-add --size $disksize $fqdn
|
||||
sexy host memory-set --memory $memory $fqdn
|
||||
sexy host cores-set --cores $cores $fqdn
|
||||
|
||||
mac_pz=$(sexy mac generate)
|
||||
mac_fz=$(sexy mac generate)
|
||||
sexy host nic-add $fqdn -m $mac_pz -n pz
|
||||
sexy host nic-add $fqdn -m $mac_fz -n fz
|
||||
|
||||
sexy net-ipv4 host-add "$net_pz" -m "$mac_pz" -f "$fqdn"
|
||||
sexy net-ipv4 host-add "$net_fz" -m "$mac_fz" -f "$fz_fqdn"
|
||||
|
||||
echo "Updating git / github ..."
|
||||
cd ~/.sexy
|
||||
git add db
|
||||
git commit -m "Added host $fqdn"
|
||||
git pull
|
||||
git push
|
||||
|
||||
# Apply changes: first network, so dhcp & dns are ok, then create VM
|
||||
cat << eof
|
||||
Todo for apply:
|
||||
sexy net-ipv4 apply --all
|
||||
sexy host apply --all
|
||||
|
||||
Start VM on $vmhost: ssh $vmhost /opt/local.ch/sys/kvm/vm/$fqdn/start
|
||||
eof
|
||||
|
||||
|
||||
### Delete a VM
|
||||
|
||||
Run the script **remove-host**, which essentially does the following:
|
||||
|
||||
* Remove various monitoring / backup configurations
|
||||
* Detect if it is a VM, if so
|
||||
* Stop it
|
||||
* Remove it from the host
|
||||
* Add mac address to the list of free mac addresses
|
||||
* Delete host from the networks
|
||||
* Delete host from sexy database
|
||||
|
||||
|
||||
### Move VM to another server
|
||||
|
||||
To move one VM to another host, the following steps are necessary:
|
||||
|
||||
* sexy host vm-host-set ... # to new host
|
||||
* stop vm
|
||||
* scp/rsync directory from old host to new host
|
||||
* sexy host apply --all # record db change
|
||||
* start vm on new host
|
||||
|
||||
|
||||
[[!tag cdist localch net sexy unix]]
|
Binary file not shown.
After Width: | Height: | Size: 74 KiB |
Loading…
Add table
Reference in a new issue