ungleich-staticcms/content/u/blog/DRAFT-how-i-run-my-budget-ceph-cluster/contents.lr
2020-01-08 18:05:19 +01:00

69 lines
4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

title: How i run my ceph cluster or how you can make a killer storage solutions for almost free
---
pub_date: 2020-01-08
---
author: ungleich
---
twitter_handle: ungleich
---
_hidden: yes
---
_discoverable: no
---
abstract:
I wanted to store some data data and this is what I've came up with.
---
body:
Low cost, high tech datastorage with ceph
First of all why would you run a ceph cluster? Its complex, a bit time consuming and easier to lose data than using zfs or ext4.
My reasons:
its very easy to expand/shrink
manage all your data/disks from 1 host (can be a security risk too)
its fun
we have it in production and it scales well
unifying the physical
Step 1 :
Find your local hw dealer.
Second hand sites can be a good source, but good deals are rare. My tactics on ricardo.ch is: server & zubehor, filters: used, auction, max 1,5kchf, sorted with the ending soonest first) link : https://www.ricardo.ch/de/c/server-und-zubehoer-39328/?range_filters.price.max=1527&item_condition=used&offer_type=auction&sort=close_to_end
Nearby dangerous material (ewaste) handler companies can be a goldmine. Big companies cannot just throw used hardware out as regular waste becuse electronics contains a little amount of lead (or some other heavy metal). So big compnies sometimes happy to sell it as a used product for cheap and ewaste companies are happy if they get more money than the recycled price / kg which is very-very low
low quality (core2duo era) pc-s also suffice, but you wont be able to do erasure coded pools are they use a ton of processing power and ram.. be careful with the ram, if you run out of swap/ram, your osd process will be killed, learnt it the hard way. also sometimes recovery uses more ram than usual so keep some free ram for safety.
Put 10G nics on your shopping list, for performance, its absolutely crucial. I've started without it, and its certainly doable but it wont perform well. A little hack is to pick up gigabit nic cards (as some people give them away for free), and put them in an lacp bond. Note here: lacp doesnt make a single connection's speed better, the benefit is only realized at parallel connections.
If you dont care or have equal disks by size or speed no worries, ceph will happily consume everything you feed to it (except smr disks* or strictly only for frozen data) One hack is to snag some old/low capacity disks for free. If you do everyting right you can surpass ssd speeds with crappy spinning rusts. Worried about disks dying? Just have higher redundancy levels (keep 2 extra copies of your data)
My personal approach is to have coldish data 2 times, hot data like vms 3x and 1 extra copy of both on non-ceph filesystem.
You can also group disks by performance/size. Ideally the disks should be uniform in a ceph device class, and equally distributed between hosts.
Avoid hardware raid, use cards that allow full control for to os over the disks. If you must use hw raid, raid0 is the way.
Install:
You can check out my ugly install script that meant to bootstrap a cluster on a vm.
tested on an alpine vm with an attached /dev/sdb datablock (don't use legacy ip (ipv4))
apk add bash
wget http://llnu.ml/data/ceph-setup
bash ./ceph-setup $ip_address_of_the_machine $subnet_that_you_will_plan_to_use
Operation:
I've never prepared a disks manually yet which i should definetly review, because Nico wrote amazing helper scripts which can be found in our repo: https://code.ungleich.ch/ungleich-public/ungleich-tools.git
Some scripts still need minor modifications because alpine doesnt ship ceph init scripts yet. For the time being I manage the processes by hand.
Alpine's vanilla kernel doesnt have rbd support compiled in it atm, but for any kernel that doesnt have the kernel module, just use rbd-nbd to map block devices from you cluster.
* https://blog.widodh.nl/2017/02/do-not-use-smr-disks-with-ceph/
Some useful commands:
Today were at Leipzig, Germany. Last day of [36c3](https://events.ccc.de/category/congress/36c3/), Chaos Communication Congress.
![](/u/image/xr-green.jpg)
## The earth is getting hotter