Migrate the FreeBSD raid monitoring article
Signed-off-by: Nico Schottelius <nico@ikn.schottelius.org>
This commit is contained in:
parent
d7f8fa91f7
commit
3ad268e41d
1 changed files with 65 additions and 0 deletions
65
docs/freebsd-raid-monitoring.mdwn
Normal file
65
docs/freebsd-raid-monitoring.mdwn
Normal file
|
@ -0,0 +1,65 @@
|
||||||
|
[[!meta title="FreeBSD Raid Monitoring"]]
|
||||||
|
|
||||||
|
### Introduction
|
||||||
|
<p>You've a raid and you want to monitor it with FreeBSD. That may or
|
||||||
|
may not be a problem. I'll try to summarise all information I got. If
|
||||||
|
you know that there's something incorrect or outdated, please contact
|
||||||
|
me. In general monitoring the state of a raid may be problematic, if
|
||||||
|
the hardware does not expose the needed information or does just expose
|
||||||
|
it via notification (it sends a messages "raid status changed" through
|
||||||
|
the driver, which you can try to grep out of syslog, but you cannot
|
||||||
|
monitor it actively).</p>
|
||||||
|
|
||||||
|
### Status of this document
|
||||||
|
This document was initially written on the 2nd of August 2007.
|
||||||
|
It was last updated on the 11th of February 2009, but
|
||||||
|
migrated to
|
||||||
|
[www.nico.schottelius.org](http://www.nico.schottelius.org)
|
||||||
|
on the 12th of May 2009.
|
||||||
|
|
||||||
|
## List of raid systems and how to monitor them
|
||||||
|
<br /><br />
|
||||||
|
### FreeBSD gmirror software raid
|
||||||
|
<br />As you might expect, monitoring this raid is pretty easy. We achieved that with the following two scripts: <br />
|
||||||
|
<pre>ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfs_gmirror.sh <br />#!/bin/sh<br />#==============================================================================<br /># Copyright (c) 2007, Netstream AG<br /># Author: Nico Schottelius <nico-freebsd-raid-monitoring <at> schottelius.org><br /># Created: 2007-04-23<br /># Description: Display state of all gmirror devices<br /># Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh<br />#==============================================================================<br /><br />gmirror list | \<br /> awk -F: 'BEGIN { print "gmirror devices";<br /> print "---------------";<br /> }<br /> /^Geom name:/ {<br /> name=$2<br /> }<br /> /^State:/ {<br /> print name ":" $2<br /> }'<br /></pre>
|
||||||
|
And the one that is called by cron:<br />
|
||||||
|
<pre>ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfrib_gmirror.sh <br />#!/bin/sh<br />#==============================================================================<br /># Copyright (c) 2007, Netstream AG<br /># Author: Nico Schottelius <nico-freebsd-raid-monitoring <at> schottelius.org><br /># Created: 2007-04-23<br /># Description: Report broken devices.<br /># Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh<br />#==============================================================================<br /><br />check=$(dirname $0)/cfs_gmirror.sh<br /><br /># Skip first two lines: header<br />"$check" | awk -F": " 'BEGIN { getline; getline } $2 !~ /COMPLETE/ { print $1 ":" $2 }'<br /><br /></pre>
|
||||||
|
###
|
||||||
|
LSI / Symbios Megaraid (<i>amr</i> driver)
|
||||||
|
<br />There are two possibilities to monitor amr-based devices:<br />
|
||||||
|
<ul><li>with <b>megarc</b></li><li>with <b>amrstat</b></li></ul>
|
||||||
|
<br />The utility "amrstat" is availale in ports as sysutils/amrstat and is <a title="The term "FOSS"" href="../../documentations/foss/the-term-foss">FOSS</a>. Calling it reveals all needed information:<br /><br />
|
||||||
|
<pre>ddna044# amrstat <br />Logical volume 0: optimal (136.73 GB, RAID0)<br />Logical volume 1: optimal (136.73 GB, RAID0)<br />Physical drive 1:1 online<br />Physical drive 1:2 online<br /></pre>
|
||||||
|
<br />The utility "<b>megarc</b>" is available in ports (sysutils/megarc), which is a <b>closed source </b>binary provided by LSI. I've found two easy to use scripts for this controller written by Scott Mitchell on <a href="http://lists.freebsd.org/pipermail/freebsd-questions/2006-June/125470.html">http://lists.freebsd.org/pipermail/freebsd-questions/2006-June/125470.html</a>:<br />
|
||||||
|
<pre>#!/bin/sh -f<br />#<br /># Check status of RAID volumes on amr(4) controllers using the LSI MegaRC<br /># utility. If any logical drive has a status other than OPTIMAL, or any<br /># physical disks has a status other that ONLINE, display the full status<br /># for the adapter. If more than one adapter exists, add additional unit<br /># numbers to $adapters.<br />#<br /># $Id$<br />#<br /><br />adapters="0"<br /><br />for adapter in $adapters; do<br /> status=`/usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\<br /> /usr/bin/sed '1,$s/^M//' |\<br /> /usr/bin/sed '1,/Information Of Logical Drive/d'` ||\<br /> echo "Failed to get RAID status for AMR adapter ${adapter}"<br /><br /> echo "${status}" |\<br /> /usr/bin/egrep '^ Logical Drive : .*: Status: .*$' |\<br /> /usr/bin/egrep -qv 'OPTIMAL$'<br /> drives=$?<br /><br /> echo "${status}" |\<br /> /usr/bin/egrep '^ [0-9]+' |\<br /> /usr/bin/egrep -qv 'ONLINE$'<br /> disks=$?<br /><br /> if [ ${drives} -ne 1 -o ${disks} -ne 1 ]; then<br /> echo ""<br /> echo "AMR RAID status (adapter ${adapter}):"<br /> echo "${status}"<br /> fi<br />done<br /></pre>
|
||||||
|
<p><b>Warning:</b> The above script may not work when doing copy and paste, as reported by Per olof Ljungmark:</p>
|
||||||
|
<pre>I proceeded to test the scripts but the first one gives you an error due<br />to what Scott Mitchell wrote in his original mail:<br />"BTW, the '^M' in the amr-check-status script is a real Control-M<br />character, and there are embedded tabs in a couple of the egrep patterns,<br />in case those get lost in transit."<br /><br /><br />Don't know if ^M will show in a browser but the 16th. line should read:<br />/usr/bin/sed '1,$s/^M//' |\<br />otherwise you will get a sed error.<br /></pre>
|
||||||
|
<p>And the other one:</p>
|
||||||
|
<pre><br />#!/bin/sh -f<br />#<br /># Display status of RAID volumes on amr(4) controllers using the LSI MegaRC<br /># utility. If more than one adapter exists, add additional unit numbers to<br /># $adapters.<br />#<br /># $Id$<br />#<br /><br /># If there is a global system configuration file, suck it in.<br />#<br />if [ -r /etc/defaults/periodic.conf ]; then<br /> . /etc/defaults/periodic.conf<br /> source_periodic_confs<br />fi<br /><br />adapters="0"<br /><br />rc=0<br />case "${daily_amr_status_enable:-YES}" in<br /> [Nn][Oo])<br /> ;;<br /> *)<br /> for adapter in $adapters; do<br /> echo ""<br /> echo "AMR RAID status (adapter ${adapter}):"<br /> /usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\<br /> sed '1,/Information Of Logical Drive/d' || rc=$?<br /> done<br /> ;;<br />esac<br /><br />exit "$rc"<br /></pre>
|
||||||
|
<p>For more information on supported devices have a look at <a href="http://www.freebsd.org/cgi/man.cgi?query=amr&apropos=0&sektion=4&manpath=FreeBSD+6.2-RELEASE&format=html">amr(4)</a>.</p>
|
||||||
|
### mpt
|
||||||
|
<br />mpt based devices can be monitored under Linux with the kernel module "mptctl" and the <a title="The term "FOSS"" href="../../documentations/foss/the-term-foss">FOSS</a> tool "<a href="http://www.drugphish.ch/~ratz/mpt-status/">mpt-status</a>". There seems to be no support under FreeBSD available currently. For more information about mpt have a look at <a href="http://www.freebsd.org/cgi/man.cgi?query=mpt&apropos=0&sektion=4&manpath=FreeBSD+6.2-RELEASE&format=html">mpt(4)</a>.<br /><br />
|
||||||
|
### ciss
|
||||||
|
<br />
|
||||||
|
Known tools:<br />
|
||||||
|
<ul><li>camcontrol<a id="acu" name="acu"></a></li><li><a id="acu" name="acu">hpacucli</a></li></ul>
|
||||||
|
### <a id="acu" name="acu"></a>
|
||||||
|
### <a id="acu" name="acu"></a>
|
||||||
|
<br />This driver is used for most HP / Compaq controllers and is (afaik) found in almost all modern SAS/SATA systems provided by HP. As described in http://www.unixadmintalk.com/f41/monitoring-raid-arrays-51889/, you can monitor it via <b>camcontrol</b>:<br /><br />
|
||||||
|
<pre># camcontrol inquiry da0<br />pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device<br />pass0: 135.168MB/s transfers<br /></pre>
|
||||||
|
<p>(This is untested by me, just found it on the net). On <a href="http://lists.freebsd.org/pipermail/freebsd-proliant/2006-October/000169.html">http://lists.freebsd.org/pipermail/freebsd-proliant/2006-October/000169.html</a> I also found the relevant strings to look for:<br /></p>
|
||||||
|
<pre>During normal operation of the raid:<br /># camcontrol inquiry da0 -D<br />pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device<br /><br />After removing one of the raid member disks:<br /># camcontrol inquiry da0 -D<br />pass0: <COMPAQ RAID 1 VOLUME inte> Fixed Direct Access SCSI-0 device<br /><br />After re-inserting the raid member disk:<br /># camcontrol inquiry da0 -D<br />pass0: <COMPAQ RAID 1 VOLUME reco> Fixed Direct Access SCSI-0 device<br /><br />And about 45 minutes later:<br /># camcontrol inquiry da0 -D<br />pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device<br /></pre>
|
||||||
|
<p>You could also use <a id="acu" name="acu">hpacucli, which can be found at </a>http://people.freebsd.org/~jcagle/. I have no experience with it. So if you have, you can send report or scripts to monitor it to me, so I can include it here (the hint to it was send by Jaimie Sirovich.<br /></p>
|
||||||
|
### 3ware raid: twa/twe
|
||||||
|
<p>Install and configure <b>sysutils/3dm</b>. This installs a daemon that provides a webinterface and which is also capable to notify you via e-mail if something happens. This is perhaps the easiest way of monitoring raid in FreeBSD. The other possibility to monitor 3ware raids is via <b>tw_cli</b>.</p>
|
||||||
|
<p></p>
|
||||||
|
### ataraid
|
||||||
|
<p>This is a softwareraid driver for many different cards, have a look at ataraid(4). Somebody in ##freebsd (irc.freenode.org) pasted the url <a href="http://www.monkeybrains.net/~rudy/example/raid_status.html">http://www.monkeybrains.net/~rudy/example/raid_status.html</a>, which contains a script that monitors gmirror, 3ware (via tw_cli) and also ataraid (ar0) via <b>atacontrol</b>. For archiving, the script is mirrored below:</p>
|
||||||
|
<pre>#!/bin/sh<br /><br /># raid_status - check the state of the RAID. <br /><br /># This script works for various types of RAID devices. (Currently, 3Ware, gmirror, BSd 'ar0' raids)<br /># WARNING: Install the proper CLI program for your 3ware card, if you use 3ware.<br /><br /># Set up a cronjob like this:<br /># */16 * * * * /home/rudy/bin/raid_status CRON<br /><br />### Copyright (c) 2006, Rudy Rucker All rights reserved.<br />### Redistribution and use of script, with or without modification, is<br />### permitted provided that the following condition is met:<br />### Redistributions of source code must retain the above copyright<br />### notice, this list of conditions and the following disclaimer.<br />### THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND<br />### ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE<br />### IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE<br />### ARE DISCLAIMED.<br /><br /># ----------- Change Log ------------<br /># Mon Oct 11 15:20:37 PDT 2004 - rudy<br /># Original script.<br /># Tue Feb 7 01:28:07 PST 2006 - rudy<br /># Added 9500 and 9550 support<br /># Fri Jun 9 10:38:33 PDT 2006 - rudy<br /># works for 'ar' and 'tw' mirrored arrays<br /># Tue Sep 12 10:23:13 PDT 2006 - rudy<br /># Added gmirror and realized that not all 3ware's are the same...<br /><br />MODE=$1<br /><br />TWCLI="/usr/local/bin/tw_cli"<br />GMIRROR="/sbin/gmirror"<br />ATACONTROL="/sbin/atacontrol"<br /><br />AWK="/usr/bin/awk"<br />GREP="/usr/bin/grep"<br />MAIL="/usr/bin/mail"<br /><br />EMAIL="noc@example.com"<br /><br /># if this is not a 3ware card, check the atacontol<br />if [ -c /dev/twed0 ] && [ -x $TWCLI ]; then<br /> # 3ware card ... 8000 series<br /> STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;<br /> VALID='OK'<br /> ESTATUS_CMD="$TWCLI info c0 u0";<br /> # double check the 3ware output incase it returned nada...<br /> # Umm... this is the only raid card I have witness this bug<br /> if [ "X$STATUS" = "X" ]; then<br /> sleep 1;<br /> STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;<br /> fi<br />elif [ -c /dev/da0 ] && [ -x $TWCLI ]; then<br /> # Note, there are plenty of other device names that use da0... this script is<br /> # not for those... works with:<br /> # 3ware 9550SX, 9500S<br /> STATUS=`$TWCLI info c0 | $GREP "^u0" | $AWK '{print $3}'`;<br /> VALID='OK'<br /> ESTATUS_CMD="$TWCLI info c0 u0"<br />elif [ -c /dev/mirror/gm0 ] && [ -x $GMIRROR ]; then<br /> # gmirror /dev/mirror/gm0<br /> STATUS=`$GMIRROR status gm0 | $GREP "^mirror" | $AWK {'print $2'}`;<br /> VALID='COMPLETE'<br /> ESTATUS_CMD="$GMIRROR list";<br />elif [ -c /dev/ar0 ] && [ -x $ATACONTROL ]; then<br /> # Motherboard promise and others<br /> STATUS=`$ATACONTROL status ar0 | $GREP "status" | $AWK -F 'status: ' '{print $2}'`;<br /> VALID='READY'<br /> ESTATUS_CMD="/sbin/atacontrol status ar0"<br />else<br /> echo "Unknown Raid type.... ";<br /> if [ -x $TWCLI ]; then<br /> echo " + found $TWCLI";<br /> else<br /> echo " - can't exec $TWCLI";<br /> fi<br /> if [ -x $ATACONTROL ]; then<br /> echo " + found $ATACONTROL";<br /> else<br /> echo " - can't exec $ATACONTROL";<br /> fi<br /> if [ -x $GMIRROR ]; then<br /> echo " + found $GMIRROR";<br /> else<br /> echo " - can't exec $GMIRROR";<br /> fi<br /> exit;<br />fi<br /><br /># Okay, we checked the raid status and know what the return code should be.<br />if [ "$STATUS" = "$VALID" ]; then<br /> if [ "$MODE" = "CRON" ]; then<br /> exit;<br /> fi<br /> echo "OK condition"; <br /> $ESTATUS_CMD<br /> exit;<br />fi<br /><br /># ERROR! Either print to TTY or send an email, based on MODE (which is arg[1])<br />if [ "$MODE" = "CRON" ]; then<br /> $ESTATUS_CMD | $MAIL -s "[ERROR] Raid array on $HOST returned $STATUS" $EMAIL<br />else<br /> echo "ERROR condition"<br /> $ESTATUS_CMD<br />fi<br /><br /></pre>
|
||||||
|
### Adaptec: <i>aac</i>
|
||||||
|
<br />Jaimie Sirovich reported that you can monitor some adaptec card with the <a href="http://www.freshports.org/sysutils/aaccli">aaccli.</a><br />More information and examples are currently missing.<br /><br />
|
||||||
|
### Areca: <i>arcmsr</i>
|
||||||
|
<br />The areca controller can either be monitored directly from the raid controller (8 and 16 port versions), which has an own nic and rj45 port or via the closed source webserver (which is the same one as running on the controller), which can be downloaded from <a href="http://www.areca.com.tw/support/main.htm">http://www.areca.com.tw/support/main.htm</a>. Configuring it means just to click around in the webinterface.<br /><br />asr<br /><br />Are reported to be monitorable via http://www.freshports.org/sysutils/asr-utils<br /><br />
|
||||||
|
|
||||||
|
|
||||||
|
[[!tag unix freebsd storage]]
|
Loading…
Reference in a new issue