www.nico.schottelius.org/docs/freebsd-raid-monitoring.mdwn

93 lines
14 KiB
Markdown

[[!meta title="FreeBSD Raid Monitoring"]]
### Introduction
You've a raid and you want to monitor it with FreeBSD. That may or
may not be a problem. I'll try to summarise all information I got. If
you know that there's something incorrect or outdated, please contact
me. In general monitoring the state of a raid may be problematic, if
the hardware does not expose the needed information or does just expose
it via notification (it sends a messages "raid status changed" through
the driver, which you can try to grep out of syslog, but you cannot
monitor it actively).
### Status of this document
This document was initially written on the 2nd of August 2007.
It was migrated to
[www.nico.schottelius.org](http://www.nico.schottelius.org)
on the 12th of May 2009.
You can have a look into [[git|about/websites]], to see when it was
last updated.
## List of raid systems and how to monitor them
### FreeBSD gmirror software raid
As you might expect, monitoring this raid is pretty easy.
We achieved that with the following two scripts:
<pre>ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfs_gmirror.sh <br />#!/bin/sh<br />#==============================================================================<br /># Copyright (c) 2007, Netstream AG<br /># Author: Nico Schottelius &lt;nico-freebsd-raid-monitoring &lt;at&gt; schottelius.org&gt;<br /># Created: 2007-04-23<br /># Description: Display state of all gmirror devices<br /># Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh<br />#==============================================================================<br /><br />gmirror list | \<br /> awk -F: 'BEGIN { print "gmirror devices";<br /> print "---------------";<br /> }<br /> /^Geom name:/ {<br /> name=$2<br /> }<br /> /^State:/ {<br /> print name ":" $2<br /> }'<br /></pre>
And the one that is called by cron:<br />
<pre>ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfrib_gmirror.sh <br />#!/bin/sh<br />#==============================================================================<br /># Copyright (c) 2007, Netstream AG<br /># Author: Nico Schottelius &lt;nico-freebsd-raid-monitoring &lt;at&gt; schottelius.org&gt;<br /># Created: 2007-04-23<br /># Description: Report broken devices.<br /># Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh<br />#==============================================================================<br /><br />check=$(dirname $0)/cfs_gmirror.sh<br /><br /># Skip first two lines: header<br />"$check" | awk -F": " 'BEGIN { getline; getline } $2 !~ /COMPLETE/ { print $1 ":" $2 }'<br /><br /></pre>
### LSI / Symbios Megaraid (<i>amr</i> driver)
<br />There are two possibilities to monitor amr-based devices:<br />
<ul><li>with <b>megarc</b></li><li>with <b>amrstat</b></li></ul>
<br />The utility "amrstat" is availale in ports as sysutils/amrstat and is <a title="The term &quot;FOSS&quot;" href="../../documentations/foss/the-term-foss">FOSS</a>. Calling it reveals all needed information:<br /><br />
<pre>ddna044# amrstat <br />Logical volume 0: optimal (136.73 GB, RAID0)<br />Logical volume 1: optimal (136.73 GB, RAID0)<br />Physical drive 1:1 online<br />Physical drive 1:2 online<br /></pre>
<br />The utility "<b>megarc</b>" is available in ports (sysutils/megarc), which is a <b>closed source </b>binary provided by LSI. I've found two easy to use scripts for this controller written by Scott Mitchell on <a href="http://lists.freebsd.org/pipermail/freebsd-questions/2006-June/125470.html">http://lists.freebsd.org/pipermail/freebsd-questions/2006-June/125470.html</a>:<br />
<pre>#!/bin/sh -f<br />#<br /># Check status of RAID volumes on amr(4) controllers using the LSI MegaRC<br /># utility. If any logical drive has a status other than OPTIMAL, or any<br /># physical disks has a status other that ONLINE, display the full status<br /># for the adapter. If more than one adapter exists, add additional unit<br /># numbers to $adapters.<br />#<br /># $Id$<br />#<br /><br />adapters="0"<br /><br />for adapter in $adapters; do<br /> status=`/usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\<br /> /usr/bin/sed '1,$s/^M//' |\<br /> /usr/bin/sed '1,/Information Of Logical Drive/d'` ||\<br /> echo "Failed to get RAID status for AMR adapter ${adapter}"<br /><br /> echo "${status}" |\<br /> /usr/bin/egrep '^ Logical Drive : .*: Status: .*$' |\<br /> /usr/bin/egrep -qv 'OPTIMAL$'<br /> drives=$?<br /><br /> echo "${status}" |\<br /> /usr/bin/egrep '^ [0-9]+' |\<br /> /usr/bin/egrep -qv 'ONLINE$'<br /> disks=$?<br /><br /> if [ ${drives} -ne 1 -o ${disks} -ne 1 ]; then<br /> echo ""<br /> echo "AMR RAID status (adapter ${adapter}):"<br /> echo "${status}"<br /> fi<br />done<br /></pre>
<p><b>Warning:</b> The above script may not work when doing copy and paste, as reported by Per olof Ljungmark:</p>
<pre>I proceeded to test the scripts but the first one gives you an error due<br />to what Scott Mitchell wrote in his original mail:<br />"BTW, the '^M' in the amr-check-status script is a real Control-M<br />character, and there are embedded tabs in a couple of the egrep patterns,<br />in case those get lost in transit."<br /><br /><br />Don't know if ^M will show in a browser but the 16th. line should read:<br />/usr/bin/sed '1,$s/^M//' |\<br />otherwise you will get a sed error.<br /></pre>
<p>And the other one:</p>
<pre><br />#!/bin/sh -f<br />#<br /># Display status of RAID volumes on amr(4) controllers using the LSI MegaRC<br /># utility. If more than one adapter exists, add additional unit numbers to<br /># $adapters.<br />#<br /># $Id$<br />#<br /><br /># If there is a global system configuration file, suck it in.<br />#<br />if [ -r /etc/defaults/periodic.conf ]; then<br /> . /etc/defaults/periodic.conf<br /> source_periodic_confs<br />fi<br /><br />adapters="0"<br /><br />rc=0<br />case "${daily_amr_status_enable:-YES}" in<br /> [Nn][Oo])<br /> ;;<br /> *)<br /> for adapter in $adapters; do<br /> echo ""<br /> echo "AMR RAID status (adapter ${adapter}):"<br /> /usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\<br /> sed '1,/Information Of Logical Drive/d' || rc=$?<br /> done<br /> ;;<br />esac<br /><br />exit "$rc"<br /></pre>
<p>For more information on supported devices have a look at <a href="http://www.freebsd.org/cgi/man.cgi?query=amr&amp;apropos=0&amp;sektion=4&amp;manpath=FreeBSD+6.2-RELEASE&amp;format=html">amr(4)</a>.</p>
### mpt
<br />mpt based devices can be monitored under Linux with the kernel module "mptctl" and the <a title="The term &quot;FOSS&quot;" href="../../documentations/foss/the-term-foss">FOSS</a> tool "<a href="http://www.drugphish.ch/~ratz/mpt-status/">mpt-status</a>". There seems to be no support under FreeBSD available currently. For more information about mpt have a look at <a href="http://www.freebsd.org/cgi/man.cgi?query=mpt&amp;apropos=0&amp;sektion=4&amp;manpath=FreeBSD+6.2-RELEASE&amp;format=html">mpt(4)</a>.<br /><br />
### ciss
Known tools:
* camcontrol
* hpacucli
<br />This driver is used for most HP / Compaq controllers and is (afaik) found in almost all modern SAS/SATA systems provided by HP. As described in http://www.unixadmintalk.com/f41/monitoring-raid-arrays-51889/, you can monitor it via <b>camcontrol</b>:<br /><br />
<pre># camcontrol inquiry da0<br />pass0: &lt;COMPAQ RAID 1 VOLUME OK&gt; Fixed Direct Access SCSI-0 device<br />pass0: 135.168MB/s transfers<br /></pre>
<p>(This is untested by me, just found it on the net). On <a href="http://lists.freebsd.org/pipermail/freebsd-proliant/2006-October/000169.html">http://lists.freebsd.org/pipermail/freebsd-proliant/2006-October/000169.html</a> I also found the relevant strings to look for:<br /></p>
<pre>During normal operation of the raid:<br /># camcontrol inquiry da0 -D<br />pass0: &lt;COMPAQ RAID 1 VOLUME OK&gt; Fixed Direct Access SCSI-0 device<br /><br />After removing one of the raid member disks:<br /># camcontrol inquiry da0 -D<br />pass0: &lt;COMPAQ RAID 1 VOLUME inte&gt; Fixed Direct Access SCSI-0 device<br /><br />After re-inserting the raid member disk:<br /># camcontrol inquiry da0 -D<br />pass0: &lt;COMPAQ RAID 1 VOLUME reco&gt; Fixed Direct Access SCSI-0 device<br /><br />And about 45 minutes later:<br /># camcontrol inquiry da0 -D<br />pass0: &lt;COMPAQ RAID 1 VOLUME OK&gt; Fixed Direct Access SCSI-0 device<br /></pre>
<p>You could also use <a id="acu" name="acu">hpacucli, which can be found at </a>http://people.freebsd.org/~jcagle/. I have no experience with it. So if you have, you can send report or scripts to monitor it to me, so I can include it here (the hint to it was send by Jaimie Sirovich.<br /></p>
### 3ware raid: twa/twe
<p>Install and configure <b>sysutils/3dm</b>. This installs a daemon that provides a webinterface and which is also capable to notify you via e-mail if something happens. This is perhaps the easiest way of monitoring raid in FreeBSD. The other possibility to monitor 3ware raids is via <b>tw_cli</b>.</p>
### ataraid
This is a softwareraid driver for many different cards.
Have a look at ataraid(4).
Somebody in ##freebsd (irc.freenode.org) pasted the url
<a href="http://www.monkeybrains.net/~rudy/example/raid_status.html">http://www.monkeybrains.net/~rudy/example/raid_status.html</a>, which contains a script that monitors gmirror, 3ware (via tw_cli) and also ataraid (ar0) via <b>atacontrol</b>.
For archiving, the script is mirrored below:</p>
<pre>#!/bin/sh<br /><br /># raid_status - check the state of the RAID. <br /><br /># This script works for various types of RAID devices. (Currently, 3Ware, gmirror, BSd 'ar0' raids)<br /># WARNING: Install the proper CLI program for your 3ware card, if you use 3ware.<br /><br /># Set up a cronjob like this:<br /># */16 * * * * /home/rudy/bin/raid_status CRON<br /><br />### Copyright (c) 2006, Rudy Rucker All rights reserved.<br />### Redistribution and use of script, with or without modification, is<br />### permitted provided that the following condition is met:<br />### Redistributions of source code must retain the above copyright<br />### notice, this list of conditions and the following disclaimer.<br />### THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND<br />### ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE<br />### IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE<br />### ARE DISCLAIMED.<br /><br /># ----------- Change Log ------------<br /># Mon Oct 11 15:20:37 PDT 2004 - rudy<br /># Original script.<br /># Tue Feb 7 01:28:07 PST 2006 - rudy<br /># Added 9500 and 9550 support<br /># Fri Jun 9 10:38:33 PDT 2006 - rudy<br /># works for 'ar' and 'tw' mirrored arrays<br /># Tue Sep 12 10:23:13 PDT 2006 - rudy<br /># Added gmirror and realized that not all 3ware's are the same...<br /><br />MODE=$1<br /><br />TWCLI="/usr/local/bin/tw_cli"<br />GMIRROR="/sbin/gmirror"<br />ATACONTROL="/sbin/atacontrol"<br /><br />AWK="/usr/bin/awk"<br />GREP="/usr/bin/grep"<br />MAIL="/usr/bin/mail"<br /><br />EMAIL="noc@example.com"<br /><br /># if this is not a 3ware card, check the atacontol<br />if [ -c /dev/twed0 ] &amp;&amp; [ -x $TWCLI ]; then<br /> # 3ware card ... 8000 series<br /> STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;<br /> VALID='OK'<br /> ESTATUS_CMD="$TWCLI info c0 u0";<br /> # double check the 3ware output incase it returned nada...<br /> # Umm... this is the only raid card I have witness this bug<br /> if [ "X$STATUS" = "X" ]; then<br /> sleep 1;<br /> STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;<br /> fi<br />elif [ -c /dev/da0 ] &amp;&amp; [ -x $TWCLI ]; then<br /> # Note, there are plenty of other device names that use da0... this script is<br /> # not for those... works with:<br /> # 3ware 9550SX, 9500S<br /> STATUS=`$TWCLI info c0 | $GREP "^u0" | $AWK '{print $3}'`;<br /> VALID='OK'<br /> ESTATUS_CMD="$TWCLI info c0 u0"<br />elif [ -c /dev/mirror/gm0 ] &amp;&amp; [ -x $GMIRROR ]; then<br /> # gmirror /dev/mirror/gm0<br /> STATUS=`$GMIRROR status gm0 | $GREP "^mirror" | $AWK {'print $2'}`;<br /> VALID='COMPLETE'<br /> ESTATUS_CMD="$GMIRROR list";<br />elif [ -c /dev/ar0 ] &amp;&amp; [ -x $ATACONTROL ]; then<br /> # Motherboard promise and others<br /> STATUS=`$ATACONTROL status ar0 | $GREP "status" | $AWK -F 'status: ' '{print $2}'`;<br /> VALID='READY'<br /> ESTATUS_CMD="/sbin/atacontrol status ar0"<br />else<br /> echo "Unknown Raid type.... ";<br /> if [ -x $TWCLI ]; then<br /> echo " + found $TWCLI";<br /> else<br /> echo " - can't exec $TWCLI";<br /> fi<br /> if [ -x $ATACONTROL ]; then<br /> echo " + found $ATACONTROL";<br /> else<br /> echo " - can't exec $ATACONTROL";<br /> fi<br /> if [ -x $GMIRROR ]; then<br /> echo " + found $GMIRROR";<br /> else<br /> echo " - can't exec $GMIRROR";<br /> fi<br /> exit;<br />fi<br /><br /># Okay, we checked the raid status and know what the return code should be.<br />if [ "$STATUS" = "$VALID" ]; then<br /> if [ "$MODE" = "CRON" ]; then<br /> exit;<br /> fi<br /> echo "OK condition"; <br /> $ESTATUS_CMD<br /> exit;<br />fi<br /><br /># ERROR! Either print to TTY or send an email, based on MODE (which is arg[1])<br />if [ "$MODE" = "CRON" ]; then<br /> $ESTATUS_CMD | $MAIL -s "[ERROR] Raid array on $HOST returned $STATUS" $EMAIL<br />else<br /> echo "ERROR condition"<br /> $ESTATUS_CMD<br />fi<br /><br /></pre>
### Adaptec: aac
Jaimie Sirovich reported that you can monitor some adaptec card with the
[aacli](http://www.freshports.org/sysutils/aaccli)
More information and examples are currently missing.
### Areca: arcmsr
The areca controller can either be monitored directly from the raid controller
(8 and 16 port versions), which has an own nic and rj45 port or via the
***closed source*** webserver
(which is the same one as running on the controller).
It can be downloaded from
[areca.com](http://www.areca.com.tw/support/main.htm).
Configuring it means just to click around in the webinterface.
### asr
Are reported to be monitorable via
[asr-utils](http://www.freshports.org/sysutils/asr-utils)
(confirmation needed).
[[!tag unix freebsd storage]]