[[!meta title="FreeBSD Raid Monitoring"]] ### Introduction
You've a raid and you want to monitor it with FreeBSD. That may or may not be a problem. I'll try to summarise all information I got. If you know that there's something incorrect or outdated, please contact me. In general monitoring the state of a raid may be problematic, if the hardware does not expose the needed information or does just expose it via notification (it sends a messages "raid status changed" through the driver, which you can try to grep out of syslog, but you cannot monitor it actively).
### Status of this document This document was initially written on the 2nd of August 2007. It was last updated on the 11th of February 2009, but migrated to [www.nico.schottelius.org](http://www.nico.schottelius.org) on the 12th of May 2009. ## List of raid systems and how to monitor themddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfs_gmirror.shAnd the one that is called by cron:
#!/bin/sh
#==============================================================================
# Copyright (c) 2007, Netstream AG
# Author: Nico Schottelius <nico-freebsd-raid-monitoring <at> schottelius.org>
# Created: 2007-04-23
# Description: Display state of all gmirror devices
# Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh
#==============================================================================
gmirror list | \
awk -F: 'BEGIN { print "gmirror devices";
print "---------------";
}
/^Geom name:/ {
name=$2
}
/^State:/ {
print name ":" $2
}'
ddna044% cat /usr/local/scripts/fbsd_raid_monitor/cfrib_gmirror.sh### LSI / Symbios Megaraid (amr driver)
#!/bin/sh
#==============================================================================
# Copyright (c) 2007, Netstream AG
# Author: Nico Schottelius <nico-freebsd-raid-monitoring <at> schottelius.org>
# Created: 2007-04-23
# Description: Report broken devices.
# Created-By: /home/user/nico/firmen/netstream/sh/neues_skript.sh
#==============================================================================
check=$(dirname $0)/cfs_gmirror.sh
# Skip first two lines: header
"$check" | awk -F": " 'BEGIN { getline; getline } $2 !~ /COMPLETE/ { print $1 ":" $2 }'
ddna044# amrstat
Logical volume 0: optimal (136.73 GB, RAID0)
Logical volume 1: optimal (136.73 GB, RAID0)
Physical drive 1:1 online
Physical drive 1:2 online
#!/bin/sh -f
#
# Check status of RAID volumes on amr(4) controllers using the LSI MegaRC
# utility. If any logical drive has a status other than OPTIMAL, or any
# physical disks has a status other that ONLINE, display the full status
# for the adapter. If more than one adapter exists, add additional unit
# numbers to $adapters.
#
# $Id$
#
adapters="0"
for adapter in $adapters; do
status=`/usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\
/usr/bin/sed '1,$s/^M//' |\
/usr/bin/sed '1,/Information Of Logical Drive/d'` ||\
echo "Failed to get RAID status for AMR adapter ${adapter}"
echo "${status}" |\
/usr/bin/egrep '^ Logical Drive : .*: Status: .*$' |\
/usr/bin/egrep -qv 'OPTIMAL$'
drives=$?
echo "${status}" |\
/usr/bin/egrep '^ [0-9]+' |\
/usr/bin/egrep -qv 'ONLINE$'
disks=$?
if [ ${drives} -ne 1 -o ${disks} -ne 1 ]; then
echo ""
echo "AMR RAID status (adapter ${adapter}):"
echo "${status}"
fi
done
Warning: The above script may not work when doing copy and paste, as reported by Per olof Ljungmark:
I proceeded to test the scripts but the first one gives you an error due
to what Scott Mitchell wrote in his original mail:
"BTW, the '^M' in the amr-check-status script is a real Control-M
character, and there are embedded tabs in a couple of the egrep patterns,
in case those get lost in transit."
Don't know if ^M will show in a browser but the 16th. line should read:
/usr/bin/sed '1,$s/^M//' |\
otherwise you will get a sed error.
And the other one:
#!/bin/sh -f
#
# Display status of RAID volumes on amr(4) controllers using the LSI MegaRC
# utility. If more than one adapter exists, add additional unit numbers to
# $adapters.
#
# $Id$
#
# If there is a global system configuration file, suck it in.
#
if [ -r /etc/defaults/periodic.conf ]; then
. /etc/defaults/periodic.conf
source_periodic_confs
fi
adapters="0"
rc=0
case "${daily_amr_status_enable:-YES}" in
[Nn][Oo])
;;
*)
for adapter in $adapters; do
echo ""
echo "AMR RAID status (adapter ${adapter}):"
/usr/local/sbin/megarc -ldinfo -a${adapter} -Lall -nolog |\
sed '1,/Information Of Logical Drive/d' || rc=$?
done
;;
esac
exit "$rc"
For more information on supported devices have a look at amr(4).
### mpt# camcontrol inquiry da0
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device
pass0: 135.168MB/s transfers
(This is untested by me, just found it on the net). On http://lists.freebsd.org/pipermail/freebsd-proliant/2006-October/000169.html I also found the relevant strings to look for:
During normal operation of the raid:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device
After removing one of the raid member disks:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME inte> Fixed Direct Access SCSI-0 device
After re-inserting the raid member disk:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME reco> Fixed Direct Access SCSI-0 device
And about 45 minutes later:
# camcontrol inquiry da0 -D
pass0: <COMPAQ RAID 1 VOLUME OK> Fixed Direct Access SCSI-0 device
You could also use hpacucli, which can be found at http://people.freebsd.org/~jcagle/. I have no experience with it. So if you have, you can send report or scripts to monitor it to me, so I can include it here (the hint to it was send by Jaimie Sirovich.
Install and configure sysutils/3dm. This installs a daemon that provides a webinterface and which is also capable to notify you via e-mail if something happens. This is perhaps the easiest way of monitoring raid in FreeBSD. The other possibility to monitor 3ware raids is via tw_cli.
### ataraidThis is a softwareraid driver for many different cards, have a look at ataraid(4). Somebody in ##freebsd (irc.freenode.org) pasted the url http://www.monkeybrains.net/~rudy/example/raid_status.html, which contains a script that monitors gmirror, 3ware (via tw_cli) and also ataraid (ar0) via atacontrol. For archiving, the script is mirrored below:
#!/bin/sh### Adaptec: aac
# raid_status - check the state of the RAID.
# This script works for various types of RAID devices. (Currently, 3Ware, gmirror, BSd 'ar0' raids)
# WARNING: Install the proper CLI program for your 3ware card, if you use 3ware.
# Set up a cronjob like this:
# */16 * * * * /home/rudy/bin/raid_status CRON
### Copyright (c) 2006, Rudy Rucker All rights reserved.
### Redistribution and use of script, with or without modification, is
### permitted provided that the following condition is met:
### Redistributions of source code must retain the above copyright
### notice, this list of conditions and the following disclaimer.
### THIS SOFTWARE IS PROVIDED BY AUTHOR AND CONTRIBUTORS ``AS IS'' AND
### ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
### IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
### ARE DISCLAIMED.
# ----------- Change Log ------------
# Mon Oct 11 15:20:37 PDT 2004 - rudy
# Original script.
# Tue Feb 7 01:28:07 PST 2006 - rudy
# Added 9500 and 9550 support
# Fri Jun 9 10:38:33 PDT 2006 - rudy
# works for 'ar' and 'tw' mirrored arrays
# Tue Sep 12 10:23:13 PDT 2006 - rudy
# Added gmirror and realized that not all 3ware's are the same...
MODE=$1
TWCLI="/usr/local/bin/tw_cli"
GMIRROR="/sbin/gmirror"
ATACONTROL="/sbin/atacontrol"
AWK="/usr/bin/awk"
GREP="/usr/bin/grep"
MAIL="/usr/bin/mail"
EMAIL="noc@example.com"
# if this is not a 3ware card, check the atacontol
if [ -c /dev/twed0 ] && [ -x $TWCLI ]; then
# 3ware card ... 8000 series
STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;
VALID='OK'
ESTATUS_CMD="$TWCLI info c0 u0";
# double check the 3ware output incase it returned nada...
# Umm... this is the only raid card I have witness this bug
if [ "X$STATUS" = "X" ]; then
sleep 1;
STATUS=`$TWCLI info c0 u0 | $GREP "^Status" | $AWK {'print $2'}`;
fi
elif [ -c /dev/da0 ] && [ -x $TWCLI ]; then
# Note, there are plenty of other device names that use da0... this script is
# not for those... works with:
# 3ware 9550SX, 9500S
STATUS=`$TWCLI info c0 | $GREP "^u0" | $AWK '{print $3}'`;
VALID='OK'
ESTATUS_CMD="$TWCLI info c0 u0"
elif [ -c /dev/mirror/gm0 ] && [ -x $GMIRROR ]; then
# gmirror /dev/mirror/gm0
STATUS=`$GMIRROR status gm0 | $GREP "^mirror" | $AWK {'print $2'}`;
VALID='COMPLETE'
ESTATUS_CMD="$GMIRROR list";
elif [ -c /dev/ar0 ] && [ -x $ATACONTROL ]; then
# Motherboard promise and others
STATUS=`$ATACONTROL status ar0 | $GREP "status" | $AWK -F 'status: ' '{print $2}'`;
VALID='READY'
ESTATUS_CMD="/sbin/atacontrol status ar0"
else
echo "Unknown Raid type.... ";
if [ -x $TWCLI ]; then
echo " + found $TWCLI";
else
echo " - can't exec $TWCLI";
fi
if [ -x $ATACONTROL ]; then
echo " + found $ATACONTROL";
else
echo " - can't exec $ATACONTROL";
fi
if [ -x $GMIRROR ]; then
echo " + found $GMIRROR";
else
echo " - can't exec $GMIRROR";
fi
exit;
fi
# Okay, we checked the raid status and know what the return code should be.
if [ "$STATUS" = "$VALID" ]; then
if [ "$MODE" = "CRON" ]; then
exit;
fi
echo "OK condition";
$ESTATUS_CMD
exit;
fi
# ERROR! Either print to TTY or send an email, based on MODE (which is arg[1])
if [ "$MODE" = "CRON" ]; then
$ESTATUS_CMD | $MAIL -s "[ERROR] Raid array on $HOST returned $STATUS" $EMAIL
else
echo "ERROR condition"
$ESTATUS_CMD
fi