something to monitor RAID/HDD health?

ViviTheMage

Lifer
Dec 12, 2002
36,189
87
91
madgenius.com
I have a a few CENTOS Servers with RAID 1/5/10 and I was curious, does anyone know of a system that will alert you if a HDD goes bad, so you can replace it, before it goes totally down?

Is there some sort of software that can do a check every 120 minutes or something? On the HDD's?
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Are they hardware or software RAID? If hardware you need to query the controller since it won't pass down individual drive errors like normal.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
For the individual drives you probably want smartmontools to watch the SMART values and run the firmware tests, if you're lucky they'll catch something ahead of time.

For the arrays you want to see if the CentOS package came with a mdadm cron job to run 'mdadm --monitor --scan --oneshot' daily.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
I'm sure mdadm's there already if you're using software RAID, I just don't know if the CentOS packages come with a cronjob for that like the Debian pakcage.
 

ViviTheMage

Lifer
Dec 12, 2002
36,189
87
91
madgenius.com
hah, the datacenter set up 2 of my RAIDS, so I am unsure what software tehy used.

Am I stuck with what they gave me? Or should I be able to install mdadm and allow that to monitor for me?

CentOS can create cron jobs.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
If it's Linux software RAID then it's using md. Technically I guess they could've used just LVM but that only does striping and mirroring and no one really uses it for that.
 

ViviTheMage

Lifer
Dec 12, 2002
36,189
87
91
madgenius.com
good deal, I will check when I get home. Any idea what I could do to monitor it, via a cron job? Or even manually? To check the 'health' of the disks, I just want to be alerted if they are failing, or failed.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
But don't just drop that in manually without looking, the mdadm CentOS package may have included one like the Debian one does already.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
I'd still run the mdadm command since the mdadm man page says:

-1, --oneshot
Check arrays only once. This will generate NewArray events and more significantly DegradedArray and SparesMissing events. Running
mdadm --monitor --scan -1
from a cron script will ensure regular notification of any degraded arrays.
 

skyking

Lifer
Nov 21, 2001
22,764
5,925
146
Originally posted by: ViviTheMage
and it will email me, as a cron :)
the --test lives a running process to kill, but the --oneshot worked perfectly. Thanks guys.
 

Gooberlx2

Lifer
May 4, 2001
15,381
6
91
Couldn't you daemonize it as well? --daemonise (instead of --oneshot) and throw it in boot.local, or make an init script.

Be sure your mdadm.conf is setup correctly, with partition and email address and all that.

Link for SLED/SLES, but should translate to other distros.
 

Nothinman

Elite Member
Sep 14, 2001
30,672
0
0
Yea, a mailer is pretty much required on any Linux machine even if it's just a workstation. I tried using one of the small relay-only things like ssmtp but realized that if I lost connectivity to the next hop SMTP relay the mail was lost because ssmtp doesn't do any queueing. So I just installed postfix and went back to not worrying about it.