Slow disk access after RAID 5 rebuild

jtvang125

Diamond Member
Nov 10, 2004
5,399
51
91
So last week one of the drives kicked out of the array out of the blue. I rescanned the array a few times to see if it'll come back online but somehow that started the rebuild process on its own even though I have it set to manual only. It took 4 days to finish but now I'm getting intermittent slow disk speeds.

I get stuttering and pauses when streaming video and sometimes if I skip around the video it will crash the player and I'll have to close it out. If I copy to or from the array while streaming it's pretty much unwatchable. Copy speeds slows to a crawl too or seems like it pauses for a second or two too.

So last night I looked into this further and looking at the health report of the drives, the one that kicked out failed SMART with over 1800 bad/repaired sectors. Array status shows Normal though. So could all the bad sectors be causing this even though it successfully finished the rebuild? I have a replacement drive coming in today already.
 

sinisterDei

Senior member
Jun 18, 2001
324
26
91
Absolutely, a drive still online in the array can cause performance problems for the array if it's experiencing issues. Hunting down 'problem' drives like you're describing can often be a very frustrating part of owning a RAID array. If you've got a mechanical drive with as many sectors being marked bad in SMART, then absolutely ditch that drive, which it sounds like you're already doing.

However, you have some other possibilities as well. You didn't tell us the type of RAID adapter you're using - hardware, software, etc - but if it's a hardware RAID array you might have a write cache. Many hardware RAID arrays will disable the write cache when the array goes into a degraded state, and it might not have turned back on. For lots of RAID cards, the performance of RAID 5 or 6 without the write cache enabled can be *abysmal* - as slow as a single disk or slower sometimes - so if your array falls into that type you'll want to check on that. Please note, write cache can only be 100% *safely* enabled on a hardware RAID card if you've got a battery backup on the RAID card, since in-progress cached writes will be lost in the event of a power outage unless you've got the battery backup.

Also, time for the standard RAID-is-not-a-form-of-backup spiel. Make sure you have an actual backup if the data is important. RAID protects against exactly *one* thing - the failure of a single disk, or two if you've got RAID 6. It does *not* ensure the continuing integrity of the data on the array; viruses can kill your data. Multiple drive failures can kill your data. Read failures during array rebuilds can kill your data. RAID helps with *uptime* and *capacity*, not actual data redundancy. So make sure you back up what needs backing up to separate storage!
 

jtvang125

Diamond Member
Nov 10, 2004
5,399
51
91
Well the suspected bad drive kicked out again last night so I popped the new one in. Initialized it and got it added to the array. The rebuilding started and fluctuated from 8-10 hrs for completion. I checked it after coming back from the store and it was at 18% done but when I checked it again before heading to bed I no longer was able to connect to the management software. I see no activity lights on the drives either. The array is up and I can access it but I have no idea if it actually finished or not so I left it running overnight. Didn't want to shut it down and risk losing the array. Lost it before when I was expanding it and a windows update forced the server to restart. I guess I'll let it run for a few days just to make sure it fully completed.
 

sinisterDei

Senior member
Jun 18, 2001
324
26
91
It's also possible the array rebuild errored out.

If it was me in the same situation, I'd be assembling an army of external drives and frantically copying all my data to them just in case the array didn't survive a reboot. This is also why I moved my personal data arrays to RAID6 instead of 5; every time I was down a drive on 5 I was super nervous, and 6 gives that much more buffer.
 

jtvang125

Diamond Member
Nov 10, 2004
5,399
51
91
An update for anyone that cares. So I left it running until yesterday where we had a blackout for a few seconds. Turn the server back on and was able to launch the management software which showed it only got to 22%. Good news was it continued with the rebuild without any issues. I left it on overnight and it completed when I checked it again this morning. So yeah if another drive happen to fail I would have been SOL with this raid5 array.
 

sinisterDei

Senior member
Jun 18, 2001
324
26
91
Well regardless, glad it all pulled through.

Still though, it was at 18% on 4/26 and at 22% yesterday, but then completed overnight. It definitely sounds like it errored out at some point, but obviously managed to recover itself. Gives me the heebie jeebies.
 

Viper GTS

Lifer
Oct 13, 1999
38,107
433
136
Backups. Now. Push it to the cloud, something. If you don't have them you're going to lose that data one way or another.

Viper GTS
 

PliotronX

Diamond Member
Oct 17, 1999
8,883
107
106
If it was just some kind of interface refresh glitch, I think its okay as I ran into something similar with an old poweredge where it actually looked like the rebuild was what I like to call "well hung" but it was the openmanage service (very outdated version) needing a restart. It does make me a little nervous, might you be able to schedule downtime for a consistency check? Agreed on the backups.