RAID5 Degraded - Segment Missing

ReggieDunlap · Dec 20, 2013

Here's my situation:
Server with 2 x 147GB SAS HDDs in RAID1 - This one is fine
4 x 300GB SAS HDDs in RAID5 - This one is "Degraded"
Adaptec 5805 Raid Controller Card.
Adaptec Storage Manager installed.

Storage Manager shows only 5 of 6 drives on the two enclosures:
Encl 0 - Slot 0, Slot 1, Slot 2 drives all reporting Optimal
Encl 1 - Slot 4, Slot 5 drives reporting Optimal (Slot 3 not showing at all)

Logical Devices:
Raid1 "System" is Optimal
Raid5 "Raid5" is Degraded - "Segment 1" X Missing (Segments 0, 2 and 3 are "Present"; slot 2, 4 and 5 respectively)

I'd like to get opinions/thoughts/experience on what I can expect when I replace the failed HDD. BTW, the "Failed" drive shows a red LED on the drive carrier. From my reading and understanding, when I replace the failed drive the RAID5 should automatically begin to rebuild (I do not have a "Hot Swap" configured). My question is, since the physical disk is not showing on the controller (to set/confirm the disk status to "Failed" before replacing the disk), am I correct in thinking replacing the HDD will cause the Adaptec to begin the rebuild of the RAID?

The server is still running - the OS is WinSrv2008R2/Hyper-V/Host to several VMs (one is the file server and one is the DC). I have rsync backups of the data from the File Server.

Cerb · Dec 20, 2013

"Hot spare" is the term you're looking for. If replacing it does not immediately begin a rebuild, the RAID management software should notice a new drive and have an option for you to make it rebuild.

And whenever you get around to refreshing it, avoid RAID 5 again. With those smallish drives, you should be fine on a rebuild now, but rebuild failures with TB and up drives are finally becoming a real issue.

kn51 · Dec 20, 2013

I will say I have several of these controllers in a production environment.

Yes it will rebuild. As said above 300GB drives (assuming 15K) you have it should rebuild somewhat quickly...probably a few hours depending on load. Perhaps you can tweak that within ASM to give priority to background tasks. I get confused on all the different array utilities and their nomenclature it makes my head spin.

You running a backplane with a single cable or do you have break out cables? Meaning, you have them individually wired or are they on sleds you can yank out and re-insert in a chassis?

Usually when a drive fails in a RAID setup such as this I will pull it and shove it back in forcing a rebuild. If it croaks after around 15% during the rebuild it is toast. I've found the Adaptec 5 series will occasionally not play nice with LSI backplane controllers and it will occasionally drop a drive yet mechanically it will be fine and soldier on after a reset.

FWIW, they have their faults, but Adaptec is fairly robust in keeping the RAID alive.

ReggieDunlap · Dec 23, 2013

Cerb said:
"Hot spare" is the term you're looking for. If replacing it does not immediately begin a rebuild, the RAID management software should notice a new drive and have an option for you to make it rebuild.

And whenever you get around to refreshing it, avoid RAID 5 again. With those smallish drives, you should be fine on a rebuild now, but rebuild failures with TB and up drives are finally becoming a real issue.

"Hot Spare" thanks. That was what was in my head, but kept being written as hot swap.
RE: Your comment to RAID5. I will be rebuilding an alodish server with a Supermicro Motherboard and Areca 1110-4P PCI-X SATA II Raid controller. The plan is to put 4 x 2TB Seagate Constellation ES2 7200RPM drives in a RAID5. I guess you would advise against it? I've only got 4 bays in this machine so I thought RAID5 would be the best choice.

kn51 said:
I will say I have several of these controllers in a production environment.

Yes it will rebuild. As said above 300GB drives (assuming 15K) you have it should rebuild somewhat quickly...probably a few hours depending on load. Perhaps you can tweak that within ASM to give priority to background tasks. I get confused on all the different array utilities and their nomenclature it makes my head spin.

You running a backplane with a single cable or do you have break out cables? Meaning, you have them individually wired or are they on sleds you can yank out and re-insert in a chassis?

Usually when a drive fails in a RAID setup such as this I will pull it and shove it back in forcing a rebuild. If it croaks after around 15% during the rebuild it is toast. I've found the Adaptec 5 series will occasionally not play nice with LSI backplane controllers and it will occasionally drop a drive yet mechanically it will be fine and soldier on after a reset.

FWIW, they have their faults, but Adaptec is fairly robust in keeping the RAID alive.

Yes the drives are all Seagate Cheetahs ST3300657SS 15K SAS drives.
I dont know the exact backplane controller (I didn't spec this server). It's on a Supermicro Server Motherboard. The drives are on sleds so I'll just pull the bad, install the new drive on the sled and put it back in the server.
Good to hear about they're pretty good with keeping a RAID alive. This all went down last Thursday, and being so close to the holidays, I was scrambling just to find a vendor to get a drive by today (the 23rd). YES, the server is Out of Warranty

I'll try to post how it all goes once I get the drive.

Cerb · Dec 23, 2013

ReggieDunlap said:
"Hot Spare" thanks. That was what was in my head, but kept being written as hot swap.
RE: Your comment to RAID5. I will be rebuilding an alodish server with a Supermicro Motherboard and Areca 1110-4P PCI-X SATA II Raid controller. The plan is to put 4 x 2TB Seagate Constellation ES2 7200RPM drives in a RAID5. I guess you would advise against it? I've only got 4 bays in this machine so I thought RAID5 would be the best choice.

Yes. Also, NetApp, Dell, HP, and others. I'm just one little messenger, unable to sleep due to a combination of weather/sinuses and cats playing in the wee hours of the morning.

For example, last year:
http://community.spiceworks.com/topic/251735-new-raid-level-recommendations-from-dell
https://community.spiceworks.com/topic/251810-dang-you-raid-5

A bit older, but a nice summary of what had been found at the time, leading up to what we're seeing now:
http://storagemojo.com/2007/02/26/netapp-weighs-in-on-disks/

The short version is, without a giant Scott Alan Miller link wall, to avoid parity RAID, except for well-tended ZFS setups, RAID 6 when write performance doesn't matter (but capacity does), or systems where you don't plan to try to recover the RAID on a failure (IE, just try to keep things going until you reconfigure, restore from backup with new drives after-hours, or bring up a cold or warm spare whole server), in which case 5 or 6 may be good, depending on how many drives and how big of an array. But, here's two of the links that would have been included, which should do a good job of summarizing things:
http://www.smbitjournal.com/2012/11/one-big-raid-10-a-new-standard-in-server-storage/
http://community.spiceworks.com/topic/356919-why-raid-5-sucks

P.S. While I'm at it, one more to chew on:
http://www.smbitjournal.com/2012/05/when-no-redundancy-is-more-reliable/

On forums more IT-centric than enthusiast-centric (Spiceworks' being a prime example), reports of, and requests for best DR options for, actual failures of whole RAID 5 arrays are not uncommon, now (usually on inherited arrays). Due to the much lower mechanical and electrical load on recovery (1 disk at <100% busy time, v. all disks), RAID 1 or 10 is much more resilient, and the best way to go, most of the time.

ReggieDunlap · Dec 31, 2013

Drive arrived finally the 27th. Was able to pick it up and install it yesterday the 30th. Raid began re-building automatically. Checked three times from home: 37%.......66%........then the last time about 11pm and it's back to the same state.
Degraded....Segment Missing.....NO drive showing at all in Slot 3.

I suspect now maybe it's the slot the drive is on? Not sure why a brand new drive would get almost rebuilt only to fail. So Im now thinking to try putting the original drive back in the server but in a diff slot.

Thoughts?

Cerb · Dec 31, 2013

It could be that, or a controller issue. With more bays available, and assuming they are all connected to the controller, trying another shouldn't hurt...

Dr-Kiev · Jan 1, 2014

What is your main goal ? If you need to extract data from raid5 , better and safe to do it virtualy with 3d party software, like raid reconctractor, R-studio etc. Or use professional help .

ReggieDunlap · Jan 7, 2014

Update: Arrived to office this morning after the holidays.
SLOT 3, new drive installed the 30th Dec, NO red LED on drive carrier.
Installed original drive into SLOT 6, RAID re-build began immediately, completed in about 40-45 minutes.
RAID state now OPTIMAL but original drive still has alert for S.M.A.R.T. error(Yes) and warnings (4).
New drive installed into SLOT 7 and shows with READY State; Status of drive properties shows no SMART errors but 1 Hardware error (not sure if this is due to the failed RAID rebuild on the drive).

Now thinking: Inizialilze the new drive in SLOT 7; then set SLOT 6 drive state to "Failed" and then have rebuild begin on new drive in SLOT 7 again.

Once again, thoughts?

@ Dr. Kiev: Main goal is to get the server RAID back to an Optimal state with no warnings on individual drives. Server is functional; and is now running in an Optimal RAID state, but with some warnings on a drive.

UPDATED AGAIN

OK. RAID is still Optimal but has warning Icon re: Bad Stripe. Seems from reading this will be permanent until logical drive is blown up and re-created. http://ask.adaptec.com/app/answers/detail/a_id/14947

Primary issue now is, trying to verify/confirm is the two drives are actually HW faulty or the issues were due to bad stripe, etc.
The new drive: I've tried Initializing, Clearing and a Verify with Fix, but during each process, the drive will just "disappear" from the Storage Manager view before finishing. If I physically pull the drive and re-install it, it shows in the manager once again. When this was done the first time (after returning the original drive to SLOT 6) there were SMART errors showing in the drive properties. I tried the "Clear Disk" and it disappeared about 5 minutes later. Removing/reinstalling and the drive no longer showed SMART errors. Tried a Verify and Fix and the drive disappeared again. I've done nothing else since.

I don't have another SAS server I can use to try to run Seatools from to check the drives that way either.

Search

RAID5 Degraded - Segment Missing

ReggieDunlap

Senior member

Cerb

Elite Member

kn51

Senior member

ReggieDunlap

Senior member

Cerb

Elite Member

ReggieDunlap

Senior member

Cerb

Elite Member

Dr-Kiev

Member

ReggieDunlap

Senior member

TRENDING THREADS