More proof that RAID is NOT backup

Fullmetal Chocobo

Moderator<br>Distributed Computing
Moderator
May 13, 2003
13,704
7
81
That sucks. I had the same happen to me (two disks went out due to a PSU issue), and my RAID 5 array died. Also came to find out that nearly all of my DVD backups were toast as well. Out of about 40 DVDz I had stuff backed up on, I got about 6 gb of data back. The music I could handle losing, but losing years and years worth of pictures really hurt...
 

Souka

Diamond Member
Sep 25, 2000
4,728
1
76
stupid server admins and/or company....


Raid5 is just so that a single-drive failure can allow you to keep your server up and running while you replace the drive...

Also, more often than not, servers such as pictured will indicate a drive is nearing failure...which allows the computer operator to get a spare ordered and replace the drive BEFORE it fails in the first place.


Also... should you get a double-drive failure.... replace one or both drives then RESTORE from TAPE backups....


My companies server room has more than 200 harddrives up and running 24/7... our computer operators replace drives all the time...a few per month or often as a few per week... MOST of the time, the server monitoring consoles report possible failure nearing...at which point we pull a drive from our spare closet and get it replaced with the server vendor.

In 7 years at the company we've NEVER had a dual-drive failure..... but even if we do, we'll just restore from tape in a few hours or so...
 

Sunner

Elite Member
Oct 9, 1999
11,641
0
76
Heh, I know a guy who had a RAID5 of 5 drives.
3 of them failed within a week, luckily not before the array had time to rebuild between the failures.

Turns out all of them were from the same batch coming from a factory that had some problem with one of it's machines, causing a couple of thousand of factory-defect drives to be delivered to Compaq(where he bought the server and hence the drives).
 

RaiderJ

Diamond Member
Apr 29, 2001
7,582
1
76
Hope my RAID 5 doesn't fail, I don't have anything backed up save for my critical data! It's just my media server, so if I lost it all it wouldn't be too bad.
 

HannibalX

Diamond Member
May 12, 2000
9,359
2
0
stupid server admins and/or company....

Did you read their page?

They had a drive die. They hot swapped the bad drive immediatly. The new drive started to rebuild - it takes about 2 hours. While the new drive was rebuilding a second drive died.

I am sure they have everything on tape backup. As the site said they are loading the server from scratch and will then reload their data from backup.
 

Genx87

Lifer
Apr 8, 2002
41,091
513
126
Friend had his server's RAID 1 array die. Both disks died within 8 minutes of each other oops.

Tape is your friend or too disk backup. It is rare that it happens but sucks when it does.

 

Zolty

Diamond Member
Feb 7, 2005
3,603
0
0
The odds for this happening are very high, but it does happen, and when it does it isn't pretty.
 

HannibalX

Diamond Member
May 12, 2000
9,359
2
0
Originally posted by: Genx87
Friend had his server's RAID 1 array die. Both disks died within 8 minutes of each other oops.

Tape is your friend or too disk backup. It is rare that it happens but sucks when it does.

It's rarer in home environments I think. As a Sys Admin for a big company I see drives die in servers frequently. You figure something that runs 24/7 with no down time is going to fail at some point, and you plan for it, you expect it.
 

soydios

Platinum Member
Mar 12, 2006
2,708
0
0
Raid5 with hot spare, so a catastrophic failure with no warning has the greatest chance of rebuilding the array before another failure occurs

but, imho, the only true backup is an off-site backup
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
Even with a hotspare IF another disk fails while the hotspare is rebuilding bye-bye data on logical drives. :Q

This is why patrol reading is important AND if disks report media errors they should be exchanged for new ones when possible.

RAID6 is interesting as it can survive a double failure. Double parity but the overhead isn't too bad for new IOP's.

 

Roguestar

Diamond Member
Aug 29, 2006
6,045
0
0
Originally posted by: MS Dawn
RAID6 is interesting as it can survive a double failure. Double parity but the overhead isn't too bad for new IOP's.

Mmm, I remember reading the wikipedia article on it and thinking "you'd have to have crazy bad luck for that to happen" but oh well...
 

SuperNaruto

Senior member
Aug 24, 2006
997
0
0
I had a 5 drive raid 5 mail server... the drives were fine.. the BACKPLANE failed..

Ever since that (2001).. I go against every raid 5 large capacity storage design..

Ill do raid 5 but with 3 drives not more than 3.

I rather do RAID 1 on multiple drives and put each group in 1 set of raid 1 than a massive raid 5 partition...

 

ForumMaster

Diamond Member
Feb 24, 2005
7,792
1
0
dang. wonder what are the chances of that. don't servers monitor S.M.A.R.T. data and report when the HDD is about to fail?

off topic though, i was in a medical center today and they had this ancient laptop running windows 98. anyway, it's booting and then waiting and saying the S.M.A.R.T. says the drive is about to fail. i asked the doctor how long it had been saying that...about 2 weeks! don't people ever read?
 

SparkyJJO

Lifer
May 16, 2002
13,357
7
81
SMART isn't all that SMART. It reported a drive healthy when it obviously had the click of death.

RAID is still better than just a single drive.
 

Oyeve

Lifer
Oct 18, 1999
22,044
875
126
Originally posted by: Sunner
Heh, I know a guy who had a RAID5 of 5 drives.
3 of them failed within a week, luckily not before the array had time to rebuild between the failures.

Turns out all of them were from the same batch coming from a factory that had some problem with one of it's machines, causing a couple of thousand of factory-defect drives to be delivered to Compaq(where he bought the server and hence the drives).

That must be me! Or around the same time. Was going CRAZY! One drive dies in the compaq server, hot spare kicks in, replace defective drive as hot-spare, next day previous hot spare dies, new hot spare kicked in. Went on for over a week until I noticed the drives were from the same batch.
 

Dravic

Senior member
May 18, 2000
892
0
76
Originally posted by: Roguestar
Well, RAID-5 is backup, but it's not infallible.

sorry but this bugs me....

no its not. it never was and never will be for backups

which part of a raid protects against deleted data?
which part of a raid protects against corrupted data?
which part of a raid protects against damage done to a box that was knocked over(hacked)?
which part of a raid protects against damage done by viruses?

raid is for redundancy ONLY. It?s about mitigating the risks of downtime.

If you were on of my techs and told me it?s ok we don?t need a backup because it raided, you would be in immediate training, and or looking for a new job.

raid 0 IMHO is the bastard child of raiding. It?s nice for performance issues, but it alone destroys the redundancy aspect of the raid to begin with.


phpbb screwed up (or their hosting company did)? its that simple. If uptime was a concern for them they should have been using the backup service/option of the hosting company. A CVS repository should not be your best source of the data.

There is no reason for them to not have the drive backed up with yesterday?s data, have the disked replaced and be back online in a matter of hours.

How this how it should have gone down in a serious server shop.
1. Drive indicating failure ? replaced (missed this one, but not surprising if it?s a hosting company, they may not have someone checking the front of servers or don?t have these warning messages being forward to there log servers for detection)
2. Drive fails ? replace and start rebuilding - this they did
3. Second drive fails while rebuilding ? if no more drives on site, have vendor ship replacement within 24hrs, if mission critical then you should have a contract with a support vendor for something like a 4 hr turnaround.
4. replace second drive, build raid 5 array, restore array from last nights backup
5. profit?..? errr