Man...I must have pissed off the hardware Gods...

vi edit

Elite Member
Super Moderator
Oct 28, 1999
62,387
8,154
126
Wednesday the 12th of this month I get a call from a store 1800 miles away saying that the server that runs the retail equipment wasn't booting. After an hour of troubleshooting on the phone, plus an hour and a half with Dell on the phone I decide that's it's bad, and hop a plane down there. Book at 4:30, fly out at 6:30.

I get down there and end up finding out that the Raid-1 volume is gone and unrecognized by the raid controller. Somehow, the RAID card went on the fritz nuking both drives in the array and tosting the volume. D'oh! So I get the the thing rebuilt, go to apply the backups and find out that the database was writing bad backups the whole time. So...after 42 straight hours of work I manage to get the thing rebuilt, three weeks worth of DB work done in 16 hours, and everything running somewhat smooth.

Fastforward to today. I've been on the road for a week on business and have an accountant here jockey tapes for me. So I come in and take a look at the server and see a big fault light flashing on it and one of the drives in the cage is blinking with a fault too.

Fsck me. I almost started crying. The volume degraded to critical but is in the process of rebuilding itself. I'm crossing my fingers and hoping that all is well. The rebuild should finish up shortly but then I have to do a consistency check on the data to get a final judgement on if I'm totally hosed or not.

It's days like this that make regret hoping into this field :(

Edit - The server that had the Raid-1 volume die was an 8 month old Dell Poweredge 2600
The one right now that is rebuilding is my Exchange box which is a 1.5 year old Poweredge 2500.

 

beatle

Diamond Member
Apr 2, 2001
5,661
5
81
I hate those stupid 2600s and 2500s We've had tons of problems with them @ work from drives dying and taking the whole RAID-5 array with them to tape drives that have been replaced 2 and 3 times in less than a year. I feel your pain. :(
 

Kelvrick

Lifer
Feb 14, 2001
18,438
5
81
Bad backups? What kind of system do you have writing backups?

This one place I went to used tape backups and one guy kept on shoving them in wrong and broke it. I guess he didn't wanna tell someoen he broke it and just kept quiet. I think he was fired. ANyway, back on topic.

 

vi edit

Elite Member
Super Moderator
Oct 28, 1999
62,387
8,154
126
Originally posted by: beatle
I hate those stupid 2600s and 2500s We've had tons of problems with them @ work from drives dying and taking the whole RAID-5 array with them to tape drives that have been replaced 2 and 3 times in less than a year. I feel your pain. :(

You have the python 20/40 drives? Yeh, I've had two those crap on me. The 2600 I had came with a different tape drive.

As for the backups - the database itself writes backups nightly and dumps them to a backup folder on another machine. The thing appears to be writing bad files because when we tried to recover from them we weren't able to. :(

It hasn't been a fun two weeks.
 

Amorphus

Diamond Member
Mar 31, 2003
5,561
1
0
Originally posted by: vi_edit
Wednesday the 12th of this month, I get a call from a store 1,800 miles away saying that the server that runs the retail equipment wasn't booting. After an hour of troubleshooting on the phone, plus an hour and a half with Dell on the phone, I decide that's it's bad, and hop a plane down there. Book at 4:30, fly out at 6:30.

I get down there and end up finding out that the RAID-1 volume is gone and unrecognized by the RAID controller. Somehow, the RAID card went on the fritz, nuking both drives in the array and toasting the volume. D'oh! So, I get the the thing rebuilt, go to apply the backups, and find out that the database was writing bad backups the whole time. So, after 42 straight hours of work, I manage to get the thing rebuilt: three weeks' worth of DB work done in 16 hours, and everything running somewhat smooth.

Fast-forward to today. I've been on the road for a week on business, and I have an accountant here to jockeying [ note: pick one of preceding two ] tapes for me. So I come in and take a look at the server, and see a big "fault" light flashing on it, and one of the drives in the cage is blinking with a fault too.

Fuck me [ btw, no. ]. I almost started crying. The volume degraded to critical but is in the process of rebuilding itself. I'm crossing my fingers and hoping that all is well. The rebuild should finish up shortly but then I have to do a consistency check on the data to get a final judgement on if I'm totally hosed or not.

It's days like this that make me regret hopping into this field :(

Ugh, I'm bored.
 

beatle

Diamond Member
Apr 2, 2001
5,661
5
81
Originally posted by: vi_edit
Originally posted by: beatle
I hate those stupid 2600s and 2500s We've had tons of problems with them @ work from drives dying and taking the whole RAID-5 array with them to tape drives that have been replaced 2 and 3 times in less than a year. I feel your pain. :(

You have the python 20/40 drives? Yeh, I've had two those crap on me. The 2600 I had came with a different tape drive.

As for the backups - the database itself writes backups nightly and dumps them to a backup folder on another machine. The thing appears to be writing bad files because when we tried to recover from them we weren't able to. :(

It hasn't been a fun two weeks.

Yeah, those 4mm drives are the biggest pieces of crap. We had a server that had a similar problem with "bad backups," though with a DLT drive. We needed to restore and the tape could not be inventoried. :Q A month of work went down the drain...
 

T2T III

Lifer
Oct 9, 1999
12,899
1
0
Originally posted by: Amorphus
Originally posted by: vi_edit
Wednesday the 12th of this month, I get a call from a store 1,800 miles away saying that the server that runs the retail equipment wasn't booting. After an hour of troubleshooting on the phone, plus an hour and a half with Dell on the phone, I decide that's it's bad, and hop a plane down there. Book at 4:30, fly out at 6:30.

I get down there and end up finding out that the RAID-1 volume is gone and unrecognized by the RAID controller. Somehow, the RAID card went on the fritz, nuking both drives in the array and toasting the volume. D'oh! So, I get the the thing rebuilt, go to apply the backups, and find out that the database was writing bad backups the whole time. So, after 42 straight hours of work, I manage to get the thing rebuilt: three weeks' worth of DB work done in 16 hours, and everything running somewhat smooth.

Fast-forward to today. I've been on the road for a week on business, and I have an accountant here to jockeying [ note: pick one of preceding two ] tapes for me. So I come in and take a look at the server, and see a big "fault" light flashing on it, and one of the drives in the cage is blinking with a fault too.

Fuck me [ btw, no. ]. I almost started crying. The volume degraded to critical but is in the process of rebuilding itself. I'm crossing my fingers and hoping that all is well. The rebuild should finish up shortly but then I have to do a consistency check on the data to get a final judgement on if I'm totally hosed or not.

It's days like this that make me regret hopping into this field :(

Ugh, I'm bored.

No. You're anal-retentive when it comes to spelling and punctuation. ;)

 

vi edit

Elite Member
Super Moderator
Oct 28, 1999
62,387
8,154
126
Thankfully the consistency checks came back okay on the raid5 volume that rebuilt itself today. I'm scared to go into the office tomorrow and see what it brings.
 

Chadder007

Diamond Member
Oct 10, 1999
7,560
0
0
Sounds like something that happened to me.....I had a RAID 1 die on me. BOTH Western digital drives died at the same freaking time. One clicked like crazy and the other wouldn't spin up. The people who were supposed to do the backups didn't do any of them at all. @#*#@* I was able to get the data off the clicking drive after a few hours though thankfully. Freezer trick helped I guess...