• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Man...I must have pissed off the hardware Gods...

vi edit

Elite Member
Super Moderator
Wednesday the 12th of this month I get a call from a store 1800 miles away saying that the server that runs the retail equipment wasn't booting. After an hour of troubleshooting on the phone, plus an hour and a half with Dell on the phone I decide that's it's bad, and hop a plane down there. Book at 4:30, fly out at 6:30.

I get down there and end up finding out that the Raid-1 volume is gone and unrecognized by the raid controller. Somehow, the RAID card went on the fritz nuking both drives in the array and tosting the volume. D'oh! So I get the the thing rebuilt, go to apply the backups and find out that the database was writing bad backups the whole time. So...after 42 straight hours of work I manage to get the thing rebuilt, three weeks worth of DB work done in 16 hours, and everything running somewhat smooth.

Fastforward to today. I've been on the road for a week on business and have an accountant here jockey tapes for me. So I come in and take a look at the server and see a big fault light flashing on it and one of the drives in the cage is blinking with a fault too.

Fsck me. I almost started crying. The volume degraded to critical but is in the process of rebuilding itself. I'm crossing my fingers and hoping that all is well. The rebuild should finish up shortly but then I have to do a consistency check on the data to get a final judgement on if I'm totally hosed or not.

It's days like this that make regret hoping into this field 🙁

Edit - The server that had the Raid-1 volume die was an 8 month old Dell Poweredge 2600
The one right now that is rebuilding is my Exchange box which is a 1.5 year old Poweredge 2500.

 
I hate those stupid 2600s and 2500s We've had tons of problems with them @ work from drives dying and taking the whole RAID-5 array with them to tape drives that have been replaced 2 and 3 times in less than a year. I feel your pain. 🙁
 
Bad backups? What kind of system do you have writing backups?

This one place I went to used tape backups and one guy kept on shoving them in wrong and broke it. I guess he didn't wanna tell someoen he broke it and just kept quiet. I think he was fired. ANyway, back on topic.

 
Originally posted by: beatle
I hate those stupid 2600s and 2500s We've had tons of problems with them @ work from drives dying and taking the whole RAID-5 array with them to tape drives that have been replaced 2 and 3 times in less than a year. I feel your pain. 🙁

You have the python 20/40 drives? Yeh, I've had two those crap on me. The 2600 I had came with a different tape drive.

As for the backups - the database itself writes backups nightly and dumps them to a backup folder on another machine. The thing appears to be writing bad files because when we tried to recover from them we weren't able to. 🙁

It hasn't been a fun two weeks.
 
Originally posted by: vi_edit
Wednesday the 12th of this month, I get a call from a store 1,800 miles away saying that the server that runs the retail equipment wasn't booting. After an hour of troubleshooting on the phone, plus an hour and a half with Dell on the phone, I decide that's it's bad, and hop a plane down there. Book at 4:30, fly out at 6:30.

I get down there and end up finding out that the RAID-1 volume is gone and unrecognized by the RAID controller. Somehow, the RAID card went on the fritz, nuking both drives in the array and toasting the volume. D'oh! So, I get the the thing rebuilt, go to apply the backups, and find out that the database was writing bad backups the whole time. So, after 42 straight hours of work, I manage to get the thing rebuilt: three weeks' worth of DB work done in 16 hours, and everything running somewhat smooth.

Fast-forward to today. I've been on the road for a week on business, and I have an accountant here to jockeying [ note: pick one of preceding two ] tapes for me. So I come in and take a look at the server, and see a big "fault" light flashing on it, and one of the drives in the cage is blinking with a fault too.

Fuck me [ btw, no. ]. I almost started crying. The volume degraded to critical but is in the process of rebuilding itself. I'm crossing my fingers and hoping that all is well. The rebuild should finish up shortly but then I have to do a consistency check on the data to get a final judgement on if I'm totally hosed or not.

It's days like this that make me regret hopping into this field 🙁

Ugh, I'm bored.
 
Originally posted by: vi_edit
Originally posted by: beatle
I hate those stupid 2600s and 2500s We've had tons of problems with them @ work from drives dying and taking the whole RAID-5 array with them to tape drives that have been replaced 2 and 3 times in less than a year. I feel your pain. 🙁

You have the python 20/40 drives? Yeh, I've had two those crap on me. The 2600 I had came with a different tape drive.

As for the backups - the database itself writes backups nightly and dumps them to a backup folder on another machine. The thing appears to be writing bad files because when we tried to recover from them we weren't able to. 🙁

It hasn't been a fun two weeks.

Yeah, those 4mm drives are the biggest pieces of crap. We had a server that had a similar problem with "bad backups," though with a DLT drive. We needed to restore and the tape could not be inventoried. :Q A month of work went down the drain...
 
Originally posted by: Amorphus
Originally posted by: vi_edit
Wednesday the 12th of this month, I get a call from a store 1,800 miles away saying that the server that runs the retail equipment wasn't booting. After an hour of troubleshooting on the phone, plus an hour and a half with Dell on the phone, I decide that's it's bad, and hop a plane down there. Book at 4:30, fly out at 6:30.

I get down there and end up finding out that the RAID-1 volume is gone and unrecognized by the RAID controller. Somehow, the RAID card went on the fritz, nuking both drives in the array and toasting the volume. D'oh! So, I get the the thing rebuilt, go to apply the backups, and find out that the database was writing bad backups the whole time. So, after 42 straight hours of work, I manage to get the thing rebuilt: three weeks' worth of DB work done in 16 hours, and everything running somewhat smooth.

Fast-forward to today. I've been on the road for a week on business, and I have an accountant here to jockeying [ note: pick one of preceding two ] tapes for me. So I come in and take a look at the server, and see a big "fault" light flashing on it, and one of the drives in the cage is blinking with a fault too.

Fuck me [ btw, no. ]. I almost started crying. The volume degraded to critical but is in the process of rebuilding itself. I'm crossing my fingers and hoping that all is well. The rebuild should finish up shortly but then I have to do a consistency check on the data to get a final judgement on if I'm totally hosed or not.

It's days like this that make me regret hopping into this field 🙁

Ugh, I'm bored.

No. You're anal-retentive when it comes to spelling and punctuation. 😉

 
Thankfully the consistency checks came back okay on the raid5 volume that rebuilt itself today. I'm scared to go into the office tomorrow and see what it brings.
 
Sounds like something that happened to me.....I had a RAID 1 die on me. BOTH Western digital drives died at the same freaking time. One clicked like crazy and the other wouldn't spin up. The people who were supposed to do the backups didn't do any of them at all. @#*#@* I was able to get the data off the clicking drive after a few hours though thankfully. Freezer trick helped I guess...
 
Back
Top