How to reconfigure a Level 5 RAID drive ??

bjmanbj · Aug 21, 2011

Here is the scenario. We have a server in a small office that is partitioned into 2 hard drives--C: and D and we are basically running out of space. The server has a RAID 5 array with 3 hard drives that are all the same. They are of this type:

Seagate SCSI ST373207LC Revision D703 - 73 Gb.

This is the RAID controller installed:

Primary Controller DELL PERC4/SC, 1 Internal Channel P4SCI

The server is a Dell Poweredge 1800 that was installed a long time ago (2005). The people who installed it are no longer in business and every other tech place I call basically says to get a new serverthats not an option because this one is working fine. All I really need is to increase the size of the RAID array. I think that because the server is so old, most tech places feel its not really worth their trouble to help me. I have a lot of experience with Windows XP systems, but not a lot with servers, although I know a little. I should also mention that we have Paragon Partition Manager for Servers installed. It was used successfully in the past to reallocate free space between the C: and D drives.

So, here is my plan.

1. I have purchased 2 hard drives of the exact same type the server has. The server has capacity for 6 hard drives, but now only 3 are being used.
2. Perform a full backup of the server.
3. Insert the 2 new drives into two empty bays.
4. Run Dell OpenManage Server Administrator.
5. Perform a rescan of the server and hope it sees the 2 new hard drives.
6. Perform a reconfigure of the RAID virtual disk.

I am hoping that the above steps will result in larger RAID array. I dont care how the extra free space is allocated as long as the server recognizes and reconfigures the drives. It would be preferable if all the free space went to the D drive, but if not, then I can use the Paragon program to Reallocate the free space.

Are the steps that I outlined above feasible? Am I missing something critical. I have read something about hitting CTRL-M during startup and doing something there, but I am not sure what that means. The Dell OpenManage program is much easier to useI just hope it works.

Any advice would be much appreciated. 😕😕

Thanks,

BJmanbj

FishAk · Aug 21, 2011

bjmanbj said:
Am I missing something critical.

Yes you are. It's called progress.

You could replace all the drives with two Intel G2 160GB drives off ebay and run them in RAID 0 for about the same price as two of those 75GB Seagate's you just bought. That setup would run circles around the system you have now- even with no TRIM support.

2 drive RAID 0 is no less safe than 5 drive RAID 5. In either case, you still need backup.

Alternatively, you could buy 4 modern 1TB RAID class drives, and put them in RAID 10. That would also be much faster and safer than your proposed setup, and not that much more expensive.

Another possibility is to buy a RAID card that won't kick a drive off the array if it tries to recover an error. That way, you could buy 4 1TB Samsung F3 drives for about $50 each, and run them in RAID 10.

It should be simple to clone from your present setup to new hardware, but why spend more money on crap when the same money can buy faster and safer?

bjmanbj · Aug 21, 2011

Thanks for the information. It's not the money that I am worried about. It's just trying to make it as simple as possible for me. All the options you listed sound awesome, but I would really need someone who knows how to do them to actually come and help me. I haven't been able to find anyone who is willing to come. So, I am left with trying to do the best I can by myself.

My plan just involves inserting a couple of drives into the server bays--simple enough. Yours involves opening the server, changing the RAID controller, replacing the drives, reformatting the drives and setting up a whole new array. I am sure your solution is better, but it's also more complicated for me.

bjmanbj

Nothinman · Aug 21, 2011

FishAk said:
2 drive RAID 0 is no less safe than 5 drive RAID 5.

That statement is blatantly wrong, losing 1 drive in the RAID0 array loses you the whole array while losing 1 in the RAID5 array doesn't.

FishAk · Aug 22, 2011

Nothinman said:
That statement is blatantly wrong

You are neglecting the write hole and parity corruption when you assume a RAID 5 array will be able to be rebuilt after a disk failure.

OP, If you aren't concerned with the money so much, I would suggest a 300GB Intel 320 series SSD and a 1TB Samsung F3 1TB drive. This will set you back about $450- $550 for the SSD and around $50 for the HDD.

I have never used a server before, so it's possible I don't understand what I'm talking about. There are plenty of extremely knowledgeable people on here- like Nothinman- who I trust will jump in to correct me, if I am wrong however.

This is the way I would proceed.

In addition to the disks, you will need disk imaging software. I think the best is Acronis or Paragon- both of which, I think supports alignment. It's unlikely your disks are aligned, so it's important to correct this for the upgrade.

After receiving your components, and reviewing the SW instructions, but before you are ready to actually upgrade, install both disks into a Windows 7 computer and test them. Create a 350GB partition at the beginning of the HDD, and while your at it partition the remaining space how you like. W7 will align partitions when it creates them, unlike XP, so you won't need to worry about alignment in the future.

Select a time your business will be shut down for a period of time so as to not to inturupt it- such as this upcoming Labor Day weekend. This way you won't feel rushed.

Install the disks into the server, install the SW, and clone the RAID 5 volume to both the SSD and the first partition of the HDD. When you clone from the RAID volume to another disk, the SW should align the new partition on the new disk. Make sure this will happen by asking the SW company, and only buy image SW that will properly align. There are other- more complicated ways to align, but that's why you are buying SW, so make sure they make it easy.

Change the BIOS settings and boot to the SSD. If it works, congratulations.

Again change the BIOS and boot to the HDD. That should also work, but not nearly as fast as the SSD. Still, probably as fast or faster than your previous setup.

At this point you would have 3 functional methods of operating the server- which would have identical information. Obviously, the info will diverge during use.

Use the remaining 600GB of the HDD to make incremental backups of the server so you can go back in time to recover an accidentally deleted file.

Sell the Seagate drives, and buy another 2 1TB Hdds, and 2 external enclosures- eSATA if the server supports it. Encrypt the drives, and clone the internal HDD to each of the external units. Place one unit in another location, and keep one locked away on the premises. Connect and update the locally stored external drive weekly, and swap it with the off site drive every quarter or so. If the off site drive is in another country or continent, it will protect against fire, flood, war, meteor impact, tornadoes, and alien invasion.

It is vital that external drives are encrypted to stop snooping. I highly recommend TrueCrypt.

yinan · Aug 22, 2011

Would the OP even be able to use SATA drives? It looks like he has a controller that only has SCSI connections. I think the best and easiest way to deal with this situation is to buy a new server and P2V this one and increase the drive size.

FishAk · Aug 22, 2011

It looks to me like only the RAID controller is limited to SCSI. The server supports SATA.

The RAID controller can be discarded or sold along with the old drives.

Nothinman · Aug 22, 2011

FishAk said:
You are neglecting the write hole and parity corruption when you assume a RAID 5 array will be able to be rebuilt after a disk failure.

I'm not neglecting anything, people have been running RAID5 for years just fine. Sure that problem exists and can bite you, but the chances are relatively low. So claiming that RAID5's reliability is equal to RAID0 is a blatant lie. If you're going with that level of hyperbole you should also recommend that they ditch Windows and switch to FreeBSD or Solaris because ZFS is a lot safer than NTFS.

FishAk · Aug 22, 2011

Nothinman said:
So claiming that RAID5's reliability is equal to RAID0 is a blatant lie.

I think that's a little strong, considering the probability of loosing one of two drives in RAID 0.

Sure that problem exists and can bite you, but the chances are relatively low.

But what do you think of the rest of my recommendation for a server upgrade? You know much more about that than I do.

aE0n · Aug 22, 2011

The reason no company will help, is touching an old server like this is asking for trouble. If anything else breaks while you're working on it, good luck finding the right part in a reasonable amount of time. Which, for a business, is usually a day or less.

If you're going to go ahead with this, you're missing a very important step. You will need a SATA controller and you will have to install the correct storage drivers before you switch your storage controller, or Windows won't boot.

FishAK may be confusing RAID 0 and RAID 1. Losing a disk in RAID 0 is guaranteed data loss. Two drives configured in RAID 1 should be just fine for this setup.

FishAk · Aug 22, 2011

aE0n said:
FishAK may be confusing RAID 0 and RAID 1.

No, I meant to say RAID 0.

aE0n said:
Two drives configured in RAID 1 should be just fine for this setup.

Personally, I think RAID adds complication to the setup that isn't needed. RAID 1 redundancy is no excuse for not also having proper backups.

aE0n said:
you're missing a very important step.

You bring up an important point. However, it looks to me like the OP's server already supports SATA, so the drivers may already be there. In any case, my recommendation was non destructive. If the s@!t hits the fan when he tries to boot to the SSD, he can simply boot back to the original setup.

Mark R · Aug 22, 2011

Personally, I'd take the if it ain't broke, don't fix it approach.

I'd want to keep as much as possible in the server the same. Just swap the drives out/add new ones. You need 80pin SCA drives, which are still available, albeit at enterprise level pricing. However, you may be able to get a deal, as these are coming up to end-of-line. That said, 146 GB direct equivalent drives - should be available for around $150 each - 300 GB 15k rpm drives for around twice that.

With your RAID card you can upgrade the capacity 'online' by simply adding a new drive(s), reconfiguring the RAID and telling the card to reconstruct. Once the new drives are in play, they will appear as unpartitioned space - which you can simply format as 'drive E'.

Alternatively, you can swap the drives out for larger ones, one at a time, letting the RAID recover onto the new drives (you will need to manually, take one drive at a time 'offline' in the BIOS, then swap it out, then manually start a 'rebuild' process in the BIOS, once the RAID has fully rebuilt, you can take the next drive offline and replace it). Once all the replacements are in position, you can create a new RAID volume on the extra space.

I'd be very careful with installing new equipment like SSDs in the server because there may be weird compatibility issues with such old hardware, or other OS related issues.

RAID 5 is OK for 3-5 drive setups. Once you start going beyond 5 then, I'd think about an alternative RAID method. Also, you don't need to worry about the RAID5 'write hole' as with a properly configured RAID card and a modern file-system like NTFS, there are substantial mitigations in place to protect against data corruption.

imagoon · Aug 22, 2011

Certain PERC4's allowed adding disks and others did not. I was able to add disk (extend a lun) to one of my Dell 2850s (perc4) and not another (also perc4.) It has been so long I do recall what the hardware rev was that allowed it. I have had some success replacing all the disks in a RAID then creating a new LUN out of the added space. This will not extend the d drive but give you an E drive to work with.

Another trick however use this with (extreme) caution (ie a full system backup), Buy a larger set of disks say 146GB x 3. Replace each disk 1 at a time to end up with a RAID 5 that is 3 x 146 GB. Then enter the PERC utility and destroy the array. Then rebuild it to RAID 5 using those 3 disks with out initializing. Typically the machine will boot fine and then the extra space will appear at the end of the disk and the D drive can be extended with a simple diskpart extend.

EDIT:

Never use RAID 0. That is straight up silly.

Evadman · Aug 22, 2011

FishAk said:
2 drive RAID 0 is no less safe than 5 drive RAID 5.

I know this has already been mentioned, but that is an absolutely wrong statement. If you were one of my associates and you put a RAID0 array into a production box, I would fire you. Especially if those disks were consumer level SSD's.

The solution the OP has recommended (adding 2 disks to a RAID5 array) will be fine. Increasing the disk capacity of each of the disks, as Mark R suggested will also work. However, I am not a fan of the 'crash 3 times to get a new car' approach of failing disks out of the array one at a time and replacing with a higher capacity. It will work as long as neither of the other 2 drives fail, but that is a lot of risk for a production system. I may reconsider if uptime was of utmost importance.

taltamir · Aug 22, 2011

FishAk said:
Yes you are. It's called progress.

You could replace all the drives with two Intel G2 160GB drives off ebay and run them in RAID 0 for about the same price as two of those 75GB Seagate's you just bought. That setup would run circles around the system you have now- even with no TRIM support.

2 drive RAID 0 is no less safe than 5 drive RAID 5. In either case, you still need backup.

Alternatively, you could buy 4 modern 1TB RAID class drives, and put them in RAID 10. That would also be much faster and safer than your proposed setup, and not that much more expensive.

Another possibility is to buy a RAID card that won't kick a drive off the array if it tries to recover an error. That way, you could buy 4 1TB Samsung F3 drives for about $50 each, and run them in RAID 10.

It should be simple to clone from your present setup to new hardware, but why spend more money on crap when the same money can buy faster and safer?

Very nice writeup and excellent points.

As far as "RAID5 isn't as bad as RAID0"... yea, FishAK is exaggerating, but the gist of what he is saying is correct. The chances of rebuilding a RAID5 array which has 1+TB drives is near zero. And the write hole means losing data even without drive failure (which is something at least RAID0 doesn't do). I would call it a wash on which one is more crappy and wouldn't use either, well actually that is not true, I also think RAID5 is worse... At least RAID0 serves a purpose (speed) and can be used if you meticulously back it up.

There are schemes which are SIMILAR to RAID5 but fix its problems. For example RAIDz. Those are far more reliable. I personally use RAIDz2 (RAID6 like, no write hole, doesn't kick drives off for inactivity so no need for TLER, doesn't abort reconstruction on read errors, instead fixes what it can and leaves a singular corrupt file when it cannot, and uses ZFS for the FS so it includes all the advantages of that too)

If you were one of my associates and you put a RAID0 array into a production box, I would fire you.

He did say RAID0 with backup, that is a commonly used setup in businesses that need higher performance.

bjmanbj · Aug 22, 2011

Thanks for all the input...I should have mentioned that the server runs Win 2003 Server Edition. So, the plan is now as follows:

- Insert 2 hot-swapple drives (for a total of 5), then try to reconfigure the RAID array 'online' using the Dell Openmanage utility.

What is the 'reconstruct' command? I see the option to do a 'reconfigure'. Is 'reconstruct' the same or is it something I have to do AFTER I reconfigure?

Thanks,

Bjman

taltamir · Aug 22, 2011

bjmanbj said:
Thanks for all the input...I should have mentioned that the server runs Win 2003 Server Edition. So, the plan is now as follows:

- Insert 2 hot-swapple drives (for a total of 5), then try to reconfigure the RAID array 'online' using the Dell Openmanage utility.

What is the 'reconstruct' command? I see the option to do a 'reconfigure'. Is 'reconstruct' the same or is it something I have to do AFTER I reconfigure?

Thanks,

Bjman

I have this nagging feeling that your intent to use a live migration is not due to downtime avoidance, but due to not having proper backups of all the data. Please make sure you have multiple backups before touching its hardware.

Nothinman said:
I'm not neglecting anything, people have been running RAID5 for years just fine. Sure that problem exists and can bite you, but the chances are relatively low. So claiming that RAID5's reliability is equal to RAID0 is a blatant lie. If you're going with that level of hyperbole you should also recommend that they ditch Windows and switch to FreeBSD or Solaris because ZFS is a lot safer than NTFS.

1. "The chances are low"? seriously? it has a major bug that causes silent data corruption. To me that is worse then RAID0... when RAID0 explodes you immediately notice and restore from a backup.
2. Suggesting ZFS is hyberbole now? ZFS is better than NTFS, there is no question about it. Anyone who cares about their data integrity should be using it, at least until BTRFS is ready for deployment or google decides to release their google FS to the public (unlikely, its their biggest advantage over yahoo, bing, etc. and is their most closely gaurded trade secret). Speaking of, the OP should get away from windows server 2003 and go with Solaris or FreeBSD to use ZFS because it is a lot than NTFS.

aE0n · Aug 22, 2011

I'd rather restore a corrupt file with no system downtime than restore a whole system which is down as soon as the drive dies.

FishAk · Aug 22, 2011

aE0n said:
I'd rather restore a corrupt file with no system downtime than restore a whole system which is down as soon as the drive dies.

I think everyone would, but what is your point?

Corrupt files is not the same as corrupt parity.

RAID 0 can only have corrupt files.

RAID 5 can have corrupt files and/or parity.

Corrupt parity would cause down time as soon as one drive dies.

imagoon · Aug 22, 2011

taltamir said:
Very nice writeup and excellent points.

As far as "RAID5 isn't as bad as RAID0"... yea, FishAK is exaggerating, but the gist of what he is saying is correct. The chances of rebuilding a RAID5 array which has 1+TB drives is near zero. And the write hole means losing data even without drive failure (which is something at least RAID0 doesn't do).

This isn't at all true. I have successfully build several multi-TB arrays the most recent being a 8TB secondary backup LUN. The risk of write holes and the like are basically zero (on good hardware) with a properly battery backed up cache module. If the write fails to complete, the cache is flushed to disk during the next boot up. You can have write holes on RAID 0 and RAID 1 also esp if you don't have a battery module during an incomplete shutdown. In RAID 0, a sector can be in cache waiting on the erase cycle for example. RAID1 can have the same thing depending on if the controller keeps the disks in lock step (which many don't.) RAID1 you can then have issues where the disks no longer agree. How does the controller know which one is correct (it doesn't always.) During the RAID 1 erase cycle the write is queued so if one disk finished a MS earlier than the other it may have started writing to disk 0 while disk one is still erasing a sector. Power failure at that moment would result in a failed CRC on 1 disk and blank sector on the other. This doesn't matter on a battery backed up cache module since the sector would be pushed during the next power up. At least RAID 1 would tell you something goofy happened. RAID 0 has no way to know.

RAID is to prevent downtime. It is not an backup or an absolute integrity system. Also only cheap crap controller will barf an entire array because of a single sector read error during a rebuild. Most log an alert and will keep on rebuilding the rest of the disk.

Mark R · Aug 22, 2011

imagoon said:
This isn't at all true. I have successfully build several multi-TB arrays the most recent being a 8TB secondary backup LUN. The risk of write holes and the like are basically zero (on good hardware) with a properly battery backed up cache module. If the write fails to complete, the cache is flushed to disk during the next boot up.

RAID is to prevent downtime. It is not an backup or an absolute integrity system. Also only cheap crap controller will barf an entire array because of a single sector read error during a rebuild. Most log an alert and will keep on rebuilding the rest of the disk.

Even without a battery-backed cache, the write hole is of limited relevance. Essentially all RAID controllers, support write-cache flushing (if they have an unbacked write cache). Advanced file systems like NTFS, ext4, ZFS, etc. use this to their advantage by relying on the data order to preserve integrity on key structures. What this means, is that if data is lost due to a "write hole" problem, it is unlikely to involve user visible data, because the file confirms that the parity is intact (by waiting for the RAID card to confirm the write as complete) before treating the data as safe.

If you're running any kind of RAID system, you should be running routine 'scrubs' of your array - to make sure that the parity is valid and not corrupted. Most decent RAID controllers will do this automatically on a schedule.

FishAk · Aug 23, 2011

Evadman said:
If you were one of my associates and you put a RAID0 array into a production box, I would fire you.

When you fire competence, it becomes your competition.

It's quite silly to use RAID 5.

Because RAID 5 relies on parity calculations to rebuild it's data, there is a risk that a single failed drive will bring down the entire volume. But beyond that, the only reason to use RAID 5 is for up-time. Unfortunately, array degradation after a drive failure, and during the very long rebuild process (assuming the rebuild will be successful), significantly reduces performance of the array. The array is busy reading, writing, and calculating, so little resources are left for the “up-time”. Using the degraded array for the desired up-time only extends the rebuild time- increasing the window of opportunity for a second drive failure, and total loss, before completion.

For only slightly more cost in disk space, there are several other RAID configurations that have little, or comparably, a very short time of that degradation while rebuilding. If it's up-time you're shooting for, RAID 5 should be the last choice.

Even RAID 0 can have better up-time than RAID 5. Sure, with a single disk failure the entire array is lost. But with another drive, with an image of the RAID 0 array attached and ready to go, a short reboot to the that volume will have the system right back up. I'm not saying this is ideal or the most economical, but it's better than RAID 5 in terms of performance and up-time- both before and after a single drive failure.

Anyone confusing the durability of a RAID array, and proper backup, will certainly become a victim of data loss in due time.

Nothinman · Aug 23, 2011

FishAk said:
But what do you think of the rest of my recommendation for a server upgrade? You know much more about that than I do.

I was only able to skim it, but I wouldn't trust consumer-level SSDs in a server right now. SSDs just haven't been around long enough to be proven either way.

taltamir said:
1. "The chances are low"? seriously? it has a major bug that causes silent data corruption. To me that is worse then RAID0... when RAID0 explodes you immediately notice and restore from a backup.

Yes, low, as in near zero. I see what you're saying, but reality disagrees since thousands or millions of servers have been running just fine with RAID5 for the past few decades.

taltamir said:
2. Suggesting ZFS is hyberbole now?

In this context, yes. NTFS has issues, but in general it works just fine.

taltamir said:
Speaking of, the OP should get away from windows server 2003 and go with Solaris or FreeBSD to use ZFS because it is a lot than NTFS.

That's exactly the hyperbole I was talking about. No offense to the OP, but if he's asking this question at all he's probably not going to have an easy time setting up FreeBSD or Solaris. Not that I would wish either of those OSes on anyone...

FishAk said:
I think everyone would, but what is your point?

Corrupt files is not the same as corrupt parity.

RAID 0 can only have corrupt files.

RAID 5 can have corrupt files and/or parity.

Corrupt parity would cause down time as soon as one drive dies.

The point is that if one of the drives in your RAID0 array has issues the entire array dies and you have to restore from backup. There is no chance of recovery, hell I don't even know if low level data recovery for SSDs exists like it does for platter hard disks.

FishAk said:
When you fire competence, it becomes your competition.

If you're seriously recommending RAID0 for your client's servers, then I welcome that competition.

FishAk said:
Even RAID 0 can have better up-time than RAID 5. Sure, with a single disk failure the entire array is lost. But with another drive, with an image of the RAID 0 array attached and ready to go, a short reboot to the that volume will have the system right back up. I'm not saying this is ideal or the most economical, but it's better than RAID 5 in terms of performance and up-time- both before and after a single drive failure.

In what world is a little downtime better than no downtime? RAID5 read performance should be better than a 2 drive RAID0 because you've got at least 3 drives striped in the array. Writes are usually slower, but that depends on the implementation.

FishAk said:
Anyone confusing the durability of a RAID array, and proper backup, will certainly become a victim of data loss in due time.

That's probably the smartest thing you've said in this whole thread.

theevilsharpie · Aug 23, 2011

FishAk said:
2 drive RAID 0 is no less safe than 5 drive RAID 5.

This is an incredibly stupid statement.

FishAk said:
You are neglecting the write hole and parity corruption when you assume a RAID 5 array will be able to be rebuilt after a disk failure.

In my 10+ years of managing storage systems with anywhere from 2-200+ disks, I have never once encountered a situation where a RAID 5 array failed to rebuild due to parity corruption. If you're using battery-backed or flash-backed write cache, I don't see how such a situation is even possible.

taltamir said:
Very nice writeup and excellent points.

As far as "RAID5 isn't as bad as RAID0"... yea, FishAK is exaggerating, but the gist of what he is saying is correct. The chances of rebuilding a RAID5 array which has 1+TB drives is near zero.

I've got multiple DAS shelves packed with 12 1TB drives that have successfully rebuilt their RAID 5 arrays on multiple occasions (during production, no less).

That being said, as the number of drives in an array increases, I would prefer to use RAID 6 or one of its nested derivatives.

FishAk said:
But beyond that, the only reason to use RAID 5 is for up-time. Unfortunately, array degradation after a drive failure, and during the very long rebuild process (assuming the rebuild will be successful), significantly reduces performance of the array. The array is busy reading, writing, and calculating, so little resources are left for the “up-time”. Using the degraded array for the desired up-time only extends the rebuild time- increasing the window of opportunity for a second drive failure, and total loss, before completion.

Every RAID controller I have ever used has allowed you to configure whether it prioritizes production I/O or the RAID rebuild. You would have to go out of your way to configure the RAID controller to behave as you describe.

FishAk said:
Even RAID 0 can have better up-time than RAID 5.

With RAID 5, in the event of a disk failure, you might (for sufficiently infinitesimal values of might) lose the array due to parity corruption.

With RAID 0, in the event of a disk failure, you WILL lose the array. Period.

There is no way that RAID 0 will ever have higher uptime than RAID 5.

FishAk said:
I have never used a server before, so it's possible I don't understand what I'm talking about.

Obviously not. Stop posting.

taltamir · Aug 23, 2011

Originally Posted by taltamir
I think everyone would, but what is your point?

Corrupt files is not the same as corrupt parity.

RAID 0 can only have corrupt files.

RAID 5 can have corrupt files and/or parity.

Corrupt parity would cause down time as soon as one drive dies.

You mixed up the quote function there Nothinman, FishAk said that not me.

How to reconfigure a Level 5 RAID drive ??

Junior Member

Senior member

Junior Member

Elite Member

Senior member

Golden Member

Senior member

Elite Member

Senior member

Member

Senior member

Diamond Member

Diamond Member

Administrator Emeritus<br>Elite Member

Lifer

Junior Member

Lifer

Member

Senior member

Diamond Member

Diamond Member

Senior member

Elite Member

Platinum Member

Lifer