plans for home file server - to raid or not to raid?

HardTech · Jul 1, 2010

Hi folks

I'm planning on turning my current desktop computer into a file/media server with TV shows, movies, music, programs, pictures, and documents. Right now, the total media I have is about .8 GB spread across a 1 TB drive and a 250 GB drive.

The new server will have about 4 TB with redundancy. I thought about RAID 5, but I've heard horror stories of rebuilding the RAID array when a drive fails or when someone wants to upgrade the drives. I know WHS uses some technique to copy a single file to multiple drives, which sounds interesting.

This got me thinking... would it be better to have 2 JBOD setups that mirror each other? Buy 8 1TB drives and split it down the middle, 4TB media and 4TB redundancy. I could have a weekly job that copies the 4TB media over to the 4TB redundancy disk.

Safe? Stupid? Waste of money?

poofyhairguy · Jul 1, 2010

Unraid

FishAk · Jul 1, 2010

RAID 1 would be good for a file server. Read speeds would be fast, and you only need two disks (of the same size).

RAID 5 should be avoided like the plague.

JBOD costs the same disk space as a RAID array, but you don't get the performance or redundancy advantages.

RAID 10 is the best for recovery and performance, but disk cost is at least four, to take advantage of the performance increase. (note that it's the same capacity cost per byte as RAID 1, but you need four physical disks)

RAID 01 is second best behind 10, due to more chance of recovery after second disk failure.

Redundancy of a RAID array is not the same as Backup. It is very important to still backup data on an array, to a non array disk- regardless of the RAID configuration.

What controller are you considering? Intel says the ICH10R controller supports RAID 10, but I'm pretty sure it's actually RAID 01.

Edit: never mind about the controller- I forgot your considering total software RAID, of which I know nothing.

Rifter · Jul 1, 2010

I went with RAID 5. If doing that definatly get a hardware controller or use linux software raid.

FishAk · Jul 1, 2010

RAID 5 is a delayed garbage can. Eventually, you will loose data due to parity miscalculations.

An expensive RAID controller will help, but it's cheaper and more reliable to simply buy one more disk, and use RAID 01, if you already have a motherboard with Intel's controller built in.

sub.mesa · Jul 1, 2010

Know that hardware RAID is specifically vulnerable to bad sectors and drives that lack CCTL/TLER, and generally causes split and broken arrays. This also counts for the more advanced hardware RAID like Areca.

The only way to escape this, is to buy expensive RAID edition harddrives, or use another operating system than Windows instead, and use Software RAID instead. In that case you don't need RAID edition harddrives, and can use the motherboard onboard SATA ports to build a cheap NAS.

Have you looked at FreeNAS? For the moment it is the easiest way to run a fast software RAID5 or ZFS array. Newer builds of FreeNAS now have stable ZFS i believe, and it supports RAID-Z which is comparable to RAID5, but a lot better. ZFS uses checksums to protect the integrity of your data, and simply put, is a lot more reliable than all other RAID with traditional filesystem which does not use checksums to protect against corruption.

I will end by saying that RAID1 rarely is useful to home users. It would be better for your data security and for power consumption if you build two arrays instead of one big mirrored array. With two arrays, you can speak of a backup, which RAID1 is not.

FishAk · Jul 1, 2010

sub.mesa said:
I will end by saying that RAID1 rarely is useful to home users. It would be better for your data security and for power consumption if you build two arrays instead of one big mirrored array. With two arrays, you can speak of a backup, which RAID1 is not.

My understanding is that RAID 1 reads data from both mirrors concurrently, and so would be nearly as fast as RAID 0 for reads. For a file server, this sounds ideal, as although filling the disks would take relatively longer, like with one disk, one would benefit from both some redundancy and increased file distribution speed. These benefits would be at no additional cost to the OP's JBOD except for the fact that you still need a separate backup. Am I on the right track?

sub.mesa · Jul 1, 2010

If you consider the redundancy that RAID1 offers just as an extra and have that redundancy alongside a true good backup, then yeah i don't see any problem with that. Personally i would RAID0 it, though. But RAID1 does provide higher uptime and ofcourse can protect against a totally failed disk; though i will say that misbehaving disks cause more problems with arrays then totally failed disks which is still rather rare.

However, speed gains with RAID1 are rare. Linux and FreeBSD have some advanced 'load balance' algoritms; but really most implementations do not accelerate RAID1 at all and read and write at the speed of a single disk. But it is not possible for HDDs to perform the same in RAID1 than in RAID0 when reading; with RAID1 it would have to interleave its own disk; meaning disk1 skips all odd blocks and disk2 skips all even blocks. This means it has to seek alot, which causes RAID1 read performance to be lower than RAID0, and most optimizations meant for increasing RAID1 performance actually do not work. Random reads are another story, there it would be even better than a RAID0 due to never being misaligned.

RAID1 is the only RAID level where alignment does not play a role; RAID1s cannot be misaligned.

HardTech · Jul 1, 2010

I don't care too much about performance. This will all go on my gigabit network, so as long as I'm able to watch movies without it lagging, I'll be fine. Sooner or later, I'd like to build a freakishly fast desktop with RAID0 across 4 SSD drives, but that's for another discussion

The benefits of JBOD over RAID1 is that if someone erases something from the server (which is unlikely since I'll set it up so that only I can write to the disk), I can restore it from the other array.

And yes, I'm considering software raid. The server itself is pretty beefy (Intel E6400 with 2GB memory) for a file server, and I don't plan on having much of anything else on there. I'm still open to using FreeNAS or OpenFiler. UnRaid just seems a bit too expensive for my purposes.

pjkenned · Jul 1, 2010

I like Openfiler, but FreeNAS is easier to use and for that amount of storage, will be sufficient. I would suggest (assuming you want ZFS) looking at OpenSolaris too, but realistically, FreeNAS is easier to use (and sub.mesa is a FreeBSD/ZFS guru).

If you spend money on an OS, WHS is basically the best home media server around when you have Windows/ Xbox360 clients and it comes with really nice backup software.

aceO07 · Jul 1, 2010

FishAk said:
RAID 5 is a delayed garbage can. Eventually, you will loose data due to parity miscalculations.

An expensive RAID controller will help, but it's cheaper and more reliable to simply buy one more disk, and use RAID 01, if you already have a motherboard with Intel's controller built in.

Do you have any resources or links where I can learn more about raid5 and parity miscalculations? I tried to google it, but the only relevant result seemed to be your post.

RebateMonger · Jul 1, 2010

I don't have any links about that particular statement, but data losses with RAID 5 arrays are far from uncommon. I've seen a couple of two-drive failures, either simultaneously or from people who didn't monitor their RAID 5 array and didn't replace a failed disk in time. I don't really care what kind of RAID folks use, as long as they keep separate backups, but many folks assume that their RAID 5 array IS their backup, and that sometimes leads to big problems.

Some articles to ponder:

Why RAID 5 stops working in 2009:
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

BAARF:
http://www.baarf.com/

FishAk · Jul 1, 2010

Try this:
http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

I saw another someplace, but I can't remember where.

taltamir · Jul 1, 2010

FishAk said:
RAID 1 would be good for a file server. Read speeds would be fast, and you only need two disks (of the same size).

RAID 5 should be avoided like the plague.

JBOD costs the same disk space as a RAID array, but you don't get the performance or redundancy advantages.

RAID 10 is the best for recovery and performance, but disk cost is at least four, to take advantage of the performance increase. (note that it's the same capacity cost per byte as RAID 1, but you need four physical disks)

RAID 01 is second best behind 10, due to more chance of recovery after second disk failure.

Redundancy of a RAID array is not the same as Backup. It is very important to still backup data on an array, to a non array disk- regardless of the RAID configuration.

What controller are you considering? Intel says the ICH10R controller supports RAID 10, but I'm pretty sure it's actually RAID 01.

Edit: never mind about the controller- I forgot your considering total software RAID, of which I know nothing.

All very true... I wanted to add to what you said about JBOD though:
JBOD costs the same disk space as a RAID array, but you don't get the performance or redundancy advantages. AND JBOD has the same increased unreliability as a RAID0 array, increasing your chance of data loss compared to having just a bunch of disks who are NOT in "just a bunch of disks" mode.

(the name is confusing, JBOD might refer to independant or unconnected disks, but usually refers to concatenation, where all the physical disks are concatenated and presented as a single disk with decreased realiability and absolutely no benefits compared to unconnected disks)
http://en.wikipedia.org/wiki/Non-RAID_drive_architectures

Mark R · Jul 1, 2010

aceO07 said:
Do you have any resources or links where I can learn more about raid5 and parity miscalculations? I tried to google it, but the only relevant result seemed to be your post.

His description is very vague and unhelpful, so it's not surprising you've not had much luck.

There are several problems with RAID 5 (some of which good quality hardware and/or software can mitigate):
1. You only have 1 redundant drive. This means that RAID5 has the lowest level of protection of any RAID. RAID 0+1 (mirrored stripes) have slightly more redundancy - they can always recover from 1 lost drive, and sometimes recover from 2 lost drives. The more drives in your RAID 0+1, means less chance of recovering from 2 drive failure. Higher redundancy requires RAID 6 - which always permits recovery from 2 lost drives.

RAID 5: 0% chance of surviving 2 drive failure.
RAID 0+1: 50-66% chance of surviving 2 drive failure - chance goes down as more drives are added.
RAID 6: 100% chance of surviving 2 drive failure

2. RAID 5 (and also RAID 6) has relatively slow writes. If you have a 5 drive RAID5, and you write 1 sector. To calculate the parity, the RAID controller must first read the equivalent sector from 3 drives, then calculate the parity, and then save the parity to the 5th drive. This is slow. Good quality hardware/software can cache the stripes, or cache writes so that the system doesn't lag. Motherboard RAID is piss-poor at this, and Intel or Jmicron integrated RAID 5 is almost unusably slow.

3. Because of the delay in saving the parity data (point 2 above) - everytime data gets saved, there is a period when the parity information on the drives is out-of-date, and the new parity is in the process of being calculated. If the system crashes or the power goes out, the correct parity never gets written - and the fact that it never gets written is never recorded or noticed. This is sometimes called the RAID 5/6 'write hole'. Sometime later, a drive may fail, at which point the incorrect parity gets used for recovery - corrupting the data.
Good quality RAID systems automatically 'scrub' the array - they automatically read all the drives and all the parity, and check that the parity all matches. If the parity doesn't match, it is recalculated. This way, after an unclean shutdown, a routine weekly 'scrub' would recalculate the parity, drastically reducing the risk of data corruption. High-end hardware RAID cards do this as standard. Linux software RAID can do this (but needs to be configured to do this). Windows software RAID doesn't. Motherboard RAID - take a guess.
There are also techniques to work around the 'write hole'. Modern linux variants (kernel 2.6.33 and up) correctly handle 'write barriers' in RAID - this means that when the OS writes critical system data, it will wait for the parity to be calculated and written, before updating the file system journal to reflect the fact. In this case, no critical data could be corrupted due to the 'write hole'. The worst that happens, is that the 'update complete' entry gets corrupted from the journal, which would still leave the file system stable.

4. RAID 5 and especially 6, require large volumes of mathematics to perform the parity calculations. A flaky CPU or hardware RAID processor can cause massive parity corruption, which may be undetectable without scrubbing - leaving the array at risk of corruption in the event of a drive failure. RAID 1/0+1/1+0 are much simpler as no parity calculations are required.
People do talk about the CPU usage of software RAID 5 being an issue. It isn't. Modern CPUs can do the calculations far faster than it is possible to pump the data out. E.g. a core i7 uses only a fraction of a single core when writing to a 12 drive RAID 6 at 1 GB/sec.

5. RAID 5 is slow when running with a missing drive. If a program requests 1 sector from the missing drive, this requires a parity calculation which requires reading the matching sector from all the other drives - so in a 5 drive RAID 5 with a missing drive, 1 read request suddenly becomes multiplied onto all 4 drives. Under normal circumstances 1 read request would go to one of the 5 drives - so on average one read request causes 0.2 read requests on each hard drive. With a missing drive, this goes up to 0.36 read requests per hard drive. In effect, the array gets thrashed twice as hard if a drive is missing - and that's before you need to start hitting the drives to rebuild the array. This isn't much of an issue on a home box - but on a multi-user or database server, this could be crippling.
That said, this is still an issue with RAID 1+0 - whether the increase in thrashing is less than, or worse than RAID 5 depends on the controller algorithms. I'm not aware of a detailed analysis of what behaviors different products have.
A RAID6 with 2 missing drives is even slower - but that's the price you pay for parity.

taltamir · Jul 1, 2010

to add to what Mark R said (all of which is true)
6. mobo based RAID5 will often drop drives from the array due to various reasons... (slow TLER, updating the bios, replacing mobo battery, etc)

7. When you recover the array, a single unrecoverable read error will cause the entire process to abort with a message about it failing due to unrecoverable read error (instead of just losing that one file). This is a problem because those occur about once in every 10TB (same as it has for many years), so on a 5x2TB array you are likely to see one, resulting in a failed rebuild in case of a drive failure.

It is possible to have RAID5 like system that eliminates many of those issues. ZFS's RAIDZ is the equivalent of raid5, yet it eliminates most of the problems mentioned here (like #7, and #3).

HardTech · Jul 1, 2010

one of the reasons why I'm leaning towards JBOD is that I can just add a drive to a JBOD array. No need to recalculate parity or heal the array or anything like that.

I could do a RAID 1 with regular backups, but then costs will start to balloon into something I'm not really willing to pay. Is there a free or open-source backup tool that will compress the files and back it up to a single file relatively quickly?

I'm thinking of buying an Apple Time Machine with 2 TB to act as an overall backup server, both for my laptop and NAS if possible. Would it be possible to backup say 2.5 TB to a 2TB disk?

aceO07 · Jul 1, 2010

Thanks guys. Very informative. Lots of reading to go through.

taltamir · Jul 1, 2010

HardTech said:
one of the reasons why I'm leaning towards JBOD is that I can just add a drive to a JBOD array. No need to recalculate parity or heal the array or anything like that.

and what if you want to remove a drive?

and if one drive fails, the entire array is gone.
It is better to just have them not be in an array, period.
JBOD is inferior in every shape and form to simply having the drives be separate.

pjkenned · Jul 2, 2010

HardTech said:
one of the reasons why I'm leaning towards JBOD is that I can just add a drive to a JBOD array. No need to recalculate parity or heal the array or anything like that.

I could do a RAID 1 with regular backups, but then costs will start to balloon into something I'm not really willing to pay. Is there a free or open-source backup tool that will compress the files and back it up to a single file relatively quickly?

I'm thinking of buying an Apple Time Machine with 2 TB to act as an overall backup server, both for my laptop and NAS if possible. Would it be possible to backup say 2.5 TB to a 2TB disk?

If this is really what you want... just buy a WHS. Deduplication saves me hundreds of GB's in storage and you can pop drives in and out pretty easily.

FishAk · Jul 2, 2010

HardTech said:
I'm planning on turning my current desktop computer into a file/media server with TV shows, movies, music, programs, pictures, and documents. Right now, the total media I have is about .8 GB spread across a 1 TB drive and a 250 GB drive.

HardTech said:
I could do a RAID 1 with regular backups, but then costs will start to balloon into something I'm not really willing to pay. Is there a free or open-source backup tool that will compress the files and back it up to a single file relatively quickly?

I can understand your concern about storage costs ballooning. However, you could buy two additional, 1TB drives now for somewhere in the neighborhood of $130. With two drives in RAID 1, and one for backup, you would have 1000 times more capacity than your current data volume. Or skip the RAID, and just buy one additional disk to get the same capacity.

In the future, when you have more data to store, you can buy more storage at much less cost per byte. Like other technology, digital storage performance (capacity and speed) is increasing exponentially, for no increase in price. The cost will only balloon if you try to buy today, what you anticipate you will need tomorrow. There will be new and better stuff tomorrow, that isn't even available today, so just get what you need right now.

HardTech · Jul 2, 2010

pjkenned said:
If this is really what you want... just buy a WHS. Deduplication saves me hundreds of GB's in storage and you can pop drives in and out pretty easily.

I would, but as soon as I turn my desktop into a server, I'll just be an Apple-only household. I heard that Apple and WHS aren't the best match

HardTech · Jul 2, 2010

FishAk said:
I can understand your concern about storage costs ballooning. However, you could buy two additional, 1TB drives now for somewhere in the neighborhood of $130. With two drives in RAID 1, and one for backup, you would have 1000 times more capacity than your current data volume. Or skip the RAID, and just buy one additional disk to get the same capacity.

In the future, when you have more data to store, you can buy more storage at much less cost per byte. Like other technology, digital storage performance (capacity and speed) is increasing exponentially, for no increase in price. The cost will only balloon if you try to buy today, what you anticipate you will need tomorrow. There will be new and better stuff tomorrow, that isn't even available today, so just get what you need right now.

gotcha... thanks for the clear breakdown. I'll wait until 2 TB hard drives start to fall under the $100 mark. BTW, I had a typo.. I have about .8 TB, not .8 GB.

So the general consensus is:

RAID 5 - sucks for backup
RAID 1 - better than RAID 5
RAID 6 - better than RAID 1, but a little more difficult to set up and a nightmare if something goes wrong
RAID 1 + non-array disk for backup - best solution for me

what if I have a 6 2TB drives in a RAID 1 setup, so 6 TB mirrored? Wouldn't the only way to back everything up be on a similar 6 TB disk array? Would JBOD be good here?

taltamir · Jul 2, 2010

RAID 6 - better than RAID 1, but a little more difficult to set up and a nightmare if something goes wrong

Whoa? who said that? RAID6 sucks, it has slight advantages over RAID5 as well as even poorer performance, but it sucks.

RAID1 is epic win, you should use it.

pjkenned · Jul 2, 2010

HardTech said:
I would, but as soon as I turn my desktop into a server, I'll just be an Apple-only household. I heard that Apple and WHS aren't the best match

I would double check that research. It is fairly easy to setup a WHS as a TimeMachine target (I've done it only maybe 5 times but it is simple to do). You don't have to do much to keep your iTunes library on the WHS either. Plus, the WHS can even use handbrake and transcode x264 video for iPods, iPhones, and Macs if you want to unload a 100% CPU utilization task off of those PCs. While WHS has some really great Windows features, it works perfectly fine as shared network storage.

And BTW taltamir: HW raid 6 on adaptec/ areca IOP348 cards really don't see that poor of performance with Raid 6, especially for home server uses. It isn't like you are going to have 5,000-10,000 users banging the same server 24x7 in a home environment. Unless your goal is to saturate, at most one GigE link, Raid 1 alone with spindle disks is not going to get you performance. Raid 6, Raid-Z2, Raid-DP and etc are all pretty much standard if you want redundancy and speed... but you do need a lot of spindles. Then again, if you start mixing RAID 5/6 with poor hardware, you will probably see issues arise.

plans for home file server - to raid or not to raid?

Golden Member

Lifer

Senior member

Lifer

Senior member

Senior member

Senior member

Senior member

Golden Member

Senior member

Diamond Member

Elite Member

Senior member

Lifer

Diamond Member

Lifer

Golden Member

Diamond Member

Lifer

Senior member

Senior member

Golden Member

Golden Member

Lifer

Senior member