Building ZFS NAS for Home Use: Mini-ITX, 6 SATA, low-power?

taltamir · Dec 10, 2010

RaiderJ said:
Hmm.. didn't realize de-dup wasn't ready for prime-time, or needed quite that much RAM. Kind of rules it out for a mini-ITX, 4 drive setup, even with 8GB of RAM. It wasn't a critical feature for me, since most of my stored data is movies, where de-dup wouldn't be helpful I'll wager.

deduplication means that if you store 2 copies of the same exact file, it will only store it once but reference it twice... aka, you cut down space by half...
it doesn't matter, at all, if you store movies, sound, or whatever. It is a question of duplication, do you or do you not store the same file multiple times. If you don't then dedup doesn't help, at all.

Well, except ZFS dedup is on the sector level not file level...but the same thing applies.

sub.mesa · Dec 10, 2010

RaiderJ: have you considered a Mini-ITX mobo with 4 SATA but with using a PCI-express HBA card to add 8 more drives? Would give you the ability to use AMD platform (cheap access to ECC), use cheap DDR3 ECC memory (DDR2 ECC much more expensive) and have more disks to expand.

There are LSI 1068E controllers with low-profile bracket, if your casing requires that. Also helps with cable clutter; one mini-SAS cable serves 4 SATA disks.

sub.mesa · Dec 10, 2010

ZFS de-dupe works both at block level (ZVOLs) and file level. On the file level, it can only de-dupe files that are exactly the same; so if a file differs by one byte you don't gain anything by file de-duplication.

However, block-level dedupe works on blocks (i'm guessing up to 128KiB in size); so if you store data on your ZVOL (for example using iSCSI) that shares data with other ZVOLs then you can gain more de-dupe, but works slower and consumes more memory than file level dedupe.

This feature is interesting for specific setups only. If you're interested in performance and stability you may not want de-dupe. Compression may work better, it's alot more stable, simpler and often more effective. It would be best to try to keep data that is good compressible separate from data that is hardly compressible, such as precompressed archives, movies, music, and so on. But alot of other stuff can yield decent benefits from compression. If you have 10TB and compression on your volume saves you 5% space total (some stuff 20% other stuff 0%) then that would be 500GB space saved already. So it could be worth it.

Compression can also be faster on stuff that is very well compressible, such as text files and other such data.

But if you care mostly about performance, the best you can do is save your money for more disks and more RAM; ZFS loves those two in combination. You scale performance basically by adding RAM and disks. Then in the future you add a third-generation SSD equipped with supercapacitor, and use it to power your pool as L2ARC and SLOG device, ready to fly!

To use SSD as SLOG device ("ZIL" or log disk) you would want two things:
- at least pool version 19, since that gives you ability to recover your pool if you lost your SSD entirely or partly (i.e. corrupted); before this version your pool is DEAD if you lose your ZIL device!
- a supercapacitor-equipped SSD (Intel G3 or Sandforce SF2000) to prevent corruption on power loss

So SLOG can be dangerous, but L2ARC feature is safe to use even with corrupting second-generation SSDs. Checksums will protect your data on ZFS! But the SLOG feature requires a really good and safe log device.

grimpr · Dec 10, 2010

Good topic on budget ZFS boxes.

@sub.mesa

Thanks for the tips, much appreciated.

Do we have any data on the reliability of ZFS on FreeBSD 8.x? is the performance good enough compared to Opensolaris?

sub.mesa · Dec 10, 2010

It appears to me that OpenSolaris has somewhat higher performance, particularly in low-memory situations. FreeBSD opted to disable read-ahead prefetching on systems with 4GiB of RAM or lower; which many people tend to run. This hurts their sequential read scores quite a bit. Though i have not seen any really good comparisons. Most are against FreeNAS or untuned FreeBSD 8.x; which does not yield optimal results.

The 4K testing thread on HardOCP forums has alot of benchmarks posted, created using my ZFSguru project which runs FreeBSD 8.1 at the moment. Those show very decent performance after memory tuning; though the scaling is not linear. It would be interesting to do such benchmarking under OpenSolaris too, or any other Solaris derivative. But i'm only familiar enough with FreeBSD to do 'qualified' benchmarks on.

I would not focus your choice of OS solely on subtle differences in performance, though. Other aspects, such as hardware compatibility, features, longevity, support and personal preference may be much more important. FreeBSD has an edge over *Solaris with regard to boot support, as it can boot from RAID-Z/1/2/3, even while degraded, while Solaris can only boot from single disk or mirror; meaning many people buy two separate HDDs just for the system disk; ugh! Money wasted, i think. But, this is just one consideration, and you should make up your own mind on what to run. I do recommend to try out different stuff; especially if it can be run inside Virtualbox or something similar so you can try it out easily without messing with hardware.

As for ZFS' future in FreeBSD, one thing is clear: ZFS is actively being worked on by the FreeBSD people, PJD in particular. ZFS v28 second patchset is coming 'soon' and i'm ready to start building experimental releases based on it. Would love to see how performance scales with all the performance enhancing patches that went into the newer versions. Newer ZFS versions don't only add the documented features; alot of under the hood changes as well! Uncertain what will happen after ZFS v28; there is no CDDL-released code after that version. But even if it stayed at that version, with other fixes like performance/stability, it still is the best filesystem out there usable for your NAS.

Oh you asked about reliability too, i forgot. ;-)
Well, avoid early FreeBSD 'experimental'-labeled ZFS releases, commonly used in older FreeNAS releases. ZFS v6 on FreeBSD was hardly stable at all. ZFS v13 in FreeBSD 7.3 and 8.0 was the first official 'stable' ZFS release in FreeBSD, but v14 added stability fixes and v15 as well. So v15 is probably the best ZFS version to run on FreeBSD platform right now. 8.2 should be V15; though i've not had confirmation yet, its still in BETA1. I'm using it already on 8.1 using a stable patchset. In FreeBSD source commits flow from -CURRENT (9.0) to -STABLE (8.x) to -RELEASE (8.2), so some patches are very well tested and stable, while others can be very experimental and dangerous, such as the ashift patch. Always use caution when rigging your own solution! And test thoroughly before committing real data to it.

Emulex · Dec 10, 2010

how does freebsd deal with:
1. AV drives = TLER=0s (no error correction, just reporting)
2. TLER=7s (wd RE3/RE4)
3. TLER > 8s (consumer drives - up to 90 seconds timeout plus reporting)
#4 - mix of all of the above.

I want to find a o/s that has advanced SATA drivers that can do its own bad sector remapping. even if that means i reserve 13% of the space for it.

any thoughts?

RaiderJ · Dec 11, 2010

taltamir said:
deduplication means that if you store 2 copies of the same exact file, it will only store it once but reference it twice... aka, you cut down space by half...
it doesn't matter, at all, if you store movies, sound, or whatever. It is a question of duplication, do you or do you not store the same file multiple times. If you don't then dedup doesn't help, at all.

Well, except ZFS dedup is on the sector level not file level...but the same thing applies.

I was thinking de-dup would work like compression - and also on-the-fly. Either way, neither one would really be efficient for my file server. The largest chunk of data is 720p/1080p movies, which wouldn't compress or de-dup.

RaiderJ · Dec 11, 2010

sub.mesa said:
RaiderJ: have you considered a Mini-ITX mobo with 4 SATA but with using a PCI-express HBA card to add 8 more drives? Would give you the ability to use AMD platform (cheap access to ECC), use cheap DDR3 ECC memory (DDR2 ECC much more expensive) and have more disks to expand.

There are LSI 1068E controllers with low-profile bracket, if your casing requires that. Also helps with cable clutter; one mini-SAS cable serves 4 SATA disks.

I'm not opposed to an add-on card, although it would have to be low profile for my case. I'm concerned on whether it would be practical, however, since my case only supports 4 data drives. If I were to have a larger case, I'd definitely look to storing more drives.

Would using an add-on card add any issues as far as upgradability? Are they essentially just adding SATA ports via PCI-e?

Are they expensive? Going with an AMD board and ECC memory might be nice, but I'd hate to have to spend more money just to get an extra couple of ports.

RaiderJ · Dec 11, 2010

sub.mesa said:
ZFS de-dupe works both at block level (ZVOLs) and file level. On the file level, it can only de-dupe files that are exactly the same; so if a file differs by one byte you don't gain anything by file de-duplication.

However, block-level dedupe works on blocks (i'm guessing up to 128KiB in size); so if you store data on your ZVOL (for example using iSCSI) that shares data with other ZVOLs then you can gain more de-dupe, but works slower and consumes more memory than file level dedupe.

This feature is interesting for specific setups only. If you're interested in performance and stability you may not want de-dupe. Compression may work better, it's alot more stable, simpler and often more effective. It would be best to try to keep data that is good compressible separate from data that is hardly compressible, such as precompressed archives, movies, music, and so on. But alot of other stuff can yield decent benefits from compression. If you have 10TB and compression on your volume saves you 5% space total (some stuff 20% other stuff 0%) then that would be 500GB space saved already. So it could be worth it.

Compression can also be faster on stuff that is very well compressible, such as text files and other such data.

But if you care mostly about performance, the best you can do is save your money for more disks and more RAM; ZFS loves those two in combination. You scale performance basically by adding RAM and disks. Then in the future you add a third-generation SSD equipped with supercapacitor, and use it to power your pool as L2ARC and SLOG device, ready to fly!

To use SSD as SLOG device ("ZIL" or log disk) you would want two things:
- at least pool version 19, since that gives you ability to recover your pool if you lost your SSD entirely or partly (i.e. corrupted); before this version your pool is DEAD if you lose your ZIL device!
- a supercapacitor-equipped SSD (Intel G3 or Sandforce SF2000) to prevent corruption on power loss

So SLOG can be dangerous, but L2ARC feature is safe to use even with corrupting second-generation SSDs. Checksums will protect your data on ZFS! But the SLOG feature requires a really good and safe log device.

That's one area I haven't quite decided how to go - setting up my OS drives. Currently I have a single 160GB, which I might re-use for ZFS at least to start.

Couple questions:

Would a RAID 1 setup be a minimum for ZFS? What's the risk of a single drive with the OS failing with respect to data loss?

For a file server, does having much, if any, L2ARC provide any benefit? I imagine most of my use will be sequential reads and writes.

Emulex · Dec 11, 2010

NTFS does compression per file. plus you disable compression after compressing the objects without decompressing the objects.

ZFS got that? it is sweet for pc's with more power than io. 5400rpm laptop drive gets a hella improve when compacting the right dirs and filetypes.

sub.mesa · Dec 11, 2010

Emulex said:
how does freebsd deal with:
1. AV drives = TLER=0s (no error correction, just reporting)

No hickups at all but likely a failed array when using light redundancy (RAID-Z, mirroring) due to Bit-Error-Rate of current high-capacity disks being so bad, that you really should avoid such disks. Basically you should consider a RAID5 to be a RAID0, a RAID6 to be a RAID5, and RAID7 (triple parity) to be RAID6. You need one 'disk failure' to compensate for Bit-Error-Rate alone.

2. TLER=7s (wd RE3/RE4)

Same as above, but less worse. Still means a RAID-Z with TLER disks can be dangerous; since losing one disk and rebuilding causes TLER to potentially kill your data by essentially restricting the last method of data recovery that you have left.

3. TLER > 8s (consumer drives - up to 90 seconds timeout plus reporting)

Best behavior for consumers; hickups can be expected but best protection against BER and dataloss in general. If you lost your redundancy and are rebuilding your pool, any UBER (Uncorrectable BER) will get the maximum time of 90 seconds to recover. So imagine your RAID-Z degraded and you are rebuilding and one of the remaining disks has a weak sector. Now you are so happy your disks are not TLER! The disk would not give up after 7 seconds but recover your data. Unless it really can't, ofcourse. But at least you give the disk the opportunity to TRY.

#4 - mix of all of the above.

A mix of the above.

Keep in mind, that corruption on ZFS due to BER may not be that bad if you have a backup or the data is not THAT important. Since you would know exactly which file(s) are affected so no more silent corruption! Also ZFS metadata is replicated, so you should never get a corrupt pool or filesystem damage from a bad sector, unlike other filesystems which are quite susceptible to damaged metadata.

I want to find a o/s that has advanced SATA drivers that can do its own bad sector remapping. even if that means i reserve 13% of the space for it.

Its own bad sector remapping? That sounds archaic. Older filesystems still do this; FAT and NTFS and UFS. But its obsolete since disks manage their own bad sectors now. A bad sector develops as follows:

BER -> read time -> UBER -> Current Pending Sector

Write to the affected sector, and the drive will swap it with an internal reserve sector, the Current Pending Sector in SMART data would disappear and you would see Reallocated Sector Count increased by one. That last means fixed bad sectors that your OS can no longer see. Current Pending Sector is the dangerous one here; these are active bad sectors your OS cannot read and can lead to major problems.

Microsoft tried to solve this issue somewhat in Drive Extender v2 (DE v2) called 'bit swap protection'. But with cancelling DEv2 there is no protection against BER on the Windows platform without resorting to third party solutions.

RaiderJ · Dec 31, 2010

Anyone have thoughts on a 6 SATA drive expansion card that would work well with Solaris/ZFS? Thinking about switching to this case: http://www.fractal-design.com/?view=product&category=2&prod=42

Would prefer to not have any extra hardware, but a cheap(?) expansion card that could be used to connect extra drives might make sense.

taltamir · Dec 31, 2010

No hickups at all but likely a failed array when using light redundancy (RAID-Z, mirroring) due to Bit-Error-Rate of current high-capacity disks being so bad, that you really should avoid such disks. Basically you should consider a RAID5 to be a RAID0, a RAID6 to be a RAID5, and RAID7 (triple parity) to be RAID6. You need one 'disk failure' to compensate for Bit-Error-Rate alone.

Don't you mean data-loss of a few specific files rather then the entire array failing?

Search

Building ZFS NAS for Home Use: Mini-ITX, 6 SATA, low-power?

taltamir

Lifer

sub.mesa

Senior member

sub.mesa

Senior member

grimpr

Golden Member

sub.mesa

Senior member

Emulex

Diamond Member

RaiderJ

Diamond Member

RaiderJ

Diamond Member

RaiderJ

Diamond Member

Emulex

Diamond Member

sub.mesa

Senior member

RaiderJ

Diamond Member

taltamir

Lifer

TRENDING THREADS