How much RAM do I REALLY need for a 72TB ZFS configuration?

ethebubbeth · Jan 16, 2014

ZFS gurus, My 16TB (of usable space) NAS is getting full so it's time to expand.

I currently have 2 RAIDZ pools each consisting of a 4x 3TB drive vdev in FreeNAS. The 8GB ram on my ITX e350 board is already insufficient for the 24TB worth of drives I'm running now.

I am moving up to an eventual goal of 2 zpools consisting of two RAIDZ2 6x 3TB drive vdevs each.

I'm familiar with the adage of 1GB ram per 1TB of drive in ZFS, but I know that isn't something set in stone. How linearly do RAM requirements scale with ZFS volume size? I'd be looking at 72TB worth of drives which would mean 72GB ram for storage + 1-2gb overhead for FreeNAS itself using the 1GB=1TB rule.

Going above 32gb means registered memory. I'd probably be looking at a 6300 series Opteron with a Supermicro motherboard at that point. Is that really necessary, or should I be able to get by with 32gb of memory with a cheaper E3-12xxv3 Xeon setup? If 32gb of RAM is still woefully inadequate, would 64gb (4x 16gb sticks) do, or would I need to shoot for 96gb?

Thanks in advance for any recommendations.

Batmeat · Jan 16, 2014

From what I remember scaling isn't directly linear but it's close. I would see how it does with 32 but expect to goto 64. I however, am not the smartest when it comes to a NAS setup.

blastingcap · Jan 16, 2014

1GB of RAM per TB is the rule of thumb, but multiply that by five if you are planning to use dedup.

http://www5.us.freebsd.org/doc/handbook/filesystems-zfs.html

In your case, assuming no dedup, I think you can get away with 32GB as long as you don't mind taking a performance hit.

Have you considered getting a cache drive? Like one or more SSDs? 960GB SSDs have really come down in price lately.

milee · Jan 17, 2014

By my calculations you'll have about 43.5TB of usable space, so 96GB just for storage purposes would be overkill.

If I were you I'd split this amount of storage into two systems on E3 v3 with 32GB ECC each.

Tristor · Jan 17, 2014

I've got a few questions to better help me answer your original question:

1) Why are you building multiple pools rather than expanding a single pool with multiple vdevs and making multiple filesystems within the pool (this is more the traditional ZFS topology)?

2) What features are you using?

3) Do you have SSDs for L2ARC/ZIL?

Point #1 is just a curiosity, since you're kind of violating tribal knowledge with your current build-out, but there are many reasons why you may want to do so.

#2 and #3 are what really determine your need for RAM. If you have SSDs for L2ARC and ZIL, then your RAM requirements drop somewhat, because the BULK of the RAM used by ZFS is for ARC, and having L2ARC doesn't replace the need for ARC, but significantly augments it. Using ZIL drives offsets some of the write performance that would be lost by having less RAM when using checksumming and compression (which you should) by moving the ZIL off of the pool itself.

The general rule of thumb, as you mentioned, is 1GB per 1TB, however I think it can be stretched a lot further, especially if this is not in a business/enterprise environment. In my home media server I'm currently following the 1GB per 1TB rule, however my planned replacement server (currently looking like it'll be needed in about 1.5 years from now) is going to be less than half that (180TB with 64GB of RAM). There's definitely a performance hit involved, but it's pretty minimal, especially if you have larger vdevs and use SSDs for caching.

In your case, I'd be more than comfortable with 32GB, although 64GB would be ideal. You don't need more than 64GB, and you can do 64GB without using ECC memory or server CPUs/Mobos on the AMD side (one of the reasons I use AMD boards as the basis for my media servers). Features like deduplication and encryption GREATLY increase the RAM requirements. If you're not using either of these things and you minimize the use of compression only to where it's strictly beneficial, you can get away with a lot less memory while still maintaining acceptable performance. Just don't try to clown car it.

ethebubbeth · Jan 17, 2014

Thank you all for the food for thought thus far.

I should probably have indicated that the main usage of the NAS is for media storage and consumption. It's also used for backups, projects, etc. I am definitely going to be using ECC memory for my build, since that is one of the final areas of vulnerability in my current setup.

My understanding was that the 1TB=1GB rule was for total raw drive capacity, not for zpool allocated space. Is that an incorrection assumption?

blastingcap said:
In your case, assuming no dedup, I think you can get away with 32GB as long as you don't mind taking a performance hit.

I will potentially be enabling compression but I have no current plans for dedupe since it does require such massive quantities of RAM. My usage model also would not benefit from the space savings of deduplication. Even compression is iffy since most of my usage is already-compressed media.

blastingcap said:
Have you considered getting a cache drive? Like one or more SSDs? 960GB SSDs have really come down in price lately.

I'd rather add more RAM to increase the ARC cache than add mirrored SSDs for L2ARC or ZIL at this time. I am definitely open to adding them down the line, however.

Tristor said:
I've got a few questions to better help me answer your original question:

1) Why are you building multiple pools rather than expanding a single pool with multiple vdevs and making multiple filesystems within the pool (this is more the traditional ZFS topology)?

2) What features are you using?

3) Do you have SSDs for L2ARC/ZIL?

1) Frankly it was something I hadn't considered, and you raise excellent points. I previously kept my 4 drive RAIDZ vdevs in separate pools because the chance of losing another drive in a vdev during a rebuild seemed plausible. I didn't want to lose 100% of my storage should that occur. Furthermore, I wanted to be sure that I could connect a pool to another system should my NAS hardware fail and I needed access to data.

In retrospect, the scenario is different in the new configuration. Most systems wont support 12 drives at a time without an HBA of some sort, so my portability case is moot vs 24 drives in a pool. If you feel that there is not significant risk in 4x RAIDZ2 vdevs of 6 drives each having a vdev fail and losing the pool all at once, I will probably go down that route. Striping across four vdevs instead of two would certainly increase performance.

2) I am not using a ton of ZFS/FreeNAS functionality at the moment. I currently have scheduled snapshots twice a day (retained for a week at a time). Scrubs every 35 days (FreeNAS default). I am not using compression or dedupe (compression is a potential for the new system). I have SFTP and CIFS servers running as well as Transmission as a torrent client. I am not using NFS because of poor performance in FreeNAS 8. I have not tried it again yet since moving to 9 but it is a possibility, especially since CIFS is only one-thread per user.

3) I do not currently have SSDs for L2ARC or ZIL. I might look into adding an SSD mirror for such purposes if I am unhappy with system performance. My understanding is that with ZFS v28, losing the ZIL no longer causes a loss of the pool. Is that correct?

Tristor · Jan 18, 2014

ethebubbeth said:
Thank you all for the food for thought thus far.

I should probably have indicated that the main usage of the NAS is for media storage and consumption. It's also used for backups, projects, etc. I am definitely going to be using ECC memory for my build, since that is one of the final areas of vulnerability in my current setup.

My understanding was that the 1TB=1GB rule was for total raw drive capacity, not for zpool allocated space. Is that an incorrection assumption?

I will potentially be enabling compression but I have no current plans for dedupe since it does require such massive quantities of RAM. My usage model also would not benefit from the space savings of deduplication. Even compression is iffy since most of my usage is already-compressed media.

I'd rather add more RAM to increase the ARC cache than add mirrored SSDs for L2ARC or ZIL at this time. I am definitely open to adding them down the line, however.

If you're doing parity RAID modes, checksumming is basically a requirement. Checksumming + Parity is what allows bad block recovery during scrubs. Also, you cannot use mirrored SSDs for L2ARC. L2ARC requires cache devices to be single device vdevs, however it can support multiple cache devices for L2ARC and caching is spread across them using a fairly efficient and intelligent algorithm. ZIL cache devices should be mirrored, but can be non-mirrored. If you lose your ZIL you're going to have a bad time.

1) Frankly it was something I hadn't considered, and you raise excellent points. I previously kept my 4 drive RAIDZ vdevs in separate pools because the chance of losing another drive in a vdev during a rebuild seemed plausible. I didn't want to lose 100% of my storage should that occur. Furthermore, I wanted to be sure that I could connect a pool to another system should my NAS hardware fail and I needed access to data.

In retrospect, the scenario is different in the new configuration. Most systems wont support 12 drives at a time without an HBA of some sort, so my portability case is moot vs 24 drives in a pool. If you feel that there is not significant risk in 4x RAIDZ2 vdevs of 6 drives each having a vdev fail and losing the pool all at once, I will probably go down that route. Striping across four vdevs instead of two would certainly increase performance.

ZFS doesn't allow restriping within its algorithms, so for this reason striping is not used across vdevs in a pool. Most of the complexity is at the vdev level, including redundancy. So in a pool with 2 raidz2 vdevs, each vdev is essentially an independent unit. The way pools expand is via intelligent writes. So you cannot gain strict performance by adding additional vdevs to a pool. The worst case scenario behaves like concatenation (e.g. 100% write bias to vdev1 till it reaches ~80% capacity, then 100% write bias to vdev2, etc.). Best case scenario behaves somewhat like layered RAID0 [RAID60/50/10] (e.g. 50% write bias to vdev1, 50% write bias to vdev2, with balanced capacity targets). You have no control over this layer of the algorithm, so it's difficult to predict exactly how it performs, but you can guarantee no worse performance than a single vdev by itself, with a possibility of better performance.

What this means is that ZFS's pooling algorithm encourages larger vdev spindle counts when targeting performance. So with a target of 16 drives in my media server I am using 8x3TB raidz2 vdevs with 2 vdevs in the pool. With a target of 45 disks in the new media server plan I will be using 9x4TB raidz3 vdevs with 5 vdevs in the pool. Since both of these configurations will more than meet my required write and read performance values.

Also, keep in mind spindle counts per vdev have strict targets for parity raid based vdevs. In mirrored configurations spindle counts are obviously inherently 2 per vdev with no real limitations to vdevs per pool.

From the ZFS Best Practices Guide you get the following disk count/redundancy recommendations.

Code:

RAIDZ Configuration Requirements and Recommendations
A RAIDZ configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.
Start a single-parity RAIDZ (raidz) configuration at 3 disks (2+1)
Start a double-parity RAIDZ (raidz2) configuration at 6 disks (4+2)
Start a triple-parity RAIDZ (raidz3) configuration at 9 disks (6+3)
(N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups.

The other thing to understand is that resilver and scrub operations are performance limited/impacted by vdev spindle count/vdev size, not by pool size. When you have a pool with multiple vdevs and start a scrub operation on the pool, the scrub performances on the vdevs concurrently (AFAICT). When you require a resilver, this is done on an individual vdev only. Scrubs are per vdev however, so since random I/O speed/IOPS of the vdev + size of the vdev affects, smaller vdevs with faster individual devices are going to obviously scrub/resilver faster.

I personally don't like FreeNAS specifically because it tries to hide all of this away from you. The thing is that ZFS is so amazingly cool, but to really dig into the meat of it you need to use the command-line and you need to tune it yourself for your application. There's nice tuning profiles available in FreeBSD that you can use, and you can do custom tuning parameters as well for how it uses memory that can impact performance in several regards. There's also specific tuning changes that can be made to increase resilver/scrub performance as well. Getting into the underlying system means something like FreeNAS is extraneous, and actually in performance testing FreeNAS tends to use tuning profiles that are less performant than the FreeBSD defaults.

2) I am not using a ton of ZFS/FreeNAS functionality at the moment. I currently have scheduled snapshots twice a day (retained for a week at a time). Scrubs every 35 days (FreeNAS default). I am not using compression or dedupe (compression is a potential for the new system). I have SFTP and CIFS servers running as well as Transmission as a torrent client. I am not using NFS because of poor performance in FreeNAS 8. I have not tried it again yet since moving to 9 but it is a possibility, especially since CIFS is only one-thread per user.

Poor NFS performance is common, unfortunately. On FreeBSD all of the network connectivity to the filesystem is userland (CIFS/NFS/iSCSI) where in the native implementation on Solaris it's builtin (at least NFS/iSCSI is) to the filesystem. Unfortunately that's not the case in FreeBSD. I think FreeNAS NFS performance is worse than you might be able to get doing it manually though.

3) I do not currently have SSDs for L2ARC or ZIL. I might look into adding an SSD mirror for such purposes if I am unhappy with system performance. My understanding is that with ZFS v28, losing the ZIL no longer causes a loss of the pool. Is that correct?

I don't know if it causes a loss of the pool, but it DOES cause data loss, guaranteed if you lose the ZIL. I'm pretty sure it loses the pool though as well. Long story short, you don't want it to happen.

Search

How much RAM do I REALLY need for a 72TB ZFS configuration?

How much RAM is needed for 72TB of Drives in ZFS?

32GB

64GB

96GB

640K RAM ought to be enough for anybody

Your NAS is bad and you should feel bad

ethebubbeth

Golden Member

Batmeat

Senior member

blastingcap

Diamond Member

milee

Member

Tristor

Senior member

ethebubbeth

Golden Member

Tristor

Senior member

TRENDING THREADS