ZFS, RAIDZ2, & 4Kn/AF (ashift=12) alignment

destrekor

Lifer
Nov 18, 2005
28,799
359
126
So I've been trying to dig up more and more on a combination of best overall storage space combined with redundancy... but another thing I want to consider is future-proofing for future growth/expansion. I figure in the near future, large disks won't come in 512e, so when I build my NAS and set the ashift size, I kind of want to do it right so I don't screw myself later.

With that said, I also don't want to screw myself by unnecessarily adding a whole bunch of padding that kills storage capacity. Not sure if overall performance gets impacted by having stripe sizes and block sizes not efficiently aligned, but I'd hope to be able to address that too.

Is it still the case that an 8 disk, 4K RAIDZ2 array is going to be far less efficient than a different configuration?
 

Essence_of_War

Platinum Member
Feb 21, 2013
2,650
4
81
Is it still the case that an 8 disk, 4K RAIDZ2 array is going to be far less efficient than a different configuration?
What exactly are you asking?

W/ ashift = 12 AND writing LOTS of small files, it is certainly possible to balloon your disk usage. It's even easier if you're doing all of those small writes in a zvols backed by raidz1,2,3.

The usual recommendation is to use ashift = 12 except for SSDs, some of which are liars and to let them use the default which the ZoL devs have a database for setting ashift= 12 or 13. If additionally, you're also using zvols with raidz1,2,3, consider settting the zvol volblocksize (defaults to 8k) to whichever is greater of:

Code:
2^(4+ashift)
or
Code:
(#disks_in_vdev - raidz_level)*2^ashift

Edit:

Forgot to include this link to some benchmarking with lots of files, lots of directories, and several volblocksize.

https://forums.freebsd.org/threads/37365/#post-207220
 
Last edited:

destrekor

Lifer
Nov 18, 2005
28,799
359
126
What exactly are you asking?

W/ ashift = 12 AND writing LOTS of small files, it is certainly possible to balloon your disk usage. It's even easier if you're doing all of those small writes in a zvols backed by raidz1,2,3.

The usual recommendation is to use ashift = 12 except for SSDs, some of which are liars and to let them use the default which the ZoL devs have a database for setting ashift= 12 or 13. If additionally, you're also using zvols with raidz1,2,3, consider settting the zvol volblocksize (defaults to 8k) to whichever is greater of:

Code:
2^(4+ashift)
or
Code:
(#disks_in_vdev - raidz_level)*2^ashift

Edit:

Forgot to include this link to some benchmarking with lots of files, lots of directories, and several volblocksize.

https://forums.freebsd.org/threads/37365/#post-207220

I guess part of what I am asking is if, absent the use of zvols, if this is even applicable?

I see that it used to be recommended to have a certain number of disks based on RAIDZ level, so that for RAIDZ2, only 6 or 10 disks was recommended, at least when using 4K/Advanced Format disks. For 512N, it doesn't really matter it appears. This is based on the variable stripe size or something like that.

But I have also seen even FreeNAS now says that recommendation isn't that applicable now, so long as you use the enabled-by-default LZ4 compression.

For my purposes, I won't be expected many tiny files at all on my array. Most files will be video (BD rips) and photos (mostly RAW but some high-quality JPEG). I might end up putting my FLAC and MP3 library on it as well. The only small files will end up being whatever small files are found in my system for regular file-level backups (Time Machine on my Mac, and bvckup2 on Windows). Image-based system backups won't be a concern.

I'm still definitely learning ZFS, as much as I can without having it to play with just yet.


As for making the initial question simple: I am planning on an 8-disk (8x3TB HGST) RAIDZ2 array... is 4K not a good idea?
I figure using 512b, while very functional right now and efficient regardless of array size, isn't a good idea in the long run if I want to avoid completely rebuilding an array from scratch whenever I expand or upgrade.
 

Essence_of_War

Platinum Member
Feb 21, 2013
2,650
4
81
I guess part of what I am asking is if, absent the use of zvols, if this is even applicable?

Yes. Ish. It is possible to use up more disk space than one might expect on a zpool w/ ashift=12 w/o using zvols. But possible != something you necessarily have to worry about.

For my purposes, I won't be expected many tiny files at all on my array. Most files will be video (BD rips) and photos (mostly RAW but some high-quality JPEG). I might end up putting my FLAC and MP3 library on it as well. The only small files will end up being whatever small files are found in my system for regular file-level backups (Time Machine on my Mac, and bvckup2 on Windows). Image-based system backups won't be a concern.

With your intended use case, it doesn't sound like this is something you'll likely have to worry about.

As for making the initial question simple: I am planning on an 8-disk (8x3TB HGST) RAIDZ2 array... is 4K not a good idea?
I figure using 512b, while very functional right now and efficient regardless of array size, isn't a good idea in the long run if I want to avoid completely rebuilding an array from scratch whenever I expand or upgrade.

Use ashift=12.

Consider not making one giant vdev. ZFS performance scales with the number of vdevs not with the number of disks.

Additionally, consider using compression=lz4 and atime=off for either the pool or a top-level dataset, let everything inherit those, and not think about either ever again.
 

destrekor

Lifer
Nov 18, 2005
28,799
359
126
Yes. Ish. It is possible to use up more disk space than one might expect on a zpool w/ ashift=12 w/o using zvols. But possible != something you necessarily have to worry about.



With your intended use case, it doesn't sound like this is something you'll likely have to worry about.



Use ashift=12.

Consider not making one giant vdev. ZFS performance scales with the number of vdevs not with the number of disks.

Additionally, consider using compression=lz4 and atime=off for either the pool or a top-level dataset, let everything inherit those, and not think about either ever again.

Thanks!

I've entertained the idea of creating multiple vdevs, but at the same time, this is a small home setup and I don't think I could possibly max out the performance, or at least it won't be a serious bother.
I'd like the double parity of RAIDZ2, and if I were to create two vdevs of 4 disks each, with RAIDZ2 I'd lose 4 disks, in which case I think RAID10 would be much better than RAID60. And I'd be too worried about any resilvers if I settled on RAIDZ1 on the two vdevs to make a RAID50.

In the long-run, much further down the road, I may consider adding an additional 8 disks to create a RAID60 with 8+8. I'm still contemplating possibly settling on just 6 disks to start so that perhaps it'll be sooner that I'd be able to afford an additional 6 disks to make a 6+6 pool. Not sure just yet. I just want to ensure a large available storage capacity but with good redundancy.

Reading more into things, I might end up making a zvol, but not sure if I'll actually need an iSCSI target. I might, to integrate with the Windows Server I plan to use and play around with, but I'll have to determine if that's needed after getting things set up and seeing what I can work with with just CIFS as opposed iSCSI.

Part of all this will be a little experimentation, as I definitely have a need to learn more about enterprise storage administration, so that's why I may be tackling a more complicated ESXi/NAS box than I ought to need for simple home storage.