Question Medium-sized NVMe array setup suggestions?

_Rick_

Diamond Member
Apr 20, 2012
3,954
70
91
I'm taking the next step with my storage setup, and will in the near future buy 8 M.2 SSDs - DRAM-less 2TB drives, ideally MLC, but potentially even QLC. Trying to get the best deal, and I can wait a while.

My intended use case is in my home NAS, to host home directories across my other machines - and potentially be able to free up the SSDs installed there mostly for steam libraries. I do expect the steam client to not enjoy having to share the directory, so we will see about that.
Also will likely host some game servers, own-cloud-style services and such - hoping to run either single node k8s or just plain docker/compose, and avoiding VMs.
Got 16 Zen4c and 128 GB of RAM on the machine, but no plans to run 100-concurrent-users-databases on the array - instead it will mostly be media. Worst case will be stuff like Windows AppData.
OS is Gentoo, I've neen running btrfs without major issue for a while (although i got annoyed with it refusing to mount when degraded, unless you specify it, which broke booting off of a single disk.
I heard good, bad and ugly about ZFS - since I am not chasing IOPS significantly, I doubt it will matter.
Will have a local spinning rust backup, so if the worst happens, I should be able to recover most of it. Bonus points, if I can get snapshots of the volumes.

Main considerations:
Don't eat my data please - and allow me something like RAID5/6 so I don't need to throw half the capacity away for RAID 1/10.
Don't eat the SSDs: write amplification of SSDs should not be multiplied significantly. It's bad enough as it is, on these cheap drives.
Make drive swaps easy: The disks will be mounted in an externally accessible bay, and I am looking into NVMe somewhat-hot swapping them, if they break - if the setup then makes me faff around with more than two lines of shell, I'll be annoyed.
Don't get slower than HDDs - regardless of what I am doing. I know that SSDs will already degrade horrendously once you eat through their cache - I need a storage setup where that condition remains the worst case.
Support TRIM - which probably throws out classic SW-RAID, as it gets much harder to track which blocks are used, if the FS needs to pass the information through.

My current default would be BTRFS with "RAID 1", so that all files are replicated exactly once, without any kind of parity overhead - and maybe I'll create a throw-away volume without replication, for scratch data, if needed.
If you have any additional input, on read-heavy "low-cost" NVMe arrays (the disks only cost as much as it costs to get them wired and installed...)

Why am I doing it? It's pretty cool, and I had too many HDDs die on me lately. And disk spin-ups need to go back into the 2000s :D