Question Medium-sized NVMe array setup suggestions?

_Rick_

Diamond Member
Apr 20, 2012
3,965
71
91
I'm taking the next step with my storage setup, and will in the near future buy 8 M.2 SSDs - DRAM-less 2TB drives, ideally MLC, but potentially even QLC. Trying to get the best deal, and I can wait a while.

My intended use case is in my home NAS, to host home directories across my other machines - and potentially be able to free up the SSDs installed there mostly for steam libraries. I do expect the steam client to not enjoy having to share the directory, so we will see about that.
Also will likely host some game servers, own-cloud-style services and such - hoping to run either single node k8s or just plain docker/compose, and avoiding VMs.
Got 16 Zen4c and 128 GB of RAM on the machine, but no plans to run 100-concurrent-users-databases on the array - instead it will mostly be media. Worst case will be stuff like Windows AppData.
OS is Gentoo, I've neen running btrfs without major issue for a while (although i got annoyed with it refusing to mount when degraded, unless you specify it, which broke booting off of a single disk.
I heard good, bad and ugly about ZFS - since I am not chasing IOPS significantly, I doubt it will matter.
Will have a local spinning rust backup, so if the worst happens, I should be able to recover most of it. Bonus points, if I can get snapshots of the volumes.

Main considerations:
Don't eat my data please - and allow me something like RAID5/6 so I don't need to throw half the capacity away for RAID 1/10.
Don't eat the SSDs: write amplification of SSDs should not be multiplied significantly. It's bad enough as it is, on these cheap drives.
Make drive swaps easy: The disks will be mounted in an externally accessible bay, and I am looking into NVMe somewhat-hot swapping them, if they break - if the setup then makes me faff around with more than two lines of shell, I'll be annoyed.
Don't get slower than HDDs - regardless of what I am doing. I know that SSDs will already degrade horrendously once you eat through their cache - I need a storage setup where that condition remains the worst case.
Support TRIM - which probably throws out classic SW-RAID, as it gets much harder to track which blocks are used, if the FS needs to pass the information through.

My current default would be BTRFS with "RAID 1", so that all files are replicated exactly once, without any kind of parity overhead - and maybe I'll create a throw-away volume without replication, for scratch data, if needed.
If you have any additional input, on read-heavy "low-cost" NVMe arrays (the disks only cost as much as it costs to get them wired and installed...)

Why am I doing it? It's pretty cool, and I had too many HDDs die on me lately. And disk spin-ups need to go back into the 2000s :D
 

sdifox

No Lifer
Sep 30, 2005
99,547
17,618
126
Umm how are you going to hook up 16 nvme? If you are sharing pcie through bifurcation, you may as well go with SATA SSD. Media content doesn't need high throughout.
 

_Rick_

Diamond Member
Apr 20, 2012
3,965
71
91
Umm how are you going to hook up 16 nvme? If you are sharing pcie through bifurcation, you may as well go with SATA SSD. Media content doesn't need high throughout.
I built a machine around that:
Lot's of PCIe is fairly easy to get, once you're off the gimped gamer platforms, I'm looking at PCIe v4 SSDs mostly,
both for power/heat reasons with that enclosure, and to have a chance in hell, to get them wired up.
Going to bifurcate one 16x PCIe and two 8i MCIOs into 8 4-lane connections.
Should still be 10x faster than SATA, and SATA SSDs actually aren't that much cheaper these days.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
8,161
3,099
146
Have you considered buying pulled u.2/u.3 enterprise SSDs? Or do they have to be m.2?
 

sdifox

No Lifer
Sep 30, 2005
99,547
17,618
126
I built a machine around that:
Lot's of PCIe is fairly easy to get, once you're off the gimped gamer platforms, I'm looking at PCIe v4 SSDs mostly,
both for power/heat reasons with that enclosure, and to have a chance in hell, to get them wired up.
Going to bifurcate one 16x PCIe and two 8i MCIOs into 8 4-lane connections.
Should still be 10x faster than SATA, and SATA SSDs actually aren't that much cheaper these days.
wait, you said 8 nvme totally 16tb, not 16 nvme right?

Get two of these and call it a day

I don't think hotswap nvme is a good idea to start with.
 

_Rick_

Diamond Member
Apr 20, 2012
3,965
71
91
It's definitely not a good idea, but proper external access to my storage devices has become a priority for me as of late.
Taking down the machine, and trying to figure out which device (by missing serial number in sdparm) is the one that I have to pick out got old pretty fast after the third or fourth time, and is not going to get much better with a card-style mounting options.

It would still leave most of the questions I actually have around getting a neat storage setup out of it just as unclear as before.

Since hotplug mostly works for u.2 and u.3, I do think that with should be able to eventually expect success - and if after 14 days of messing around I cannot get a stable setup, I may just send the whole thing back to the retailers, and opt for U.2 after all - maybe grab some used ones with some PBs of write endurance left on them.

M.2 as a consumer standard leaves me with much better options in the future to low-cost expand, and availability of deals with warranty is a sure thing. The same cannot be said for used U.2 - so currently I prefer M.2 for that reason.
 
  • Like
Reactions: Shmee

sdifox

No Lifer
Sep 30, 2005
99,547
17,618
126
I would just get a proper used server as opposed to trying to jerry rig one



 
Last edited:

_Rick_

Diamond Member
Apr 20, 2012
3,965
71
91
I would just get a proper used server as opposed to trying to jerry right one
Don't have a space for a proper server :(
Even 2U/3U units would need proper mounting, fan replacement for the use case, and you get locked into all kind of weirdness - not quite what I was going for with this build. Also, warranty is nice :D
 

BonzaiDuck

Lifer
Jun 30, 2004
16,429
1,932
126
Actually, I had been considering this to at least augment my drive-pool on a media PC and file server. I have an 8-port Supermicro controller card in the top or first PCIE 3.0 16 slot -- using 8 lanes. I have another PCIE slot that will give me 8 more lanes. I could get either 4TB NVME drives (two) or two 8TB NVME drives, and use a two-m.2 NVME-to-PCIE card providing its own bifurcation. But the NVME drives are costly.

Hard drives will eventually fail -- and fail sooner than either SATA or NVME SSDs.
 

_Rick_

Diamond Member
Apr 20, 2012
3,965
71
91
"cheap" 4 TB drives aren't too bad either, the premium is minimal, I guess with 8TB drives the pain will be higher for another 2-3 years (depending on advances in manufacturing).
There are also some external 2xM.2 cages, but with just two devices the overall failure rate should be significantly lower, so probably not worth it.
The question remains for you though, what kind of file system can actually take advantage of the NVMe performance, and how it needs to be configured. With the lower failure rate of just two drives, I suppose you don't need to worry about RAID too much, and could just recover from backup, so the simplest approach would be a striped volume, with right-sized stripes so you would get the best of wear-levelling across the devices, and minimum write amplification per drive.

One constraint I noticed the other day, is that I will have to look into encryption/decryption performance, since luks benchmark is giving me 3.5GB/s peak for each thread - I have to double check if it's using the right optimizations.

I assume that write caching is also quite nice, to keep write amplification low, so CoW-file systems are quite nice to deal with power failures or kernel panics.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,429
1,932
126
Right away, because I've made note of this very question, you'd want plenty of PCIE lanes and extra PCIE-x16/x8 capability. The thought of it has even made me forego having a dGPU in the first x16 slot. I was glad to obtain a Workstation motherboard for one system. WS motherboards have a lot of extra PCIE lanes provided through either the CPU or the chipset.

That leaves the issue of the PCIE cards you can buy for two or more NVME drives. The offerings in that market are less limited if your motherboard provides "PCIE bifurcation", so that the lines are independently managed in groups of 4. But there are a comfortable number of those add-in cards which provide the bifurcation separately. I think they use an Asmedia chip that manages the switching.

So with NVME speeds, you might not want RAID0, and for the reliability of NVME, you might not think it necessary for RAID1. I would probably consider putting two or more NVMEs in a drive-pool. That should provide some redundancy so you don't need to worry about what happens if you break a RAID0 array or RAID5 array. I've used Stablebit DrivePool and Stablebit Scanner for many years now.

You can get SSDs of high capacity -- 8TB -- in two flavors: SATA and NVME. You'd think there'd be a difference in price between SATA and NVME, but there isn't. You'll pay something above $600 for each unit. 4GB units are probably in the $200 range.
 

_Rick_

Diamond Member
Apr 20, 2012
3,965
71
91
The main benefit of striping would be to extend the virtual size of the pSLC cache and to do some wear levelling across the SSDs. Actual read/write performance would still be bounded by encryption performance for the most part, but the kind of SSDs I am looking at, tend to lose 50% of their performance in under a minute of sustained transfer, and then completely collapse in under 10 minutes. Sure, 10 minutes of ~3GB/s is significant throughput, but sustaining that essentially indefinitely by striping across 4 devices would make many of the weakness of the cheap drives go away, and pay off on the concept of having a large number of them.

I've got the PCIe all sorted out with a Siena build ready to take the setup - so now all the software stack issues are more pertinent for me.
The follow on question, is whether to augment the flash-pool via some FS magic with spinning rust in a multi-tier system, but I don't really believe in those. I'll keep the rust as backup - with snapshots, btrfs should make those easy. If only the reliability concerns of yesteryear weren't haunting me :D
 

BonzaiDuck

Lifer
Jun 30, 2004
16,429
1,932
126
I was the IT-go-to-guy in my finance office until I retired in 2000. Since then, I've been building computers, managing my money, corresponding in e-mail, analyzing my spreadsheets -- and maintaining a "Home" LAN and at one time with a Windows Server 2012 Essentials with three or four users under the same roof. My family all died within two years of each other and I'm freaking alone. My cousin moved in to share the utilities and be a helpful housemate -- pulled down my tomato garden today and that chore is behind us.

What I've "learned" has mostly come from Anandtech forums and friends who are more "mainstreamers" than tech-veterans. So I hobble together what hardware I have, hardware I buy, hardware I "need".

I'm starting to get scared as I get older, because I see changes in the commercial digital world with the new OS (win 11), movement of some software makers to a subscription model, moving toward "Cloud" storage when I and my 77-year-old friends are STILL not "Cloud-comfortable".

I became very anxious today when I discovered that Macrium Reflect is not only bereft of its "Free" version, but they've moved to a $50/annum subscription model!

I'm going to break from my anxiety for a while since we're going out to have a Mexican dinner.

I've had RAID systems -- server and workstation. I've currently got a 12 TB drive-pool -- of 3.5" HDDs. I worry about what's going to happen when I can't take care of myself. But there's too much change going on with Windows and everything else. I think I'm going to order at least two blended margaritas.