Question Medium-sized NVMe array setup suggestions?

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
I'm taking the next step with my storage setup, and will in the near future buy 8 M.2 SSDs - DRAM-less 2TB drives, ideally MLC, but potentially even QLC. Trying to get the best deal, and I can wait a while.

My intended use case is in my home NAS, to host home directories across my other machines - and potentially be able to free up the SSDs installed there mostly for steam libraries. I do expect the steam client to not enjoy having to share the directory, so we will see about that.
Also will likely host some game servers, own-cloud-style services and such - hoping to run either single node k8s or just plain docker/compose, and avoiding VMs.
Got 16 Zen4c and 128 GB of RAM on the machine, but no plans to run 100-concurrent-users-databases on the array - instead it will mostly be media. Worst case will be stuff like Windows AppData.
OS is Gentoo, I've neen running btrfs without major issue for a while (although i got annoyed with it refusing to mount when degraded, unless you specify it, which broke booting off of a single disk.
I heard good, bad and ugly about ZFS - since I am not chasing IOPS significantly, I doubt it will matter.
Will have a local spinning rust backup, so if the worst happens, I should be able to recover most of it. Bonus points, if I can get snapshots of the volumes.

Main considerations:
Don't eat my data please - and allow me something like RAID5/6 so I don't need to throw half the capacity away for RAID 1/10.
Don't eat the SSDs: write amplification of SSDs should not be multiplied significantly. It's bad enough as it is, on these cheap drives.
Make drive swaps easy: The disks will be mounted in an externally accessible bay, and I am looking into NVMe somewhat-hot swapping them, if they break - if the setup then makes me faff around with more than two lines of shell, I'll be annoyed.
Don't get slower than HDDs - regardless of what I am doing. I know that SSDs will already degrade horrendously once you eat through their cache - I need a storage setup where that condition remains the worst case.
Support TRIM - which probably throws out classic SW-RAID, as it gets much harder to track which blocks are used, if the FS needs to pass the information through.

My current default would be BTRFS with "RAID 1", so that all files are replicated exactly once, without any kind of parity overhead - and maybe I'll create a throw-away volume without replication, for scratch data, if needed.
If you have any additional input, on read-heavy "low-cost" NVMe arrays (the disks only cost as much as it costs to get them wired and installed...)

Why am I doing it? It's pretty cool, and I had too many HDDs die on me lately. And disk spin-ups need to go back into the 2000s :D
 

sdifox

No Lifer
Sep 30, 2005
99,587
17,635
126
Umm how are you going to hook up 16 nvme? If you are sharing pcie through bifurcation, you may as well go with SATA SSD. Media content doesn't need high throughout.
 

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
Umm how are you going to hook up 16 nvme? If you are sharing pcie through bifurcation, you may as well go with SATA SSD. Media content doesn't need high throughout.
I built a machine around that:
Lot's of PCIe is fairly easy to get, once you're off the gimped gamer platforms, I'm looking at PCIe v4 SSDs mostly,
both for power/heat reasons with that enclosure, and to have a chance in hell, to get them wired up.
Going to bifurcate one 16x PCIe and two 8i MCIOs into 8 4-lane connections.
Should still be 10x faster than SATA, and SATA SSDs actually aren't that much cheaper these days.
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
8,167
3,101
146
Have you considered buying pulled u.2/u.3 enterprise SSDs? Or do they have to be m.2?
 

sdifox

No Lifer
Sep 30, 2005
99,587
17,635
126
I built a machine around that:
Lot's of PCIe is fairly easy to get, once you're off the gimped gamer platforms, I'm looking at PCIe v4 SSDs mostly,
both for power/heat reasons with that enclosure, and to have a chance in hell, to get them wired up.
Going to bifurcate one 16x PCIe and two 8i MCIOs into 8 4-lane connections.
Should still be 10x faster than SATA, and SATA SSDs actually aren't that much cheaper these days.
wait, you said 8 nvme totally 16tb, not 16 nvme right?

Get two of these and call it a day

I don't think hotswap nvme is a good idea to start with.
 

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
It's definitely not a good idea, but proper external access to my storage devices has become a priority for me as of late.
Taking down the machine, and trying to figure out which device (by missing serial number in sdparm) is the one that I have to pick out got old pretty fast after the third or fourth time, and is not going to get much better with a card-style mounting options.

It would still leave most of the questions I actually have around getting a neat storage setup out of it just as unclear as before.

Since hotplug mostly works for u.2 and u.3, I do think that with should be able to eventually expect success - and if after 14 days of messing around I cannot get a stable setup, I may just send the whole thing back to the retailers, and opt for U.2 after all - maybe grab some used ones with some PBs of write endurance left on them.

M.2 as a consumer standard leaves me with much better options in the future to low-cost expand, and availability of deals with warranty is a sure thing. The same cannot be said for used U.2 - so currently I prefer M.2 for that reason.
 
  • Like
Reactions: Shmee

sdifox

No Lifer
Sep 30, 2005
99,587
17,635
126
I would just get a proper used server as opposed to trying to jerry rig one



 
Last edited:

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
I would just get a proper used server as opposed to trying to jerry right one
Don't have a space for a proper server :(
Even 2U/3U units would need proper mounting, fan replacement for the use case, and you get locked into all kind of weirdness - not quite what I was going for with this build. Also, warranty is nice :D
 

BonzaiDuck

Lifer
Jun 30, 2004
16,454
1,941
126
Actually, I had been considering this to at least augment my drive-pool on a media PC and file server. I have an 8-port Supermicro controller card in the top or first PCIE 3.0 16 slot -- using 8 lanes. I have another PCIE slot that will give me 8 more lanes. I could get either 4TB NVME drives (two) or two 8TB NVME drives, and use a two-m.2 NVME-to-PCIE card providing its own bifurcation. But the NVME drives are costly.

Hard drives will eventually fail -- and fail sooner than either SATA or NVME SSDs.
 

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
"cheap" 4 TB drives aren't too bad either, the premium is minimal, I guess with 8TB drives the pain will be higher for another 2-3 years (depending on advances in manufacturing).
There are also some external 2xM.2 cages, but with just two devices the overall failure rate should be significantly lower, so probably not worth it.
The question remains for you though, what kind of file system can actually take advantage of the NVMe performance, and how it needs to be configured. With the lower failure rate of just two drives, I suppose you don't need to worry about RAID too much, and could just recover from backup, so the simplest approach would be a striped volume, with right-sized stripes so you would get the best of wear-levelling across the devices, and minimum write amplification per drive.

One constraint I noticed the other day, is that I will have to look into encryption/decryption performance, since luks benchmark is giving me 3.5GB/s peak for each thread - I have to double check if it's using the right optimizations.

I assume that write caching is also quite nice, to keep write amplification low, so CoW-file systems are quite nice to deal with power failures or kernel panics.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,454
1,941
126
Right away, because I've made note of this very question, you'd want plenty of PCIE lanes and extra PCIE-x16/x8 capability. The thought of it has even made me forego having a dGPU in the first x16 slot. I was glad to obtain a Workstation motherboard for one system. WS motherboards have a lot of extra PCIE lanes provided through either the CPU or the chipset.

That leaves the issue of the PCIE cards you can buy for two or more NVME drives. The offerings in that market are less limited if your motherboard provides "PCIE bifurcation", so that the lines are independently managed in groups of 4. But there are a comfortable number of those add-in cards which provide the bifurcation separately. I think they use an Asmedia chip that manages the switching.

So with NVME speeds, you might not want RAID0, and for the reliability of NVME, you might not think it necessary for RAID1. I would probably consider putting two or more NVMEs in a drive-pool. That should provide some redundancy so you don't need to worry about what happens if you break a RAID0 array or RAID5 array. I've used Stablebit DrivePool and Stablebit Scanner for many years now.

You can get SSDs of high capacity -- 8TB -- in two flavors: SATA and NVME. You'd think there'd be a difference in price between SATA and NVME, but there isn't. You'll pay something above $600 for each unit. 4GB units are probably in the $200 range.
 

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
The main benefit of striping would be to extend the virtual size of the pSLC cache and to do some wear levelling across the SSDs. Actual read/write performance would still be bounded by encryption performance for the most part, but the kind of SSDs I am looking at, tend to lose 50% of their performance in under a minute of sustained transfer, and then completely collapse in under 10 minutes. Sure, 10 minutes of ~3GB/s is significant throughput, but sustaining that essentially indefinitely by striping across 4 devices would make many of the weakness of the cheap drives go away, and pay off on the concept of having a large number of them.

I've got the PCIe all sorted out with a Siena build ready to take the setup - so now all the software stack issues are more pertinent for me.
The follow on question, is whether to augment the flash-pool via some FS magic with spinning rust in a multi-tier system, but I don't really believe in those. I'll keep the rust as backup - with snapshots, btrfs should make those easy. If only the reliability concerns of yesteryear weren't haunting me :D
 

BonzaiDuck

Lifer
Jun 30, 2004
16,454
1,941
126
I was the IT-go-to-guy in my finance office until I retired in 2000. Since then, I've been building computers, managing my money, corresponding in e-mail, analyzing my spreadsheets -- and maintaining a "Home" LAN and at one time with a Windows Server 2012 Essentials with three or four users under the same roof. My family all died within two years of each other and I'm freaking alone. My cousin moved in to share the utilities and be a helpful housemate -- pulled down my tomato garden today and that chore is behind us.

What I've "learned" has mostly come from Anandtech forums and friends who are more "mainstreamers" than tech-veterans. So I hobble together what hardware I have, hardware I buy, hardware I "need".

I'm starting to get scared as I get older, because I see changes in the commercial digital world with the new OS (win 11), movement of some software makers to a subscription model, moving toward "Cloud" storage when I and my 77-year-old friends are STILL not "Cloud-comfortable".

I became very anxious today when I discovered that Macrium Reflect is not only bereft of its "Free" version, but they've moved to a $50/annum subscription model!

I'm going to break from my anxiety for a while since we're going out to have a Mexican dinner.

I've had RAID systems -- server and workstation. I've currently got a 12 TB drive-pool -- of 3.5" HDDs. I worry about what's going to happen when I can't take care of myself. But there's too much change going on with Windows and everything else. I think I'm going to order at least two blended margaritas.
 
  • Love
Reactions: _Rick_

BonzaiDuck

Lifer
Jun 30, 2004
16,454
1,941
126
For PCIE NVME controller PCIE expansion cards that provide their own PCIE bifurcation, look at model numbers in the 7000's from HighPoint. One of them probably costs $1,900 and sockets 8 NVME sticks. There are some four-drive models for something close to $300. The all do RAID0 and RAID1, or provide independent drive otherwise.

There's also a model by Sonnet -- the M.2 4x4 Silent PCIE (NVME) card.

Also does not require MOBO bifurcation.
 

_Rick_

Diamond Member
Apr 20, 2012
3,967
71
91
If the mobo has MCIO x8 ports, you can break those out into two NVMe slots with an adapter.
Yep, board has two MCIO x8s, and I'll add four more via two breakout-cards, so I can connect SATA breakouts to one, and 4x-breakouts to the other, to get all 32 PCIe and 16 SATA I need. Siena is a pretty awesome bang-for-buck I/O-heavy platform, RAM, CPU and board cost about as much as one of those 7000-series HighPoints :D

The actual SSDs are getting into my range as well now (though I did spot ~8 TB class PCIe3 U.2s in the Sandisk shop at around 450 euros, it didn't look like they wanted to sell to me, as the add-to-cart was disabled), with sub-100euro 2TB M2s with decent TBWs being in reaching range (almost grabbed a set of Biwin Black Opals at ~103). The M.2 enclosures are being pretty stubborn when it comes to reaching their previous prices. Goal would be to keep the whole setup (breakout cards + cables, drives and enclosure) around 1700-1800€, and there's zero movement on the pricing for the connectivity...
Worst case, I'll still be sitting here with a server board on my desk until black friday.

It does feel like I'm the first guy in the whole wide world trying to build a hobby-scale all-flash-array with an open source stack, as opposed to just buying some storage appliance, by how many blog posts I find treating the subject.
Between fstrim, dm-crypt and striped mirrors (or RAID-6 after all? but surely not on BTRFS?), getting sustainable multi-GB/s throughput for sequential access, while staying on likely-to-fail hardware, still doesn't look like a straightforward challenge.
 

BonzaiDuck

Lifer
Jun 30, 2004
16,454
1,941
126
Yep, board has two MCIO x8s, and I'll add four more via two breakout-cards, so I can connect SATA breakouts to one, and 4x-breakouts to the other, to get all 32 PCIe and 16 SATA I need. Siena is a pretty awesome bang-for-buck I/O-heavy platform, RAM, CPU and board cost about as much as one of those 7000-series HighPoints :D

The actual SSDs are getting into my range as well now (though I did spot ~8 TB class PCIe3 U.2s in the Sandisk shop at around 450 euros, it didn't look like they wanted to sell to me, as the add-to-cart was disabled), with sub-100euro 2TB M2s with decent TBWs being in reaching range (almost grabbed a set of Biwin Black Opals at ~103). The M.2 enclosures are being pretty stubborn when it comes to reaching their previous prices. Goal would be to keep the whole setup (breakout cards + cables, drives and enclosure) around 1700-1800€, and there's zero movement on the pricing for the connectivity...
Worst case, I'll still be sitting here with a server board on my desk until black friday.

It does feel like I'm the first guy in the whole wide world trying to build a hobby-scale all-flash-array with an open source stack, as opposed to just buying some storage appliance, by how many blog posts I find treating the subject.
Between fstrim, dm-crypt and striped mirrors (or RAID-6 after all? but surely not on BTRFS?), getting sustainable multi-GB/s throughput for sequential access, while staying on likely-to-fail hardware, still doesn't look like a straightforward challenge.
First of all, _Rick_ . . . . you have my attention with what you are doing. I'll explain why, despite my attention to it, that I'm merely going to make note of the details as I consider further what I myself will do.

So -- please confirm -- is THIS the motherboard you referenced when you said "Siena build"?

I've made a quick Google AI search about a certain benefit of ditching four 3.5" Hitachi spinners in favor of an NVME array of some kind. Power savings are not so great or definitive when the drives are active: the NVMEs will consume between 2W and the expected HDD usage of 8W. But the power saving is greatest at idle: going with NVME would reduce power consumption by about (3.5W x 4)= 14W. I could rather doubt that it would pay for itself with today's prices as we'd discussed -- between $200 and $600 per NVME depending on choice of 4TB units versus 8TB. But it would have some impact on our monthly bill -- or annual bill to achieve a larger number for comparison.

My personal needs seem to be no more than a shared storage space of about 12 TB. With my movie collection thus far, the total of my files are less than 3TB. Any significant growth in my total files depends on addition of movie ISO and DVR captures. These media files are not duplicated in my drive-pool. My serious documents -- everything from vehicle registrations and Medicare statements to tax filings and source documents, family photos and so forth -- are duplicated in my pool.

Assuming I were to spend $300 on the Sonnet or HighPoint controller, the money outlays for NVMEs would be the same. And since I can take this PCIE device and stick it in another ATX motherboard, it wouldn't be an expenditure linked to my current choice of NAS or server hardware. But in my case, I have not yet foresworn using old desktop or workstation hardware for the purpose. And equally for certain, I'm no longer using a version of Windows Server Essentials to my needs. What I "need" is shared, synched storage and software to play movies and music through my Sony Bravia and its ARC/eARC connection to my Sony 5.1 Receiver. And I need to run the software I chose to build the Media PC -- rather than deploy MediaPortal or Plex on my network. For the moment, I use a $50 license to PowerDVD 24. We have enough in streaming media access throughout the house that there is no need to feed my collected-movie-play to other TVs in the house, or even to "cast" material.

But I had always chosen my "cast-off" hardware for any type of home server duty.

What do you think? Should I give up using a Z170 board and i7-6700 processor and replace it immediately or soon with this Siena AMD system? I'm well aware, as I said, that I could take the Sonnet controller and NVME drives and socket them in a different box at will.

Normal people I know don't do these sorts of things as "Mainstreamers". We are a breed most likely found on the Anandtech, Overclock or Tom's forums. At the same time, if a person wanted to start this NVME drive pool with just two 8TB drives, it would cost over $1,500 for the PCIE card and those drives. 4x 4TB NVMEs would make the total outlay about $1,100. To equal the capacity of my four HDDs, I could start the project with 3x 4TB NVMEs, totaling about $900.

As I may have mentioned, two of those HDDs have been spinning for about 7 years.

EDIT OR AFTERTHOUGHT: Looking at the first price data I've seen on the other needed parts, I'd need to invest $1,000 in hardware before adding the NVME purchases.

To take reasonable advantage of such a hardware upgrade, I should go forward to replace my NetGear Nighthkawk router with the latest model, and then looking at a way to get 2.5 MbE across my LAN. If I only had Mainstreamer aspirations .. . .
 
Last edited:
Jul 27, 2020
26,987
18,582
146
What do you think? Should I give up using a Z170 board and i7-6700 processor and replace it immediately or soon with this Siena AMD system?
$500 for the cheapest Sienna CPU and the mobo is about $770. Then you need to decide how many RAM sticks you want to take advantage of all the RAM channels or just use fewer sticks to save costs and ignore the extra channels. Looking at something close to $2000.

Only recommended if you want to get into the homelab scene. Installing these server CPUs also requires a bit more care and precision than desktop ones.