Best way to connect two Intel 905p drives (not RAID)

PaulStoffregen

Junior Member
Oct 1, 2018
9
0
6
Anyone have advice or ideas about the best way to connect 2 Intel Optane 905p drives, as individual drives (not RAID)?

Currently I have an ASUS Prime Z370-A motherboard, with a single GTX1060 video card in the main pcie16 slot, and 3 I/O cards in 3 of the 4 pcie1 slots. The other two pcie16 slots are empty.

The path of least resistance might look like just getting both Intel 905p's as pcie4 cards and putting them into the two open pcie16 slots. As nearly as I can tell from the motherboard manual, the two main pcie16 slots would then run as 8 lanes each from the CPU. Apparently the 3rd pcei16 slot is always only 4 lanes supplied from the Z370 chipset.

But that has 2 drawbacks. I'd like to leave that last pcie16(4 lane) slot open for a 10G ethernet card in a couple years. Maybe this 2nd thing is a non-issue, but I've seen some benchmarks saying the chipset adds about 2us latency. Normally for nand-based SSDs like Samsung 970 that's almost nothing, but for Optane with ~10 us latency.... or maybe those benchmarks aren't real and the drive would perform the same on chipset lanes as with direct cpu lanes?

If the chipset lanes are ok, then I could get one of the 905p in the U.2 form factor and use the U.2 to M.2 cable to plug it into the main M.2 connector. But every indication I've seen is both M.2 connectors are chipset lanes, and one of them shares 2 of its 4 lanes with some of the SATA ports (but I'm not planning to ever use those), so whether chipset lanes add latency is a big question?

Another option might be the Asus Hyper M.2 X16 card. The Prime Z370-A manual specifically says this card is supported, and there's options in the bios for it (which presumably turn on pcie bifurcation, but the manual doesn't use such technical terms). The manual specifically does say a maximum of 3 of the 4 drives can be used if the Hyper M.2 X16 is in the first pcie16 slot with the 2nd pcie16 slot empty, and a max of 2 of 4 usable if it's installed in the second pcie16 slot.

The big question... would those U.2 drives work, connected through those U.2 to M.2 cables (which Intel ships with the U.2 drive), then though that Hyper M.2 X16 card, and then through whatever lane multiplexing the motherboard does to split the 16 lane to 8 lanes on each slot? Maybe all that stuff connected in tandem is no big deal?

Or maybe the ultimate answer is to scrap the Prime Z370A board (which was cheap compared to even one of those 905p drives), and also the i7-8086K chip (not as cheap) and step up to a X299 motherboard? But to get a chip with 44 pcie lanes adds about $1000 more. I'm also not so excited about those big chips due to lower single threaded performance. Sadly, quite a lot of my typical workload isn't very well multithreaded.

Another option might involve ditching the GTX1060 card and just using the CPU's graphics. I never run any games, so 3D doesn't really matter. I use the card mostly because prior attempts at 4K didn't work on anything but the nVidia cards, mostly graphics drivers on Linux tend to suck - nVidia's close-source driver is the exception. I absolutely need a 4K desktop (but would upgrade to 5K or 8K in a heartbeat if large format monitors existed).

The workload by the way is mostly software development and engineering like PCB CAD. I run only Linux, with a few virtualbox machines for 2 legacy Window programs, but there's no native Windows partition at all. The main motivation for spending so much on storage is speeding up lengthy software compile times, where makefiles and scripts process many thousands of small files (mostly single threaded & lots of filesystem access) and then lots of code is compiled (scales well to many cores). Two drives, allows the /tmp folder (and OS+software) on 1 drive and my home directory is on the other. The compiler makes heavy use of temporary files, which is the main reason to use 2 drives.

But how to best connect them for optimum performance?
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
I haven't done much systematic testing of software compilation, but I suspect you will not see any performance benefit from a second Optane drive unless you upgrade to have far more CPU cores. Depending on what you're compiling, you might also need more RAM than that consumer platform supports in order to use a job count high enough to pose a significant storage workload. Single-threaded software definitely won't justify getting a second Optane drive—just running a storage benchmark that doesn't do anything productive with the data it moves around requires most of a Skylake-class CPU core to keep an Optane drive properly busy. When doing real work on the CPU, the Optane drive will spend most of its time waiting on the CPU, unless you're thrashing it as swap space in which case more RAM would probably be a more cost effective upgrade than more Optane.
 

nosirrahx

Senior member
Mar 24, 2018
304
75
101
You should be able to use M.2 -> PCIe risers to connect anything PCIe 4X3. Finding a way and place to mount 2 PCIe risers will be the hard part.
 

Campy

Senior member
Jun 25, 2010
785
171
116
What kind of capacity are you looking for? There is a 1.5TB 905p coming out soon, and a 750GB Optane M.2 drive.

Honestly, if you're spending thousands of dollars on IO it's probably time to start looking at HEDT platforms too.
 

PaulStoffregen

Junior Member
Oct 1, 2018
9
0
6
I only *need* about 1TB, but that'd be pretty tight. Currently my system has 2 aging SATA SSD drives (everything else was upgraded recently - I believe those old drives are Samsung 840 & 850) with about 60G used on the root+tmp drive and 760GB used on the home directory drive.
 

PaulStoffregen

Junior Member
Oct 1, 2018
9
0
6
Agreed on HEDT. Until earlier this year had a Sandy Bridge E system. Was holding out for cascade lake or maybe the next gen after, but my system died after 6 years of heavy use. Without another LGA2011 (not -3) motherboard, no way to really troubleshoot. Had to replace it in a hurry and without a big budget.

Happy to say the Z370 & i7-8086K turned out to be a moderate but still nice speed increase over the 6 year old HEDT. On a project I often recompile, the i7-3930K took ~18 seconds. The i7-8086K does it in ~12 seconds.

Will probably go back to a HEDT or Xeon in a few years. Hopefully 1 or 2 Intel 905p drives will make the move. But with my hastily purchased i7-8086K working pretty well, and Intel probably under a lot of pressure from Threadripper/Epyc to step up their HEDT game, and Optane DIMM-format persistent memory on the horizon, hardly seems like a good time to pour another few thousand into Skylake-X.

Right now, these SATA SSDs are near the end of their life (yes, I do have automated daily backup) and need to be replaced with something new. I've always used 2 drives, only for root+/tmp and another for /home. But maybe one 905p 1.5TB will do? Or maybe the frugal thing to do is go with a couple Samsung 970s for now, in hopes Optane DIMMs will become available soon?
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
It's not worth RAIDing an Optane setup, because it adds latency, quite a bit more than using the chipset lane. It doesn't matter for sequential, but for low QD random read/writes its quite a bit slower. And that's why you are getting Optane so, don't RAID.

For consumer DIMMs, IF Intel plans to bring it with Cascade Lake-X, that's fall of next year at the earliest.
 

nosirrahx

Senior member
Mar 24, 2018
304
75
101
It's not worth RAIDing an Optane setup, because it adds latency, quite a bit more than using the chipset lane. It doesn't matter for sequential, but for low QD random read/writes its quite a bit slower. And that's why you are getting Optane so, don't RAID.

For consumer DIMMs, IF Intel plans to bring it with Cascade Lake-X, that's fall of next year at the earliest.

VROC for Optane isn't all that bad actually for 4KQ1T1. Even chipset RAID still puts 4KQ1T1 way ahead of anything NAND has to offer.

4 900P VROC 0:

IcG9r9G.jpg


2 800P RAID 0:

8gMZzCX.jpg


That said, waiting for the larger 905P is the simplest solution.
 

PaulStoffregen

Junior Member
Oct 1, 2018
9
0
6
Ah, here's a better description of Hybrid Block Polling.

Linux 4.4 added support for polling requests in the block layer, a similar approach to what NAPI does for networking, which can improve performance for high-throughput devices (e.g. NVM). Continuously polling a device, however, can cause excessive CPU consumption and some times even worse throughput. This release [Linux 4.10] includes a new hybrid, adaptative type of polling. Instead of polling after IO submission, the kernel induces an artificial delay, and then polls after that. For example, if the IO is presumed to complete in 8 μsecs from now, the kernel sleep for 4 μsecs, wake up, and then does the polling. This still puts a sleep/wakeup cycle in the IO path, but instead of the wakeup happening after the IO has completed, it'll happen before. With this hybrid scheme, Linux can achieve big latency reductions while still using the same (or less) amount of CPU. Thanks to improved statistics gathering included in this release, the kernel can measure the completion time of requests and calculate how much it should sleep.

The hybrid block polling is disabled by default. A new sysfs file, /sys/block/<dev>/queue/io_poll_delay has been added, which makes the polling behave as follows: -1: never enter hybrid sleep, always poll (default); 0: Use half of the completion mean for this request type for the sleep delay (aka: hybrid poll); >0: disregard the mean value calculated by the kernel, and always use this specific value as the sleep delay.​
 

PaulStoffregen

Junior Member
Oct 1, 2018
9
0
6
The 905p drive arrived. Got Ubuntu 18.04 installed. A quick benchmark with fio that's supposed to be similar to CrystalDiskMark with Q=1 shows amazing performance.

Code:
Run status group 0 (all jobs):
   READ: bw=345MiB/s (361MB/s), 345MiB/s-345MiB/s (361MB/s-361MB/s), io=5000MiB (5243MB), run=14511-14511msec

Run status group 1 (all jobs):
  WRITE: bw=318MiB/s (334MB/s), 318MiB/s-318MiB/s (334MB/s-334MB/s), io=5000MiB (5243MB), run=15700-15700msec

This is using the stock Linux 4.15 kernel that comes with Ubuntu 18.04, with its default settings. Looking at the sysfs files, 4.15 seems to automatically turn on io_poll, but io_poll_delay is -1, so not using the hybrid mode.


I still have one SATA SDD connected, a Crucial MX500 512GB, for comparison. Here are its results running that same benchmark.

Code:
Run status group 0 (all jobs):
   READ: bw=29.2MiB/s (30.7MB/s), 29.2MiB/s-29.2MiB/s (30.7MB/s-30.7MB/s), io=5000MiB (5243MB), run=170966-170966msec

Run status group 1 (all jobs):
  WRITE: bw=144MiB/s (151MB/s), 144MiB/s-144MiB/s (151MB/s-151MB/s), io=5000MiB (5243MB), run=34759-34759msec
 
Last edited: