Discussion Optane Client product current and future

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
With Carson Beach being a new controller (and available in BGA SSD according to that old roadmap) maybe that will be the controller with the low power to match the best NAND based NVMe SSDs?

You could be right. We can hope for M10 being Carson Beach? Totally speculative though. The pricing list on retailers have a significant premium for M10 so I hope we get more than few power saving features.

P.S. My guess is that Carson Beach will have four channels (slotting in between the two channel of Stony Beach/Brighton Beach the and seven channels of Mansion Beach). If true then maybe 120GB Carson Beach will have 1200 MB/s Sequential write.

The Brighton Beach products have 2x the maximum write performance. That means it may have 4 channels. That means it may need 8 channels to make it 1300MB/s. And with x4 interface the writes can go close to 2GB/s.

I haven't got an answer why the 16GB Stony Beach model has half the write performance of single channel in a 900P device.

One more thing, I noticed Brighton Beach is NVMe 1.1.....but I wonder if Carson Beach will have NVMe 1.2 and support for Host Memory Buffer? If so, will we see any improvement using System RAM? A tiny improvement?

No point. Host Memory Buffer is a cost-saving feature. It still needs to go through the PCIe interface and thus has significantly higher latency than DRAM on a DRAM bus. There will be a negligible gain which will be unnoticeable compared to NVMe 3D XPoint.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
The Brighton Beach products have 2x the maximum write performance. That means it may have 4 channels. That means it may need 8 channels to make it 1300MB/s. And with x4 interface the writes can go close to 2GB/s.

The 900p has seven channels and that one can do at ~2100 MB/s Sequential write (depending on the review, Tom's has it at 2264 MB/s for 280GB 900p). So that would be 300 MB/s Sequential write per channel at both the 280GB and 480GB capacity.

So quad channel should be able to get at least 1200 MB/s (maybe more if they make improvements on next gen?)

I haven't got an answer why the 16GB Stony Beach model has half the write performance of single channel in a 900P device.

Each die has 150 MB/s Sequential write unless it is limited by the controller?

So 32GB (two dies) = 300 MB/s

58GB (four dies) = ~600 MB/s

118GB (eight dies) = ~ 600 MB/s (so limited by both number of channels and 300 MB/s speed per channel?)
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Intel needs to play the game, rather than creating a new metric for performance and hoping people will follow it.

It's true that Storage testing methodologies are synthetic, and frankly, ancient. They are back in 1995 when sisoftware sandra was the main benchmark tool for CPUs.

But, they need to play the game. Capping the highest 900P SSD to 2.5GB/s, 2GB/s, sequential was not a wise choice. Now Samsung will bring the Z-SSD and undercut them in pricing while matching 900P in most of the consumer metrics. While Z-SSD is based on NAND and will suffer considerably when doing simultaneous Read/Write operations, the amount of times people will care about that is infinitesimally small. The Z-SSD nearly closes the gap on random read QD1, which is a metric that Intel uses to promote Optane. On Random writes, Z-SSD uses DRAM buffer, and 99.99% of people will find it fine.

900P should have been maxing out the NVMe interface. We should have been seeing 3.5GB/s read and 3GB/s+ writes. 800P should have had double the sequential write, and priced at $99/$169 maximum. Preferrably $79/$139. Sequential performance affects benchmarking performance, and most people will disregard it after seeing reviews. They are positioning themselves out of the market by artificial segmentation and margin-oriented push. Honestly, Intel needs to tell investors and say they'll sacrifice 4-5% margins so they can have much more competitive products.

3D XPoint products will start to shine in DIMMs, but even before that much more could have been done. They need to realize with all the money grabbing, in a few years they might find themselves in the same position they were with NAND when dozen other competitors overtook them because they chose to coast with X25-M.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Intel needs to play the game, rather than creating a new metric for performance and hoping people will follow it.

Could it be that metric (QD1 4K read) is actually more useful to people with lower end systems? With lower amounts of RAM?

(I'm thinking mainly about the Optane Memory than the 800p when I ask this question, but I think it could also apply to the 58GB 800p in some scenarios)
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Looking at the Newegg listing for 7th generation Core i3 (ie, the processors that would use the H110 chipset) I see 14 out of the 18 systems coming with only 4GB of RAM:

https://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=100019096 8000 601286691 601272810 4814&IsNodeId=1&bop=And&ActiveSearchResult=True&Order=PRICE&PageSize=36

So for folks buying (or OEMs supplying) this level of computer (assume we will get similar PCs with H310 chipset that support optane memory) how many would be better off with a 16GB optane module vs. another 4GB DDR4 stick?

P.S. 4GB DDR4 starts at $43 at Newegg and 16GB optane is $36.38 at Newegg.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
900P should have been maxing out the NVMe interface. We should have been seeing 3.5GB/s read and 3GB/s+ writes.

As I recall when 900P was first introduced (October 2017) NAND controllers capable of 3.5 GB/s read and 3.0 GB/s write (Eg, Silicon Motion SM2262EN) and were not yet into actual shipping products.

So could it be that the Mansion Beach Refresh SSD uses SM2262EN (or maybe even Phison E12) adapted to 3DXPoint?

Or are NAND controllers not really appropriate (or adaptable) for 3DXpoint SSDs?

EDIT: I found this article suggesting the 900P might use an adapted NAND based controller:

https://www.pcper.com/reviews/Stora...VMe-HHHL-SSD-Review-Lots-3D-XPoint/Internals-

DSC01102.jpg



This is Intel's 7-channel controller design that has been around since the SSD DC P3700/SSD 750 (though it's 14-channels in those parts - same PCB layout though). We suspect significant changes had to be made to bring latencies down so significantly. Also note: no DRAM. Who needs DRAM with a few hundred GB of stuff that is almost as fast!
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
So could it be that the Mansion Beach Refresh SSD uses SM2262EN (or maybe even Phison E12) adapted to 3DXPoint?

Or are NAND controllers not really appropriate (or adaptable) for 3DXpoint SSDs?

It's not appropriate. Like the article you linked says, controller has to be really fast, otherwise the total speed will be slow. Silicon Media and Phison controllers are not top performing ones either.

NAND controllers also do TRIM, and have much more advanced wear levelling, in addition to using various buffering schemes to improve performance and endurance. There's also the indirection system.

So even if its electrically and physically compatible, it needs a totally new controller.


On another note, I am wondering what the "Next Gen" Optane Memory will be. Since that's a feature of the 300-series chipset, we'll get to know more. If they kept to the original roadmap, it suggests we'll see Carson Beach with x4 interface going into next gen Optane Memory. At maybe 2GB/s Read and 1.4GB/s write, it would be nice. It might even be a nice pair with really cheap SSDs like the 660p that supposedly only costs $100 for 512GB.

Also added active power states for 800P. According to Newegg.com, the actual release date is March 21, 2018. So it was released to reviewers earlier.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
So the 58GB and 118GB versions have different power usage figures.

This is for the 118GB version.

PS0 Write: 3.75W
PS1 Write: 2.75W
PS2 Write: 2.0W

PS0 Read: 3.5W
PS1 Read: 2.75W
PS2 Read: 2.0W

58GB version.

PS0 Write: 3.25W
PS1 Write: 2.5W
PS2 Write: 2.0W

PS0 Read: 3.0W
PS1 Read: 2.5W
PS2 Read: 2.0W
 
  • Like
Reactions: cbn and Dayman1225

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
Those power numbers for the 58GB roughly match what I've measured. I haven't measured PS1 and PS2 on the 118GB model yet.

When going from PS0 to PS1 or PS2, throughput decreases more than power usage, so PS1 and PS2 are less efficient and won't get you better battery life. They are only useful if your platform cannot provide enough power for PS0 or cannot handle the heat from PS0.
 
  • Like
Reactions: cbn and Dayman1225

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Billy,

What metric do you use for throughput? Because it could be sequential, or random, at either high or low queue depths. The power states can save energy even if it drops power less than it drops throughput. A computer at any given time can be bound by many different components. It may sometimes be GPU-bound, CPU bound, or Network bound. In such cases the I/O may not need to run at full speed. For super fast drives like Optane, its likely it'll find bottlenecks elsewhere too, like the simple fact that most are made for ancient platter HDDs.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Transferring a certain amount of data with slow and steady I/O uses more energy than a race to idle strategy.

It's much more complicated than that on reality. I already gave one reason, and because bottlenecks exist elsewhere on the system. There are others. The race to idle is an old concept.
 

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
The race to idle is an old concept.

It's only an old concept in that it has become nearly universal. Almost any time you have a finite amount of work and the availability of an idle power state, a race to idle strategy will be superior, because in the real world it is much easier to save a lot of power when you're doing nothing than it is to save a lot of power when you're still trying to get something done. This applies to processors, SSDs, wireless networking, and occasionally even GPUs. Almost everything about personal computing is about doing bursts of work and then getting back to waiting on the user.

In the specific case of an Optane 800P SSD, bottlenecks elsewhere in the system don't matter. If you send the drive a sequential read request for 1GB of data that you need 3 seconds from now, the 800P 58GB will deliver the data in time whether it's in PS0, PS1 or PS2. If it's in PS2, it'll take 2.41 seconds to read that 1GB and consume 4.09 Joules of energy during that time, before sleeping for the remaining 0.59 seconds. If it's in PS0, it'll take 0.74 seconds to read that 1GB and consume 2.12 Joules during that time, before sleeping for the remaining 2.26 seconds.

If we take my initial measurements for idle power when the drive's sleeping (714mW), then the total energy usage over that 3 seconds will be 3.74 J when using PS0 and 4.51 J when using PS2. If we take the specified idle power that the 800P should get on a system with fully working PCIe power management (8mW), then it's 2.14 J for PS0 and 4.10 J for PS2. So even when the drive definitely isn't the bottleneck and even the slowest PS2 is more than fast enough, the race to idle uses less energy.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
In the specific case of an Optane 800P SSD, bottlenecks elsewhere in the system don't matter.

Yes, it matters.

Bottlenecks in the rest of the system, for example CPU, and in a vast majority of the cases, due to the mindset of the programmers(because they are used to 40 years of HDDs and having to account for the lowest common denominator) limit 800P's performance to be not fully realized. When you are transferring files, or benchmarking one metric at a time, sure what you say is correct.

Loading time tests show there's little gain after you get a decent, SATA SSD. Tech Report and Tomshardware reviews cover this. The 2.5GB/s read speeds, the insane QD1 speeds, the 10us latency, nothing matters. Yes, there are cases where such numbers benefit real usage, but not all cases do.

Computing requirements are as complex as the many individuals that work with them and built them. That's why modern laptops have nearly a dozen different C states, many different P states and clock steps.

That's why I advocate that power consumption tests, and it only matters if it benefits battery life. In the consumer world, watt numbers doesn't really matter. Notebookcheck does both power measurements and battery life, and there are cases where power measurements don't necessarily reflect battery life numbers. Because its impossible to account for every scenario, and in case of a laptop there's no simple way to measure power usage numbers accurately in a way that'll reflect battery life figures. Unless you open it up, and get some very special tools and circuitry. Why would you do that when you can stick a battery in and let it do its thing?

Look at the 760p review. Tomshardware sticks in on a laptop, and voila! you get your power figure by battery life numbers. And its pretty damn good. You start doing power measurements, and you run into various problems. Intel even does some platform specific power optimizations which won't show on a desktop testing platform and with a power meter. On their marketing sheet for 600p it states with 6th Gen Core laptops it achieves additional power savings.

And Intel knows this too. Starting with Haswell generation, they started implementing such advanced techniques. If you look at some leaked presentations it says something like power aware turbo being used. Because they found out there are scenarios where running flat out isn't the most power saving. They have one for system memory too. Haswell was about a platform based power management where it takes into account every component in the system.
 
Last edited:

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
Yes, it matters.

All you're doing is pointing out that there are other parts of the system that use variable amounts of power. You haven't put forth any specific theory as to how running the SSD in a less efficient power state can enable those other components to save enough to make up the energy deficit from the SSD. When the SSD is in a slower and less efficient power state, which other parts of the system are going to end up being substantially more efficient? Can you at all quantify those savings, to compare against the cost of keeping the SSD in its 1+W active state longer when using the slower NVMe power states?
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
So the 58GB and 118GB versions have different power usage figures.

This is for the 118GB version.

PS0 Write: 3.75W
PS1 Write: 2.75W
PS2 Write: 2.0W

PS0 Read: 3.5W
PS1 Read: 2.75W
PS2 Read: 2.0W

58GB version.

PS0 Write: 3.25W
PS1 Write: 2.5W
PS2 Write: 2.0W

PS0 Read: 3.0W
PS1 Read: 2.5W
PS2 Read: 2.0W

Those power numbers for the 58GB roughly match what I've measured. I haven't measured PS1 and PS2 on the 118GB model yet.

When going from PS0 to PS1 or PS2, throughput decreases more than power usage, so PS1 and PS2 are less efficient and won't get you better battery life. They are only useful if your platform cannot provide enough power for PS0 or cannot handle the heat from PS0.

Hello Billy, do you remember how much throughput decreased? How much lower did the 58GB 800p SSD drop below ~370 MB/s (2.0/3.25*~600 MB/s) for PS2 QD1 and sustained Sequential write?

P.S. I wonder how much endurance is affected by PS2 write at the level of both QD1 and sustained Sequential write? My assumption is that endurance would increase even though I know no official numbers are published on this. (Write speed, endurance, data retention....pick any two of the three right?)
 
Last edited:

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
The raw data (well, rounded):

Code:
Optane SSD 800P 58GB

Random Read
        QD1                QD2                QD4                QD8
PS0     587 MB/s 1.85 W    632 MB/s 1.90 W    968 MB/s 2.33 W   1380 MB/s 2.87 W
PS1     505 MB/s 1.79 W    587 MB/s 1.90 W    952 MB/s 2.41 W    958 MB/s 2.42 W
PS2     322 MB/s 1.55 W    425 MB/s 1.70 W    427 MB/s 1.70 W    427 MB/s 1.70 W

Random Write
        QD1                QD2                QD4                QD8
PS0     349 MB/s 2.27 W    533 MB/s 2.89 W    567 MB/s 3.00 W    576 MB/s 3.00 W
PS1     238 MB/s 1.97 W    300 MB/s 2.19 W    301 MB/s 2.20 W    301 MB/s 2.20 W
PS2     141 MB/s 1.62 W    162 MB/s 1.70 W    162 MB/s 1.70 W    162 MB/s 1.70 W

Sequential Read
        QD1                QD2                QD4        
PS0    1320 MB/s 2.78 W   1425 MB/s 2.92 W   1425 MB/s 2.92 W
PS1     911 MB/s 2.36 W    959 MB/s 2.42 W    959 MB/s 2.43 W
PS2     419 MB/s 1.69 W    427 MB/s 1.70 W    427 MB/s 1.70 W

Sequential Write
        QD1                QD2                QD4        
PS0     593 MB/s 3.06 W    621 MB/s 3.14 W    621 MB/s 3.15 W
PS1     294 MB/s 2.18 W    301 MB/s 2.20 W    301 MB/s 2.20 W
PS2     160 MB/s 1.70 W    162 MB/s 1.70 W    162 MB/s 1.71 W

Note that the numbers I posted earlier were based on the scores that are an average of the QD1, QD2 and QD4 values. There's something odd about the QD4 random read numbers for PS0 vs PS1; I'll re-run this next week and test PS1 and PS2 on the 118GB.

I'm not sure if this throttling necessarily has any effect on endurance. The controller might not be changing the media access parameters at all, and could instead just be choosing to insert small amounts of idle time between operations—too small to enter a low-power idle state, but maybe long enough for us to measure. It might be possible to notice the consequences of something like this by looking at the latency distribution. The Quarch power module samples at 4µs intervals which might not be quite fast enough to discern the difference between a reduced duty cycle of operations and a slower but steadier media access (it's great for looking at individual hard drive seeks). My oscilloscope should be plenty fast to catch individual media accesses, but probably won't have sufficient ADC resolution to show anything interesting. I'll try both to see if I can capture the timing and power usage of individual IOs. But I'm traveling this week to the OCP Summit, so further testing will have to wait until I'm back home to my lab. (My expectation is that they probably haven't implemented multiple low-level media access sequences for this.)

PS: Is anyone else bothered by how this forum renders code blocks with a striped background that doesn't scroll with the content?
 
Last edited:
  • Like
Reactions: cbn

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
All you're doing is pointing out that there are other parts of the system that use variable amounts of power. You haven't put forth any specific theory as to how running the SSD in a less efficient power state can enable those other components to save enough to make up the energy deficit from the SSD.

I'm not sure what you don't get about this.

Let's compare the two, on the code where the application and the CPU presents itself as a bottleneck for Optane. Let's say the bottleneck is 400MB/s for I/O on the QD1 4K Random read metric.

1. Optane running full bore PS0 @ 590MB/s and using 1.85W
2. Optane running PS1 @ 510MB/s and using 1.79W
3. Optane running PS2 @ 320MB/s and using 1.55W


Scenario 1: Optane is capable of getting 590MB/s, but bottleneck means you are really at 400MB/s of performance. 400MB/s divided by 1.85W = 216MB/s/W

Scenario 2: Optane is capable of getting 510MB/s, but bottleneck means you still get only 400MB/s of performance. 400MB/s divided by 1.79W = 223MB/s/W, which means its more efficient

Finally, Scenario 3: Optane is capable of getting 320MB/s, and actually gets 320MB/s because its under the bottleneck limit of 400MB/s. 320MB/s divided by 1.55W = 206MB/s/W.

In this case, you'd want the Optane device to run on PS1 state because it achieves the most MB/s/W figure.


I'm being extremely generous here and assuming the application is capable of handling that much throughput at QD1. A fast NAND NVMe SSD only achieves 50MB/s at QD1, meaning Optane is 10x the performance. I doubt it needs more than PS2 performance vast majority of the time.

less efficient power state

The least efficient power state is actually the one doing more work than its needed. Bottlenecks prevent the system from being running faster.
 
Last edited:

Billy Tallis

Senior member
Aug 4, 2015
293
146
116
Scenario 1: Optane is capable of getting 590MB/s, but bottleneck means you are really at 400MB/s of performance. 400MB/s divided by 1.85W = 216MB/s/W
If the CPU isn't keeping the SSD 100% busy with read commands, then its power draw is not going to be 1.85W any more. The drive will be spending some of its time idle (approximately 32% idle in this example). The distribution of these idle times matters greatly; if any of them are long enough for APST to kick in, then a lot of power will be saved. If they're more evenly spaced short idle periods on the order of tens or hundreds of microseconds, then the drive will be dropping down to the ~1W active idle power between commands.

You're also presuming an infinite workload. This is reasonable when modeling GPU usage for cryptocurrency mining or video gaming with vsync off, but in almost any other client computing workload, the chunks of work (in this case, I/O) to be done are bounded and you'll be hitting idle periods that are long enough for the device to drop down to a sleep state no matter operational power state it was using.

Our ATSB tests can illustrate this. They don't hit the drive with continuous I/O. Instead, real-world I/O patterns are played back to the drive, with long idle times truncated to 25ms. The 58GB 800P averages between 861mW (Light) and 1078mW (Destroyer) on those tests, so clearly APST is kicking in some even for idle times of 25ms or less, and even considering just the time the drive spends in PS0 instead of PS3, it's averaging far less than the 1.85+W that results from keeping the drive 100% busy.