Discussion Optane Client product current and future

Page 20 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
It'd be nice if you can get everything, but that's just not the reality. It has to be competitive in pricing with other SSDs too.

Sequentials are important in not just marketing, because some people definitely care about that. You probably only need about 100MB/s Q1T1 and anything above that is pretty much as worth the same as super high sequentials. Maybe even 50MB/s is enough and the real advantage from an Optane drive is not so much latency as the stability in performance due to its write-in-place nature of the media.

Sequentials are important in file transfers and while not everyone cares about it, its also a real world metric. A product that's as expensive as Optane drives need a clean sweep win against something like the 970 Pro, and it doesn't because of sequentials.

H10 will be more important because it has the potential to drive volume, and without volume Optane series will simply die.
 
Last edited:

nosirrahx

Senior member
Mar 24, 2018
304
75
101
It'd be nice if you can get everything, but that's just not the reality. It has to be competitive in pricing with other SSDs too.

Sequentials are important in not just marketing, because some people definitely care about that. You probably only need about 100MB/s Q1T1 and anything above that is pretty much as worth the same as super high sequentials. Maybe even 50MB/s is enough and the real advantage from an Optane drive is not so much latency as the stability in performance due to its write-in-place nature of the media.

Sequentials are important in file transfers and while not everyone cares about it, its also a real world metric. A product that's as expensive as Optane drives need a clean sweep win against something like the 970 Pro, and it doesn't because of sequentials.

H10 will be more important because it has the potential to drive volume, and without volume Optane series will simply die.

If the technology is doing a good job keeping frequently used small files mirrored in the Optane part there is no reason the H10 couldn't have have blazing performance at the 4KQ1T1 level on top of great sequential speed.

I am now responsible for 5 systems that use an Optane + SATA SSD configuration and the performance is simply amazing. The 58GB 800P + 2TB SATA SSD combo is currently my go to solution for speed, price and capacity. The 2TB 970 EVO is like $500. The 58GB 800P + 2TB 860 QVO is less than $400. Upgrade to the 118GB 800P and you are still cheaper than the 970 EVO.

I am a little worried about the small Optane cache size on these H10 drives. I have done a lot of experimenting and if you want a bunch of left over Optane cache you really need to use a 118GB 800P. The 58GB modules do pretty good on a more typical setup but if you use a lot of software/games the 58GB buffer is nearly full at all times. I can't recommend the 16GB/32GB modules.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Yea, but you are not the typical user. Actually, quite far from it. :)

While in laptops SSDs have greater marketshare, in desktops HDDs are still dominant, because demand for storage is still increasing very rapidly.

Optane part there is no reason the H10 couldn't have have blazing performance at the 4KQ1T1 level on top of great sequential speed.

About the Optane size: DRAM buffers for HDDs and SSDs are tiny, yet they play a tremendous role in performance.

Also, you don't see DRAM's amazing random numbers on SSDs.
 

nosirrahx

Senior member
Mar 24, 2018
304
75
101
Yea, but you are not the typical user. Actually, quite far from it. :)

Not typical sure but Intel is actively standing in the way of Optane's best use case. There really are no typical Optane users.

While in laptops SSDs have greater marketshare, in desktops HDDs are still dominant, because demand for storage is still increasing very rapidly.

The 8TB Micron SSD really is the only huge capacity solid state option, I imagine that will change as QLC starts to become more popular.

I wonder, will see see a comeback of the 3.5 inch SSD? I doubt it, but I would imagine that 32TB is totally possible in that form factor.

About the Optane size: DRAM buffers for HDDs and SSDs are tiny, yet they play a tremendous role in performance.

Of course, I was just commenting on my personal experience with anything less than the 58GB Optane modules. 16GB and 32GB were not super impressive when under heavier usage.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
The 8TB Micron SSD really is the only huge capacity solid state option, I imagine that will change as QLC starts to become more popular.

The limitation to adoption is due to price, and price per GB.

16GB and 32GB were not super impressive when under heavier usage.

The lower capacity modules also have much reduced sequential speeds. That will play a role. There's probably a limit to what can be done with a driver, but H10 likely has tigher integration.

NAND SSDs do with a DRAM buffer of maybe 1GB for the large ones. The H10 PCB has the DRAM buffer for the NAND controller plus Optane.
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,193
4,674
136
If the technology is doing a good job keeping frequently used small files mirrored in the Optane part there is no reason the H10 couldn't have have blazing performance at the 4KQ1T1 level on top of great sequential speed.

I am now responsible for 5 systems that use an Optane + SATA SSD configuration and the performance is simply amazing. The 58GB 800P + 2TB SATA SSD combo is currently my go to solution for speed, price and capacity. The 2TB 970 EVO is like $500. The 58GB 800P + 2TB 860 QVO is less than $400. Upgrade to the 118GB 800P and you are still cheaper than the 970 EVO.

I am a little worried about the small Optane cache size on these H10 drives. I have done a lot of experimenting and if you want a bunch of left over Optane cache you really need to use a 118GB 800P. The 58GB modules do pretty good on a more typical setup but if you use a lot of software/games the 58GB buffer is nearly full at all times. I can't recommend the 16GB/32GB modules.

I have the system in my sig, and would you recommend adding a 58GBP P800 to the system and using Primocache as caching software?
 

nosirrahx

Senior member
Mar 24, 2018
304
75
101
The lower capacity modules also have much reduced sequential speeds. That will play a role.

Of course but I as commenting more on how frequently HDD + 16GB and even HDD + 32GB setups felt like just a hard drive when I was doing several things at once. I guess falling back to NAND speed wont be as painful.

NAND SSDs do with a DRAM buffer of maybe 1GB for the large ones. The H10 PCB has the DRAM buffer for the NAND controller plus Optane.

I have mixed feelings about going this route. I guess I was hoping a simplified design that wouldn't need DRAM at all so that costs could be reduced.


The enthusiast in me has been thinking about a cool 'roll your own' product that Intel could create loosely based on the H10.

Imagine a PCIe AIC that has a DDR4 port, SATA port and M.2 port supporting up to 22110. The intention being for the user to control how much NAND (or HDD), how much Optane cache and how much DRAM cache is on board.

This would allow Intel to unload much of the cost onto the user as they pick out the 3 parts that are right for their specific use case.
 
  • Like
Reactions: cbn

biostud

Lifer
Feb 27, 2003
18,193
4,674
136
Of course but I as commenting more on how frequently HDD + 16GB and even HDD + 32GB setups felt like just a hard drive when I was doing several things at once. I guess falling back to NAND speed wont be as painful.



I have mixed feelings about going this route. I guess I was hoping a simplified design that wouldn't need DRAM at all so that costs could be reduced.


The enthusiast in me has been thinking about a cool 'roll your own' product that Intel could create loosely based on the H10.

Imagine a PCIe AIC that has a DDR4 port, SATA port and M.2 port supporting up to 22110. The intention being for the user to control how much NAND (or HDD), how much Optane cache and how much DRAM cache is on board.

This would allow Intel to unload much of the cost onto the user as they pick out the 3 parts that are right for their specific use case.
That is basically what primocache do.

Level 1 cache is system memory, level 2 fast storage and then the target drives you want to cache.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Experts say that multibit cells on Optane are extremely difficult to do and the only way to scale is using shrinks and stacking layers. Also even of multibit is doable there's a big performance sacrifice so its only viable for SSDs.

There are problems with shrinks and stacking layers as well.

Shrinks: 3D XPoint is already at a bleeding edge process, which is 20nm. DRAM has been doing it in very small steps to go from 20nm to 15nm(20nm, 19nm, 18nm, 16nm) and a single jump to 15nm is nearly impossible.
Multiple layers: Unlike vertical layers on NAND, the processes require many more steps on 3D XPoint. That will increase cost and ultimately how high it can go. Even 8 layers may be pushing it.

Making 3D XPoint viable requires volume. 25x might be what's required.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Making 3D XPoint viable requires volume. 25x might be what's required.

Intel Xe is my hope for that. (Well that along with FPGA and NNP)

And if Intel does particularly well with Xe (ie, Intel 12th Gen and beyond graphics) there is a chance AMD will come on board too.

P.S. Interesting discussion of using AI and ray tracing together as pertains to new Nvidia card

https://www.datamation.com/commentary/nvidia-launches-a-workstation-ray-tracing-revolution.html

One of the fascinating, and I think underplayed, parts of this new card line is an artificial intelligence (AI) component which can be trained to up-convert images. They first render in low resolution and then the AI takes over and converts the image in real time to 4K or 8K. This conversion capability can be applied to most any low-resolution image. The AI learns how to interpolate the needed extra pixels and then reimages the picture or frame to create a far higher resolution result. Interestingly, it can do the same thing with frames in a movie to take a regular speed GoPro-like video file and convert it into the kind of high-speed video file that would typically require a $140K high-speed camera.

This powerful ability to, relatively inexpensively, create and modify images in photo-realistic ways will, I believe, fundamentally change the TV and movie industry. We should see increased interest in old shows and movies as they are updated to new digital standards and movies created from scratch which are both less expensive to create and more realistic to watch. Of course, it won’t fix issues with the scripts and editing (two areas that could also use some AI help), but the quality and amount of video content should increase substantially as a result of this.

I am assuming Intel Xe will have these two capabilities.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Some testing results on the Optane DIMMs as volatile memory or persistent storage using Cascade Lake Xeon:

https://www.nextplatform.com/2019/03/18/researchers-scrutinize-optane-memory-performance/

Volatile memory results:

Optane-DC-comparative-performance-as-memory.png


Persistent storage results:

Optane-DC-comparative-performance-as-storage.png


The bottom line is that for applications that require a larger memory footprint than can be supplied by DRAM, Optane DC has clear potential, with cost becoming a deciding factor. The situation is more complicated when using Optane DC as an in-memory storage device replacement, since results appear to be more sensitive to application type and the degree of software optimization that the customer is willing to pursue.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Intel Optane, Xeon and Xe to power DOE Aurora Supercomputer:

https://www.tomshardware.com/news/intel-exascale-aurora-supercomputer-xe-graphics,38851.html

Aurora also comes armed with "a future generation" of Intel's Optane DC Persistent Memory using 3D XPoint that can be addressed as either storage or memory. This marks the first known implementation of Intel's new memory in a supercomputer-class system, but it isn't clear if the reference to a future generation implies an upcoming version of the Optane variant beyond Intel's first-gen DIMMs.

aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS84L0QvODI4NDQ1L29yaWdpbmFsL0V4YXNjYWxlLVVwZGF0ZS1QcmVzcy1BbmFseXN0LURlY2stMy0xNS0xOS1GSU5BTC1wYWdlLTAwMi5qcGc=


aHR0cDovL21lZGlhLmJlc3RvZm1pY3JvLmNvbS84L0UvODI4NDQ2L29yaWdpbmFsL0V4YXNjYWxlLVVwZGF0ZS1QcmVzcy1BbmFseXN0LURlY2stMy0xNS0xOS1GSU5BTC1wYWdlLTAwMy5qcGc=
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Neither Intel nor the DOE would comment on what GPU form factor will make an appearance in the new supercomputer, but logically we could expect these to be Intel's discrete graphics cards.

Maybe instead of Xe dGPU connected over PCIe x16 the Xe GPU will connect to the Xeon CPU over EMIB so the Xe GPU will have full speed access to the Optane DIMMs.

Maybe something that looks like this. (i.e. Xeon CPU chiplet(s) in the center with Intel Xe GPU chiplets (each with HBM stacks) to the right and left)

AMD.EHP_.png


On either side of the CPU clusters are a total of four GPU clusters, each consisting of two GPU chiplets on a respective active interposer. Upon each GPU chiplet is a 3D stack of DRAM (e.g., some futuregeneration of JEDEC high-bandwidth memory (HBM) [12]).The DRAM is directly stacked on the GPU chiplets to maximize bandwidth (the GPUs are expected to provide the peak computational throughput) while minimizing memory-related data movement energy and total package footprint. CPU computations tend to be more latency sensitive, and so the central placement of the CPU cores reduces NUMA-like effects by keeping the CPU-to-DRAM distance relatively uniform.
 
Last edited:
  • Like
Reactions: VirtualLarry

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I want to keep the thread for about client, so I'll limit DIMM talks for now.

Maybe instead of Xe dGPU connected over PCIe x16 the Xe GPU will connect to the Xeon CPU over EMIB so the Xe GPU will have full speed access to the Optane DIMMs.

The Aurora system is announced to use the just announced CXL interconnect. That's actually the perfect fit for CPU to GPU.

Optane on EMIB is still unrealistic, because EMIB is meant for very high bandwidth connections like HBM.

The patent might make sense for even faster NV memories like ReRAM or MRAM, if it ever comes to fruition.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
Update on Optane Memory M15: Up to 64GB capacity, specs same at Up to 2.1GB/s Read, 1GB/s Write

IntelUser2000, M15 is available in up to 128GB.....but if 64GB M15 can do 1GB/s write then Intel must have moved on to smaller dies for increased parallelism (reason: the current Optane dies are only good for ~150 MB/s Sequential write each).

P.S. If 64GB can do 1GB/s write then what can 128GB do? 1GB/s (ie, limited by controller) or 2GB/s?
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
(reason: the current Optane dies are only good for ~150 MB/s Sequential write each).

The 128GB doesn't do any better.

Optane SSDs can do at least 300MB/s. It's only the Optane Memory line that's limited.

It's same as with DRAM. Bandwidth is a function of frequency, and width. Clock it faster, or connect more data wires.

IntelUser2000, M15 is available in up to 128GB

Hence, the update. :)

I don't think the 128GB makes sense anyway. For twice the price, you could get it from the 900P series that are far faster. The 800P 118GB is probably selling at few thousand units per year at best.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
You mean 300 MB/s per channel right?

Yes.

Optane M10 (and 800p) is also 300 MB/s per channel.

The 16 and 32GB versions are rated identically to the Optane Memory.

There's no TDP difference between the 16GB and the 32GB in the original version, while there's quite a bit of difference on the M10, leading me to believe the lower bandwidth might have been due to power as well, and 145MB/s per channel being a power-related limit.

If you look at this review: https://techreport.com/review/33338...8-gb-and-118-gb-solid-state-drives-reviewed/3

There's an indication that its throttling at some points. If there's zero active power management, TDP levels have to be set conservatively, since it'll run at that speed all that time. If active power management exists, it can be made to run faster in all but the worst case scenarios.

Further data based on the M.2 P4801X: https://ark.intel.com/content/www/us/en/ark/compare.html?productIds=149364,149365,149367,149366

Capped write speeds are likely related to TDP limits.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
The 16 and 32GB versions are rated identically to the Optane Memory.

Well yeah, 32GB is identical and the 16GB is very very close (almost identical). The main difference is active power.

M10 16GB is 150 MB/s (2.0W active power) vs. Original Optane memory 16GB 145 MB/s (3.5W active power)
M10 32GB is 290 MB/s (2.5W active power) vs. Original Optane memory 32GB 290 MB/s (3.5W active power)



There's no TDP difference between the 16GB and the 32GB in the original version, while there's quite a bit of difference on the M10, leading me to believe the lower bandwidth might have been due to power as well, and 145MB/s per channel being a power-related limit.

If you look at this review: https://techreport.com/review/33338...8-gb-and-118-gb-solid-state-drives-reviewed/3

There's an indication that its throttling at some points. If there's zero active power management, TDP levels have to be set conservatively, since it'll run at that speed all that time. If active power management exists, it can be made to run faster in all but the worst case scenarios.

Further data based on the M.2 P4801X: https://ark.intel.com/content/www/us/en/ark/compare.html?productIds=149364,149365,149367,149366

Capped write speeds are likely related to TDP limits.

If looking at the 64GB M10 it has 3.25W TDP (this, in contrast, to the TDP of 2.5W and 2W for 32GB M10 and 16GB M10 respectively) and it is also the fastest per die at 160 MB/s write (reason it has 4 dies and 640 MB/s write). The 16GB M10 has 150 MB/s write and the 32GB M10 has 290 MB/s write (or 145 MB/s per die)

So definitely the 16GB M10 and 32GB M10 are limited in write per die compared to the 64GB M10. With the 64GB M10 being able to go faster per die (for at least some period) due to higher TDP.

But getting back to performance per die it really does look to be not much greater than 150 MB/s....at least at the TDPs Intel is willing to supply.

Remember this data---> https://forums.anandtech.com/thread...rrent-and-future.2525180/page-3#post-39351786

Thinking about how poorly write scales at lower power I can't imagine it doing that much better at higher power.

With that noted, I would think future lithography shrinks should reduce power to write. (The following paper looks into some of the challenges and explains some of the benefits of scaling for Phase change memory--> https://www.researchgate.net/public...ltimate_scaling_limits_of_phase-change_memory)

PCMs offer unique opportunities for low-power data storage, and appear to be among the few nonvolatile memory candidates whose properties (e.g. energy consumption) signifi-
cantly improve with downscaling of the memory cells.

The switching time and energy consumption of the PCM cell are dependent on cell dimensions. Scaling PCM cells to nanometer dimensions decreases the thermal time constant as well as the heat capacity of the cell. Therefore, PCM devices improve with scaling in terms of energy consumption and switching speed. Nonetheless, the temperature rise, electro-thermal properties, and thermoelectric effects of nanometer-scale PCM cells have only been recently investigated.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Thinking about how poorly write scales at lower power I can't imagine it doing that much better at higher power.

That's funny, because the 900P/905P already does. M15 likely will do 50% better at same power.

Billy's numbers aren't representative, because the device is throttling to achieve lower power, while the manufacturer can do other things(like binning). It's like comparing desktop vs laptop chips. You can't.

But getting back to performance per die it really does look to be not much greater than 150 MB/s....at least at the TDPs Intel is willing to supply.

Remember how you were talking about using even smaller capacities(maybe 8GB?) to get more channels thus more bandwidth? If its power limited then you won't get much higher bandwidth. Conversely, if its not power limited, you get 905P.

The most important thing about 1st gen vs M10 is that the latter doubles the bandwidth at same power, simply because it uses active power management. Hence you see weird throttling on the M10/800P drives.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
900p 280GB has 21 Optane dies and write speed of 2000 MB/s.

Still 7-channels, thus 300MB/s. That's why 900P/905P has die counts in the multiples of 7. Channels matter, not dies. Dies per channel matter for NAND, because they are fundamentally slow.

By the way, I think Anandtech tests can be better. Their burst QD1 numbers aren't shown anywhere and way too optimistic.
 
Last edited:

cbn

Lifer
Mar 27, 2009
12,968
221
106
Still 7-channels, thus 300MB/s. Channels matter, not dies.

Channels are the limiting factor.....but there also needs to be enough Gen 1 dies to fill the 300 MB/s capacity of each channel.

The current 16GB and 32GB Optane memory don't do this because each Gen 1 die only provides ~150 MB/s write.

So in order to saturate the current dual channel M10 controller (also used in the 800p) there needs to be at least four Gen 1 dies. The only Optane memory (or SSD) using that many dies with that dual channel controller is the 64GB M10 Memory or the 58GB 800p. (118GB 800p also saturates because it has eight dies, but doesn't improve write any further due to the limitation of the controller which is 2 channels x ~300 MB/s per channel)