Discussion Optane Client product current and future

cbn · Jun 5, 2018

IntelUser2000 said:
Do you have a certain usage scenario in mind, or are you looking for a general low end system?

A good Laptop for browsing that is lightweight with a big screen (using Intel Low Power display technology) and has long battery life.

Google has made high end Browser laptops for years, but I really do hope we see the Linux distros get involved as well. (Linux distro = more privacy than Google).

So with that noted, I do wonder what kind of power consumption there would be for various combinations of DDR4 vs. LPDDR4 vs. DDR Optane....at idle and with a bunch of (mostly) static webpages open? (My speculation is that the Optane persistent memory would be far better on power consumption than the DRAM for this type of work)

EDIT: It could even be some kind of wearable device (eg, augmented reality glasses for browsing also using Intel Low Power display technology). In fact, Mozilla has a new browser for this.

We believe that the future of the web will be heavily intertwined with virtual and augmented reality, and that future will live through browsers. That’s why we’re building Firefox Reality, a new kind of web browser that has been designed from the ground up to work on stand-alone virtual and augmented reality (or mixed reality) headsets.

cbn · Jun 7, 2018

Tom's hardware predicts Optane NVMe + 3D QLC NVME:

https://www.tomshardware.com/news/intel-optane-memory-qlc-cache,37223.html

The next shoe to drop could involve adding QLC-based SSDs to a bundle with next generation chipset components that allow Optane Memory to cache NVMe devices. The rumored Intel 660p QLC NVMe SSD looks like the leading candidate for this to bear fruit. The leaked deck shows capacities between 512GB and 2TB but QLC, will need a nudge to gain acceptance due to reliability and endurance concerns.

Sitting behind high-endurance Optane Memory (cache) will remove any immediate concerns regarding QLC's endurance. A bundle that gives users a free or very low cost 512GB boot drive will be difficult to pass over even for the most adamant Optane and QLC naysayers.

My biggest concern with 3D QLC (like it was with planar TLC) is data retention.

P.S. I am assuming the author is thinking about 16GB Optane paired with 512GB 3D QLC NVMe at the lowend (but I believe the 16GB Optane 4K QD1 write will be much slower* than 512GB 3D QLC NVMe. Sequential write will definitely be much slower).

*The same amount of Optane (16GB) as DDR interface would change this relationship to 16GB Optane 4K QD1 write >> 512GB 3D QLC NVMe 4K QD1 write. (I am concerned though how long till we see such a low end configuration). Could it be that one way to speed adoption of low end DDR interface Optane is to push more than four bits per cell into 3D NAND. This or more aggressive than usual lithography shrinks + (possibly) less planes per die? (And this is one reason why Intel is diverging from Micron for 3D NAND development?)

IntelUser2000 · Jun 8, 2018

cbn said:
My biggest concern with 3D QLC (like it was with planar TLC) is data retention.

Meh, QLC SSDs aren't trailblazers in sequential performance anyway. Plus, you get the random speed advantage. But yes, a 32GB makes more sense because 145MB/s is too low in any matter.

*The same amount of Optane (16GB) as DDR interface would change this relationship to 16GB Optane 4K QD1 write >> 512GB 3D QLC NVMe 4K QD1 write. (I am concerned though how long till we see such a low end configuration).

Yes, but who's going to do that? You buy this fancy new tech, which will be priced as such and you cache your lowly storage. No, you'll take that and aim it for being the only memory, like on a low-end system like $199 Chromebook/Stripped down Windows laptops, or you'll use that as persistent memory(and memory means system RAM).

Since no sane company will put such a big change in tech and aim it straight at the market that cares primarily about cost, only the latter will make sense for the next decade. Or two. This goes the same for potential 3D XPoint competitors.

The reason is you need it as a system memory for the insane QD1 random performance, or sequential bandwidth to really matter. Storage is not used for compute, and only helps on loading times, and even then when its not limited by other factors, like being on ancient code. This is why NVMe isn't that big of a deal over SATA.

Once its on a DIMM slot and its corresponding bus, then the system and OS can treat it as such and you can use it for real benefits. Like the hypothetical near-instant boot and load time future I talk about. NV memories are exciting because it opens the future for that. Forget about faster CPUs that are at best, 50% better than last generation, or 2x the cores. If you want real impact, having a computer that reduces all loading time to nearly zero is the real next gen. I think those NV memories represent the "Real 640K", where for most people they really won't need another computer until it breaks down.

Billy Tallis · Jun 8, 2018

cbn said:
The same amount of Optane (16GB) as DDR interface would change this relationship to 16GB Optane 4K QD1 write >> 512GB 3D QLC NVMe 4K QD1 write.

Not going to happen. We won't see Optane DIMMs smaller than 128GB unless Intel starts making much smaller 3D XPoint dies.

cbn · Jun 8, 2018

cbn said:
The same amount of Optane (16GB) as DDR interface would change this relationship to 16GB Optane 4K QD1 write >> 512GB 3D QLC NVMe 4K QD1 write.

Billy Tallis said:
Not going to happen. We won't see Optane DIMMs smaller than 128GB unless Intel starts making much smaller 3D XPoint dies.

I was actually thinking of the 16GB DDR interface Optane as a soldered (onto the motherboard) package in the same way we sometimes see DDR3/4 DRAM soldered on the motherboard as package rather than a SO-DIMM.

cbn · Jun 9, 2018

cbn said:
*The same amount of Optane (16GB) as DDR interface would change this relationship to 16GB Optane 4K QD1 write >> 512GB 3D QLC NVMe 4K QD1 write. (I am concerned though how long till we see such a low end configuration).

IntelUser2000 said:
Yes, but who's going to do that? You buy this fancy new tech, which will be priced as such and you cache your lowly storage. No, you'll take that and aim it for being the only memory, like on a low-end system like $199 Chromebook/Stripped down Windows laptops, or you'll use that as persistent memory(and memory means system RAM).

Since no sane company will put such a big change in tech and aim it straight at the market that cares primarily about cost, only the latter will make sense for the next decade. Or two. This goes the same for potential 3D XPoint competitors.

I was thinking of DDR interface Optane being used in addition to DRAM (see post #198 and #199). And if used in addition to DRAM it is not big change. All that would would happen is that the soldered on (possibly Intel basebopard strategy?) DDR Optane would be used for swap/OS (and cache for higher end Chromebooks and Linux notebooks* that also come wiith 2.5" HDD, eMMC or 3D QLC NVMe)

*now beginning to show up with the option for whole system automatic updates (a feature that I think will help Linux distros gain mainstream acceptance. For example, I know there were a lot of people on these forums recommending ChromeOS over Windows because it was far less likely to get a virus and was seen as maintenance free....so having the automatic updates in a community distro like like Linux Mint is very welcome because it opens up the possibility of an 1:1 alternative to ChromeOS.)

cbn · Jun 9, 2018

Something else I am wondering about soldered on (or DIMM) DDR interface Optane:

How well could it work with a DRAM-less 3D QLC NVMe SSD as host memory buffer?

cbn · Jun 13, 2018

Here is a PDF from 2009 on Phase change memory by Numonyx (acquired by Micron in 2010) :

http://www.pdl.cmu.edu/SDI/2009/slides/Numonyx.pdf

Pretty interesting that even as far back as 2009 they had the read latency only 2x as large as DRAM (is this an example of fast access deck design*), but I wonder how far that latency decrease could progress if Intel used cutting edge lithography?

*endurance was higher than 100,000 cycles according to other info in the deck though

https://www.pcper.com/reviews/Edito...y-Works/Selectors-Scaleability-and-Conclusion

We’re all used to die shrinks leading to lower endurance and other scaling problems for NAND flash memory, but the case is actually the opposite for PCM devices. Since the GST material is resistive and has mass that must be heated while also being thermally insulated from its surroundings, the smaller the better! As a matter of fact, one of the main things holding PCM technology back all of these years was that the required program currents were too high to be practical in a high-density device. Smaller cells translate to lower resistances across smaller areas, meaning less power needed to heat the smaller space, and therefore lower voltages and currents are needed to program a given cell. There is technically room to shrink further, but the materials science is extremely complex to even get perfectly segmented GST layers on a standard silicon wafer in the first place, so let’s be happy Intel and Micron have succeeded with a 20nm pitch for now.

cbn · Jun 13, 2018

Memory Scaling is Dead, Long Live Memory scaling (September 2014):

https://hps.ece.utexas.edu/yale75/qureshi_slides.pdf

So DRAM + PCM on NVDIMM-P or together in some arrangement (Eg, HBM (DRAM) stacks + Optane DIMMs.

However, I just to think Intel is aiming to completely DRAM*. (Maybe only SRAM + 3DXPoint (and other phase change memories) remain at some point?)

*With eDRAM perhaps being the last DRAM to eventually survive on Intel processors.

IntelUser2000 · Jun 13, 2018

Intel's future is also HBM. eDRAM was a stepping stone until their HBM implementations are ready.

(I just think maybe, you are posting too many times without anyone's response?)

cbn · Jun 13, 2018

IntelUser2000 said:
Intel's future is also HBM. eDRAM was a stepping stone until their HBM implementations are ready.

Thanks for pointing that out.

So (assuming PCM scaling works out) eventually processors with planar SRAM and on die HBM stacks using DRAM (for the bottom layers) and 3DXPoint (as the topmost layers) replaced with processors with 3D SRAM (processor cache) using on die HBM stacks comprised entirely of 3DXPoint?

nosirrahx · Jun 14, 2018

I did some Optane and VROC testing. The top pic is intentionally pre-patch OS and BIOS, the bottom has the OS and BIOS fully updated. The left is the 905P, the right is 4 way VROC 0 900P.

IntelUser2000 · Jun 14, 2018

That's a substantial drop in performance. In some ways, Optane is impacted most. So its probably very important that Cascade Lake server has some hardware Meltdown/Spectre mitigations.

CDM I think also underrates the impact of the Meltdown patch. The difference is greatest on random read/writes, where Optane is supposed to have the massive advantage.

Insert_Nickname · Jun 14, 2018

nosirrahx said:
I did some Optane and VROC testing.

Ouch. That should remove all doubts about the Spectre/Meltdown performance hit. Thanks for the demonstration.

thecoolnessrune · Jun 14, 2018

And this doesn't even include the "optional" speculative store bypass mitigation microcode that Intel should be releasing at the end of the month. Can't stop the pain train!

Brahmzy · Jun 14, 2018

Great tests - thanks. This mirrors my smelt down / 900p benchmarking exactly as well.
Pretty disappointing about the performance drops.

I’m not convinced Cascade and future chips are suddenly going to restore previous performance. I think people are misunderstanding what intel has to/is going to do there. I think to some degree, that level of performance is forever lost with intel chips. The way they may get it back is through slow architecture refreshes, process and die shrinks.
Cascade Lake isn’t going to suddenly restore all of that. Intel has had to “close the holes” that made that performance possible.

nosirrahx · Jun 14, 2018

Brahmzy said:
I’m not convinced Cascade and future chips are suddenly going to restore previous performance
Cascade Lake isn’t going to suddenly restore all of that. Intel has had to “close the holes” that made that performance possible.

Its going to help but you are right, this is a fundamental part of x86. What we need is a new platform entirely, one built with a lot more attention to security from the ground up.

What I see happening is a new platform that when released has the performance to emulate the current x86 at a level that makes migration to it painless.

The user would be able to run old code and new code on the same platform and the OS would simply emulate x86 when needed.

Insert_Nickname said:
Ouch. That should remove all doubts about the Spectre/Meltdown performance hit. Thanks for the demonstration.

Not only that but this clearly demonstrates that Optane makes very heavy use of speculative execution.

thecoolnessrune said:
And this doesn't even include the "optional" speculative store bypass mitigation microcode that Intel should be releasing at the end of the month.

We will likely see both microcode and OS updates all year, perhaps even a few next year as well. Now that attention is on this attack vector there will be a lot of researchers trying all kinds of creative ways of manipulating data to see what access they can gain, this is far from over.

thecoolnessrune · Jun 14, 2018

nosirrahx said:
Its going to help but you are right, this is a fundamental part of x86. What we need is a new platform entirely, one built with a lot more attention to security from the ground up.

What I see happening is a new platform that when released has the performance to emulate the current x86 at a level that makes migration to it painless.

The user would be able to run old code and new code on the same platform and the OS would simply emulate x86 when needed.

I can't see x86 holding on into eternity, but at the same time migration off the platform has been discussed for literally decades and I haven't seen a solid foot yet. PowerVM Lx86, Itanium, and overall large emulation sub-projects under projects like QEMU have certainly extended an olive branch, but nothing has come close to being a performant offering. And Intel openly threatening Qualcomm and Microsoft last year at the mere mention of running standard Windows programs on ARM (which would no doubt involve some sort of emulation), it's obvious that getting into the next architecture with backwards compatibility will require legal battles with behemoths, which will more than likely drag on for at least 5 years if not longer.

nosirrahx said:
Not only that but this clearly demonstrates that Optane makes very heavy use of speculative execution.

What I'm going to say is that the benchmarks you're running on Windows 10 leverage code that takes advantage of speculative execution that there is now more of a barrier against in regards to system calls. Other benchmarks, or software may not be as affected, depending on how close to root they are. For instance, VMWare's vSAN product promises to not be heavily affected, because its Storage Calls do not have to be made to the system. It's all going to depend on the software in use.

nosirrahx said:
We will likely see both microcode and OS updates all year, perhaps even a few next year as well. Now that attention is on this attack vector there will be a lot of researchers trying all kinds of creative ways of manipulating data to see what access they can gain, this is far from over.

I agree, we'll be dealing with this steadily for a long time to come, hopefully in less and less severe ways as the big targets are mitigated.

nosirrahx · Jun 14, 2018

thecoolnessrune said:
I can't see x86 holding on into eternity, but at the same time migration off the platform has been discussed for literally decades and I haven't seen a solid foot yet.

I don't see migration ever happening unless it involves bridging technology that lets people use all of their existing "suff" on a new platform.

As the new "stuff" created specifically for the new platform makes huge performance improvements adoption will pick up.

Intel likely is working on this in some capacity. The last thing they want is another competing technology, it is better to create your own replacement than to let someone else do that.

IntelUser2000 · Jun 14, 2018

thecoolnessrune said:
PowerVM Lx86, Itanium, and overall large emulation sub-projects under projects like QEMU have certainly extended an olive branch, but nothing has come close to being a performant offering.

It isn't just about performance. Emulation/code translation has compatibility issues. Granted looking at percentages it might look minor, but people care about the corner case scenarios. That's because there are lots of old code out there that require it working *exactly* and emulation doesn't provide that.

Its going to help but you are right, this is a fundamental part of x86. What we need is a new platform entirely, one built with a lot more attention to security from the ground up.

I don't think so. Eventually they'll get it. Remember, Meltdown exists in some ARM chips, but doesn't exist on AMD. Spectre impacts pretty much everyone.

PC impact seems negligible, but there are cases in server that can be significant: https://www.nextplatform.com/2018/01/30/reckoning-spectre-meltdown-performance-hit-hpc/

Here it shows in some benchmarks even the 950 Pro is impacted in a large way.
https://www.techspot.com/article/1556-meltdown-and-spectre-cpu-performance-windows/

cbn · Jun 16, 2018

Regarding the talk of Spectre and Meltdown......if Intel ends up closing the memory gap then the need for speculative execution could be eliminated.

Some related links for those interested:

http://personals.ac.upc.edu/mpajuelo/papers/MEDEA04.pdf

https://www.extremetech.com/computi...m-is-a-major-roadblock-to-exascale-and-beyond

In every case — and in a remarkably consistent fashion — latency improved by 20-30% in the same time that it took bandwidth to double. This problem is one we’ve been dealing with for decades — it’s been addressed via branch prediction, instruction sets, and ever-expanding caches.

IntelUser2000 · Jun 16, 2018

(Originally, I wanted to focus this thread on being about Optane on the client side. Lack of interest likely due to unfortunate pricing and relative newness of the technology seems to push the thread towards all things Optane. I wanted it similar to the old Atom thread I had years ago where all information was consolidated into one. I guess this works too.)

That being said, caches will always be faster than system memory. The reason is, like everything else in computing it can be summarized into "Laws of Physics". SRAM is very fast, not only because the technology is fast, but because the size is small. Large capacity means more distance for the electrons to travel meaning latency increases.

Not sure if its even remotely likely the gap can be mostly gone, nevermind eliminated. As always, if you can make a capacious system fast, the smaller capacity one can be made even faster.

cbn · Jun 16, 2018

IntelUser2000 said:
Not sure if its even remotely likely the gap can be mostly gone, nevermind eliminated.

According to the following the potential does exist:

https://news.stanford.edu/press/view/9468

Silicon chips can store data in billionths of a second, but phase-change memory could be 1,000 times faster, while using less energy and requiring less space.

Silicon Chips = RAM

https://arxiv.org/ftp/arxiv/papers/1602/1602.01885.pdf

Of course, reaching that full potential might take some time.

cbn · Jun 24, 2018

Looking at the teardown of Apple Macbook Retina 2017:

https://www.ifixit.com/Teardown/Retina+MacBook+2017+Teardown/92172

I wonder if we would eventually see the two NAND packages replaced with one high layer count (possibly lithography shrunk as well) 3D QLC NAND package and one Optane package? (This tied together into a single volume with Apple Fusion drive)

Same thing goes for AMD laptops if they get AMD StoreMI (ie, Enmotus FuzeDrive).

nosirrahx · Jun 24, 2018

cbn said:
I wonder if we would eventually see the two NAND packages replaced with one high layer count (possibly lithography shrunk as well) 3D QLC NAND package and one Optane package? (This tied together into a single volume with Apple Fusion drive)

Same thing goes for AMD laptops if they get AMD StoreMI (ie, Enmotus FuzeDrive).

I know that the Optane caching software is aware of file types and intentionally balances file level and block level caching to greatly improve performance while avoiding the caching of data that does not need accelerating.

Would Apple and AMD be able to create their own software that caches to Optane in the same way or does Optane have some sort of "secret sauce" that only Intel has access to?

In the case of Apple specifically the CPU, RAM, SSD and Optane could all be identical. Would Apple be able to engineer caching software on iOS that attains identical performance to the same hardware with Windows installed?

You are absolutely right about the single volume though, that is a very desirable feature for a lot of people. When I did IT a lot of people hated concepts like 'data drives' and multiple drive letters in general.

Discussion Optane Client product current and future

Lifer

Lifer

Elite Member

Senior member

Lifer

Lifer

Lifer

Lifer

Lifer

Elite Member

Lifer

Senior member

Elite Member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Elite Member

Lifer

Elite Member

Lifer

Lifer

Senior member