• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Updated Knights Landing (KNL) Info.

ShintaiDK

Lifer
http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html

211.jpg

212.jpg


Its still on track to be the first stacked DRAM product we gonna see. dGPUs may follow sometime in 2016 or later.

The silvermont cores got extended with 512bit AVX3.2 support and 4 threads per core. No TSX support in the cores, else they should be completely compatible with any instructions on the CPUs at the time.
 
Interesting that they have partnered with Micron- I expected Intel to fab the on-package memory themselves.

Looking like a very nice product- full binary compatibility means that the x86 ISA will actually have a point. The ball's in NVidia's court now, if they don't want to lose Tesla customers they need to get stacked RAM out fast.
 
http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html

211.jpg

212.jpg


Its still on track to be the first stacked DRAM product we gonna see. dGPUs may follow sometime in 2016 or later.

The silvermont cores got extended with 512bit AVX3.2 support and 4 threads per core. No TSX support in the cores, else they should be completely compatible with any instructions on the CPUs at the time.

It clearly says On-package, this is the same thing as Iris-Pro. Stacked Ram is On-Die.
 
It clearly says On-package, this is the same thing as Iris-Pro. Stacked Ram is On-Die.

Not necessarily. You can have a stack which is on-package, next to the main CPU/GPU die. This is exactly the same as what NVidia are doing in Pascal (and what AMD will probably do too).
 
It clearly says On-package, this is the same thing as Iris-Pro. Stacked Ram is On-Die.

Hynix HBM is portraited the exact same way. Visual example:
HBM-SoC.jpg


Green=Package.

Not sure where you got the assumption that stacked DRAM had to be ondie.

As as said by NTMBK, nVidia Pascal prototype with on package stacked DRAM as well:

PascalBoard.jpg
 
Last edited:
The Intel and the the two above (HBM and NVIDIA) are On Interposer.
Global Foundries TSV(Through Silicon Vias) is 3D Stacked On-Die.

Edit: You can also see that HBM Memory Chips are also 3D Stacked (one layer upon the other)
 
Last edited:
Its interesting that the uArch is silvermont/airmont? at 14nm, rather than Goldmont, which I expect to be significantly higher performing, maybe that uArch is for the 2017 product at 10nm.
 
The Intel and the the two above (HBM and NVIDIA) are On Interposer.
Global Foundries TSV(Through Silicon Vias) is 3D Stacked On-Die.

Edit: You can also see that HBM Memory Chips are also 3D Stacked (one layer upon the other)

You seem to confused Interposer (2.5D) and Vertical stacking (3D) with 3D memory stacking. And 3D memory stacking isnt new.

HBM-Memory.JPG


For high performance devices they will use 2.5D due to the thermal issue.

And the reason why its not coming anytime soon(size):
TSV-Roadmap.png
 
Its interesting that the uArch is silvermont/airmont? at 14nm, rather than Goldmont, which I expect to be significantly higher performing, maybe that uArch is for the 2017 product at 10nm.

Remember its a modified silvermont. AVX3.2, 4 threads per core etc.
 
Its interesting that the uArch is silvermont/airmont? at 14nm, rather than Goldmont, which I expect to be significantly higher performing, maybe that uArch is for the 2017 product at 10nm.
It doesn't matter: what matters are the wide vector units and the multiple threads. It also means they probably didn't keep much from Silvermont, adding SMT to a processor requires many changes all around the place. The interconnect is also surely vastly different. And the memory controllers have nothing to do.

I guess calling it a "Silvermont Arch" core means little except that it's a two way superscalar with OoOE core and also that it's low power.
 
Yes my bad i forgot to mention i was talking about 3D Stack.
What i was trying to point out is that Haswell Crystalwell was the first commercial x86 CPU with On-Package Memory. KNL will also use the same technology.
 
They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

I'm guessing that's why HT is now at 4 threads/core even though one wouldn't think there'd be enough resources to support all 4 threads well. The extra threads would hide how each ns of downtime now mean more lost potential FLOPS for a task.
 
Last edited:
They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

I'm guessing that's why HT is now at 4 threads/core even though one wouldn't think there'd be enough resources to support all 4 threads well. The extra threads would hide how each ns of downtime now mean more lost potential FLOPS for a task.
The 4 threads already exists on the current Xeon Phi and their use is more to hide instruction latency than external (or on package) memory. See this article for instance.

Also note that memory latency depends a lot on where your core is located due to the memory controllers and the cores being on a ring bus. Getting the most out of such an architecture surely is difficult 🙂
 
Interesting that they have partnered with Micron- I expected Intel to fab the on-package memory themselves.
Eh? Intel announced that they partnered with Micron for HMC years ago.
They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

I'm guessing that's why HT is now at 4 threads/core even though one wouldn't think there'd be enough resources to support all 4 threads well. The extra threads would hide how each ns of downtime now mean more lost potential FLOPS for a task.
That's an odd conclusion to come to, given that the memory moves closer to memory controllers.
AMD has an exclusivity agreement for the 4 Gb/8 Gb 2y-nm 4-Hi stack sizes. TSMC will have HBM and HMCC support in place by Q4 2014.
Do you have a source for that? I'd be interested in seeing it. Fits in pretty well with Nvidia's Pascal release date.
 
Last edited:
I have a lot of optimism that this is the right route for parallel computing for the future, its just more widely applicable to the wider software market than GPU architecture.
 
That's an odd conclusion to come to, given that the memory moves closer to memory controllers.
The distance is just one part of the latency. There's also the latency of addressing lines in the arrays inside the memory chips. GDD5 has not great at that.

However, with so many threads, plus decent amounts of cache, RAM latency is not likely a high priority.
 
So, according to the slides, this chip isn't restricted to only PCI-e slots. Any chance we'll see it (or a successor) sharing the QPI ring bus with "normal" Xeons?
 
Back
Top