Updated Knights Landing (KNL) Info.

ShintaiDK · Jun 24, 2014

http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html

Its still on track to be the first stacked DRAM product we gonna see. dGPUs may follow sometime in 2016 or later.

The silvermont cores got extended with 512bit AVX3.2 support and 4 threads per core. No TSX support in the cores, else they should be completely compatible with any instructions on the CPUs at the time.

NTMBK · Jun 24, 2014

Interesting that they have partnered with Micron- I expected Intel to fab the on-package memory themselves.

Looking like a very nice product- full binary compatibility means that the x86 ISA will actually have a point. The ball's in NVidia's court now, if they don't want to lose Tesla customers they need to get stacked RAM out fast.

AtenRa · Jun 24, 2014

ShintaiDK said:
http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html

Its still on track to be the first stacked DRAM product we gonna see. dGPUs may follow sometime in 2016 or later.

The silvermont cores got extended with 512bit AVX3.2 support and 4 threads per core. No TSX support in the cores, else they should be completely compatible with any instructions on the CPUs at the time.

It clearly says On-package, this is the same thing as Iris-Pro. Stacked Ram is On-Die.

NTMBK · Jun 24, 2014

AtenRa said:
It clearly says On-package, this is the same thing as Iris-Pro. Stacked Ram is On-Die.

Not necessarily. You can have a stack which is on-package, next to the main CPU/GPU die. This is exactly the same as what NVidia are doing in Pascal (and what AMD will probably do too).

ShintaiDK · Jun 24, 2014

AtenRa said:
It clearly says On-package, this is the same thing as Iris-Pro. Stacked Ram is On-Die.

Hynix HBM is portraited the exact same way. Visual example:

Green=Package.

Not sure where you got the assumption that stacked DRAM had to be ondie.

As as said by NTMBK, nVidia Pascal prototype with on package stacked DRAM as well:

AtenRa · Jun 24, 2014

The Intel and the the two above (HBM and NVIDIA) are On Interposer.
Global Foundries TSV(Through Silicon Vias) is 3D Stacked On-Die.

Edit: You can also see that HBM Memory Chips are also 3D Stacked (one layer upon the other)

jdubs03 · Jun 24, 2014

Its interesting that the uArch is silvermont/airmont? at 14nm, rather than Goldmont, which I expect to be significantly higher performing, maybe that uArch is for the 2017 product at 10nm.

ShintaiDK · Jun 24, 2014

AtenRa said:
The Intel and the the two above (HBM and NVIDIA) are On Interposer.
Global Foundries TSV(Through Silicon Vias) is 3D Stacked On-Die.

Edit: You can also see that HBM Memory Chips are also 3D Stacked (one layer upon the other)

You seem to confused Interposer (2.5D) and Vertical stacking (3D) with 3D memory stacking. And 3D memory stacking isnt new.

For high performance devices they will use 2.5D due to the thermal issue.

And the reason why its not coming anytime soon(size):

ShintaiDK · Jun 24, 2014

jdubs03 said:
Its interesting that the uArch is silvermont/airmont? at 14nm, rather than Goldmont, which I expect to be significantly higher performing, maybe that uArch is for the 2017 product at 10nm.

Remember its a modified silvermont. AVX3.2, 4 threads per core etc.

NostaSeronx · Jun 24, 2014

ShintaiDK said:
And the reason why its not coming anytime soon(size):

AMD has an exclusivity agreement for the 4 Gb/8 Gb 2y-nm 4-Hi stack sizes. TSMC will have HBM and HMCC support in place by Q4 2014.

witeken · Jun 24, 2014

Knights Landing Details RealWorldTech

2nd half of 2015... could anyone give me an estimate on how competitive this will be?

Nothingness · Jun 24, 2014

jdubs03 said:
Its interesting that the uArch is silvermont/airmont? at 14nm, rather than Goldmont, which I expect to be significantly higher performing, maybe that uArch is for the 2017 product at 10nm.

It doesn't matter: what matters are the wide vector units and the multiple threads. It also means they probably didn't keep much from Silvermont, adding SMT to a processor requires many changes all around the place. The interconnect is also surely vastly different. And the memory controllers have nothing to do.

I guess calling it a "Silvermont Arch" core means little except that it's a two way superscalar with OoOE core and also that it's low power.

AtenRa · Jun 24, 2014

Yes my bad i forgot to mention i was talking about 3D Stack.
What i was trying to point out is that Haswell Crystalwell was the first commercial x86 CPU with On-Package Memory. KNL will also use the same technology.

mavere · Jun 24, 2014

They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

I'm guessing that's why HT is now at 4 threads/core even though one wouldn't think there'd be enough resources to support all 4 threads well. The extra threads would hide how each ns of downtime now mean more lost potential FLOPS for a task.

NostaSeronx · Jun 24, 2014

mavere said:
They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

https://www.youtube.com/watch?v=Jc6B0EZKUEU

Nothingness · Jun 24, 2014

mavere said:
They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

I'm guessing that's why HT is now at 4 threads/core even though one wouldn't think there'd be enough resources to support all 4 threads well. The extra threads would hide how each ns of downtime now mean more lost potential FLOPS for a task.

The 4 threads already exists on the current Xeon Phi and their use is more to hide instruction latency than external (or on package) memory. See this article for instance.

Also note that memory latency depends a lot on where your core is located due to the memory controllers and the cores being on a ring bus. Getting the most out of such an architecture surely is difficult

Hans de Vries · Jun 24, 2014

ShintaiDK said:
http://vr-zone.com/articles/intel-unveils-knights-landing/79686.html

Its still on track to be the first stacked DRAM product we gonna see. dGPUs may follow sometime in 2016 or later.

The HMC interface didn't work out so they are using DDR4 channels for the time being:

http://www.eetimes.com/document.asp?doc_id=1322855&

ShintaiDK · Jun 24, 2014

Hans de Vries said:
The HMC interface didn't work out so they are using DDR4 channels for the time being:

http://www.eetimes.com/document.asp?doc_id=1322855&

You may want to read your linked article again. It works exactly as it should. (DDR4 is regular slotted mobo DDR4 DIMMs.)

The HMC serves as a big cache.

ShintaiDK · Jun 24, 2014

Intel press release about it:
http://newsroom.intel.com/community...building-block-for-high-performance-computing

Also included the new Omni Scale Fabric that is integrated into KNL and future 14nm Xeons (Skylake).

Homeles · Jun 24, 2014

NTMBK said:
Interesting that they have partnered with Micron- I expected Intel to fab the on-package memory themselves.

Eh? Intel announced that they partnered with Micron for HMC years ago.

mavere said:
They didn't mention latency in their otherwise extremely enthusiastic promotional deck, so I'm going to assume that things are as "bad" as GDDR5 or worse.

I'm guessing that's why HT is now at 4 threads/core even though one wouldn't think there'd be enough resources to support all 4 threads well. The extra threads would hide how each ns of downtime now mean more lost potential FLOPS for a task.

That's an odd conclusion to come to, given that the memory moves closer to memory controllers.

NostaSeronx said:
AMD has an exclusivity agreement for the 4 Gb/8 Gb 2y-nm 4-Hi stack sizes. TSMC will have HBM and HMCC support in place by Q4 2014.

Do you have a source for that? I'd be interested in seeing it. Fits in pretty well with Nvidia's Pascal release date.

ShintaiDK · Jun 24, 2014

Not to mention joint vetures like the IMFT that they founded in 2006.

Idontcare · Jun 24, 2014

Will be interesting to see in what way this technology branch evolves and filters down into mainstream CPU platforms.

BrightCandle · Jun 24, 2014

I have a lot of optimism that this is the right route for parallel computing for the future, its just more widely applicable to the wider software market than GPU architecture.

Cerb · Jun 24, 2014

Homeles said:
That's an odd conclusion to come to, given that the memory moves closer to memory controllers.

The distance is just one part of the latency. There's also the latency of addressing lines in the arrays inside the memory chips. GDD5 has not great at that.

However, with so many threads, plus decent amounts of cache, RAM latency is not likely a high priority.

DrMrLordX · Jun 24, 2014

So, according to the slides, this chip isn't restricted to only PCI-e slots. Any chance we'll see it (or a successor) sharing the QPI ring bus with "normal" Xeons?

Updated Knights Landing (KNL) Info.

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Golden Member

Lifer

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Member

Diamond Member

Diamond Member

Senior member

Lifer

Lifer

Platinum Member

Lifer

Elite Member

Diamond Member

Elite Member

Lifer