Intel Silently Releases Knights Mill

Dayman1225

Golden Member
Aug 14, 2017
1,152
974
146
Intel Knights Mill: Last generation of Larrabee Has started (German)

Ark Page Here
1-630.826301401.jpg
2-630.3355480353.jpg
3-630.694346072.jpg
4-630.3060552600.jpg
5-630.4095074313.jpg

7-630.1296261818.jpg
6-630.3943931501.jpg

DRWXZLJW4AAZBX4.jpg
 
  • Like
Reactions: Burpo

BigDaveX

Senior member
Jun 12, 2014
440
216
116
Interesting how much the diagram of Knight's Landing's execution core resembles a Bulldozer module. :D
 
  • Like
Reactions: amd6502

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Interesting how much the diagram of Knight's Landing's execution core resembles a Bulldozer module. :D
Yeah really goes back to the point that it's about execution and not the technology. Not only was BD AMD's key to increasing "core count" without having to replicated everything. I think there were hopes they would evolved to more compute units per module in future arch's based around the design (not the evolutions we saw). But I think like this is showing as they simplified and moved around what a CPU was and how to utilize it they could get to a point with future die shrinks and tech changes where CPU units and GPU units could become the same thing. All of sudden you would have one upgradable part that could handle both and in the sense of GPU's could continue to expand well past the growth of a CPU. Allowing AMD to challenge Intel in setting they hadn't been able to compete.

But they missed the important first step. You have to be competitive with what is already out there. Now I think Fusion in that sense is dead for now and now it's more about sharing resources.

Maybe if Intel moves KNL and its future offerings away from being a device that has to be specifically coded for and only used in a single setting, to something a little more open, we could see this idea come back.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Interesting how much the diagram of Knight's Landing's execution core resembles a Bulldozer module. :D

You are probably referring to the image that refers to the two cores as a tile with 1MB L2 in between?

That's actually only thing it has in common with Bulldozer. It's just modified Atom cores with shared L2 cache. Bulldozer splits the execution units inside a core.

Back to Xeon Phi: Benchmarks for Nvidia's Volta accelerators have been revealed, and contrary to expectations for the Tensor core, the gains are significantly less than 2x compared to Pascal. In fact the average is closer to the difference in memory bandwidth between the two. The explanation was that Tensor operations are a fraction of the whole code.

That reinforces my belief that if they didn't have 10nm problems, and 10nm Knights Hill was released late this year, it might not have been cancelled. Note that it could have been even earlier, as even Knights Landing was delayed by half a year as well. If it was early 2017, they would have had a credible alternative to Volta.
 

amd6502

Senior member
Apr 21, 2017
971
360
136
You are probably referring to the image that refers to the two cores as a tile with 1MB L2 in between?

That's actually only thing it has in common with Bulldozer. It's just modified Atom cores with shared L2 cache. Bulldozer splits the execution units inside a core. .

I was thinking the 4 wide (2ALU+2AGU) integer core. What is a stark contrast is that one is CMT and the other SMT4. I'd love to see BD evolve another generation to add some primitive multithreading to it; imagine a module that could handle 3 or 4 threads rather than 2.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
I was thinking the 4 wide (2ALU+2AGU) integer core. What is a stark contrast is that one is CMT and the other SMT4. I'd love to see BD evolve another generation to add some primitive multithreading to it; imagine a module that could handle 3 or 4 threads rather than 2.

I see. They still have nothing in common though. Bulldozer has two sets of 2 ALUs, which are fed by a 4-wide issue decoder. The modified Silvermont core used here is just a regular 2-wide decode with 4-way SMT.