• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Updated Knights Landing (KNL) Info.

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Hans de Vries

Senior member
May 2, 2008
347
1,177
136
www.chip-architect.com
The HMC interface didn't work out so they are using DDR4 channels for the time being:

http://www.eetimes.com/document.asp?doc_id=1322855&

Here's a literal confirmation that Knights Landing doesn't use the HMC interface but DDR4 channels.
That's why Intel calls it MCDRAM and not HMC memory.

Mike Black said:
"We plug(ed) off the open (HMC) interface of the consortium and we put on an interface optimized for Knights Landing"
https://www.youtube.com/watch?v=Jc6B0EZKUEU (at 4:15)

So the quote from the eetimes journalist about using DDR4 channels was correct:

eetimes said:
"Our HMC will be packaged with a very optimized interface, so they can be placed very close to the Xeon Phi using DDR4 channels," Mike Black, HMC technology strategist at Micron, told us. "And then all of that will be put into a common package that then drops into a single socket on the board."
http://www.eetimes.com/document.asp?doc_id=1322855&

HMC has a bandwidth of 160 GB/s to 240 GB/s per package. The 500 GB/s of
Knights Landing is, although a lot, only 2 or 3 HMC's with real HMC interfaces.
http://www.hybridmemorycube.org/files/SiteDownloads/HMC_Specification%201_0.pdf
 
Last edited:

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
I would hope 14/16nm, but am kind of expecting 20nm.

Yeah, if current rumors are correct, Nvidia might sneak in one or two 20nm SKUs by the end of the year. That doesn't bode well for seeing 16FF SKUs in 2016.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
The question is if nVidia and AMD will ever go below 20nm due to cost.

And remember, the 16FF aka 20nm with FF is a low power optimized. So dont expect highend GPUs on that.
 

ElFenix

Elite Member
Super Moderator
Mar 20, 2000
102,402
8,574
126
Yes my bad i forgot to mention i was talking about 3D Stack.
What i was trying to point out is that Haswell Crystalwell was the first commercial x86 CPU with On-Package Memory. KNL will also use the same technology.

crystalwell is a just an L4 cache. in that sense, it's not different than an on-die cache. so, no, it's not the first commercial x86 CPU with on-package memory. if you meant to say it's the first commercial x86 CPU with a separate memory chip in package, that's not correct either because the pentium pro had that. also, since it runs separate clocks from the CPU the pentium 2 with cache on the daughtercard also counts.
 
Last edited:

xpea

Senior member
Feb 14, 2014
458
156
116
I would hope 14/16nm, but am kind of expecting 20nm.

Pascal is definitively TSMC 16FF (JHH talked about it after Pascal announcement)

20nm will be used first for Erista SoC (or Tegra M1, M stands for Maxwell). No idea if we'll see GPUs on this node, as it's optimized for SoCs. 16FF is best suited for GPUs
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Pascal is definitively TSMC 16FF (JHH talked about it after Pascal announcement)

20nm will be used first for Erista SoC (or Tegra M1, M stands for Maxwell). No idea if will see GPUs on this node, as it's optimized for SoCs. 16FF is best suited for GPUs

I would say TSMCs 20nm is the best suited for high performance dGPUs.

TSMCs 16FF is targetted at much lower power targets.
 

xpea

Senior member
Feb 14, 2014
458
156
116
Wait, this contradicts what ShintaiDK said a few posts ago.
yep and I don't agree with him.
If you look at Nvidia roadmap (SoC and GPU), they always planned to avoid 20nm to go directly to 16FF. It's only because 16FF was delayed that they changed their mind to include stop gap solutions like Erista
 
Last edited:

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
I would say TSMCs 20nm is the best suited for high performance dGPUs.

TSMCs 16FF is targetted at much lower power targets.

It might be true that it's not suited to a high clock CPU design, but GPU workloads have the advantage of being embarrassingly parallel. You can reach the same performance target by halving the clockspeed (due to the low power target nature of TSMC's process), and doubling the functional units. Intel did this with its HD 5000 graphics (the ultrabook Graphics w/o the eDRAM). This trick doesn't apply as cleanly to CPU performance targets, because those care more about single-threaded performance (in general), which is more closely tied to the manufacturing process (see Kaveri's single thread performance and clock speed).
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
It might be true that it's not suited to a high clock CPU design, but GPU workloads have the advantage of being embarrassingly parallel. You can reach the same performance target by halving the clockspeed (due to the low power target nature of TSMC's process), and doubling the functional units. Intel did this with its HD 5000 graphics (the ultrabook Graphics w/o the eDRAM). This trick doesn't apply as cleanly to CPU performance targets, because those care more about single-threaded performance (in general), which is more closely tied to the manufacturing process (see Kaveri's single thread performance and clock speed).

Double (or ateast xx% bigger) the chip size is a really bad tradeoff for 15% density reduction and the large FF penalty performance wise. Specially when 16FF doesnt seem to be around the corner and it will be in high demand by other companies.
 

xpea

Senior member
Feb 14, 2014
458
156
116
low power doesn't mean low speed.
Look at intel 22nm Trigate/FF, many iGPUs in Haswell run at 1.2GHz...
 

Homeles

Platinum Member
Dec 9, 2011
2,580
0
0
yep and I don't agree with him.
If you look at Nvidia roadmap (SoC and GPU), they always planned to avoid 20nm to go directly to 16FF. It's only because 16FF was delayed that they changed their mind to include stop gap solutions like Erista
I don't know exactly who's picking what, as there are a lot of mixed signals being given, but 16FF shouldn't be an issue for high performance GPUs. Even if it is a mobile-focused process, we're talking a full node density and performance increase, in addition to FinFETs. The pros would outweigh the cons.
 

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
I would say TSMCs 20nm is the best suited for high performance dGPUs.

TSMCs 16FF is targetted at much lower power targets.

Do we know any of the parametrics of 16FF? For sure, it won't have a $/xtor advantage compared to 20nm - but, at least with NV, they do need a large drop in power for their compute cards to keep pace with Intel. NV is getting much better at designing low power GPUs - I can't see hitting 1 GHz +/- as likely being a limitation @ 16FF (I'm sure there will be ARM SoCs besting that by quite a bit).

So offhand, a large die professional/HTC GPU @ 16nm seems to be likely for NV, even if they need to sit out a production year for prices to come down. This gives NV a higher FLOPS/watt, especially with Pascal's HBM. They can drive down their costs by die harvesting for the consumer market - something Intel doesn't have going for it.

In any case, it will be interesting to see how things play out.
 

sefsefsefsef

Senior member
Jun 21, 2007
218
1
71
Double (or ateast xx% bigger) the chip size is a really bad tradeoff for 15% density reduction and the large FF penalty performance wise. Specially when 16FF doesnt seem to be around the corner and it will be in high demand by other companies.

Well, the almighty Intel did it, so it must not have been that bad of an idea. Also, it's more like a 36% reduction (1-(16/20)^2 = .36), and you wouldn't necessarily have to halve and double. I just used those numbers as an example.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
This thing will also come as a standalone CPU! Was waiting for Haswell-E, but I think I'll wait for this instead since my FX8350 is still doing well.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Why do you think 16FF will clock so low? Usually with node shrinks the low performance variation at least match the transistor performance of high performance variation of the previous node. IDC said that when he was working at TI they always at least matched the transistor performance of their previous high performance node with their new full node shrank low performance node. So their 90nm high performance node would have similar transistor performance to 65nm low performance low power node. He said it was the norm in the industry. 16FF won't be a node shrink from 20nm but 20nm LP should at least match 28nm HP, so even if they won't do 16FF HP and will only offer 16FF LP we should be where we are now when it comes to transistor performance/switching speed/Idrive etc.
At least that's what I gathered from IDC's posts, I might have misinterpreted something etc.
 

DrMrLordX

Lifer
Apr 27, 2000
22,945
13,029
136
This thing will also come as a standalone CPU! Was waiting for Haswell-E, but I think I'll wait for this instead since my FX8350 is still doing well.

I noticed this as well. What socket will the standalone CPU use? LGA2011? Can you put this on an MP board with other Xeons?
 

Hans de Vries

Senior member
May 2, 2008
347
1,177
136
www.chip-architect.com
Reading this forum generally causes electroshock's to my brain:\

--- edit: Let me remove the emotional content here, :) Hans -----


Cadence said:
16nm FinFET Processes
The 16FF and 16FF+ technologies are "ready for prime time," according to Sun (left). He noted that the 16FF yield has already caught up with the 20nm planar (20SoC) process node. As a second-generation FinFET technology, he said, 16FF+ can provide an additional 15% die size reduction compared to 20SoC.
Liu said that TSMC plans 15 16FF tapeouts this year, and that compared to 20SoC, 16FF can provide a 40% performance increase at the same power consumption. 16FF+ allows an additional 15% performance increase. Volume production for the 16nm FinFET nodes is expected in 2015. "We are confident that our customers can use this [16nm] technology to produce mobile devices superior to those produced by IDMs," he said.
Hou spoke in detail about TSMC's IP silicon validation for 16FF. He said the company has finished silicon validation for high-speed and high-density standard cell libraries, including more than 8,000 cells. The silicon report shows "very good SPICE to silicon chip correlation." As for memory, TSMC has taped out more than 250 SRAM instances and has finished silicon validation.
Pointing to a 128Mb compiled SRAM instance, Hou said that TSMC can reduce minimum Vcc (supply voltage) by more than 300mV. Peripheral logic can run as low as 0.3 volts, providing further reductions in chip power. TSMC has also completed silicon validation for 1.8V and 3.3V I/Os, analog IP, and eFuse metal.
"All of the silicon reports will be available in two to three weeks," Hou said. "The 16nm FinFET process is very mature and the ecosystem is ready for your design."
And what about 16FF+? Hou said that TSMC was able to improve power, performance, and area in this "second generation" FinFET technology for four reasons:

  • Learning from 20SoC production has allowed for better process control, and as a result, signoff corners have been tightened so as to reduce the need for over-design
  • Device enhancement
  • Middle end of line (MEOL) improvements
  • Back end of line (BEOL) improvements
Combine all these factors, Hou said, and a 16FF+ ring oscillator simulation will show a 20% to 23% speed improvement compared to 16FF. More specifically, standard cells show a 16% to 18% speed improvement, memory shows a 17%-19% speed improvement, eFUSE shows a 13% speed improvement, and I/O devices provide a 3% speed improvement. However, the 16FF+ technology significantly reduces I/O device leakage.
Analog IP, such as PLLs and SerDes, shows a 15% active power reduction. For DDR4 IP, Hou said, TSMC has seen a 20% standby power reduction. All of the design kits and collateral for 16FF+ will be ready by the end of April 2014. Foundation IP will be ready by the end of May, and the complete memory compiler will be available in July, although "for key instances we will support you in May and June," Hou said.

http://www.cadence.com/Community/bl...-ahead-for-16nm-finfet-plus-10nm-and-7nm.aspx

Profanity is not allowed in the technical forums. Also, if you could keep the meta commentary to yourself we would appreciate it.
-ViRGE
 
Last edited:

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Well, the almighty Intel did it, so it must not have been that bad of an idea. Also, it's more like a 36% reduction (1-(16/20)^2 = .36), and you wouldn't necessarily have to halve and double. I just used those numbers as an example.

TSMCs 16FF is not a real 16nm. TSMC themselves expect 15%.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Reading this forum generally causes electroshock's to my brain:\

It's knee deep in [stuff], as far as the eye can see, pooped
out relentlessly by some of the 24/7 posters.
Is it about my post?
 
Last edited by a moderator:

pTmdfx

Member
Feb 23, 2014
85
0
0
It might be true that it's not suited to a high clock CPU design, but GPU workloads have the advantage of being embarrassingly parallel.
It is quite interesting to see somebody here relates the scale of chips that can be built on a node to whether or not the node is low-power... in its positioning. Hmm.
 
Last edited:

jdubs03

Golden Member
Oct 1, 2013
1,291
904
136
Knights Landing looks pretty nice, and the ability to socket could help become a competitive advantage.

However, look at AMD and their FirePro W9100, 2.1 DP TFLOPS on 28nm. That's very impressive. Imagine a 16FF/+ W9100-successor. The potential could be 4-5 DP TFLOPS.