New Zen microarchitecture details

Page 119 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
DP is something I'm not sure is even relevant in this context, today. It's a litho RET technique though. As in, allowing better optical focusing for small features and edges. Using it creates complexity... limitations to the chip design, performance and variability (like misalignment between critical layers) tho.

In order for 14nm LPP to get the highest density and smallest SRAM size you must use a lot of M1 (Metal layer) in your IC design, in order to use the M1 in the 14nm LPP you need to go for Double Patterning.
As i have tried to explain earlier and like you said, because of the problems associated with Double Patterning (misalignment etc) it elevates the total cost of the wafer and thus the price of the final product.

But in order for your design to use all the 14nm FF capabilities of the process (14nm LPP) you need to use M1 in order to get the highest density and highest performance. If you will not use any M1 layers your end product (Chip) will be cheaper (No Double Patterning) but it will be bigger (lower density) and will have lower performance (higher resistance) and higher consumption (higher resistance).

So when they say they will use "density-optimised version of 14nm FinFET" they most probably are going to use a lot of M1 layers and talking about double patterning and not HDL like we had on 28nm.
 

SocketF

Senior member
Jun 2, 2006
236
0
71
ISo when they say they will use "density-optimised version of 14nm FinFET" they most probably are going to use a lot of M1 layers and talking about double patterning and not HDL like we had on 28nm.
Why would they be so stupid and not use their biggest competitive advantage?
 

KTE

Senior member
May 26, 2016
478
130
76
But in order for your design to use all the 14nm FF capabilities of the process (14nm LPP) you need to use M1 in order to get the highest density and highest performance. If you will not use any M1 layers your end product (Chip) will be cheaper (No Double Patterning) but it will be bigger (lower density) and will have lower performance (higher resistance) and higher consumption (higher resistance).

M1 is a layer name. There in every chip, and the most congested. Usually formed on the BEOL, before Cu CMP on the platens.

DP is a litho resolution enhancement technique.

You are greatly confusing the two (and many more elements) :)

Sent from HTC 10
(Opinions are own)
 
  • Like
Reactions: Arachnotronic

NostaSeronx

Diamond Member
Sep 18, 2011
3,687
1,222
136
So when they say they will use "density-optimised version of 14nm FinFET" they most probably are going to use a lot of M1 layers and talking about double patterning and not HDL like we had on 28nm.
AMD is using 9-track LPP which is the same as dGPU.

14LPP = Tri-Gate, Gate-last, Replacement Metal Gate, Bulk FinFET with eSiGe PMOS and eSiP NMOS. Which has standard cells of 10.5T Performance-optimized LPP, and 9T Density-optimized LPP, and 9T Cost-optimized LPe.
M1 is a layer name.

DP is a litho resolution enhancement technique.
He is talking about 1x BEOL layers, as the single pattern layers are called 1.1x BEOL layers. 14nm BEOL-stack has three 1x layers with double patterning no matter what.
 
Last edited:
  • Like
Reactions: KTE

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
i wonder if we wont be granted an "updated" version of Cinebench for 2017, just like the R15 version that opportunistically replaced the 11.5 version while reducing the AMD scores in the same motion...
I have told people to expect the next version to rely (as heavily as they can muster) on AVX-2. Benchmark warfare is an old thing.
 

Abwx

Lifer
Apr 2, 2011
10,996
3,595
136
I have told people to expect the next version to rely (as heavily as they can muster) on AVX-2. Benchmark warfare is an old thing.

Quite possible but there are several tricks that can be used, for instance CB R15 curiously reduced the sizes of the spheres in respect to CB 11.5, we can see on the rendering that theses spheres are the most compute demanding within the scene or that they necessitate ops that use more cycles, the result is that Intel s scores were boosted in respect of AMD and that both Steamroller and Excavator show lower gain (in respect of PD) than in CB 11.5.

Also CB R15 use Embree, wich is an Intel rendering kernel that use their MKLs, dont know if it s used in the bench though..
 
Mar 10, 2006
11,715
2,012
126
But in order for your design to use all the 14nm FF capabilities of the process (14nm LPP) you need to use M1 in order to get the highest density and highest performance. If you will not use any M1 layers your end product (Chip) will be cheaper (No Double Patterning) but it will be bigger (lower density) and will have lower performance (higher resistance) and higher consumption (higher resistance).

Perhaps a visual aid is in order...

oIlEDqW.png


For higher performance (read: clockspeed) you use more layers and, more importantly, thicker lower level metal layers. For better density, you use more of the minimum pitch layers. And, for lower cost, you want to minimize layer count.
 
  • Like
Reactions: sirmo and KTE

Abwx

Lifer
Apr 2, 2011
10,996
3,595
136
For higher performance (read: clockspeed) you use more layers and,
It depend of the circuit, whatever the layers number the transistors will switch at the same speed, more layers are necessary if the circuit is very big and hence will need more current as the frequency is increased, but as said a smaller circuit can work at the same frequencies and with less layers than big ones.

The added layers are here to limit voltages drop, that is, as pointed by Atenra the current deliveries will have lower resistance routes.
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
This only confirms they will use the LPP and not the LPE and from what i can understand from GloFo themselves, 14nm LPP process combines all Libraries (HDL, HP etc) in to a single process.
That means there is no 14nm LPP HP or 14nm LPP HDL or 14nm LPP LP etc, this time because of FinFets they designed a single process (LPP) that has the highest Density (M1 double Patterning), highest Performance (Increased Fin height) and lower power due to Fully Depleted FinFets.

Edit: This statement also confirms ZEN will use M1 Double Patterning for highest Density and highest Performance. It will be more expensive but it will have the highest density and performance the 14nm LPP can offer.

Citation needed
 

KTE

Senior member
May 26, 2016
478
130
76
He is talking about 1x BEOL layers, as the single pattern layers are called 1.1x BEOL layers. 14nm BEOL-stack has three 1x layers with double patterning no matter what.
I know these process details. But process is separate to an enhancement technique. DP is chosen for its benefits at the critical layers... Bedtime and work early. I'll discuss this later.

Naples benchmarks leaked:
http://wccftech.com/amd-zen-naples-soc-benchmarks/

Sent from HTC 10
(Opinions are own)
 

SpaceBeer

Senior member
Apr 2, 2016
307
100
116
One thing came to my mind – If I remember correctly, when AMD presented Zen ~3 weeks ago, it was mentioned they still don’t know what will be official name of Zen based CPUs. Which means they still don’t have final design of packaging boxes, data-sheets, manuals, etc. And probably no final approvals (CE, FCC, etc.). So I don’t don’t know how it would be possible to launch product in 4-5 months without all this things. It takes a lot of time for each of them, there are lot of parties included, and products should be on stock little bit before sales start. Therefore, I’m not sure if we can expect Zen based CPUs in Dec/Jan, though this is just my opinion.
 

del42sa

Member
May 28, 2013
26
11
81
I've heard that the SMT implementation on Zen would have a very large switching penalty. Meaning that it is not ideal to utilize a single thread on each physical core, and then switch to two threads per each physical core (or vice versa). Any idea why would that be the case with Zen?

where this information came from ?
 

Abwx

Lifer
Apr 2, 2011
10,996
3,595
136

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
It is just something I've heard, along with other things. No idea if it is accurate or not, I guess we have to wait and see.
One idea: power gating - with 32 cores on a chip every bit of saved leakage might count - similar to Intel's AVX high path gating. I wonder if it would be an option to have faster path in SMT-aware structures, which could be used in 1T mode instead of 2T capable structures. With SMT the power mgmt might clock the core lower anyway, as increased use of resources might eat up the power budget more quickly.

Maybe there is also some self calibration going on with a SMT switch.

And then there is this logical core renaming patent. Maybe this technique is involved in this.

The simple flushing of pipelines shouldn't take that long.

Could it be from algorithmic switching from microcode?
µCode switching (if any) shouldn't cause such effects, except for power gating like I described above.
 

Abwx

Lifer
Apr 2, 2011
10,996
3,595
136
µCode switching (if any) shouldn't cause such effects, except for power gating like I described above.

They use clock gating, wich mean that registers for instance are active but consuming only the leakage current while keeping the last logical states available.
 

DrMrLordX

Lifer
Apr 27, 2000
21,657
10,889
136
I would expect Zen+ to be 14nm LPP or IBM 14nm HP before 7nm, 7nm is a ways down the road.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
I dont believe AMD will use 7nm before 2019, ZEN+ should be on 14nm as well (im expecting it 1.5 - 2 years after ZEN launch).
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
They use clock gating, wich mean that registers for instance are active but consuming only the leakage current while keeping the last logical states available.
I know. But clock gating happens at subnanosecond levels (even from cycle to cycle). That's out of the question, so the only slow gating left is power gating.
 
Last edited:

AMDisTheBEST

Senior member
Dec 17, 2015
682
90
61
Where did you see the prove? Please share with us too.

Anyway, people should be cautious about taking controlled room demos as some kind of actual performance review. Not long ago did AMD show a system with a polaris consuming 54W less than a gtx 950 equipped system. From various reviews it become apparent that either rx460 SKUs consumed a bit more or about equal and in some cases it did consume 20-25W less but still nowhere close to what if controlled demo was taken as a real world reference. Also if AMD has actually a product which is better than Broadwell clock for clock, then they probably were/are being quite conservative with their "40%" ipc estimation.

https://www.youtube.com/watch?v=xzZT2xH3zBk
here ya go
Zen will be the Athlon64 2.0 :D

Intel also pulled off a 40% ipc gained back in 2006 when they released core2 duo to replace the hot pentium4. That 40% is the magic number bud

Polaris did bring huge improvement in power consumption. Amd never mentioned it was better than pascal but they did say it was hugely better which is true
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,996
3,595
136
I know. But clock gating happens subnanoseconds (even from cycle to cycle). That's out of the question, so the only slow gating left is power gating.

Not at all, you can stop clocking any digital circuitry for any time, the result is that (statisticaly) half of the transistors will be switched on while the other half will be switched off, the circuit will consume nothing other than the leakage residual current, as it s when the circuit switch that power is mainly consumed.

On the other hand power gating will reduce the leakage by at most 2 but this cant be applied on short periods,
so it s usefull only for caches and such parts wich are not systematicaly used fully, you couldnt power gate a part of a pipeline for instance while you can clock gate it.
 

KTE

Senior member
May 26, 2016
478
130
76
I know. But clock gating happens subnanoseconds (even from cycle to cycle). That's out of the question, so the only slow gating left is power gating.
Power gating is already implemented quite heavily with XV.

Leakage current isn't an 'only' BTW. Adds up very fast. Depends on the base process characteristics how negligible it is.

Sent from HTC 10
(Opinions are own)
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,357
1,564
136
you couldnt power gate a part of a pipeline for instance while you can clock gate it.

The most fine-grained example I know of is that Intel power-gates the upper half of AVX machinery when AVX isn't used a lot.