New Zen microarchitecture details

Page 38 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
So if one is at say 100 the other will be at 140, this for one thread..



In FP workload like CB an EXV module score 188 in respect of a single core, so 73% of this performance mean that a Zen core will be at 137 when loaded with two threads, that is, less than what you re stating for a single thread a few lines above, hey, that make a negative efficency for SMT...

Why do you think I included SMT in that estimation?

A single Excavator CU running at 3.4GHz will score 144 in Cinebench R15.

144 * 0.73 = 105.12
105.12 * 8 = 840.96

The multiply the score produced by native cores by 1.10 - 1.25 to get the performance estimate for 8C/16T Zen (925 - 1051 points @ 3.4GHz).

Obviously it is currently impossible to say what kind of yield will AMDs SMT implementation produce, so I ceiled it to roughly the same figure as Haswell / Broadwell & Skylake achieves.
 

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
ok thanks

So, that will put it at Core i7 4960X Throughput for CB R15 but at 95W TDP for ZEN and 140W TDP for the IvyBridge.
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I wouldn't be so sure that in 8C/16T configuration Zen will be able to hit 3.4GHz with all cores loaded, at 95W. If we look where Intel is at, I would guess around 2500 - 2800MHz sustained clocks at 95W.

But really, your guess is just as good as mine.
 

majord

Senior member
Jul 26, 2015
433
523
136
I wouldn't be so sure that in 8C/16T configuration Zen will be able to hit 3.4GHz with all cores loaded, at 95W. If we look where Intel is at, I would guess around 2500 - 2800MHz sustained clocks at 95W.

But really, your guess is just as good as mine.

I don't think Intel's HEDT platform TDP's are the best point of reference to be honest. Their base clock vs TDP vs core count just doesn't align well with what they are able offer in lower core count Mainstream platform SKU's. In other words, if they were to offer 8core SKU's on the mainstream dual channel platform, you'd be seeing 3Ghz+ @ 95w.

Either the difference is because of yeilds, 'acceptance' of higher TPP's on that platform, a lack of competition, or maybe all of the above. But then again, the former may still play a big part in what actual clockspeeds you see out the door with Zen.

Looking at it from a purely theortical pt of view (of course) When you consider how well Excavator clocks on 28nm next to 22nm FF it's no surprise people have reasonable expectations @ 14nm
e.g:

top mobile SKUs:

HW 22nm FF
2c4t: 2.1Ghz 15w
2c4t: 2.9Ghz 37w

Ex 28nm
2c4t: 2.1Ghz 15w
2c4t: 3.xGhz?? @ 35w

Actually, Stilt. I'm sure I read from your own testing, Exv sample you were running was sustaining 2.8Ghz @ 12.5w/CU (25w package)

that was in boost state of a 15/25w cTdp, but if i'm not mistaken it would give you the same behavior in non boost state of a 25w /42Wb configuration.

That's ~100w @2.8Ghz for 8Cu's on 28nm right there.

I still maintain that on a like process Zen will have lower clocks than Exv - watt for watt, and there's too many unknowns and variables to theorize any further I think, but what one could at take from it is:

If they can't pull off 3Ghz base , then based on their own IPC estimates ,there's going to be a hell of a lot of: "why didn't you just shrink 8 Exv CU's?!?" questions floating around!
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
I don't think Intel's HEDT platform TDP's are the best point of reference to be honest. Their base clock vs TDP vs core count just doesn't align well with what they are able offer in lower core count Mainstream platform SKU's. In other words, if they were to offer 8core SKU's on the mainstream dual channel platform, you'd be seeing 3Ghz+ @ 95w.

I think you're right, or at least close (i7-6820EQ). Base clock for the top bin 45W Skylare part is 2.8GHz. Going to your hypothetical version should cost less than twice the power because the memory channels remain the same, so there'd be more than 5W of margin left to try to get closer to 3GHz. Considerably more if there's no IGP.

But this would be for the very best bin of these parts, and with such a bigger die hitting those bin targets could be harder.

Either the difference is because of yeilds, 'acceptance' of higher TPP's on that platform, a lack of competition, or maybe all of the above.

Another aspect is that the L3 cache scales a little further in power because the cache/core goes up. This and the quad channel memory are side effects of using dies designed first and foremost for the server space.

Intel could spin a die that's strictly for HEDT but I get the feeling that the volume just doesn't justify it.

Looking at it from a purely theortical pt of view (of course) When you consider how well Excavator clocks on 28nm next to 22nm FF it's no surprise people have reasonable expectations @ 14nm

That's like letting Netburst set the expectations for clock speed on Conroe.. Okay, not that extreme, but you get my point. Zen is going to be a wider/higher IPC design, the clock margins are going to be a different vs Excavator.
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I don't think Intel's HEDT platform TDP's are the best point of reference to be honest. Their base clock vs TDP vs core count just doesn't align well with what they are able offer in lower core count Mainstream platform SKU's. In other words, if they were to offer 8core SKU's on the mainstream dual channel platform, you'd be seeing 3Ghz+ @ 95w.

Either the difference is because of yeilds, 'acceptance' of higher TPP's on that platform, a lack of competition, or maybe all of the above. But then again, the former may still play a big part in what actual clockspeeds you see out the door with Zen.

Looking at it from a purely theortical pt of view (of course) When you consider how well Excavator clocks on 28nm next to 22nm FF it's no surprise people have reasonable expectations @ 14nm
e.g:

top mobile SKUs:

HW 22nm FF
2c4t: 2.1Ghz 15w
2c4t: 2.9Ghz 37w

Ex 28nm
2c4t: 2.1Ghz 15w
2c4t: 3.xGhz?? @ 35w

Actually, Stilt. I'm sure I read from your own testing, Exv sample you were running was sustaining 2.8Ghz @ 12.5w/CU (25w package)

that was in boost state of a 15/25w cTdp, but if i'm not mistaken it would give you the same behavior in non boost state of a 25w /42Wb configuration.

That's ~100w @2.8Ghz for 8Cu's on 28nm right there.

I still maintain that on a like process Zen will have lower clocks than Exv - watt for watt, and there's too many unknowns and variables to theorize any further I think, but what one could at take from it is:

If they can't pull off 3Ghz base , then based on their own IPC estimates ,there's going to be a hell of a lot of: "why didn't you just shrink 8 Exv CU's?!?" questions floating around!

At this point it is IMO completely down to the characteristics of the 14nm LPP manufacturing process. I expect the power efficient Fmax ceiling to be extremely low (no higher than on Carrizo on 28HPP =< 2.6GHz) and the power efficiency to degrade extremely rapidly (even faster than on Carrizo, Fmax vs. VDD) beyond this point. Because of this I expect the base frequency to be quite low, while the various turbo frequencies to reach around 3.2 - 3.5GHz.

I expect to see various (four to six) boosted PStates with a huge frequency delta to the base state on Zen, similar to Carrizo. Bulldozer, Piledriver and Steamroller designs only had one or two boosted states with a rather small frequency delta (except the "power efficient" E-parts).

Eventhou it sounds like I'm heavily disliking the 14nm LPP process, that's really not the case. The process is no doubt murderously power efficient at low frequencies (i.e on GPU ASICs), but I just have to question it's ability to scale higher, as it is absolutely mandatory for Zen to succeed.
 

Abwx

Lifer
Apr 2, 2011
10,854
3,298
136
Why do you think I included SMT in that estimation?

A single Excavator CU running at 3.4GHz will score 144 in Cinebench R15.

144 * 0.73 = 105.12
105.12 * 8 = 840.96

The multiply the score produced by native cores by 1.10 - 1.25 to get the performance estimate for 8C/16T Zen (925 - 1051 points @ 3.4GHz).

Obviously it is currently impossible to say what kind of yield will AMDs SMT implementation produce, so I ceiled it to roughly the same figure as Haswell / Broadwell & Skylake achieves.

Well, since you did mention 73% of a module perfs it would be logical to compare on a 2T basis, other than this EXV score about 320 at 3.5GHz in CB R15 according to Jagat review of the 845.

As for exact throughput in think that it s safe to assume that Zen FPU being about the same as an EXV module the former will have a comparable throughput than the latter when two threads are loading the core.

At this point it is IMO completely down to the characteristics of the 14nm LPP manufacturing process. I expect the power efficient Fmax ceiling to be extremely low (no higher than on Carrizo on 28HPP =< 2.6GHz) and the power efficiency to degrade extremely rapidly (even faster than on Carrizo, Fmax vs. VDD) beyond this point. Because of this I expect the base frequency to be quite low, while the various turbo frequencies to reach around 3.2 - 3.5GHz.

I expect to see various (four to six) boosted PStates with a huge frequency delta to the base state on Zen, similar to Carrizo. Bulldozer, Piledriver and Steamroller designs only had one or two boosted states with a rather small frequency delta (except the "power efficient" E-parts).

Eventhou it sounds like I'm heavily disliking the 14nm LPP process, that's really not the case. The process is no doubt murderously power efficient at low frequencies (i.e on GPU ASICs), but I just have to question it's ability to scale higher, as it is absolutely mandatory for Zen to succeed.

Numbers have been published, so i dont understand why there s still a debate about GF s 14nm, at 2.4GHz it is extremely efficient, 20% better than Intel at this frequency, transistors are the same for everybody and Intel s 10-15% higher voltage need no further comments...

As for frequency ceiling it wont have any problem outmatching the 28nm process if we are to interpret GF s numbers wich are not even for the faster variant of their 14nm offerings.

I think you're right, or at least close (i7-6820EQ). Base clock for the top bin 45W Skylare part is 2.8GHz. Going to your hypothetical version should cost less than twice the power because the memory channels remain the same, so there'd be more than 5W of margin left to try to get closer to 3GHz. Considerably more if there's no IGP.

.

950mV at 2.8GHz for thoses chips, to compare with GF s 800mV at 2.4GHz with a CPU that is not as much optimised for high frequencies, if anything it looks like those who are basing their assumptions on brand names rather than on physics will get a good ride for their money...

FTR this is the secondly ranked transistors speed wise, there s a higher ranking wich should be 10% faster but at the expense of more leakage, not that it s a problem as the afformentioned is 6x less leaky than their 28nm HPP and still 3.5x less than their 28nm SLP.
 
Last edited:

majord

Senior member
Jul 26, 2015
433
523
136
That's like letting Netburst set the expectations for clock speed on Conroe.. Okay, not that extreme, but you get my point. Zen is going to be a wider/higher IPC design, the clock margins are going to be a different vs Excavator.

2nd last paragraph, - I agree to an extent.

But it's funny you mention Netburst vs conroe.. An extreme example as you say- but in reality it doesn't really give a good case for the point, as the outcome wasn't so extreme.

That was one of the stunning things about Conroe's launch. The clockspeeds at ambient temperatures (where it matters) wern't much lower than Netburst both factory and overclocked... Despite being so much shorter, wider, and with an IPC some 80 + % higher

and it really came down to Netburst's inherent frequency targets being just sooo far off reality for the available process' characterstics, that they barely out-clocked Conroe anyway in the confines of platform power limitations (3.73Ghz @115w vs 2.93Ghz@89w DC) - So ~20% more frequency, for all the massive architectural - IPC robbing compromises .

In stark contrast, Conroe hit the nail on head so to speak. its frequency targets were Perfectly suited to the Process tech

Constuction cores somehow fell into a similar trap as well. Designed around 32nm SOI, targeting >4Ghz frequencies, but they've since progressively changed focus, and in a similar vein to my Netburst example above, the compromises aren't yielding much frequency delta from a higher IPC deisgn anymore - at least not down in ~3Ghz region anyway.

So I guess the question is - Will this far less extreme scenario play out similar, and have an even less extreme outcome on clockspeeds?

To be honest I think the biggest potential for disapointment with reaching clockspeeds we're expecting will instead be the Process itself. So little information about it (at least that I am aware of) and what, if any variation there is to suit higher frequency?

I guess we'll never know the reality since XV will never be on the same process. but it would be nice to have some data on the two process nodes performance.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Well, since you did mention 73% of a module perfs it would be logical to compare on a 2T basis, other than this EXV score about 320 at 3.5GHz in CB R15 according to Jagat review of the 845.

As for exact throughput in think that it s safe to assume that Zen FPU being about the same as an EXV module the former will have a comparable throughput than the latter when two threads are loading the core.



Numbers have been published, so i dont understand why there s still a debate about GF s 14nm, at 2.4GHz it is extremely efficient, 20% better than Intel at this frequency, transistors are the same for everybody and Intel s 10-15% higher voltage need no further comments...

As for frequency ceiling it wont have any problem outmatching the 28nm process if we are to interpret GF s numbers wich are not even for the faster variant of their 14nm offerings.



950mV at 2.8GHz for thoses chips, to compare with GF s 800mV at 2.4GHz with a CPU that is not as much optimised for high frequencies, if anything it looks like those who are basing their assumptions on brand names rather than on physics will get a good ride for their money...

FTR this is the secondly ranked transistors speed wise, there s a higher ranking wich should be 10% faster but at the expense of more leakage, not that it s a problem as the afformentioned is 6x less leaky than their 28nm HPP and still 3.5x less than their 28nm SLP.

IIRC the 28nm processes from GlobalFoundries too were supposed to be superior to 32nm SHP SOI, on paper :sneaky:
Except in reality they are inferior in every other way, but the better power efficiency at low clocks (<3.5GHz). At higher frequencies 32nm SHP SOI not only provides better power efficiency, but also completely superior and linear scaling in terms of Fmax (absolute being ~45% higher).

Comparing a random specimen from a actual production run (Intel) against some theoretical marketing figures is completely futile, even more so when the figures apply on completely different designs. Unless you have some unreleased data about 14nm LPP, then you are comparing a single ARM macro against a whole CPU core(s). Not to mention that the voltages on production parts will vary from specimen to specimen, but also contain tolerances for load-line for example. The figures for the ARM core on 14nm LPP most likely are the bare absolute minimum, at which the macro is able operate properly.
 

Yutani

Junior Member
Apr 14, 2016
2
0
66
Why do you think I included SMT in that estimation?

A single Excavator CU running at 3.4GHz will score 144 in Cinebench R15.

144 * 0.73 = 105.12
105.12 * 8 = 840.96

The multiply the score produced by native cores by 1.10 - 1.25 to get the performance estimate for 8C/16T Zen (925 - 1051 points @ 3.4GHz).

Obviously it is currently impossible to say what kind of yield will AMDs SMT implementation produce, so I ceiled it to roughly the same figure as Haswell / Broadwell & Skylake achieves.

Hi guys, I am new to this forum, this is my first post here.

I've made some calculations based on your numbers and CB R15 scores, and they tell me that a 4c/4t 3400 MHz Zen would be equal to a 3100 MHz Sandy Bridge i5 (e.g. i5-2400).

If I take the Athlon X4 845 CPU's CB R15 score and give the 40% IPC boost it's still below but closer to the Sandy i5 performance.

A few months ago I said that if Zen reaches the Sandy's performance I would be glad with that. But is that all what we can expect from AMD's Zen?
 
Last edited:

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
At this point it is IMO completely down to the characteristics of the 14nm LPP manufacturing process. I expect the power efficient Fmax ceiling to be extremely low (no higher than on Carrizo on 28HPP =< 2.6GHz) and the power efficiency to degrade extremely rapidly (even faster than on Carrizo, Fmax vs. VDD) beyond this point. Because of this I expect the base frequency to be quite low, while the various turbo frequencies to reach around 3.2 - 3.5GHz.

I expect to see various (four to six) boosted PStates with a huge frequency delta to the base state on Zen, similar to Carrizo. Bulldozer, Piledriver and Steamroller designs only had one or two boosted states with a rather small frequency delta (except the "power efficient" E-parts).

Eventhou it sounds like I'm heavily disliking the 14nm LPP process, that's really not the case. The process is no doubt murderously power efficient at low frequencies (i.e on GPU ASICs), but I just have to question it's ability to scale higher, as it is absolutely mandatory for Zen to succeed.
From this description, it sounds like Zen will be a poor overclocking chip?
 

Abwx

Lifer
Apr 2, 2011
10,854
3,298
136
IIRC the 28nm processes from GlobalFoundries too were supposed to be superior to 32nm SHP SOI, on paper :sneaky:
Except in reality they are inferior in every other way, but the better power efficiency at low clocks (<3.5GHz). At higher frequencies 32nm SHP SOI not only provides better power efficiency, but also completely superior and linear scaling in terms of Fmax (absolute being ~45% higher).

32nm tops at 5.3-5.4 the same way the 28nm HPP tops at 4.7GHz, the difference is close to the claimed 10%, besides compare a Godavari at 4GHz and a Richland at the same frequency, the latter is less efficient and still, it benefit from a better voltage binning than Kaveri.



Comparing a random specimen from a actual production run (Intel) against some theoretical marketing figures is completely futile, even more so when the figures apply on completely different designs. Unless you have some unreleased data about 14nm LPP, then you are comparing a single ARM macro against a whole CPU core(s). Not to mention that the voltages on production parts will vary from specimen to specimen, but also contain tolerances for load-line for example. The figures for the ARM core on 14nm LPP most likely are the bare absolute minimum, at which the macro is able operate properly.


There s nothing random, if you did pay attention the test vehicle was a 2C A57 chip, i dont know from where you pulled the macro thing and if there s such a test thingy that reach 330mW power comsumption..

The figures on those cores are extremely encouraging as this frequency is reached on a design that has a shorter pipeline than Zen..

As for branding them marketing numbers, here they are, if there s any marketing that use physically measured numbers displayed in conference for enginers only.



GLOBAL FOUNDRIES FINFETS VS 28NMs

PPA RESULT : LVT


GF PROCESS 28 SLP 28 HPP 14 LPP



SPEED (SS.0P90V.WORST 125C/-40C) (SS.0P765V.WORST 125C/-40C) (SS.0P72V.WORST 125C/-40C)


FMAX GHZ 0.97 1.17 2.41

RELATIVE SPEED 1 1.2 2.48





POWER (FF.1P10V.125C) (FF.0P935V.125C) (FF.0P88V.125C)


TOTAL DYNAMIC 158 210 310
POWER (mW)


RELATIVE DYN 1 1.3 1.9
POWER

TOTAL LEAKAGE 70 119 18.6
POWER (mW)


Seriously, you think that such numbers can be used to market GF s in any AT mainstream article..?.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
From this description, it sounds like Zen will be a poor overclocking chip?

As said before, these are my own expectations. I base them on the process characteristics (low power), process design targets (ASICs, mobile devices, network), on the fact how hard it is to get < 20nm processes right (look at Intel 14nm) and the various issues AMD has had with most of it's CPU designs in the past. Also if the low power process is the optimal choice for a relatively large and high power design such as Zen, why did Intel spend hundreds of millions in developing two separate process variants, one targeting power efficiency (low power) and the second one targeting high performance? Probably not just because they had nowhere else to put the incoming cash I recon.

I believe Zen from the design side is good enough to improve the position AMD is currently in, however I also do think that ultimately Zen's success will depend on how high it is able to clock. And in that the manufacturing process plays a huge role.

Nevertheless, at the moment all of the speculation is futile. We will get a idea of Zen's frequency capabilities soon enough.
 
Aug 11, 2008
10,451
642
126
As said before, these are my own expectations. I base them on the process characteristics (low power), process design targets (ASICs, mobile devices, network), on the fact how hard it is to get < 20nm processes right (look at Intel 14nm) and the various issues AMD has had with most of it's CPU designs in the past. Also if the low power process is the optimal choice for a relatively large and high power design such as Zen, why did Intel spend hundreds of millions in developing two separate process variants, one targeting power efficiency (low power) and the second one targeting high performance? Probably not just because they had nowhere else to put the incoming cash I recon.

I believe Zen from the design side is good enough to improve the position AMD is currently in, however I also do think that ultimately Zen's success will depend on how high it is able to clock. And in that the manufacturing process plays a huge role.

Nevertheless, at the moment all of the speculation is futile. We will get a idea of Zen's frequency capabilities soon enough.

Thanks for all the information. Your posts bring a welcome dose of reality to the sometimes wild speculation that seems to abound in these forums.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
There s nothing random, if you did pay attention the test vehicle was a 2C A57 chip, i dont know from where you pulled the macro thing and if there s such a test thingy that reach 330mW power comsumption..

The figures on those cores are extremely encouraging as this frequency is reached on a design that has a shorter pipeline than Zen..

As for branding them marketing numbers, here they are, if there s any marketing that use physically measured numbers displayed in conference for enginers only.
IIRC in the same presentation a slide predecessing these results mentioned an a9_neon macro or something like that. This also explains the small area of ~0.25mm^2 in 28nm and ~0.06mm^2 in 14nm (IIRC).

If that is of any help: I've seen some older GF slides showing some board, an oscilloscope with an overshooting square wave and a monitor showing some measurement tool listing for example 3GHz@5xx mW for a Cortex-A9 macro in 14XM. From the presentation context, they used 9T libs (vs. 12T for 28HPP) and got significant speed/power improvements even with the denser libs. I don't know, if all that was just made up for marketing purposes or if there is some truth in it. But it's not much different to what TSMC has presented so far about their ARM macros.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,854
3,298
136
IIRC in the same presentation a slide predecessing these results mentioned an a9_neon macro or something like that. This also explains the small area of ~0.25mm^2 in 28nm and ~0.6mm^2 in 14nm (IIRC).

If that is of any help: I've seen some older GF slides showing some board, an oscilloscope with an overshooting square wave and a monitor showing some measurement tool listing for example 3GHz@5xx mW for a Cortex-A9 macro in 14XM. From the presentation context, they used 9T libs (vs. 12T for 28HPP) and got significant speed/power improvements even with the denser libs. I don't know, if all that was just made up for marketing purposes or if there is some truth in it. But it's not much different to what TSMC has presented so far about their ARM macros.

It doesnt really matter what was the test vehicle as they didnt release 14nm numbers in isolation but in respect of both their 28nm HPP and SLP, from the numbers above we can extract directly the reduction in parasistic capacitance and the improvement in conductance, indeed i just checked in a recent slide where they summarized thoses numbers in more accessible way, they claim in respect of 28 HPP :

1.521x speed at isopower.
0.442x power at isospeed.
0.495x density reduction.

Set apart density the other numbers are directly extracted from the ones they published in their engineers dedicated conf, they specify that the 14nm LPP is well suited for high performance designs.
 

nenforcer

Golden Member
Aug 26, 2008
1,767
1
76
Hi guys, I am new to this forum, this is my first post here.

I've made some calculations based on your numbers and CB R15 scores, and they tell me that a 4c/4t 3400 MHz Zen would be equal to a 3100 MHz Sandy Bridge i5 (e.g. i5-2400).

If I take the Athlon X4 845 CPU's CB R15 score and give the 40% IPC boost it's still below but closer to the Sandy i5 performance.

A few months ago I said that if Zen reaches the Sandy's performance I would be glad with that. But is that all what we can expect from AMD's Zen?

If a new AMD Zen chip released in late 2016 / early 2017 can only hit the IPC speeds of an Intel chip released way back in 2011 I think that would be extremely disappointing. I think realistically they've got to hit Haswell 2013 / 2014 IPC to make it even competitive and then only with the usual price / core discounts relative to Intel.
 
Mar 10, 2006
11,715
2,012
126
If a new AMD Zen chip released in late 2016 / early 2017 can only hit the IPC speeds of an Intel chip released way back in 2011 I think that would be extremely disappointing. I think realistically they've got to hit Haswell 2013 / 2014 IPC to make it even competitive and then only with the usual price / core discounts relative to Intel.

AMD is a much smaller and has far fewer resources than Intel does. It's these unrealistic expectations of a frail, beaten-down company in the face of extremely strong, well-funded, and successful competition that leads to disappointment.
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
I have a feeling we are going to have a long wait until the release. I'm guessing very late in 2016.First AMD has to show a big jump from Bulldozer/Piledriver.