New Zen microarchitecture details

Page 74 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

KTE

Senior member
May 26, 2016
478
130
76
I too believe those charts refer to performance per watt increases.

I don't see any precedence for thinking so much higher in gains.

Sent from HTC 10
 

inf64

Diamond Member
Mar 11, 2011
3,698
4,018
136
3.2GHz 8C/16T providing 2 - 2.2x MT performance of a FX-8350 would mean that the IPC has increased by 100 - 120%. Either you expect Zen to be > 73% faster than Excavator, or Excavator to be 42.85% faster than Piledriver. Either way, both figures differ significantly from AMDs own figures / expectations: PD to XV = 15.5% (average), XV to Zen <= 40%. That's with 25% SMT yield, which is pretty optimistic IMO.

With those specs Zen would have significantly higher IPC than Haswell / Broadwell / Skylake and most likely Kaby Lake.

I counted SMT for Zen and MT penalty for PD in the speed-up:

* while running on the same clock*

1.40(over XV) x 1.15 (XV over PD) x 1.25 (SMT speed-up) / 0.83 (penalty PD has when running 2 threads on a module) = 2.4x

*counting in the clock difference, Zen 3.2Ghz base, PD 8350 4Ghz base*
2.4 x 3.2 / 4 = 1.92 or pretty much 2x over 8350, just like the first slide that had Orochi showed ;)

For ST code it is easier,assuming Zen can reach 3.7Ghz in Turbo mode while 8350 runs at 4.2Ghz :

1.4 x 1.15 x 3.7 / 4.2 = 1.42 so between 40 and 45%, stock vs stock.

BTW I think they absolutely MUST hit these numbers at the very least, if not better, if they want to compete in 2017 and onward.
What is funny and interesting, applying these ST/MT "hypothetical speed-ups" in anandtech bench results for 8350, one arrives at 5960x performance (+-10%).
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
I counted SMT for Zen and MT penalty for PD in the speed-up:

* while running on the same clock*

1.40(over XV) x 1.15 (XV over PD) x 1.25 (SMT speed-up) / 0.83 (penalty PD has when running 2 threads on a module) = 2.4x

*counting in the clock difference, Zen 3.2Ghz base, PD 8350 4Ghz base*
2.4 x 3.2 / 4 = 1.92 or pretty much 2x over 8350, just like the first slide that had Orochi showed ;)

For ST code it is easier,assuming Zen can reach 3.7Ghz in Turbo mode while 8350 runs at 4.2Ghz :

1.4 x 1.15 x 3.7 / 4.2 = 1.42 so between 40 and 45%, stock vs stock.

BTW I think they absolutely MUST hit these numbers at the very least, if not better, if they want to compete in 2017 and onward.
What is funny and interesting, applying these ST/MT "hypothetical speed-ups" in anandtech bench results for 8350, one arrives at 5960x performance (+-10%).

You got me confused, first one "same clock" measures ST or MT ??

For MT we should compare single ZEN Core vs one Module.
 

inf64

Diamond Member
Mar 11, 2011
3,698
4,018
136
You got me confused, first one "same clock" measures ST or MT ??

For MT we should compare single ZEN Core vs one Module.

First case was for 8C/16T 4Ghz/4.2Ghz Zen Vs stock 8350 in MT code - obviously that sort of Zen is not going to happen as 4Ghz stock is a pipe dream. Then I just adjusted for more realistic clock of 3.2Ghz when running MT code.

Second case was pure ST workload and I assumed Zen can reach 3.7Ghz in such scenario Vs 4.2Ghz that 8350 runs at.

edit:

To further explain , running at the same clock:
1M/2T PD scores 0.8x2=1.6 pts in MT code.
1C/2T Zen scores 1.6 x 1.25 = 2 pts in MT code.

4M/8T PD scores 6.4 pts
8C/16T Zen scores 16 pts.

Zen at the same clock is 2.5x faster. Now adjust for difference in clock speed: 2.5 x 3.2 / 4 = 2 or 2x faster than 8350 @ Stock.

ST code is easier : 1.6 x 3.7 / 4.2 = 1.41 or Zen with 3.7Ghz ST Turbo is 41% faster than stock 8350.

Hope it is clearer now.
 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Next step in those performance predictions will be workload based estimations. CB is done, what next? DX11/DX12 games, media processing, office... What are you guys interested in the most?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Next step in those performance predictions will be workload based estimations. CB is done, what next? DX11/DX12 games, media processing, office... What are you guys interested in the most?

Modern media codecs, HEVC / BGP & VP9.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
138
106
Next step in those performance predictions will be workload based estimations. CB is done, what next? DX11/DX12 games, media processing, office... What are you guys interested in the most?
Virtual machines. I am being tired of Intel sluggish improvement.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
First case was for 8C/16T 4Ghz/4.2Ghz Zen Vs stock 8350 in MT code - obviously that sort of Zen is not going to happen as 4Ghz stock is a pipe dream. Then I just adjusted for more realistic clock of 3.2Ghz when running MT code.

Second case was pure ST workload and I assumed Zen can reach 3.7Ghz in such scenario Vs 4.2Ghz that 8350 runs at.

edit:

To further explain , running at the same clock:
1M/2T PD scores 0.8x2=1.6 pts in MT code.
1C/2T Zen scores 1.6 x 1.25 = 2 pts in MT code.


4M/8T PD scores 6.4 pts
8C/16T Zen scores 16 pts.

Zen at the same clock is 2.5x faster. Now adjust for difference in clock speed: 2.5 x 3.2 / 4 = 2 or 2x faster than 8350 @ Stock.

ST code is easier : 1.6 x 3.7 / 4.2 = 1.41 or Zen with 3.7Ghz ST Turbo is 41% faster than stock 8350.

Hope it is clearer now.

If Single ZEN Core + SMT on the new 14nm FF is only 25% faster in MT (same 4GHz) than a 2012 32nm SOI PileDriver Module then AMD is screwed.

That suggest that a Single ZEN Core + HT is not that much faster (If faster at all) than a Single Excavator Module in MT loads.
 

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
If Single ZEN Core + SMT on the new 14nm FF is only 25% faster in MT (same 4GHz) than a 2012 32nm SOI PileDriver Module then AMD is screwed.

How would AMD be screwed if an 8 core Zen CPU is twice as fast in MT workloads as a 4M8T Piledriver CPU? With an 8350 at 640 in something like R15 while a 5960X is 1337. Getting essentially a tie with an 8C Intel HEDT chip would be a big win for Zen, IMO.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
How would AMD be screwed if an 8 core Zen CPU is twice as fast in MT workloads as a 4M8T Piledriver CPU? With an 8350 at 640 in something like R15 while a 5960X is 1337. Getting essentially a tie with an 8C Intel HEDT chip would be a big win for Zen, IMO.

Edit: Misunderstood, NVM.
 
Last edited:

MrTeal

Diamond Member
Dec 7, 2003
3,569
1,699
136
So you expect a 8C/16T Zen to have 4.0GHz base frequency, at 95W?

No, not likely. inf64's rather optimistic prediction was 2.5x a 8350 at the same clocks but 2x faster at a more realistic 3.2GHz. 2x would make 8C16T Zen about as fast as a 5960X in CB R15. 2.5x would make it faster than an i7-6900k, which is just pie in the sky.
 

inf64

Diamond Member
Mar 11, 2011
3,698
4,018
136
How would AMD be screwed if an 8 core Zen CPU is twice as fast in MT workloads as a 4M8T Piledriver CPU? With an 8350 at 640 in something like R15 while a 5960X is 1337. Getting essentially a tie with an 8C Intel HEDT chip would be a big win for Zen, IMO.


I agree. I think it was AMD's goal all along, to basically tie/match Haswell-E at approx. the same clock. It is in no way a fail, it is a MUST for AMD. If they manage to do this and they deliver on Zen+ like they stated (~1 year after Zen, traditional ~10% uarch. update) they would be in a solid spot against Cannonlake derivatives.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
How would AMD be screwed if an 8 core Zen CPU is twice as fast in MT workloads as a 4M8T Piledriver CPU? With an 8350 at 640 in something like R15 while a 5960X is 1337. Getting essentially a tie with an 8C Intel HEDT chip would be a big win for Zen, IMO.

They could have saved all this time and money and resources to make ZEN by doubling the Modules on a CMT design Excavator+ at 14nm and have the same MT performance.

So what you people say is that they spend 4-5 years, spending Billions in R&D to reach the same performance in MT they could have by simple increase the CMT modules of the current design. Which it would be just perfect on a 14nm FF process.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I don't believe AMD used FX-8350 / FX-8370 as "Orochi" in the slide, for two reasons: too high TDP and too high performance for Zeppelin to reach twice the performance over.

In Cinebench R15 Excavator is just 4.87% (instead of the official 15.5% average figure) faster than Piledriver Orochi. In total the expected difference between Piledriver and Zen would be 46.82% (4.87% * 40%, as quoted by AMD). FX-8350 / FX-8370 score 645 points in Cinebench R15. Piledriver scores 96 at 4.0GHz frequency in Cinebench R15 ST test.

2 * 645 = 1290 // Zeppelin's 2x performance target score (for FX-8350 / FX-8370)
96 * 1.4682 = ~140.95 (Zen ST CB R15 score @ 4.0GHz /w quoted and measured performance differences)
8 * 140.95 = ~1128 (Zeppelin's estimated performance /wo SMT @ 4.0GHz)
1128 * 1.25 = 1410 (Zeppelin's estimated 8C/16T performance w/ Intel SMT like yield @ 4.0GHz)
1290 / 1410 = ~0.9149 (Frequency normalization)
0.9149 * 4000 = ~3660MHz (Normalized frequency required for 1290 score)

~3660MHz on all cores of 8C/16T CPU at 95W? I think not. Especially when it is highly unlikely that the SMT implementation in Zen will reach similar yield as the latest SMT implementation from Intel. If (and when) the SMT yield goes down, obviously the CPU clocks must go up even further.

However, if the "Orochi" AMD compared Zen to was infact a FX-8370E CPU, which happens to have both matching TDP and lower clocks...

2 * 534 = 1068 // Zeppelin's 2x performance target score (for FX-8370E)
96 * 1.4682 = ~140.95 (Zen ST CB R15 score @ 4.0GHz /w quoted and measured performance differences)
8 * 140.95 = ~1128 (Zeppelin's estimated performance /wo SMT @ 4.0GHz)
1128 * 1.25 = 1410 (Zeppelin's estimated 8C/16T performance w/ Intel SMT like yield @ 4.0GHz)
1068 / 1410 = ~0.7574 (Frequency normalization)
0.7574 * 4000 = ~3030MHz (Normalized frequency required for 1068 score)

3000MHz is something that is most likely doable when taking the known TDP, the expected process and design characteristics into account :sneaky: In fact I find it pretty likely :D

However none of this matters if the chart didn't represent CB R15 in the first place.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Modern media codecs, HEVC / BGP & VP9.
OK. I'll put them on my list as a codec like type of software. Analyzing the individual ones wouldn't create much better accuracy given the general prediction error. ;)

Virtual machines. I am being tired of Intel sluggish improvement.
Difficult to analyze the way I intended to use. But as AMD had data centers in mind when creating Zen and its somewhat bigger brother K12, I'm sure, that Zen well do well there. The DMC might also help here.
 
Mar 10, 2006
11,715
2,012
126
They could have saved all this time and money and resources to make ZEN by doubling the Modules on a CMT design Excavator+ at 14nm and have the same MT performance.

MT performance isn't the only requirement for PC processors. Single-threaded performance needed to go up.

So what you people say is that they spend 4-5 years, spending Billions in R&D to reach the same performance in MT they could have by simple increase the CMT modules of the current design. Which it would be just perfect on a 14nm FF process.

Intel spent billions on R&D for Skylake and got a +12% IPC boost or so from HSW. IPC improvements are really, really hard, especially when you are dealing with a high frequency design.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
I dont believe I have said a single word abut ST, I was only talking about MT and that is what 99% of the users going to buy an 8 core 16 Threads CPU are interested in. Single thread performance only matters to Desktop Gamers. And I have a feeling by the start of 2017 this one will not have that much of a gravity in Desktop gamers buying decisions as before in DX-11 era.
 
Mar 10, 2006
11,715
2,012
126
I dont believe I have said a single word abut ST, I was only talking about MT and that is what 99% of the users going to buy an 8 core 16 Threads CPU are interested in. Single thread performance only matters to Desktop Gamers. And I have a feeling by the start of 2017 this one will not have that much of a gravity in Desktop gamers buying decisions as before in DX-11 era.

Zen is a multipurpose core aimed at a wide range of markets, some of which aren't all that conducive to more but weaker cores. Single threaded performance matters.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
They could have saved all this time and money and resources to make ZEN by doubling the Modules on a CMT design Excavator+ at 14nm and have the same MT performance.

So what you people say is that they spend 4-5 years, spending Billions in R&D to reach the same performance in MT they could have by simple increase the CMT modules of the current design. Which it would be just perfect on a 14nm FF process.

How would porting any of the recent 15h designs on 14nm process and increasing the module count address it's fatal flaws, such as the obsolete single threaded performance? Not by "the higher speeds made (im)possible by the 14nm LPP" I hope?
 
Mar 10, 2006
11,715
2,012
126
How would porting any of the recent 15h designs on 14nm process and increasing the module count address it's fatal flaws, such as the obsolete single threaded performance? Not by "the higher speeds made (im)possible by the 14nm LPP" I hope?

14LPP probably isn't as conducive to ultra-high frequencies as the AMD custom-tailored 32nm SOI process was.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
How would porting any of the recent 15h designs on 14nm process and increasing the module count address it's fatal flaws, such as the obsolete single threaded performance? Not by "the higher speeds made (im)possible by the 14nm LPP" I hope?

An Excavator+ CMT module on 14nm FF LPP would have close to 25% faster IPC vs PileDriver FX8350 and more than 40% higher MT performance (higher IPC + less CMT penalty).

Now increase the clocks to 4.5GHz and they could have a 8x CMT Module more than 2,5x faster than current PD FX8350. Just upgrading the Ecavator core and porting it to 14nm LPP.

So if Single ZEN Core + SMT is only as fast as PD Module at 32nm SOI then this will be a huge fail for MT performance. And this is what both Intel and AMD are aiming for the Server segment.
 
Mar 10, 2006
11,715
2,012
126
Now increase the clocks to 4.5GHz and they could have a 8x CMT Module more than 2,5x faster than current PD FX8350. Just upgrading the Ecavator core and porting it to 14nm LPP.

Much easier said than done, especially on a process like 14LPP.