• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

New Zen microarchitecture details

Page 75 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
14LPP probably isn't as conducive to ultra-high frequencies as the AMD custom-tailored 32nm SOI process was.
I don't think there will ever be a process, which provides both as linear scaling and as wide range of frequencies as the 32nm SHP SOI ended up providing. At least not on conventional technologies. The process was never the most efficient one, but it can still easily outperform even the custom GF28A bulk process at higher frequencies. Pretty telling that a 2 CU Steamroller part can easily consume more power at ~4.5GHz than a 4 CU Piledriver based CPU at < 4.2GHz. Also the frequency scaling is so much behind of 32nm SHP SOI that the GF28A / GF28HPP thinks it's leading :biggrin:
 
Mar 10, 2006
11,719
2,003
126
I don't think there will ever be a process, which provides both as linear scaling and as wide range of frequencies as the 32nm SHP SOI ended up providing. At least not on conventional technologies. The process was never the most efficient one, but it can still easily outperform even the custom GF28A bulk process at higher frequencies. Pretty telling that a 2 CU Steamroller part can easily consume more power at ~4.5GHz than a 4 CU Piledriver based CPU at < 4.2GHz. Also the frequency scaling is so much behind of 32nm SHP SOI that the GF28A / GF28HPP thinks it's leading :biggrin:
I believe that the 32nm SHP SOI was based on the work IBM did with its 32nm SOI process, a process designed for very high performance/high frequency (and power guzzling) POWER chips.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
An Excavator+ CMT module on 14nm FF LPP would have close to 25% faster IPC vs PileDriver FX8350 and more than 40% higher MT performance (higher IPC + less CMT penalty).

Now increase the clocks to 4.5GHz and they could have a 8x CMT Module more than 2,5x faster than current PD FX8350. Just upgrading the Ecavator core and porting it to 14nm LPP.

So if Single ZEN Core + SMT is only as fast as PD Module at 32nm SOI then this will be a huge fail for MT performance. And this is what both Intel and AMD are aiming for the Server segment.
Even if the "Excavator+" would hit 5.5GHz on 14nm LPP, it still would be no match to even Intel's current offerings. Skylake is > 60% ahead of Excavator in FP workloads. And what shall AMD do with their new "Excavator+" design when the software developers start to implement more modern instructions? AVX2 has been already implemented for several years in high profile software and AVX512 is about materialize. In AVX2 workloads Skylake can be > 100% ahead of Excavator.
 
Mar 10, 2006
11,719
2,003
126
Even if the "Excavator+" would hit 5.5GHz on 14nm LPP, it still would be no match to even Intel's current offerings. Skylake is > 60% ahead of Excavator in FP workloads. And what shall AMD do with their new "Excavator+" design when the software developers start to implement more modern instructions? AVX2 has been already implemented for several years in high profile software and AVX512 is about materialize. In AVX2 workloads Skylake can be > 100% ahead of Excavator.
Zen doesn't implement AVX-512 AFAIK.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,172
1,731
136
Even if the "Excavator+" would hit 5.5GHz on 14nm LPP, it still would be no match to even Intel's current offerings. Skylake is > 60% ahead of Excavator in FP workloads. And what shall AMD do with their new "Excavator+" design when the software developers start to implement more modern instructions? AVX2 has been already implemented for several years in high profile software and AVX512 is about materialize. In AVX2 workloads Skylake can be > 100% ahead of Excavator.
The reality is there is Sweet FA support for 256bit OP's in software especially enterprise and "cloud" software. 256bit ops only matter for a small subset of applications and thax to intel's deliberate market segmentation AVX isn't even used in a whole pile of Consumer apps that it could be.

There's no point in making intel core mk.2 and AMD aren't, The choice to go 128bit datapaths is a good choice to make when designing a core to go into high core count servers and low power devices, increased width data paths effect your core end to end and your cache fabric regardless of weather you use it or not (the extreme majority dont).

Also AVX2 isn't the issue, 256bit ops are the issue for Conn cores but then again who actually cares, there are like two main groups of applications that really do.

To paraphrase Linus Torvald pick a benchmark that actually gives you a good idea as to real world performance for real world applications. His recommendation for a good benchmark is SPEC 403.gcc
 
Last edited:

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,004
153
106
Hitting higher frequencies is a matter of design these days ...

The SPARC M7 was able to hit 4.133GHz on ALL of it's 32 cores(!) with TSMC's 20nm (not 16FF) process which had a reputation relegated to lower power systems ...

The frequency ceiling peaked almost a decade ago so unless Zen was a dud of a design there's no reason it couldn't hit 4.5GHz with a stable overclock ...
 
Last edited:

SAAA

Senior member
May 14, 2014
484
83
91
Man I really hoped that AMD ported Piledriver to 22nm SOI at some point, just look at what frequencies can reach the ultra wide Power8 and imagine a speed focused arch on that node...
They could easily have 4.5GHz quad modules or 4GHz octa (!) modules Excavator by now, instead we are here waiting for Zen with no product in between.

And don't tell me about cost of implementation because they did 5 generations of APUs with both cores, GPU and now even southbridge integrated, sure they could have spared a bit of money to design a iGPU-less quad module design?
Heck even a tri module Excavator with L3 and half the current igp could hold up pretty well against FX CPUs, plus it would be smaller (~250mm^2 on 28nm, less? on 22nm SOI) and you won't need to buy a discrete GPU if you don't game.
 

ShintaiDK

Lifer
Apr 22, 2012
20,395
128
106
Man I really hoped that AMD ported Piledriver to 22nm SOI at some point, just look at what frequencies can reach the ultra wide Power8 and imagine a speed focused arch on that node...
They could easily have 4.5GHz quad modules or 4GHz octa (!) modules Excavator by now, instead we are here waiting for Zen with no product in between.
But that's a waste of money that will never come back. It wont make the chip sell in any meaningful volume. And it cost quite a bit.
 

SAAA

Senior member
May 14, 2014
484
83
91
But that's a waste of money that will never come back. It wont make the chip sell in any meaningful volume. And it cost quite a bit.
I'm not so sure ot this cost thing, especially if they really did release quad modules Steamroller and Excavator desings.
People who bought FXs till now would have bought them instead, the work itself was mostly done: the modules were already ported on 28nm, the most difficult part was to make twice that and interconnect them. L3 optional if it really helped.

I suspect the limitation either came from above "no more FX, we are focusing on something else!" or possibly AMD didn't have enough spare engineers/teams to work on another line, regardless of cost.

I vote for the last one myself, truth it could easily be a marketing thing...

Guess what in the end if Zen works and delivers twice Piledriver performance, hits IPC claims etc it would make far more better impressions than say against a possible quad/+ Excavator.
I mean everyone would be talking: hey AMD just did their Conroe thing and got a massive performance increase, far better power consumption etc etc.

Not that they will surpass Intel, yet if they really release just 6-8 cores they will also strike that psicological feeling of better (same way higher clocks sounds better on any CPUs, regardless of arch), the "moar" factor and probably push the competition to release mainstream six-cores in the end.
Because if the 8 core Zen goes against hex line well you have 6 cores at quad i7 or possibly less the price. Now add the "new games will support more cores" talk you hear left and right and many people might jump onto a cheap Zen, even if it has lower IPC and clocks than Kabylake.
 

AtenRa

Lifer
Feb 2, 2009
13,553
2,533
126
A SteamRoller at 22nm SOI back in 2014 was going to sell like hotcakes both in Desktop and especially in Server. The problem was GloFo was going for the 14nm FF with Samsung and stopped 22nm SOI development before 2013.

AMD lost more money and market share not having 22nm SOI Desktop/Server products from 2014 to 2016 than they managed to save from not developing server dies since Opteron 6300 series. Kaveri, Carrizo and new 6-8 Module SteamRoller Server SKUs on 22nm SOI would be much more competitive and attractive than current 32nm SOI and 28nm Bulk products.
 

AtenRa

Lifer
Feb 2, 2009
13,553
2,533
126
And for those that believe Server market cares that much about ST performance, take a look at Broadwell-E AT review.

http://www.anandtech.com/show/10337/the-intel-broadwell-e-review-core-i7-6950x-6900k-6850k-and-6800k-tested-up-to-10-cores/4

Core i7 6950X is made for MT workloads not ST, those that will need this 10Core 20 Threads CPUs dont care if ST is 10-15% faster than Haswell but how much faster Broadwell-E 10 Core is in MT.









As for AVX, simple a joke at this point. If you want high FP performance, use a dGPU.
 

Abwx

Diamond Member
Apr 2, 2011
9,133
918
126
14LPP probably isn't as conducive to ultra-high frequencies as the AMD custom-tailored 32nm SOI process was.
14nm LPP LVT transistors switch at 2.4GHz and 0.8V voltage in a short pipeline CPU, they will switch at 3.7GHz at 1V and 4.5GHz at 1.1V.
 

ShintaiDK

Lifer
Apr 22, 2012
20,395
128
106
14nm LPP LVT transistors switch at 2.4GHz and 0.8V voltage in a short pipeline CPU, they will switch at 3.7GHz at 1V and 4.5GHz at 1.1V.
With what, 10 transistors?

From the look at GCN, 14LPP looks to be an epic disaster against the competition.
 

monstercameron

Diamond Member
Feb 12, 2013
3,829
1
0
And for those that believe Server market cares that much about ST performance, take a look at Broadwell-E AT review.

http://www.anandtech.com/show/10337/the-intel-broadwell-e-review-core-i7-6950x-6900k-6850k-and-6800k-tested-up-to-10-cores/4

Core i7 6950X is made for MT workloads not ST, those that will need this 10Core 20 Threads CPUs dont care if ST is 10-15% faster than Haswell but how much faster Broadwell-E 10 Core is in MT.









As for AVX, simple a joke at this point. If you want high FP performance, use a dGPU.
Avx and fp arent synonymous.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
14nm LPP LVT transistors switch at 2.4GHz and 0.8V voltage in a short pipeline CPU, they will switch at 3.7GHz at 1V and 4.5GHz at 1.1V.
And 6.6GHz @ 1.4V, 6.95GHz @ 1.45V (based on your linear scaling). Amazing :awe:
 

Abwx

Diamond Member
Apr 2, 2011
9,133
918
126
And 6.6GHz @ 1.4V, 6.95GHz @ 1.45V (based on your linear scaling). Amazing :awe:

You had to create a monumental straw to try making a point, i stated up to 1.1V, the rest is invention from your part..

What is amazing is your bad faith, besides that s not a linear but a square law scale, but i guess that semiconductors are not exactly your specialty...



It's actually superlinear.
I see no reason why it shouldnt sustain 1.1V within the usual square law of fets.

Thing is that whoever pretend that Zen will clock at only 3GHz is also saying that 14nm LPP cant go past 0.893V, i guess that you realize at wich point speculation by here are often built on wishfull thoughts.
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,172
1,731
136
With what, 10 transistors?

From the look at GCN, 14LPP looks to be an epic disaster against the competition.
Care to give us you expert analysis on how you came to that conclusion? Again like always dont go soft on us, give us all the gory details like how you factored in pipeline length, register file and cache design. Basically all the steps you took to come to your final conclusion.

funny how mongoose, kyro and A9 are all manufactured on this abortion of a process........

i'll assume like always your lack of an answer as a retraction of your statement :D.

And for those that believe Server market cares that much about ST performance, take a look at Broadwell-E AT review.
ST performance is still very important to server workloads, ST performance can have a big impact on application latency (especially if using something like SAP HEC), The issue is getting the balance between cores and per core performance right, CON cores didn't do this. I still saw heaps of magny cours in DC's back in the day, never seen bulldozer in one.
 
Last edited:

ASK THE COMMUNITY