New Zen microarchitecture details

Page 31 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Also, do I need to remind people in this thread that we will have i7 6800K-6950X and i7-7700K before Zen drops? It's funny how Zen launches in late 2016 so we should compare it to 5820K/6700K? I might as well make the argument straight up that if Zen volume production --> retail is only Q1 2017, we actually should be comparing it to the rumored Q2 SKL-E and Q3 2017 Cannonlake. What kind of a double standard is it that we use 'old' Intel CPUs/architectures but Zen launches way later?

ZEN = Q4 2016

Broadwell-E = Q2 2016
Kabylake = Q4 2016
SkyLake-E = Q3 2017 ??? (perhaps later)
Cannolake Desktop = Q1-Q2 2018 ???

At launch time ZEN will compete against Kabylake in mainstream and Broadwell-E in HEDT.


Right now I can go out and buy an i7 6700K and enjoy it until December 1st. I bet Zen won't even be out in large volumes by then. It's like this forum assigns 0 value for opportunity cost of waiting but at the same time the minute Zen launches (and we were supposed to wait for it for 15+ months since August 2015 I7 6700K launched), we are supposed to ignore that Cannonlake and SKL-E won't be far away either?!

By that logic we can wait another 6-8 months after that for ZEN+ and wait more fot CannonLake and wait indefinitely.

As I said, even with a ~ 10% IPC differential between 5820K and 6700K, 6700K still sells like hot cakes. I think too many people here underestimate that many consumers will still choose 4 fastest cores over 6-8 slower ones.

If the platform and CPU cost were the same, I bet all my money the majority of people here would choose the 6-8 core Broadwell-E over Kabylake 4 core + iGPU.

I would likely take i7-6800K BW-E over i7 6700K but I would take i7-6700K over i7 4930K. If Zen brings 6-8 cores with IVB or even Haswell IPC, even that isn't a slam dunk. Even ignoring i7-7700K (Kaby Lake), we will have $389 i7-6800K BW-E probably in June/July of this year.

That is why some of us expecting ZEN to bring more cores at lower price than Intel. Example 6 Core 12T ZEN could have an MSRP of $250, it could be slower in Single Thread than Kabylake Core i5 but way faster in MT and extremely close in Gaming performance (mostly due to DX-12).

NehalemIt's also interesting how people believe that AMD will by pure magic and a fraction of R&D make up 4 (!) major Intel architectures:


Sandy
Haswell
Skylake

There is no way that AMD can come up with an architecture that incorporates a boost in single threaded performance = to 4 of Intel's major architectures since 2008. It took Intel 7 years but AMD can short-cut it? Ya right...

IPC is not growing by the year like trees, AMD took a different road than Intel with Bulldozer. BD core within the module only has 2x Integer ALUs when Intel Haswell has 4x. AMD for example could make a higher IPC CPU than Skylake yesterday but it would only clock to 2GHz.
What we care is performance, and in early 2017 with a lot of DX-12 games on the market single core performance will not play that big of a role in Games. Early 2017 will not be like Bulldozer release vs SandyBridge at 2011.

You guys are literally setting up Zen to fail by having insanely unrealistic expectations. Even Intel's 6-core 3.6Ghz BW-E will have a 140W TDP and Intel's 14nm tech guaranteed wipes the floor with GloFo's FinFET. Yet, people only look at the architectural aspect and ignore that Intel's 14nm node is no way comparable to their competitors -- it's far superior.

The difference of 14nm LPP and Intel 14nm is not that much as you may think.

And here is the kicker -- even if by pure magic, AMD can pull out an 8-core Zen that matches i7-5960X, Intel will just add more cores and drop prices on SKL-E to make sure AMD is not relevant. Why? Because Intel has 60%+ gross margins and their BW-E workstation platform already has a 22/44HT CPU. Let's not kid ourselves -- Intel is the one who decides just how high AMD can price Zen. The minute Intel feels threatened, they'll move 6-core SKL-E go $299.

The rumor is that Intel already have lowered 6-core Broadwell-E price to sub $400. So if 8-Core ZEN is competitive against a 6-Core Broadwell-E CPU then we are not mad or we are not hyping when to expect AMD to launch 8-Core ZEN at lower price than that. Thus why we are talking of 6-8 Core ZEN CPUs at $300-400 price range.

AMD's best bet is to win OEMs/mobile designs. As far as DIY market goes, I don't think it has a chance because Intel can drop prices/raise the number of cores at will.

I would gladly replace my Core i7 3770K with an 6-8 Core ZEN if the performance is what im expecting (MT higher than KabyLake).
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
Yes, as long as the 8 core/16T Zen runs @ 4.0 Ghz base, and 4.2 Ghz turbo frequencies, which I can assure you will not be happening. I'd be surprised if the 16 thread Zen had a base frequency as high as 3.0 Ghz
I don't know what Erenhardt was estimating earlier, but for the purpose of this dirty guesstimate, please keep in mind Zen 8c/16t @ 3Ghz will likely have double the throughput of 4 construction modules (8t) at same frequency.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
IPC is measured relative to a single clock speed... Comparing IPC Piledriver clocked higher to lower clocked IPC of Excavator is meaningless because the IPC is compared with the same clock.

1) Let's start off with CPU 0 (FX9590). It has a performance of 100% @ 4.7-5Ghz.

2) Here comes in CPU 1 (Excavator), with 15% higher instructions per clock than CPU 0. That means CPU 1 needs to have clock speed of (4.7-5Ghz)/1.15 = 4.09-4.35Ghz to match the performance of CPU 0's 4.7-5Ghz clocks.

3) We then find out that CPU 1 has another 40% higher instructions per clock on top of CPU 0.

4) CPU 2 (Skylake), when compared with a CPU 0, clocked at 4-4.2Ghz, wins by 82%.

Now, we have these parameters:

CPU 1 @ 4.09-4.35Ghz = 100% base x 40% (1.4X) increased throughput = 140%.
CPU 2 @ 4-4.2Ghz = 182%

CPU 2 is faster by 182%/140% = 30%. That means CPU 1 even when clocked to 4.09-4.35Ghz will have 30% slower single threaded performance.

So clearly, Zen has nearly 0% chance of beating Skylake per clock (i.e., we just proved that Zen will definitively have lower IPC than Skylake). And in my analysis, even with 4.09-4.35Ghz clocks, Zen would still not be that impressive against a 4-4.2Ghz Skylake. However, by the time Zen launches, Intel should add another 200+mhz to Kaby Lake. Oops.

I'm sorry but you seem to not grasp the situation properly if you think this is how it works. They are not simply trying 'catch up' , they've moved back to a wider, high IPC design. There will of course be some elements of evolutionary architectural improvements - deeper buffers, Ld/store improvements , brance prediction improvements, etc, etc, and given the big jump down to 14nm, this will no doubt be significant part of the 40% uplift over Excavator. but the bulk will be from the physically wider execution (integer in particular), lower latency cache system, and (unconfirmed I know) shorter pipeline.

What are you playing with words? Catching up => when you are behind. The strategy of catching up is moving to a higher IPC architecture. So yes, they are in catch up mode. You just busted out a reply without addressing any of my key points. Intel has taken 4 distinct and major architecture over the span of 7 years. You are saying AMD will just come out with a single new architecture that "catches up" all of that IPC increase from Nehalem to Skylake, just like that in 1 shot? Don't you see it that 40% above Excavator is not enough to catch up to Skylake?

RS, gross profit margin is a function of product competitiveness. If AMD has an 8-core Zen that's competitive with an 8 core 6900K, then Intel would be downright stupid to price it for $999 if AMD is pricing roughly equivalent product for $499-$699.

Well Intel is launching 6900K first. That's my point entirely. It makes perfect sense for Intel to raise prices now and introduce a new flagship at $1500, while keeping 6900K at $999. This way if Zen is a threat, they can always lower the price OR choose to keep prices high because chances are 8-core Zen will not come close to a 6900K. Don't forget that for many of us OC vs. OC performance matters too. That means if 8-core Zen matches 6900K at stock but 6900K OC from 3.6->4.6Ghz and a stock Zen is a 4.0-4.2Ghz with minimal overclocking headroom, well that's a loss for Zen.

The real problem for AMD is that Intel has a lot of knobs it can dial in response to AMD product. Intel has 10 core, 15 core, and 24 core Broadwell-EP chips that it can configure any which way it likes & unlock the multiplier in response to what AMD puts out. By the time Zen arrives, we are probably talking more about SKL-EP rather than BDW-EP, anyway.

That's my point too. Before Bulldozer launched, we also started hearing the hype how AMD's $300-350 chip will smash Intel's flagship. I think many in this thread are in denial about how far behind Excavator is wrt to Skylake. Has AMD even caught up to Nehalem in IPC? The gap between Nehalem/Lynnfield and Skylake is massive.

76274.png


Here is the epitome of single threaded prowess.

76281.png


By the time Zen is out in volumes, BW-E will be roughly 1/2 way through its life-cycle based on the rumors that SLK-E is launching in the 1H of 2017.

How slow zen will be:
We start with fx8350 base of 6.85 CB11.5 MT multiply it by XV IPC increase 1.1 and announced Zen improvements of 1.4 and we have 10.5 CB11.5 MT score which is more than i7 6700k.

And there lies the problem. You just applied 1.1x & 1.4x IPC increase to an 8-core processor. 6700K is a 4-core processor. That means the only way it would be impressive if it was a 4-core SMT Zen with a 10.5 score in Cinebench 11.5.

Here is the scary part: i7 6700K OC is almost as fast as a 6-core IVB with 3.6Ghz base and 4Ghz boost.
cinebench-6700k.jpg


i7 6700K 4.6Ghz with DDR3 1600 still smacks the i7 4790K @ 4.9Ghz around:
https://www.youtube.com/watch?v=f5lfMogcrPU
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
If intel fans will have to resort to overclocking performance to claim victory against Zen, amd will be more than happy.

Maybe amd will be not a go-to CPU producer that you blindly pick knowing you get the best there is, but their competitiveness will greatly increase compared to anything we saw in the last few years.

And that fear shows in the forums.
 

majord

Senior member
Jul 26, 2015
433
523
136
How slow zen will be:
We start with fx8350 base of 6.85 CB11.5 MT multiply it by XV IPC increase 1.1 and announced Zen improvements of 1.4 and we have 10.5 CB11.5 MT score which is more than i7 6700k.
?

No this is completely wrong way to look at it sorry.


So firstly, it's (logically) assumed the 40% IPC uplift over Excavator is for a single thread per core or module. So when you're comparing the two under Multithreaded workloads, that goes out the window..

As per my last post - when you're talking about total MT throughput, you instead just need to think of a single Bulldozer module, as more like a single Zen CORE. Or even An SMT intel core, as all have similar total throughput capabilty - give or take 10, 20% - whatever, it's in the ballpark.

In your example you're comparing 4 BD modules with 8 threads, to 8 Zen cores with 16 threads. So clock/clock, an 8 core zen would have in the vicinity of double the performance of an FX 8350, not 40+10%

Key is though, clockspeeds - You aint going to see a 4Ghz base clock.. So it ain't going to be double the performance!
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
The difference of 14nm LPP and Intel 14nm is not that much as you may think.

There is no way to know this until there are physical products available, manufactured using this process.

If a purely low power, mobile, network & ASIC targeted (and often compared to other low power planar processes by it's developer) process can match the high performance 14nm variant from Intel (used on Skylake), then Intel has *ucked up extremely badly. The difference between the two Intel 14nm nodes (P1272 & P1273) is what, 300-400MHz? IMO if the 14nm LPP from Samsung can match even the worse scaling (LP & SoC targeted) variant used on Broadwell, then Samsung has done extremely good job.

Sure, Fmax is not defined by the process alone but it has a major impact on it. As seen on Richland vs. Kaveri, Kaveri vs. Carrizo and Skylake vs. Broadwell.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
No this is completely wrong way to look at it sorry.


So firstly, it's (logically) assumed the 40% IPC uplift over Excavator is for a single thread per core or module. So when you're comparing the two under Multithreaded workloads, that goes out the window..

As per my last post - when you're talking about total MT throughput, you instead just need to think of a single Bulldozer module, as more like a single Zen CORE. Or even An SMT intel core, as all have similar total throughput capabilty - give or take 10, 20% - whatever, it's in the ballpark.

In your example you're comparing 4 BD modules with 8 threads, to 8 Zen cores with 16 threads. So clock/clock, an 8 core zen would have in the vicinity of double the performance of an FX 8350, not 40+10%

Key is though, clockspeeds - You aint going to see a 4Ghz base clock.. So it ain't going to be double the performance!

So 8 thread Zen will be about the same as 8 thread Vishera? They can pack their stuff already.
 

majord

Senior member
Jul 26, 2015
433
523
136
1)



What are you playing with words? Catching up => when you are behind. The strategy of catching up is moving to a higher IPC architecture. So yes, they are in catch up mode. You just busted out a reply without addressing any of my key points. Intel has taken 4 distinct and major architecture over the span of 7 years.

Absolutely not playing with words, I assure you. I am pointing out that IPC comparisons between completely different approaches to an architecture are not a measure of success. If performance/core wasn't still considered important (or should I say, AMD didn't wake up to the fact it is) , they probably would have successfully persisted with derivative of CON architecture.

When AMD released Bulldozer, with K8 levels of IPC (single threaded of course) , would you say they were just "catching up" with.. themselves from 2003? Like some sort of crazy time lords? Of course not

Did intel magically come with a 80% IPC increase going from P4 to Conroe? Of course not.

Zen is much more along the lines of the later example - just not so extreme.. A complete shift to a higher IPC, lower clocked (at least on the sam process - but as i've said am sure it will be lower on 14nm too) design.

You are saying AMD will just come out with a single new architecture that "catches up" all of that IPC increase from Nehalem to Skylake, just like that in 1 shot? Don't you see it that 40% above Excavator is not enough to catch up to Skylake?

Actually if you read my post properly you'd see I clearly said based on the claims, it would NOT be at skylake levels. Or Haswell for that matter.
 
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
So 8 thread Zen will be about the same as 8 thread Vishera? They can pack their stuff already.
Wait, how exactly did you expect a Zen core to completely outperform a construction module throughput wise, when most of the ST IPC uplift is offset by SMT having significantly lower scaling than CMT?

Did you really think the +40% increase was applied to throughput?
 

deasd

Senior member
Dec 31, 2013
516
746
136
Wait, how exactly did you expect a Zen core to completely outperform a construction module throughput wise, when most of the ST IPC uplift is offset by SMT having significantly lower scaling than CMT?

Did you really think the +40% increase was applied to throughput?

You should take Excavator vs Zen for reference but not Piledriver vs Zen......
I think 8 thread Zen might only slightly above 8 thread Excavator.
 

Adored

Senior member
Mar 24, 2016
256
1
16
So much missing the point here. Look at DX12 CPU benchmarks, that's where gaming is going. Even faildozer is getting close to Skylake.

Hitman-PC-DirectX-12-CPU-Scaling.jpg
WHnwU.jpg


oJngh.jpg


That DX11 result right there is something like what Starcraft II has been like over the past few years. The single thread is on the way out guys, rejoice!
 
Last edited:

Abwx

Lifer
Apr 2, 2011
10,940
3,441
136
How slow zen will be:
We start with fx8350 base of 6.85 CB11.5 MT multiply it by XV IPC increase 1.1 and announced Zen improvements of 1.4 and we have 10.5 CB11.5 MT score which is more than i7 6700k.

Will Zen have more 'module penalty' than Vishera?

At 4GHz the FX score 6.92, that s 0.4325/GHz/module whenever EXV score 0.525/GHz/module, that s 21% better throughput/Hz per module.

Now Zen use one such FPU/core, so it should score like a whole EXV module, that is 4.2/GHz for a 8C, and this of course include SMT since that s the total throughput of the FPU.
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
Did you really think the +40% increase was applied to throughput?

Why not? The "ST IPC" we keep discussing around here is basically a meaningless metric of nothing except for crazy people that deliberately underwork their hardware.

Heyyy let's go out and buy an 8c/16t CPU and run software that spawns maybe 4 work-intensive threads and call it great! Yeah. Um okay.

Zen is primarily meant for the server market, where every core and every thread will be put to use. If AMD comes out and says "40% more IPC than XV" then how can they possibly NOT mean throughput? What other value would there be in such a comment?

"ST IPC" has had no real meaning since AMD went CMT and Intel went (back to) SMT. Take the "ST IPC" of a 4c/8t Skylake chip and try to generate the MT performance with it. I dare you. You can't do it, can you? Because the ST performance x8 does not equal the MT performance when the chip is fully-loaded. Same for any of AMD's CMT designs. It'll be the same for Zen, too. You can maybe guess at the MT performance based on SMT/CMT scaling but until you've run the software and done the benchmark, nothing is certain.

Comparing a SMT and CMT design is even worse, due to the difference in the way they scale with thread count.

The only rational way to compare the two designs is to look at throughput, ESPECIALLY for CPUs that will see heavy use in server/datacenter roles.

I've said it before, and I'll say it again: if AMD is NOT talking about throughput in their comparison of Zen to XV, then they are talking out of their collective posterior. Such a statement would be effectively meaningless as it would tell us nothing useful about Zen's actual performance, at least not without some additional information about its SMT scaling.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
I've said it before, and I'll say it again: if AMD is NOT talking about throughput in their comparison of Zen to XV, then they are talking out of their collective posterior.
40% average throughput increase would put Zen in Skylake territory. Don't go there, it's a house made of treacherous candy.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Guys, AMD talked about IPC not Throughput. Dont confuse one with the other.

As for Throughput, im still trying to understand where each ZEN Core will be vs Excavator (CMT).
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,764
3,131
136
Guys, AMD talked about IPC not Throughput. Dont confuse one with the other.

As for Throughput, im still trying to understand where each ZEN Core will be vs Excavator (CMT).

Assuming 2 threads it all comes down to workload,

per zen core per clock.
2 128bit add/iadd
2 128bit mul/imul
or
2 128bit FMA
256/128 a cycle load store.

per BD/PD module
2 128bit mul/add/fma
2 128bit iadd/imul(mmx/XOP?)
512/256bit a cycle load store.

So really it all comes down to reads and writes to the L1D, if your workload gets good register usage or you get "lucky" and get a good amount of store to load forwarding( your store doesn't touch the L1D then) then your likely to actually get better per clock performance from a Zen core vs module as the add and mul latencies are lower. if your workload has little FMA and good register usage then it could be way higher per clock on Zen as you have twice the FP resources.

But if your workload has poor register usage and your constantly reading and writing to L1D then a CON core module will have more throughput (chances are your workload runs like crap regardless).

Reduce this to single thread and Zen just wins allround.

Against haswell and onwards at 128bits (and below) Zen could very well be ahead as it has more execution resources ( things like decode and retirement should be pretty similar) , but when it comes to 256bit ops a haswell core will have near twice the throughput per clock.

One thing i dont know is do modern X86 processors speculatively execute SSE and AVX instructions? its a pretty big chunk of power if you get it wrong.
 

majord

Senior member
Jul 26, 2015
433
523
136
40% average throughput increase would put Zen in Skylake territory. Don't go there, it's a house made of treacherous candy.

Excavator is already close to skylake in throughput, 40% would put it well above.

I wouldn't be surprised to see a small regression from Excavator in some cases.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Excavator is already close to skylake in throughput, 40% would put it well above.

I wouldn't be surprised to see a small regression from Excavator in some cases.

I guess "close" is quite relative then :eek:
Skylake is 50-90% faster IPC wise, depending on workload.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
That's great news! Can you show me some benchmarks?

I can show you Kaveri vs Haswell


Kaveri has 2x Modules (CMT) 4x Threads vs Haswell dual Core (SMT) 4x Threads.

A8-7600 = 3100MHz base , 3800MHz turbo
Core i3 4330 = 3500MHz base (no turbo)

x264 (integer)

A8-7600 at 65W TDP has higher throughput than Core i3 4330 Haswell.

mmv1n4.jpg


FP

65W TDP A8-7600 is equal to Haswell Core i3 4330.

xx0f5.jpg
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
You feel that's an realistic and unbiased comparison? AMD CMT provides >85% yield while Intel SMT < 26.5%, without even mentioning the difference in power consumption or the required die area of these two technologies.

How about single threaded, floating point workloads instead of the cherry picked integer :sneaky:
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Assuming 2 threads it all comes down to workload,

per zen core per clock.
2 128bit add/iadd
2 128bit mul/imul
or
2 128bit FMA
256/128 a cycle load store.

per BD/PD module
2 128bit mul/add/fma
2 128bit iadd/imul(mmx/XOP?)
512/256bit a cycle load store.

So really it all comes down to reads and writes to the L1D, if your workload gets good register usage or you get "lucky" and get a good amount of store to load forwarding( your store doesn't touch the L1D then) then your likely to actually get better per clock performance from a Zen core vs module as the add and mul latencies are lower. if your workload has little FMA and good register usage then it could be way higher per clock on Zen as you have twice the FP resources.

But if your workload has poor register usage and your constantly reading and writing to L1D then a CON core module will have more throughput (chances are your workload runs like crap regardless).

Reduce this to single thread and Zen just wins allround.

Against haswell and onwards at 128bits (and below) Zen could very well be ahead as it has more execution resources ( things like decode and retirement should be pretty similar) , but when it comes to 256bit ops a haswell core will have near twice the throughput per clock.

One thing i dont know is do modern X86 processors speculatively execute SSE and AVX instructions? its a pretty big chunk of power if you get it wrong.

Well throughput is very tricky because one design is CMT (high throughput) and the other one is SMT (high IPC).

If we take things 1 on 1, in order to get the same Throughput of one Excavator Module (CMT), ZEN with 40% higher IPC would need 60% of SMT scaling (keeping clocks the same at 3.5GHz for example).

So either each ZEN Core will have lower Throughput than Excavator Module or if they want to keep the same throughput AMD will have to raise clocks higher than Excavator.

Edit: Or AMD SMT implementation has higher scaling than Intel's HyperThreading.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
You feel that's an realistic and unbiased comparison? AMD CMT provides >85% yield while Intel SMT < 26.5%, without even mentioning the difference in power consumption or the required die area of these two technologies.

How about single threaded, floating point workloads instead of the cherry picked integer :sneaky:

When we talk about Throughput is the MAXIMUM performance we get from each core (in SMT) or each Module (in CMT).

Single Thread is IPC performance.

So, yes this is a very valid and unbiased comparison.

ps, i gave both Integer and FP.
 

deasd

Senior member
Dec 31, 2013
516
746
136
You feel that's an realistic and unbiased comparison? AMD CMT provides >85% yield while Intel SMT < 26.5%, without even mentioning the difference in power consumption or the required die area of these two technologies.

I'm afraid your data is incorrect.

http://www.pcper.com/reviews/Proces...-Rounds-Out-Kaveri-Line/Results-Cinebench-R15

in CBR15, A6-7400K which is 1M2C, has only 70% additional performance(87 vs 149) when multithreaded. Not to mentioned 7400k has higher turbo of up to 3.9Ghz in single thread when there's almost none in MT. I think it just has roughly 60-65% yield which means almost 35% penalty.