New Zen microarchitecture details

Page 32 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
As I said, even with a ~ 10% IPC differential between 5820K and 6700K, 6700K still sells like hot cakes. I think too many people here underestimate that many consumers will still choose 4 fastest cores over 6-8 slower ones.

Sure, the i7-5820K is only about $25 more expensive than the i7-6700K, but don't forget the total platform cost. You'll be spending about $100 more for a good X99 motherboard than for a Z170 motherboard of equal quality. Plus you've got to buy a quad-channel RAM kit instead of dual-channel. So you're actually paying about $150 more out-of-pocket for worse single-thread performance. Yes, that can be a tough sell, especially for gamers.

But if it's 95W TDP with dual-channel RAM, Zen will probably have platform costs closer to Intel's mainstream chipset. And AMD can set the price level at a manner which will be competitive with Intel's offerings. The problem with current AM3+ CPUs is that they are uncompetitive in every way: single-thread performance is way down in the dumps, multi-thread performance on 4M/8T still can't match Intel's modern mainstream i5 quad-cores, power consumption is way too high, and the platform is grossly outdated. Even if Zen's IPC is only on par with Sandy Bridge, it will fix these issues. Empirically, Sandy Bridge single-thread performance is good enough for quite a few users, including gamers; there are still a bunch of people with legacy i5-2500Ks, and even some who are using old surplus hex-core Nehalem Xeons with overclocks. Zen should also have competitive performance-per-watt due to the new architecture and 14nm FinFET process, and it will come on a modern platform with the bells and whistles people expect these days. That means that AMD will have a salable product; it's just a matter of finding the right price point. Offering quad-cores at the i3 price point, and octo-cores at the (mainstream) i7 price point, could do the trick.
 

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
Anandtech has a Skylake i3 @ 3.65Ghz to compare. Why are they not equal in most benchmarks?

We are talking about Throughput, the vast majority of the benchmarks in the AT review are either single thread or doesnt scale higher than 2-3 threads.

And we also have Cinebench witch is HEAVILY Intel optimized.

One more thing, AT used an ASROCK motherboard on the A8-7600 that had serious performance issues at the time of the review.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Anandtech has a Skylake i3 @ 3.65Ghz to compare. Why are they not equal in most benchmarks?
Now I'm lost.
So if 40% increase in Zen IPC is core (2 threads) in relation to a 1 thread of XV?
Based on this 2 core/4 threadZen CPU will have CB11.5 MT perf around 2.8 compared to 3.39 of a8-7600. So, 2C/4T Zen will be 15% slower than 2 module XV in MT tasks? And they want take take server marketshare?

Looks like DX12 could make ZEN dead even before arrival.
hitman_furyx_all.png
 
Last edited:

majord

Senior member
Jul 26, 2015
433
523
136
You feel that's an realistic and unbiased comparison? AMD CMT provides >85% yield while Intel SMT < 26.5%, without even mentioning the difference in power consumption or the required die area of these two technologies.

How about single threaded, floating point workloads instead of the cherry picked integer :sneaky:

OK, look this has gone off course, and perhaps I wasn't making myself clear.

This side discussion is about throughput of an entire CMT Module,( Specifically Piledriver since it's the last uArch that shares decoder) Vs Zen and/or Skylake. I'm quite well aware, first hand, that Skylake is some 60~% faster clock/clock, and was not suggesting otherwise :)

only when a module or SMT core is fully utilized are you able to make such comparisons.

That's great news! Can you show me some benchmarks?

Funny you say that, I've spent quite some time benchmarking piledriver, steamroller, excavator and now 4Mb skylake @ 3Ghz. ST, MT, 2M/2C,) but have about another week to go :\
 

majord

Senior member
Jul 26, 2015
433
523
136
It's not.
7400K has cut down L2 cache, 1MB per module.

Indeed, and so does Excavator.. From testing, it does take the same sort of hit with CMT scaling.

half the advantage steamroller gained from the dedicated (duplication) decoding stage is "lost" due to it, it seems.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
So, 2C/4T Zen will be 15% slower than 2 module XV in MT tasks? And they want take take server marketshare?

Looks like DX12 could make ZEN dead even before arrival.
It doesn't matter if new design throughput is roughly equal (±10%) as long as it is more compact and allows for more cores in the same die area. (process node aside)
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
It doesn't matter if new design throughput is roughly equal (±10%) as long as it is more compact and allows for more cores in the same die area. (process node aside)

So what core configs are we expecting? So Zen is MOAR COREZ approach?
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,744
3,078
136
Well throughput is very tricky because one design is CMT (high throughput) and the other one is SMT (high IPC).

If we take things 1 on 1, in order to get the same Throughput of one Excavator Module (CMT), ZEN with 40% higher IPC would need 60% of SMT scaling (keeping clocks the same at 3.5GHz for example).

So either each ZEN Core will have lower Throughput than Excavator Module or if they want to keep the same throughput AMD will have to raise clocks higher than Excavator.

Edit: Or AMD SMT implementation has higher scaling than Intel's HyperThreading.
Dont try and use "increase in IPC" to predict situations where you have high ILP ( throughput workload). IPC in terms of performance is talking about serial code with dependencies etc.

bulldozer module only had 1 L1i and for BD and PD "only" 4 decoders per module. Even with SR and the split of the decoders L1i is still shared which limits instruction throughput.

Zen has 4 decoders as well as possibly having a better L1i ( bulldozer L1i has aliasing issues) and uop cache, so the front end is looking better as well.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,686
136
So what core configs are we expecting? So Zen is MOAR COREZ approach?
No, Zen is more ST IPC approach, but as long as they accomplish that using less resources for a SMT core than they would for a CMT module, they get a slimmer design which can translate into more cores per die (coupled with better process node). [EDIT] This bit about slimmer design is not taken from any source, just derived from previous comparison on this forum between HT and CMT implementation.

Personally I liked the idea behind CMT more, but either the implementation was flawed or the concept is simply not efficient enough to begin with.
 
Last edited:

Haserath

Senior member
Sep 12, 2010
793
1
81
8 core Zen should end up being a fairly small chip in the mid 1XXmm^2. If it doesn't have 256-bit datapaths for AVX, I wouldn't be surprised if it clocked at mid to high 3's.

HW-E has a wide AVX implementation and is made with more connectivity, and it still gets above 3Ghz at 140W.

I expect 8 core Zen@95W to be slightly more than double the performance of an 8350. Right around the Haswell E 8 core, minus AVX.

I tend to be optimistic.:D

If it does end up that way they'd still need to price the SKUs around $150-500 (4-8 core) since Intel has already been offering that level of performance(or more from Skylake).

I don't expect them to have trouble reaching clocks up to 4Ghz; I don't know why anyone expects them to. An Apple A9X reaches 2.25Ghz in a tablet! A TABLET!
 

MrTeal

Diamond Member
Dec 7, 2003
3,554
1,658
136
Sure, the i7-5820K is only about $25 more expensive than the i7-6700K, but don't forget the total platform cost. You'll be spending about $100 more for a good X99 motherboard than for a Z170 motherboard of equal quality. Plus you've got to buy a quad-channel RAM kit instead of dual-channel. So you're actually paying about $150 more out-of-pocket for worse single-thread performance. Yes, that can be a tough sell, especially for gamers.

But if it's 95W TDP with dual-channel RAM, Zen will probably have platform costs closer to Intel's mainstream chipset. And AMD can set the price level at a manner which will be competitive with Intel's offerings. The problem with current AM3+ CPUs is that they are uncompetitive in every way: single-thread performance is way down in the dumps, multi-thread performance on 4M/8T still can't match Intel's modern mainstream i5 quad-cores, power consumption is way too high, and the platform is grossly outdated. Even if Zen's IPC is only on par with Sandy Bridge, it will fix these issues. Empirically, Sandy Bridge single-thread performance is good enough for quite a few users, including gamers; there are still a bunch of people with legacy i5-2500Ks, and even some who are using old surplus hex-core Nehalem Xeons with overclocks. Zen should also have competitive performance-per-watt due to the new architecture and 14nm FinFET process, and it will come on a modern platform with the bells and whistles people expect these days. That means that AMD will have a salable product; it's just a matter of finding the right price point. Offering quad-cores at the i3 price point, and octo-cores at the (mainstream) i7 price point, could do the trick.

That's a bit of an exaggeration, don't you think? The Gigabyte GA-X99-SLI is available for $134AR, while you'll be paying over $100 for a similar quality SLI capable Z170 MB. The more comparable (at least in Gigabyte's lineup) Z170X-UD3 is the same price as the X99 board. Even the cheapest Z170 MB is $88, so that's less than a $50 difference. 16GB of DDR4-3000 is available at the same $70 price point in both 2x8GB and 4x4GB varieties. Even if you said the gamer might settle with a 2x4GB kit, you're looking at a $35 price difference on the memory.

Looking at equal lower levels systems, the delta including the processor would be closer to $50 than $150+, and still less than $100 even if you outfit the Z170 system with half the RAM of the X99 just because it's possible.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
How slow zen will be:
We start with fx8350 base of 6.85 CB11.5 MT multiply it by XV IPC increase 1.1 and announced Zen improvements of 1.4 and we have 10.5 CB11.5 MT score which is more than i7 6700k.

Will Zen have more 'module penalty' than Vishera?
As said, there are no modules. Also IPC usually is given per core with one thread. Giving IPC for SMT cores is unusual, otherwise SMT-heavy archs like the Niagara family would be IPC monsters. That's analyzed as throughput in such cases.

You need to calculate fx8350 base * XV_IPC_INCREASE * ZEN_IPC_INCREASE / CMT_EFFICIENCY * SMT_EFFICIENCY for MT scores.

CMT_EFFICIENCY is sth. like 0.7 to 0.9.

Now I'm lost.
So if 40% increase in Zen IPC is core (2 threads) in relation to a 1 thread of XV?
Based on this 2 core/4 threadZen CPU will have CB11.5 MT perf around 2.8 compared to 3.39 of a8-7600. So, 2C/4T Zen will be 15% slower than 2 module XV in MT tasks? And they want take take server marketshare?

Looks like DX12 could make ZEN dead even before arrival.
A Zen core vs a module? So a 32C Zen would compare to a 32 module chip?
 
Last edited:

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Here is the epitome of single threaded prowess.

76281.png

Uh. Dolphin is multithreaded. Even the ancient build (some 3.0 revision) I use for playing Dragon Ball will happily eat three cores.

Then factor in Vulkan for the newer Dolphin builds...

And didn't we put Cinebench to rest, after Agner demonstrated that it's little more than an intel marketing benchmark?
 

Doom2pro

Senior member
Apr 2, 2016
587
619
106
Uh. Dolphin is multithreaded. Even the ancient build (some 3.0 revision) I use for playing Dragon Ball will happily eat three cores.

Then factor in Vulkan for the newer Dolphin builds...

And didn't we put Cinebench to rest, after Agner demonstrated that it's little more than an intel marketing benchmark?

I'm detecting some confirmation bias in this guy...
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
I'm detecting some confirmation bias in this guy...

I think Dolphin even uses four cores now, and that's without Vulkan; Dolphin 4.5 (or was it earlier?) added a dedicated DSP thread.

A bit of googling also shows that Dolphin has a few additional worker threads, which means even more core usage.

If you want power-virus esque single-threaded performance benchmarking, you're better off disabling all but one core (disable SMT as well), setting the CPU, RAM & motherboard (e.g, HyperTransport, Northbridge) to the same speeds for each platform & CPU.

Once that's done, do some super intensive task.

Maybe rendering some hyper complex, un-instanced scene with loads of objects and particles in 3DS Max will do the trick? Or doing an audio-mixdown in Presonus Studio One, with loads of tracks with hyper-complex midi compositions & effect chains?


But picking multithreaded software that relies upon vectorization and the latest instruction sets, to demonstrate single-threaded performance, is rather...Bad.

Stick to SSE2 instructions (practically every CPU supports that, post 2001), uniform clocks and intensive tasks. Besides, pretty much every game available uses SSE2, except for the really early games (Morrowind era) and badly programmed ones (e.g, Fallout New Vegas has lots of x87 code AND SSE2 code).



Edit: Also, didn't Haswell have some bizarre +30% performance improvement, experienced only with emulators? This was a fairly big thing over on the PCS2 forums. Talk about a fringe case.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Did a bit of reading due to AtenRa clarifying Nehalem's performance. Sandybridge is 10-15% faster than Nehalem at equal clocks, so I'll say it's 20% for a worst case Zen scenario.

A few things that need clarifying; Ivybridge is only 5% faster than Sandybridge @ same clocks, right? And Haswell is only 5% faster than ivybridge?

Quick google-fu states as such, but eh. I'll roll with it.



Piledriver @ 3.4Ghz == Phenom II @ 3.4Ghz
Phenom II @ 3.4Ghz == Nehalem @ 2.7Ghz (700mhz deficit)

Sandybridge = Nehalem + 20% performance
Sandybridge @ 3.4Ghz == Nehalem @ 4.08Ghz (3.4Ghz * 1.2)
Sandybridge == Nehalem + 600Mhz

Ivybridge @ 3.4Ghz = Sandybridge @ 3.4Ghz * 1.05
Ivybridge @ 3.4Ghz == Sandybridge @ 3.57Ghz
Ivybridge @ 3.4Ghz == Nehalem @ 4.28Ghz (SB @ 3.57Ghz * 1.2)
Ivybridge == Nehalem + 880mhz

Haswell @ 3.4Ghz = Ivybridge @ 3.4Ghz * 1.05
Haswell @ 3.4Ghz == Ivybridge @ 3.57Ghz
Haswell @ 3.4Ghz == Sandybridge @ 3.74Ghz
Haswell @ 3.4Ghz == Nehalem @ 4.48Ghz
Haswell == Nehalem + 1008 Mhz

Steamroller @ 3.4Ghz = Piledriver * 1.05 performance (3.57Ghz)
Excavator @ 3.4Ghz = Steamroller * 1.05 performance (3.74Ghz)
Excavator @ 3.4Ghz == Piledriver @ 3.7Ghz
Excavator @ 3.4Ghz == Nehalem @ 3.0Ghz (400Mhz deficit)

Zen @ 3.4Ghz = Excavator @ 3.4ghz * 1.4 (4.76Ghz)
Zen @ 3.4Ghz == Nehalem @ 4.36Ghz
Zen == Nehalem + 900mhz


So a worst case scenario for Zen, is that it's between Ivybridge and Haswell.

Considering how AMD's performance claims have held true (Steamroller 5% faster than Piledriver, Excavator 5% faster than Steamroller), it's not looking anywhere near as bad as folks have been saying.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Nop, Phenom II has higher IPC than PileDriver.

I believe Excavator almost reached Phenom II IPC (with less resources though)

Really? I know that Bulldozer is slower than Phenom II, but Piledriver achieved performance parity, whilst also using less power.

A Phenom II x4 965 BE performs pretty much the same as an 8320 in lesser threaded games (Skyrim, Fallout New Vegas) last I checked; sounds about right, seeing as how they both only have four FPUs.

Hell, the Anandtech bench shows the 8320 pulling slightly ahead.

http://anandtech.com/bench/product/102?vs=698
 

f2bnp

Member
May 25, 2015
156
93
101
Nop, Phenom II has higher IPC than PileDriver.

I believe Excavator almost reached Phenom II IPC (with less resources though)

Kinda doubt that, at least I can't say for certain that it is so. Swapped my OCed Phenom II 1090T for an FX 8320 a few months ago and saw DOSBox performance (a decidedly singlethreaded program) drop a little.
Think I was running my 1090T at 3.6GHz constantly. At 4GHz the 8320 was slightly faster.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
For those PD -> XV IPC increases we could use some results done at constant clock rates:
http://excavator.looncraz.net/
http://www.planet3dnow.de/cms/18564...cavator-leistungsvergleich-der-architekturen/
and for a more complex model of reality incl. equal TDP settings:
http://www.planet3dnow.de/cms/22697-erste-benchmarks-des-athlon-x4-845/

And the InstLatx64 site has a nice chart comparing different cores:
CfSZWQgUYAI3aW3.jpg:large


High res: http://instlatx64.atw.hu/ (3rd link "CPU comparison chart v1.0")

I think, Idontcare asked for such a table. Of course, info of future cores is sparse, but since I didn't do a detailed analysis of the latencies given in the patch, this is the best overview you can get right now. The patches (incl. the corrective ones) still provide some interesting info not used here, but that needs a lot of additional research anyway.
 
Last edited: