[Techpowerup] AMD "Zen" CPU Prototypes Tested, "Meet all Expectations"

Page 24 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Where do you think this will land performance wise

  • Intel i7 Haswell-E 8 CORE

  • Intel i7 Skylake

  • Intel i5 Skylake

  • Just another Bulldozer attempt


Results are only viewable after voting.

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Comparing one Excavator Module (including CMT) to one Haswell Core (including SMT) you comparing Throughput.

Comparing one Excavator Core without CMT (Single Thread) to one Haswell Core without HT (Single Thread), you comparing Single Thread Performance or some will call it IPC.
 

dark zero

Platinum Member
Jun 2, 2015
2,655
140
106
Only one positive from that scenario IMO and that is that Intel would be 'free' to implement something like IA64 across the board.
"Only when there is one, there can be none"
Ok... If the competition dies, Intel could deliver an ET tech (Itanium) and BAM! The whole tech world dies brutally or at least the domestic one (Intel seems to hints that the PC world will get ditched by them).

Ok it's apocalyptic, but not so much considering that Apple and Samsung can get the PC world remnants and has strong enough SW to keep that world alive for enough years before ARM finally catch x86.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Of course, if Zen ISN'T 40% faster, then it'll be a fiasco. It'll also be quite a problem if clockspeeds are too low. But if you want Haswell IPC, AMD has it right now. They just don't have Haswell levels of performance, and Bristol Ridge sure isn't going to get us there either.

This is a red herring. The problem was never raw performance per core/module/whatever, but the costs AMD has to pay in terms of transistor budget, die area and power consumption in order to reach these performance levels.
 

Shehriazad

Senior member
Nov 3, 2014
555
2
46
Something that is also often ignored or not taken into account....AMD wants to run this chip at 95W.


Wouldn't that mean that even with a more inefficient design, they would have a lot of room for performance?
That would just be a guess from me...but having such a relatively high tdp would be rather helpful, aye?
 

dark zero

Platinum Member
Jun 2, 2015
2,655
140
106
Something that is also often ignored or not taken into account....AMD wants to run this chip at 95W.


Wouldn't that mean that even with a more inefficient design, they would have a lot of room for performance?
That would just be a guess from me...but having such a relatively high tdp would be rather helpful, aye?
Actually their best chip will run up to 350 Watts.... then lowering it i(to the extreme) goes down to 5 Watts to return to 95 watts.
 

Shehriazad

Senior member
Nov 3, 2014
555
2
46
Actually their best chip will run up to 350 Watts.... then lowering it i(to the extreme) goes down to 5 Watts to return to 95 watts.

Wasn't that 300something watts chip that supposed monster APU? (Not to mention that was more of a rumor since AMD just said "high").

I was talking the normal consumer Zen CPU. 95W on 14/16nm seems like a lot of room.
 
Aug 11, 2008
10,451
642
126
Broadwell E is 140 watts on 14nm, so 95 watts (for 8 cores) seems low to me actually. Even the K series quads are 80 plus watts.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Broadwell E is 140 watts on 14nm, so 95 watts (for 8 cores) seems low to me actually. Even the K series quads are 80 plus watts.

95W TDP will give you higher yields = lower Chip cost = lower package/heat-sink cost = higher profit.

Also, OEMs will use 95W TDP SKUs for workstations and high end Gaming systems. Broader market penetration = higher volumes = higher profit.

edit.

One more thing, 95W TDP will need cheaper motherboard designs.

Edit 2.

And as i have said before, i strongly believe that AMD targets ZEN for highest perf/watt possible and not highest performance. So at default, 8 Core 16 Thread 95W TDP ZEN may come slower than same Broadwell-E but it could have way higher Perf/Watt than the 140W TDP Intel SKU.
 
Last edited:
Aug 11, 2008
10,451
642
126
So now we have gone from projected Sandy Bridge IPC to Haswell and better performance per watt than intel in one generation. OK........... whatever you say.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
So now we have gone from projected Sandy Bridge IPC to Haswell and better performance per watt than intel in one generation. OK........... whatever you say.


This is not one generation, it is two generations for the manufacturing (32/28nm to 14nm FF) and 5 years of CPU architecture evolution with a radical new architecture design.
 

tential

Diamond Member
May 13, 2008
7,348
642
121
So from what I'm gathering from Atenra's posts, I'm definitely not buying Zen since it'll be slower than intel. But I'm hoping there will be a cheap Zen SKU to user for my server..... and I mean CHEAP, because I don't pay for weak performance.

So I'm waiting on my performance part from intel to be fast (where I'll spend money on it), and I'm waiting on my cheap, "efficient", price competitive Zen processor which needs to be price/performance competitive for my server.

Sounds like AMD still loses as I'll be spending more on my intel processor since AMD will STILL not have competition for intel's performance.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
So from what I'm gathering from Atenra's posts, I'm definitely not buying Zen since it'll be slower than intel.

95W TDP ZEN may be slower than 140W Broadwell-E , BUT what about OC ZEN to 140W TDP ???

Or

If ZEN at 95W TDP gives you 90% the performance of Broadwell-E at 140W TDP, what will you choose then ???

But I'm hoping there will be a cheap Zen SKU to user for my server..... and I mean CHEAP, because I don't pay for weak performance.

Lets see the performance first and then we speak about the price, but i have a feeling ZEN will not come cheap.
 

Abwx

Lifer
Apr 2, 2011
11,997
4,954
136
95W TDP ZEN may be slower than 140W Broadwell-E , BUT what about OC ZEN to 140W TDP ???

Or

If ZEN at 95W TDP gives you 90% the performance of Broadwell-E at 140W TDP, what will you choose then ???

Lets see the performance first and then we speak about the price, but i have a feeling ZEN will not come cheap.

We know from the published numbers that 14nm LPP is much better than Intel s 14nm at a 2.4GHz frequency, i posted the difference a few months ago, now thanks to a SA member whe have the infos that GF process is as good at 3GHz than at 2.4GHz, and hence much better than Intel s at 3GHz.

So the power figures you posted are relevant.
 

TechGod123

Member
Oct 30, 2015
94
1
0
We know from the published numbers that 14nm LPP is much better than Intel s 14nm at a 2.4GHz frequency, i posted the difference a few months ago, now thanks to a SA member whe have the infos that GF process is as good at 3GHz than at 2.4GHz, and hence much better than Intel s at 3GHz.

So the power figures you posted are relevant.
Oh well that's good!
 

DrMrLordX

Lifer
Apr 27, 2000
23,183
13,270
136
Here's CB10 on 4c/8t Haswell running at 3.4Ghz.

That's in keeping with the comparison I made. Cut that down to 2c/4t Haswell and your score is only 12871 - less than a 2m/4t XV can score at the same clockspeed. XV wins.

He's comparing an AMD module to an Intel core... kind of a dubious comparison, as it ignores the massive single thread performance deficit.

Why is it a dubious comparison? How else would you compare them?

XV is not Bulldozer or Piledriver. You don't have any situations where you get better performance running a 2m SR or XV chip with just one core per module (or "compute unit" as per the UEFI setting). There is no more module penalty. You can't reasonably load up one thread per module and get a meaningful "single threaded" performance figure, since you're crippling the module. Those modules are designed to handle threads in pairs, and if they don't, they slow down big time, by 40-50% depending on the workload.

Intel provides the appearance of higher "single threaded" performance since the first four threads you load onto an i7 comprise ~70% of the performance that you can get out of the chip. It's also why people buy 4c i5 chips. Construction modules just don't work that way.

This is a red herring. The problem was never raw performance per core/module/whatever, but the costs AMD has to pay in terms of transistor budget, die area and power consumption in order to reach these performance levels.

Mock AMD's finances all you like, but they actually did launch 4m/8t Bulldozer and Piledriver, so it's been well-demonstrated that they CAN bring chips like that to market, provided the process is there to support it. GF fell down on its face, and the rest is history. If AMD could actually do the same things with SR and then XV that they did with Piledriver, they'd still be in the game, at least performance-wise. But they can't, and so they didn't, and so people have this silly notion in their heads that AMD's designs are inferior, which they really aren't. XV is quite something to perform so well without any L3 cache at all.

Comparing one Excavator Module (including CMT) to one Haswell Core (including SMT) you comparing Throughput.

Comparing one Excavator Core without CMT (Single Thread) to one Haswell Core without HT (Single Thread), you comparing Single Thread Performance or some will call it IPC.

Deriving IPC from a chip when you are only running one thread is complete bollocks, and it has been since Nehalem. Chips that have shared L3 will grant the benefit of the entire cache block to the execution of that one thread. Inefficient SMT designs will cause the chip to look unnaturally powerful running at half its thread capacity.

IPC literally means "instructions per clock", so if you have a module OR an SMT-capable core or whatever else, the maximum number of instructions per clock is achieved when handling two separate threads! Or more, if you've involved POWER8. Now you're up to eight threads.

So now we have gone from projected Sandy Bridge IPC to Haswell and better performance per watt than intel in one generation. OK........... whatever you say.

Nobody has ever said anything credible that indicated that the actual IPC - not just "single threaded" performance - of Zen would be at Sandy Bridge levels. If you wish to doubt AMD's statements regarding Zen's improvement over XV, then fine. But don't deny what XV is capable of doing right now which is well-established and documented.

If you run XV at half its thread capacity or lower, it's not all that impressive. If you load it all the way up, it beats Haswell in IPC. Too few modules and too low clockspeeds (and a complete focus on the mobile sector) make it a non-starter for the enthusiast. There are other ways to cripple Construction cores as well, such as hitting them with expensive pipeline stalls over and over again. But what it's capable of doing is really quite impressive. THAT'S why AMD's statements about Zen are so extraordinary. Nobody knows if they'll be able to pull it off, but AMD seems intent on leading us to believe that they can. I'm still waiting on the clockspeeds, though.

The other thing that's going to be confusing is how Zen will scale with thread count. SR and XV scale very well with thread count all the way up to the chip's full thread capacity, unlike Haswell/Broadwell/Skylake that typically gain only 30% more performance on the last half of the thread count (unless you have something like 3DPM that inflicts multitudinous pipeline stalls). Zen is going to be SMT-based, and if Keller was cribbing from Intel's designs, we may see behavior out of Zen that's far more like an Intel core than a Construction module.
 
Last edited:

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
Mock AMD's finances all you like, but they actually did launch 4m/8t Bulldozer and Piledriver, so it's been well-demonstrated that they CAN bring chips like that to market, provided the process is there to support it. GF fell down on its face, and the rest is history.

Again, you are putting a red herring. Nobody said that AMD cannot reach certain levels of performance if they want, the problem is that they reach these levels throwing more hardware at the problem (higher transistor budget and bigger dies) which results in higher power consuption, which wrecks the TCO equation for both PC and server OEMs.

It's just not enough to reach the performance threshold, AMD should also do that within reasonable manufacturing costs and power consumption levels, and this is where they fail, badly.
 

DrMrLordX

Lifer
Apr 27, 2000
23,183
13,270
136
, AMD should also do that within reasonable manufacturing costs and power consumption levels, and this is where they fail, badly.

I will agree with you on that point, which is why we're stuck with 2m SR and XV chips. Well, one of the reasons. But you are forgetting that part of the transistor budget that made Piledriver and Bulldozer so bloated was the large and slow L3. The modules themselves were not particularly large, and if you look at the module sizes on Kaveri and Carrizo (HDL notwithstanding), they aren't that big.

AMD has major problems with cache density and cache performance. That's why they stopped making new chips with L3, period.
 

mrmt

Diamond Member
Aug 18, 2012
3,974
0
76
I will agree with you on that point, which is why we're stuck with 2m SR and XV chips. Well, one of the reasons. But you are forgetting that part of the transistor budget that made Piledriver and Bulldozer so bloated was the large and slow L3. The modules themselves were not particularly large, and if you look at the module sizes on Kaveri and Carrizo (HDL notwithstanding), they aren't that big.

That's just not true. Trinity CPU part is 40-50% bigger than the Sandy Bridge 2C CPU part, and it also consumes more power for less performance. The bloated L3 and HT links just make things worse on the server, but by no means they are the main problem.
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
mrmt said:
, AMD should also do that within reasonable manufacturing costs and power consumption levels, and this is where they fail, badly.

I will agree with you on that point, which is why we're stuck with 2m SR and XV chips. Well, one of the reasons. But you are forgetting that part of the transistor budget that made Piledriver and Bulldozer so bloated was the large and slow L3. The modules themselves were not particularly large, and if you look at the module sizes on Kaveri and Carrizo (HDL notwithstanding), they aren't that big.

AMD has major problems with cache density and cache performance. That's why they stopped making new chips with L3, period.

I think 2M XV for Carrizo was not the best idea this day and age. One example would be the analysis found in this article.

When I started out this piece the goals I set out to reach was to either confirm or debunk on how useful homogeneous 8-core designs would be in the real world. The fact that Chrome and to a lesser extent Samsung's stock browser were able to consistently load up to 6-8 concurrent processes while loading a page suddenly gives a lot of credence to these 8-core designs that we would have otherwise not thought of being able to fully use their designed CPU configurations.
[...]
What we see in the use-case analysis is that the amount of use-cases where an application is visibly limited due to single-threaded performance seems be very limited. In fact, a large amount of the analyzed scenarios our test-device with Cortex A57 cores would rarely need to ramp up to their full frequency beyond short bursts (Thermal throttling was not a factor in any of the tests). On the other hand, scenarios were we'd find 3-4 high load threads seem not to be that particularly hard to find, and actually appear to be an a pretty common occurence. For mobile, the choice seems to be obvious due to the power curve implications. In scenarios where we're not talking about having loads so small that it becomes not worthwhile to spend the energy to bring a secondary core out of its idle state, one could generalize that if one is able to spread the load over multiple CPUs, it will always preferable and more efficient to do so.
[...]
In the end what we should take away from this analysis is that Android devices can make much better use of multi-threading than initially expected. There's very solid evidence that not only are 4.4 big.LITTLE designs validated, but we also find practical benefits of using 8-core "little" designs over similar single-cluster 4-core SoCs. For the foreseeable future it seems that vendors who rely on ARM's CPU designs will be well served with a continued use of 4.4 b.L designs.

Now granted that was using Android, but I wouldnt be surprised if Microsoft Spartan or IE11 scaled the same way.

Then, of course, I would also expect a 6C/6T to have better performance per watt than 4C/4T.

But I do agree about the L3 cache.
 
Last edited:

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
IPC literally means "instructions per clock", so if you have a module OR an SMT-capable core or whatever else, the maximum number of instructions per clock is achieved when handling two separate threads! Or more, if you've involved POWER8. Now you're up to eight threads.

That is Core/Module Throughput, not what people mean by IPC (Single Thread Performance) here.