AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

inf64 · Oct 5, 2016

ShintaiDK said:
So in other words, the up to 40% with Zen for example can mean anything. Even 10%

Thanks for confirming it.

Hmm I have trouble finding Zen slides that show "up to" statement for IPC increase. Can you provide some? I have found a bunch of them for SR and XV but all I see for Zen is 40% more IPC in slides (which was confirmed at HC that it was related to XV and only concerned pure ST improvement and SMT came on top of it).

Dresdenboy · Oct 5, 2016

ShintaiDK said:
So in other words, the up to 40% with Zen for example can mean anything. Even 10%

Thanks for confirming it.

Please provide your evidence for this statement. Otherwise you just disqualified your point.

.vodka · Oct 5, 2016

ShintaiDK said:
So in other words, the up to 40% with Zen for example can mean anything. Even 10%

Thanks for confirming it.

Come on, don't be so pessimistic. Zen will be better than XV, even considering it was developed on a shoestring budget because it's a saner architecture to begin with. I like to see Zen as the spiritual successor to K10/Stars. If all the proposed changes to the uarch relative to XV as seen in the slides amount to a 10% improvement then AMD can go hang themselves, they won't be missed (well, ATI would, hope they set them free before dying in this hypothetical scenario)

AMD has lied and lied, trying to polish the turd that BD was during its first years. If only BD had started out from what Bristol Ridge is.. I don't see a reason for them to lie about Zen, they're betting their future on it. Having said that... this 40% number had better be a real 40% improvement in perf/clock, not the up to 2.8x perf/w claim for Polaris that turned out to be true for RX 470.

Anyway, I'd like to see another meaningful leak in the coming weeks showing similar results to Blender Zen vs BW-E. What we've seen so far seems promising... but I'm not holding my breath.

Arachnotronic · Oct 5, 2016

.vodka said:
Come on, don't be so pessimistic. Zen will be better than XV, even considering it was developed on a shoestring budget because it's a saner architecture to begin with. I like to see Zen as the spiritual successor to K10/Stars. If all the proposed changes to the uarch relative to XV as seen in the slides amount to a 10% improvement then AMD can go hang themselves, they won't be missed (well, ATI would, hope they set them free before dying in this hypothetical scenario)

AMD has lied and lied, trying to polish the turd that BD was during its first years. If only BD had started out from what Bristol Ridge is.. I don't see a reason for them to lie about Zen, they're betting their future on it. Having said that... this 40% number had better be a real 40% improvement in perf/clock, not the up to 2.8x perf/w claim for Polaris that turned out to be true for RX 470.

Anyway, I'd like to see another meaningful leak in the coming weeks showing similar results to Blender Zen vs BW-E. What we've seen so far seems promising... but I'm not holding my breath.

40% IPC over XV would put Zen at somewhere in between a Sandy Bridge and an Ivy Bridge. I don't think it will be able to match Broadwell-E clock for clock in most workloads.

Glo. · Oct 5, 2016

Pardon me that I am asking for this, but did AMD ever stated that IPC increase is Instructions dispatched or executed per clock?

Arachnotronic · Oct 5, 2016

Glo. said:
Pardon me that I am asking for this, but did AMD ever stated that IPC increase is Instructions dispatched or executed per clock?

Executed.

superstition · Oct 5, 2016

.vodka said:
AMD has lied and lied, trying to polish the turd that BD was during its first years.

Yeah. Horrible performance for a 2011/2012 architecture stuck on 32nm.

Question is... If Intel hadn't been able to rob AMD of so much profit with its OEM, compiler, and benchmarking shenanigans... how could Bulldozer 8 core have been improved during the years after Piledriver.

Excavator, for high performance, is not the answer. It was designed for low power consumption not high performance. The same goes for Bristol Ridge. Bristol and Excavator are fine for low-demand OEM systems, portables, and such but they're hardly what could have been with the construction design had it been advanced for higher performance higher-TDP purposes.

If AMD had had more money it could have added a uOP cache, for instance. It could have done more to improve the L2 and L3 caches. There were performance improvements to be had but Intel's tricks kept significant quantities of cash out of AMD's R&D arm.

.vodka said:
If only BD had started out from what Bristol Ridge is..

A cache-starved part that uses a high density library on a 28nm bulk process and which targets low TDP?

NicolasCC · Oct 5, 2016

Sweepr said:
Straight from Ashes of the Singularity's CPU benchmark:

#COMPARISON 1

- Summit Ridge 8C/16T ES - 2.8-3.2 GHz (2016/2017 Zen)
Average: 58.9 FPS
Normal batch: 65.8 FPS
Medium batch: 62.8 FPS
Heavy batch: 50.5 FPS

http://www.ashesofthesingularity.co...-details/bfba4b4a-4b1e-4ab3-8f2f-2375321ea68b

- Core i7-980 6C/12T - 3.33 GHz (2010 Westmere)
Average: 58.6 FPS
Normal batch: 65.2 FPS
Medium batch: 59.7 FPS
Heavy batch: 52.3 FPS

http://www.ashesofthesingularity.co...-details/45e74e9d-4b0c-4557-8225-a3b8664dd4a8

- Core i7-4770 4C/8T 3.4-3.9 GHz (2013 Haswell)
Average: 66.0 FPS
Normal batch: 74.5 FPS
Medium batch: 69.5 FPS
Heavy batch: 56.6 FPS

http://www.ashesofthesingularity.co...-details/4baad586-6271-42bd-84c5-6884cfb3341d

- Core i5-6600K 4C/4T - 3.5-3.9 GHz (2015 Skylake)
Average: 82.4 FPS
Normal batch: 87.6 FPS
Medium batch: 85.5 FPS
Heavy batch: 75.2 FPS

http://www.ashesofthesingularity.co...-details/8e1f0605-d5d9-49d1-afc0-e6fbf9e51262

- Core i7-6700K 4C/8T - 4.0 GHz (2015 Skylake) + OC
Average: 107.3 FPS
Normal batch: 125.7 FPS
Medium batch: 113.8 FPS
Heavy batch: 89.2 FPS

http://www.ashesofthesingularity.co...-details/6096dd00-7e11-4368-b588-97413922f5c7

- Core i7-5960X 8C/16 - 3.0-3.5 GHz (2014 Haswell-E)
Average: 109.8 FPS
Normal batch: 127.9 FPS
Medium batch: 119.7 FPS
Heavy batch: 89.7 FPS

http://www.ashesofthesingularity.co...-details/5d5ab818-bc9e-4340-8a33-9e243a097a1f

The Haswell score is 100% at stock and based on the latest version of the benchmark included in the search engine (1.24.20823.0). The benchmark likes cores/thread and scales with more than 4C/8T:

3.0 GHz Haswell-E beating 4.0 GHz Skylake-S.

#COMPARISON 2

- Summit Ridge 8C/16T ES - 2.8-3.2 GHz (2016/2017 Zen)
Average: 31.5 FPS
Normal batch: 36.5 FPS
Medium batch: 33.8 FPS
Heavy batch: 26.2 FPS

- Core i5-4670K 4C/4T 3.4-3.8 GHz (2013 Haswell)
Average: 52.6 FPS
Normal batch: 56.9 FPS
Medium batch: 54.4 FPS
Heavy batch: 47.5 FPS

https://forums.anandtech.com/threads/first-summit-ridge-zen-benchmarks.2482739/page-4#post-38414245

#UPDATE 13/08

AMD ZEN Engineering Sample AOS - Further Performance Analysis

http://www.guru3d.com/news-story/amd-zen-engineering-sample-aos-further-analysis.html

# Comparison 3

Intel Xeon E5-2603 v3 (Haswell) @ 1.60 GHz
- Single-Core Score: 1804

https://browser.primatelabs.com/v4/cpu/117877

AMD Zeppelin (Zen) @ 1.45 GHz
- Single-Core Score: 1141

https://browser.primatelabs.com/v4/cpu/105227

wow!! thanks a lot! such a detailed post!

majord · Oct 5, 2016

Arachnotronic said:
Sure, Intel's Atom efforts didn't work out in smartphones and tablets for a number of reasons.

Intel's mobile failure is an example of the latter. As for the "ARM gnome and friends," you do realize that the ARM companies that have been successful (Apple, Samsung, Qualcomm) are not operating on AMD-level R&D budgets.

Do you actually have the comparative numbers for these cores R&D budget? (Axx, mongoose, Atom, Zen etc)

Anyway this IPC argument is a little ridiculous. If you want to believe AMD are lying over and over and over again about the IPC incease over XV (since they've repeated it many times now, and been very clear about what they mean), then whatever, I don't think there's any point debating it. Given the (truckloads) of data relased on the core, there's no reason to doubt it , nor is there anything magical about it.

The clockspeed issue is far more interesting, and open to debate since there's a lot of Question marks still, and will have far more implication on performance than this petty crap about how acurate their IPC uplift claim will be.

inf64 · Oct 5, 2016

inf64 said:
Hmm I have trouble finding Zen slides that show "up to" statement for IPC increase. Can you provide some? I have found a bunch of them for SR and XV but all I see for Zen is 40% more IPC in slides (which was confirmed at HC that it was related to XV and only concerned pure ST improvement and SMT came on top of it).

Dresdenboy said:
Please provide your evidence for this statement. Otherwise you just disqualified your point.

Bump for Shintai in case he missed it

NostaSeronx · Oct 5, 2016

I think x86-64 should follow the fringe big.Little ARM designs like the Helio X20/X25.

Cat cores for light tasks. Lowest Frequency. Power is the most important aspect.
Zen cores for medium tasks. Middle ground because of highest IPC.
Bulldozer cores for heavy tasks. Fastest Frequency. Vector potential is more important than scalar potential.

.vodka · Oct 5, 2016

Arachnotronic said:
40% IPC over XV would put Zen at somewhere in between a Sandy Bridge and an Ivy Bridge. I don't think it will be able to match Broadwell-E clock for clock in most workloads.

I know. Neither do I, that's why I'd like to see another Blender like leak to confirm if we're looking at best possible case performance (likely) or a baseline, give or take 5-10%. If the geekbench v4 numbers turn out to be true somehow then it's alright, Ivy/Haswell-like performance for a first gen core isn't bad at all, considering both companies are now fighting the laws of physics more than ever.

superstition said:
Yeah. Horrible performance for a 2011/2012 architecture stuck on 32nm.

Question is... If Intel hadn't been able to rob AMD of so much profit with its OEM, compiler, and benchmarking shenanigans... how could Bulldozer 8 core have been improved during the years after Piledriver.

Excavator, for high performance, is not the answer. It was designed for low power consumption not high performance. The same goes for Bristol Ridge. Bristol and Excavator are fine for low-demand OEM systems, portables, and such but they're hardly what could have been with the construction design had it been advanced for higher performance higher-TDP purposes.

If AMD had had more money it could have added a uOP cache, for instance. It could have done more to improve the L2 and L3 caches. There were performance improvements to be had but Intel's tricks kept significant quantities of cash out of AMD's R&D arm.

A cache-starved part that uses a high density library on a 28nm bulk process and which targets low TDP?

Not willing to go down this path. The what ifs on AMD go back to the P4 days and Intel's dirty tactics.

Come on, you know what I meant with my post, no need to nitpick. An outlier of a game or two doesn't change the fact that most of the time BD and its derivatives just can't keep up. Piledriver/Vishera was a fine CPU back in its day when competing with Sandy (8350 matching 2500k as expected)... nowadays it's game over. The 9370 and 9590 are that high in the charts through balls to the wall overclocking and overvolting for an off the shelf consumer part.

I was thinking more along the lines of the modules themselves instead of the end products.. We can agree that XV in BR is a much better example of the BD paradigm than 1st generation BD was in the x1xx Zambezi part, in XV the low hanging fruit in the design has been picked and further refinement could do wonders, as you say. BR as it is, a cache starved part on an outdated 28nm process tuned for sheer density, is a much stronger product than Trinity/Richland were. Imagine what a four module XV v2 part with L3 and a better uncore could do as replacement for Vishera, designed for higher power on a newer, better node... sadly we won't get to see that. Another what if.

AMD knows very well why they didn't keep pursuing the BD lineage for high performance and power usage, instead decided to design a new core. Zen should be more all around more competitive than BD ever was in any of its iterations given the 40% figure over XV... if it can clock high enough.

dark zero · Oct 6, 2016

Arachnotronic said:
Sure, Intel's Atom efforts didn't work out in smartphones and tablets for a number of reasons.

I don't know who said this first, but I have seen it used by ShintaiDK and others: In R&D, you don't always get what you pay for, but you won't get what you don't pay for.

Intel's mobile failure is an example of the latter. As for the "ARM gnome and friends," you do realize that the ARM companies that have been successful (Apple, Samsung, Qualcomm) are not operating on AMD-level R&D budgets.

Mediatek wants to talk with you... with even lesser R&D than AMD, they already destroyed any Intel effort to enter to mobile... Atom was an easy target even for them.

And well... Mediatek is now like AMD in the ARM market, but with more success, which is hilarious.

AMD should entered on the Android market way before than now.

cdimauro · Oct 6, 2016

lolfail9001 said:
Considering that generally process has minor impact on actual architecture performance (in particular per clock), it kinda discredits interview as a whole, if anything.

Dresdenboy said:
I think he referred to remaining power efficiency improvements, because a part of the process' improvement likely already got used up in uarch and cycle time.

The problem here is related to the words that he used, and particularly "per clock". Maybe he made a mistake, and so giving a misleading information.

coercitiv said:
AMD mentions 40% for ST IPC, first of all because they would always communicate the higher number (40% for MT then communicate 50%+ for ST), second because if it were otherwise Zen would tie or beat Intel's HEDT in more traditional benchmarks such as Cinebench. So not only is there confusion in the way you used IPC in relation to the number of threads per core, you may have also chosen a rather unreliable path to interpret what little AMD claimed in regard to Zen vs. XV

So, you're stating that there are an IPC ST and IPC MT definitions, right? I beg to differ.

There was nothing like that in the definition that someone reported before, but it's not an isolated case.

For example, here you can find the definition from a respectable source (they should know about measuring the performance, right?), and and pay attention to this: "IPC is an excellent metric for judging an overall potential for application performance tuning". So, application, as a whole. No ST or MT distinction, because IPC IS measured/extracted running an application, and not a single part of its execution.

From another source (64-ia-32 manual-325462.pdf , "Intel® 64 and IA-32 Architectures Software Developer’s Manual"), at p.45:

"2.2.3.2 Execution Core
The execution core of the Intel Core microarchitecture is superscalar and can process instructions out of order to increase the overall rate of instructions executed per cycle (IPC)."

So, the whole core is considered, and NOT a part of it, or splitting the definition in ST/MT terms.

From another source (64-ia-32-architectures-optimization-manual.pdf, "Intel® 64 and IA-32 Architectures Optimization Reference Manual"), at p.20:

"2.3.3 The Out-of-Order Engine
The Out-of-Order engine provides improved performance over prior generations with excellent power characteristics. It detects dependency chains and sends them to execution out-of-order while maintaining the correct data flow. When a dependency chain is waiting for a resource, such as a second-level data cache line, it sends micro-ops from another chain to the execution core. This increases the overall rate of instructions executed per cycle (IPC)."

Pay attention to the "overall".
At p.583:

"Retiring denotes slots utilized by “good operations”. Ideally, you want to see all slots attributed here since it correlates with Instructions Per Cycle (IPC). Nevertheless, a high Retiring fraction does not necessary mean there is no room for speedup. since it correlates with Instructions Per Cycle (IPC). Nevertheless, a high Retiring fraction does not necessary mean there is no room for speedup."

In fact, when the CPU retires instructions, it doesn't make a distinction between the threads: it retires whatever is the thread (one or two) from which they come.
At p.586:

"B.1.7 Retiring
This category reflects slots utilized by “good micro-ops” – issued micro-ops that get retired expeditiously without performance bottlenecks. Ideally, we would want to see all slots attributed to the Retiring category; that is Retiring of 100% of every slots correspond to hitting the maximal micro-ops retired per cycle of the given microarchitecture. For example, assuming one instruction is decoded into one microop, Retiring of 50% in one slot means an IPC of 2 was achieved in a four-wide machine. In other words, maximizing the Retiring category increases the IPC of your program."

Again, it talks about the whole program. Not ST and/or MT.
I think that it's enough. So, there was no confusion when I talked about IPC neither when I talked about Blender's results.

SarahKerrigan said:
I was at HC28. Mr. Clark was quite clear that it was ST.

Questioner from Nvidia: You had 40% uplift on IPC. Did it include the dual-thread, or was that per thread?
Clark: That was just a one-thread number. We do have good throughput on SMT but we're not stating numbers on that right now.

For those who have access to HC presentation videos, the timecode is 1:27:46 on the session9 video.

I have no problem believing you

, but see above: I respectfully disagree about such IPC definition.

@bjt2: I've no time now to reply to your post. However you can take a look at Intel's optimization manual, and you'll see the architectures' diagrams that you're looking for, as well as a lot of other useful information.

Nothingness · Oct 6, 2016

cdimauro said:
For example, here you can find the definition from a respectable source (they should know about measuring the performance, right?), and and pay attention to this: "IPC is an excellent metric for judging an overall potential for application performance tuning".

That's a funny statement to make. IPC in isolation is useless: a poorly optimized application might run many useless instructions and that might artificially increase IPC. One place where IPC can be considered as useful is for comparing two different CPU running the same program (or when tuning a micro-architecture

)

Dresdenboy · Oct 6, 2016

cdimauro said:
I have no problem believing you , but see above: I respectfully disagree about such IPC definition.

With a clear distinction it should be possible to use IPC for 1T on a 2T core.

Where is the definition of IPC, which excludes its application to parts of programs, or different scenarios on a SMT machine? This would just cut that metrics' usability. In fact I've read papers showing the actual IPC plotted over time for different applications.

bjt2 · Oct 6, 2016

cdimauro said:
@bjt2: I've no time now to reply to your post. However you can take a look at Intel's optimization manual, and you'll see the architectures' diagrams that you're looking for, as well as a lot of other useful information.

Even if i read carefully the diagrams, there are 2 reasons that forces us to wait actual benchmarks:

1) We don't have instruction type layout for Zen. We don't know what pipeline can do what, e.g. how many IMUL? How many cycles, limitation? So a simulation is impossible.
2) Even if we have that details, it's a difficult calculation, better done with a simulator, that we don't have.

We can do only an high level analysis, with queue theory to roughly estimate the outcome.
And we know that 4+4 specialized queues are faster than 4 shared queues, given the same latencies but obviously require much logic...

The only reason to be slower are if there are some limitation in instruction combinations, that limit the maximum IPC.

Moreover 10 uops is not sustainable, but only 6, and this only assuming 100% cache hit and high enough uop cache hit to avoid the 4 uops bottleneck of the decoder and low enough dependencies...

So INTEL design is balanced enough for very well mixed instructions (8 peak uops/cycle processing, with max 6 uops/cycle dispatching), but could have some stall in particular instruction mix, e.g. complex FP instruction mix with low interlocking that leaves few free ports for integer instructions, like 2 thread of the 2.4 IPC SPEC FP bench.
Moreover probabily branch prediction is better on INTEL and this assures a constant 6 uops cycle flux, where in Zen this could be more intermittent...

Zen is more equipped for peak with awfully mixed instructions...

coercitiv · Oct 6, 2016

cdimauro said:
So, you're stating that there are an IPC ST and IPC MT definitions, right? I beg to differ.

Ok, let's disagree.

cdimauro said:
"IPC is an excellent metric for judging an overall potential for application performance tuning".

If the quote above is a definition of sorts. then I'm an interstellar rocket.

cdimauro said:
I have no problem believing you , but see above: I respectfully disagree about such IPC definition.

Based on your post, would you agree that IPC is utterly useless in describing ST performance of a SMT capable CPU core?

ShintaiDK · Oct 6, 2016

inf64 said:
Bump for Shintai in case he missed it

Fair enough, it was an article rewrite that got it the up to part added.

But would be nice to see AMD reach the performance claim they do for the first time in 10 years. But that would mean the leaks are wrong.

lolfail9001 · Oct 6, 2016

ShintaiDK said:
Fair enough, it was an article rewrite that got it the up to part added.

But would be nice to see AMD reach the performance claim they do for the first time in 10 years. But that would mean the leaks are wrong.

If the intial GB4/GB3 benches had Zen working at 1Ghz (Geekbench is a mess in terms of frequency report), then it does reach AMD's performance claim. In fact, it also manages to perform damn close to Haswell-EP per clock in this case, validating Blender bench. So, no, leaks don't actually contradict AMD's claims, they lack information to do that as of now.

So, as weird as it is, AMD got my attention.

hojnikb · Oct 6, 2016

NostaSeronx said:
I think x86-64 should follow the fringe big.Little ARM designs like the Helio X20/X25.

Cat cores for light tasks. Lowest Frequency. Power is the most important aspect.
Zen cores for medium tasks. Middle ground because of highest IPC.
Bulldozer cores for heavy tasks. Fastest Frequency. Vector potential is more important than scalar potential.

I don't think you can do mismatched cores with x86, because it would have been done by now.

Arachnotronic · Oct 6, 2016

lolfail9001 said:
If the intial GB4/GB3 benches had Zen working at 1Ghz (Geekbench is a mess in terms of frequency report), then it does reach AMD's performance claim. In fact, it also manages to perform damn close to Haswell-EP per clock in this case, validating Blender bench. So, no, leaks don't actually contradict AMD's claims, they lack information to do that as of now.

So, as weird as it is, AMD got my attention.

The Stilt published an XV result @ 3.4GHz:
https://forums.anandtech.com/threads/first-summit-ridge-zen-benchmarks.2482739/page-51#post-38501596

1.4x the perf/MHz shown here (and assuming linear scaling from 3.4GHz to 3.5GHz) would give a score of ~3549 at 3.5GHz. My Broadwell @ 3.5GHz manages to get 3903 in single core.

This implies that Broadwell is ~10% ahead in perf/clock or, more succinctly, AMD has built its own version of Ivy Bridge.

That's still 3 core iterations behind Intel's current best in terms of perf/clock, but that's a much better position than what AMD was previously in.

Shivansps · Oct 6, 2016

Thats may not be enoght for people with sb and ivy to consider to switch, and in 5 days its gona be 5 years for my 2500k, so thats not good.

People with Fx should upgrade, but they will hace a hard time with sb and ivy users.

Arachnotronic · Oct 6, 2016

Shivansps said:
Thats may not be enoght for people with sb and iv to consider to switch, and in 5 days its gona be 5 years for my 2500k, so thats not good.

Well, if people weren't compelled by Skylake, they probably won't find Zen all that appealing.

LTC8K6 · Oct 6, 2016

Zen's extra cores should make it's multi thread performance well above those older Intel chips, though. That should be a reason to consider Zen, depending on price, when they upgrade.

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Diamond Member

Golden Member

Golden Member

Lifer

Diamond Member

Lifer

Platinum Member

Junior Member

Senior member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Member

Diamond Member

Golden Member

Senior member

Diamond Member

Lifer

Golden Member

Senior member

Lifer

Diamond Member

Lifer

Lifer