AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Det0x · Oct 3, 2016

Abwx said:
They will, once it s launched, or did any manfacturer of anything send an ES to any site before the product was launched, tell us when this already happened, i m much interested to see if any other firm ever bowed to such irrealistic and irrational demands...

Maybe not entirely the same thing, but didn't AMD show a ES hammer @ 800mhz vs a Willamette @ 1600mhz to Anandtech back in the day ?
Pretty sure i remember reading a exclusive here, where the Athlon64 roflstomped the Pentium4..

*edit*

Cant seem to find the original link, but here are some of the forum threads about it:

Early AMD Opteron/Hammer benchmarks @ http://arstechnica.com/civis/viewtopic.php?f=2&t=2059

arstechnica said:
German hardware magazine tecChannel got its hands on a Hammer system for about an hour, so they did what any good tech site would do'”they started running benchmarks. The Hammer benchmarks were performed with prototype hardware'”the CPU and mobo, obviously, and as such warrant caution. The benchmarks compare the Hammer and Athlon MP, both running at 800MHz, plus an MP running at 1.667GHz, and a P4 Willamette at 800MHz and 1.6GHz, with a 400MHz FSB. The basic results put the 800MHz Hammer on par, or surpassing, the 1.6GHz P4 in their limited battery of tests, which included Quake III (with unspecified video config), 32-bit and 128-bit memory transfers, and a few other quick, one-off tests. I dunno 'bout you, but this world is getting messy! If anything, it's becoming increasing clear even within the x86 world that MHz doesn't necessarily mean anything. 800MHz versus 1.6GHz, on 32-bit x86 code? Gotta love it.

These tests, however, are focused on 32-bit performance, and so we shouldn't jump to conclusions about how this stacks up to performance gains for 64-bit x86 applications, should they see the light of day. Indeed, all this is a tad confusing considering AMD's posturing earlier this week, wherein they seemed to be cushioning us for less than stellar performance. In the 32-bit category, if these early benchmarks can be trusted, Hammer is kickin-butt. But that still leaves two major questions: is the 64-bit performance also impressive? Perhaps even more importantly, can AMD make a financially beneficial transition from .18 to .13 micron for T-bred, and get Hammer to 1.6/.7GHz for release at the same time? To be sure, AMD has to crank these Opteron/Hammer babies out at much higher speed ratings to attract attention. I suspect therein lay the problem.

Lets hope history repeats itself

bjt2 · Oct 3, 2016

cdimauro said:
Why?

I remember some old analysis of SPEC INT and FP mean IPC. SPEC FP, compiled for x86 resulted in a mean IPC of 2.4, comprising all: FP, jump, compare etc.
Take 2 threads per core, due to SMT. We have a MEAN value of 4.8. Intel archs has 4 shared ports for int+FP. In this 4.8 there are also memory instructions that go on separate pipelines. But 4.8 is a mean value. There certainly is some single bench with an higher IPC and even if not, dependencies and constraints on the four pipelines can lead to stall/bottlenecks.
Even on Zen this could happen. But you should admit that on a 4+4 pipeline scheme, this is less probable...

EDIT: this analysis was done in the good old times of SMTless CPUs and now probabily the branch prediction techniques are improved and give maybe up to 3 of IPC... Obviously in and SMT design this is unsustainable on INTEL arch: it would mean 6 x86 instructions/cycle. But is theoretically possible on AMD Zen... Obviously also 2.4+2.4 does not sum up to 4.8 in an SMT CPU, due to saturation effects of the sharing. But on AMD Zen this saturation effect should be lower due to higher ports...

EDIT2: think of queue theory: you never get full resource utilizations and approaching the limit increase exponentially latencies...

cdimauro · Oct 3, 2016

lolfail9001 said:
Because if you are not making this split then it leads to entire confusion since suddenly a core performance means either single threaded performance (that won't be equal to IPC*clock in your terms), or single core throughput, that will be given by your definition IPC.

There's no confusion at all properly because I made no split, since splitting an IPC as ST and/or MT doesn't make sense at all, as I already stated.

Did you saw Intel, AMD, or other processors vendors talking about ST IPC and/or MT IPC? I don't.

And what about the 40% improvement that AMD reported about Zen vs XV? Is it ST or MT? Please, can you ask AMD to split such absolute value according to YOUR definition?

IPC is was the (commonly accepted) definition is. That's it. And that's how I've used it.

Single core/thread and Multi core/thread matters only if you focus on specific applications which show one or the other kind of behavior.

Specifically, Blender IS a MT application, so every core tried to execute as much as instructions possible thanks to having two hardware threads feeding the backend.

Next, even "roughly" is unproven, even if evidently simple messing with flags does not affect it that much.

Well, I don't think that AMD has compiled Blender with Intel's ICC compiler, which is the only way that you can have two different code-paths to be executed depending on the specific micro-architecture.

It's likely that GCC was used, with just one, common, code path executed for SSE2 code.

Of course, there's the possibility that Blender was compiled with specific optimizations for:

- Zen
- Broadwell
- both

but we don't know.

Anyways, this 1Ghz hypothesis gives AMD fans a hope, so we'll let it live until first proper reviews. Zen certainly won't be anything groundbreaking even in this best case though.

I see 1.44Ghz, not 1Ghz, on the posted images.

cdimauro · Oct 3, 2016

bjt2 said:
I remember some old analysis of SPEC INT and FP mean IPC. SPEC FP, compiled for x86 resulted in a mean IPC of 2.4, comprising all: FP, jump, compare etc.
Take 2 threads per core, due to SMT. We have a MEAN value of 4.8. Intel archs has 4 shared ports for int+FP. In this 4.8 there are also memory instructions that go on separate pipelines. But 4.8 is a mean value. There certainly is some single bench with an higher IPC and even if not, dependencies and constraints on the four pipelines can lead to stall/bottlenecks.

Not all ALU ports are shared with the FPU. Please, take a look at Haswell schema for this.

Even on Zen this could happen. But you should admit that on a 4+4 pipeline scheme, this is less probable...

Sure, it's an advantage. But here we are talking about the execution ports, and not the unique scheduler. To be more clear, a unique scheduler isn't a bottleneck by definition: it's a shared port which MIGHT be.

EDIT: this analysis was done in the good old times of SMTless CPUs and now probabily the branch prediction techniques are improved and give maybe up to 3 of IPC... Obviously in and SMT design this is unsustainable on INTEL arch: it would mean 6 x86 instructions/cycle. But is theoretically possible on AMD Zen... Obviously also 2.4+2.4 does not sum up to 4.8 in an SMT CPU, due to saturation effects of the sharing. But on AMD Zen this saturation effect should be lower due to higher ports...

You've forgot that an Intel core has different kinds of uops, and that its decoders can decode up to 4 instructions to 6uops, and every uop can be split in 2 simpler uops before being sent to an execution port. The 4 decoders can also merge up to two different instructions in one uop, but with at most 2 merges per cycle.

So, Intel's Micro-op cache can deliver up to 6 uops/cycle, but every uop can carry:
- 1 uop;
- 1 uop -> 2 merged instructions;
- 1 uop -> 2 (simpler) uops.

And this might be one of the reasons why Intel's microarchitectures have good performances even with 8 ports with partially shared ALUs/FPUs.

The Stilt · Oct 3, 2016

inf64 said:
How do you know it is a hoax? All Zeppelin entries are gone. If the score is legit which I think is the case, it matches perfectly with the claims of 40% higher ST IPC over XV core.

A couple of XV scores (3.8Ghz ST Turbo, locked):
https://browser.primatelabs.com/geekbench3/7933740
https://browser.primatelabs.com/geekbench3/6953761

ST score of ~2500pts or so at 3.8Ghz.
At 1Ghz it should score 657pts. 40%more is around 920pts. Leaked Zeppelin scored 984pts at unknown clock. Unknown clock's likelihood of being 1Ghz is 99% now .

Here is my FX-8800P GB4 result with fixed 3.4GHz speed, DRAM at 2133MHz.

http://browser.primatelabs.com/v4/cpu/241727

Abwx · Oct 3, 2016

cdimauro said:
I see 1.44Ghz, not 1Ghz, on the posted images.

I think this kind of comment is not only useless but aknowledgment that you didnt even read in the thread..

inf64 said:
. If you think they were done at 1.4Ghz then there is zero IPC gain vs XV on a core that has roughly double the resources of its predecessor. Which makes absolutely no sense.

Take your pick .

Abwx said:
looking at the individual FP ST scores the chip is about 2.7x slower than a Bristol Ridge XV at 4GHz, this imply that at 1.44GHz Zen would be as fast as a 1.48GHz Bristol Ridge XV, wich does not make sense at all, so the only plausible explanation is that the chip is working in the vicinity of 1GHz..

cdimauro said:
You've forgot that an Intel core has different kinds of uops, and that its decoders can decode up to 4 instructions to 6uops, and every uop can be split in 2 simpler uops before being sent to an execution port.

And so does Zen...

cdimauro · Oct 3, 2016

Abwx said:
I think this kind of comment is not only useless but aknowledgment that you didnt even read in the thread..

What I've missed, my lord?

lolfail9001 · Oct 3, 2016

cdimauro said:
There's no confusion at all properly because I made no split, since splitting an IPC as ST and/or MT doesn't make sense at all, as I already stated.

Did you saw Intel, AMD, or other processors vendors talking about ST IPC and/or MT IPC? I don't.

And what about the 40% improvement that AMD reported about Zen vs XV? Is it ST or MT? Please, can you ask AMD to split such absolute value according to YOUR definition?

AMD, did, in fact, state that 40% improvement is about SINGLE THREAD improvement. What now?

cdimauro said:
Well, I don't think that AMD has compiled Blender with Intel's ICC compiler, which is the only way that you can have two different code-paths to be executed depending on the specific micro-architecture.

It's likely that GCC was used, with just one, common, code path executed for SSE2 code.

CPU Dispatching support in GCC is like 4 year old at this point.

cdimauro said:
Of course, there's the possibility that Blender was compiled with specific optimizations for:

- Zen
- Broadwell
- both

but we don't know.

Sort of my point.

cdimauro said:
I see 1.44Ghz, not 1Ghz, on the posted images.

Geekbench reports that 6.2Ghz 6770k works on 4Ghz (check leaderboards for evidence). It's detection is faulty to say the least. Hence this hypothesis has right to exist for now. We shall see whether it will be supported in few months.

The Stilt said:
Here is my FX-8800P GB4 result with fixed 3.4GHz speed, DRAM at 2133MHz.

http://browser.primatelabs.com/v4/cpu/241727

Well, it is 724 per Ghz. 724 times 1.4 = 1013. Pretty close to where first GB4 leak places it. Well, RIP Dreams, 1Ghz hypothesis is very likely real.

Abwx · Oct 3, 2016

cdimauro said:
What I've missed, my lord?

That people who are really interested in this CPU have already figured for obvious reasons that the test at GB wasnt done at 1.44Ghz, and that they stated the evidences, so why pointing that you did read 1.44GHz as if it was an argument about the actual frequency being this one...?..

The moment that some people pointed that this Zen at 1.44GHz seemed to perform like an XV clocked identically the debate should move foward in the direction of more clarity, not in the direction of more confusion...

cdimauro · Oct 3, 2016

lolfail9001 said:
AMD, did, in fact, state that 40% improvement is about SINGLE THREAD improvement. What now?

Two different sources which do NOT report anything about "single thread" improvements:

http://www.anandtech.com/show/10391/amd-briefly-shows-off-zen-summit-ridge-silicon
http://wccftech.com/amd-zen-architecture-release-schedule-revealed-rolled-server-market/

CPU Dispatching support in GCC is like 4 year old at this point.

It seems to work by feature: http://agner.org/optimize/blog/read.php?i=167

In short: if the CPU supports the same (subset) of instructions, the executed code-path is the same for both CPUs, even if they have a completely different micro-architecture.

Which is this case.

Sort of my point.

Fine.

Geekbench reports that 6.2Ghz 6770k works on 4Ghz (check leaderboards for evidence). It's detection is faulty to say the least. Hence this hypothesis has right to exist for now. We shall see whether it will be supported in few months.

OK, but what about Zen?

cdimauro · Oct 3, 2016

Abwx said:
That people who are really interested in this CPU have already figured for obvious reasons that the test at GB wasnt done at 1.44Ghz, and that they stated the evidences, so why pointing that you did read 1.44GHz as if it was an argument about the actual frequency being this one...?..

The moment that some people pointed that this Zen at 1.44GHz seemed to perform like an XV clocked identically the debate should move foward in the direction of more clarity, not in the direction of more confusion...

OK, now I got it. Correct.

lolfail9001 · Oct 3, 2016

cdimauro said:
Two different sources which do NOT report anything about "single thread" improvements:

Because that confirmation is pretty recent (was mentioned on the sides at recent Hot Chips). I remember AMD mentioning it directly as well, but here's a quote from some guy who does not look to have a reason to make it up:
https://twitter.com/Daniel_Bowers/status/768270633125806081

cdimauro said:
OK, but what about Zen?

In regards to Zen see my reply to Stilt's benchmark. It's probably GB3 being this bad of benchmark, rather than clocks being wrong.

Abwx · Oct 3, 2016

The Stilt said:
Here is my FX-8800P GB4 result with fixed 3.4GHz speed, DRAM at 2133MHz.

http://browser.primatelabs.com/v4/cpu/241727

If you could do the same with GB3, it s what was lastly submitted for the Zen plateform..

Other than this your ST score is 721pts/GHz while the GB4 Zen ST score was at 1141 at an unknown frequency, if that was at 1GHz as some people think then 1141/721 = 1.58, wich is in line with AMD claims, that s more than 40%, likely partly due to some FP tests as well as AES and SHA wich should both show big gains in the INT side of this bench.

Arachnotronic · Oct 3, 2016

The Stilt said:
Here is my FX-8800P GB4 result with fixed 3.4GHz speed, DRAM at 2133MHz.

http://browser.primatelabs.com/v4/cpu/241727

That's useful, thanks. It looks to me that the 1141 score for the 1.44GHz (?) Zen is pretty much in line with what AMD has said. Excavator was not a good core, barely better than a Goldmont Atom

lolfail9001 · Oct 3, 2016

Yeah, linux, yada-yada but here is the 6400 result on dual channel DDR4 2133Mhz for Skylake reference. As it is stock 6400, single thread is 3.3Ghz, multi-thread is 3.1Ghz. Checks out to be ~15% faster per clock than our hypothetical "1Ghz Zen".
http://browser.geekbench.com/v4/cpu/629952

Interesting bit: AES performance per clock is identical on Excavator and Skylake. ASICs are good, apparently.

The more interesting bit, however, is the fact that scaling on Zen samples worries me to no end still. It's just too freaking low for 32, let alone 64 cores.

cdimauro · Oct 3, 2016

lolfail9001 said:
Because that confirmation is pretty recent (was mentioned on the sides at recent Hot Chips). I remember AMD mentioning it directly as well, but here's a quote from some guy who does not look to have a reason to make it up:
https://twitter.com/Daniel_Bowers/status/768270633125806081

So, it's quite recent. Fine.

In regards to Zen see my reply to Stilt's benchmark. It's probably GB3 being this bad of benchmark, rather than clocks being wrong.

I saw, thanks.

cdimauro · Oct 3, 2016

Regarding the IPC dispute, here are also some statements by AMD's Papermaster from a recent interview, which I think isn't second to Daniel Bowers:

" It's actually pretty much rolling up the sleeves and hardcore micro architecture engineering, it's the nuts and bolts of how you take an existing instruction set architecture x86 and deliver more performance every clock cycle. In fact 40% more performance every clock cycle is the target we've set out and in fact achieved. We did that through three areas, performance of the engine itself, the throughput to that engine and of course thirdly, doing that at very high energy efficiency which matters in every workload in which you're running across your CPU.
[...]
So FinFET on top of all the micro architectural improvements it got 40% improvement per clock FinFET is an excellent technology highly scalable and delivers significant energy efficiency gains."

He doesn't seem to have talked about single-threaded...

lolfail9001 · Oct 3, 2016

cdimauro said:
Regarding the IPC dispute, here are also some statements by AMD's Papermaster from a recent interview, which I think isn't second to Daniel Bowers:

" It's actually pretty much rolling up the sleeves and hardcore micro architecture engineering, it's the nuts and bolts of how you take an existing instruction set architecture x86 and deliver more performance every clock cycle. In fact 40% more performance every clock cycle is the target we've set out and in fact achieved. We did that through three areas, performance of the engine itself, the throughput to that engine and of course thirdly, doing that at very high energy efficiency which matters in every workload in which you're running across your CPU.
[...]
So FinFET on top of all the micro architectural improvements it got 40% improvement per clock FinFET is an excellent technology highly scalable and delivers significant energy efficiency gains."
He doesn't seem to have talked about single-threaded...

But that's the problem. Here you can only say "seem", so they don't exactly contradict what Bowers says was claimed.

Also, notice how Papermaster distinguishes "performance" and "throughput". It's pretty much a direct claim that "performance" you underline stands for "single thread performance", because "throughput" has to stand for something else, right?

cdimauro · Oct 3, 2016

What about the 40% improvement "on top of all the micro architectural improvements"?

lolfail9001 · Oct 3, 2016

cdimauro said:
What about the 40% improvement "on top of all the micro architectural improvements"?

Considering that generally process has minor impact on actual architecture performance (in particular per clock), it kinda discredits interview as a whole, if anything.

coercitiv · Oct 3, 2016

cdimauro said:
And what about the 40% improvement that AMD reported about Zen vs XV? Is it ST or MT? Please, can you ask AMD to split such absolute value according to YOUR definition?

cdimauro said:
What about the 40% improvement "on top of all the micro architectural improvements"?

AMD mentions 40% for ST IPC, first of all because they would always communicate the higher number (40% for MT then communicate 50%+ for ST), second because if it were otherwise Zen would tie or beat Intel's HEDT in more traditional benchmarks such as Cinebench. So not only is there confusion in the way you used IPC in relation to the number of threads per core, you may have also chosen a rather unreliable path to interpret what little AMD claimed in regard to Zen vs. XV

jpiniero · Oct 3, 2016

deasd said:
Also a geekbench result. Take with a grain of salt.

http://semiaccurate.com/forums/showpost.php?p=273864&postcount=3883

WCCFTech, so I vote fake. That would be a pretty good result assuming the turbo is working though.

DrMrLordX · Oct 3, 2016

AtenRa said:
Or for a cheaper Overclockably Quad Core CPU priced between Core i3 and Core i5 Kabylake. At a price of $180 and IF it can OC to 4-4.5GHz it will be a really nice alternative to locked Core i3-i5 KabyLake.

Maybe, if the IPC is somewhere in the same department of Haswell or Broadwell. Ivy Bridge? Ehhhhhh.

deasd said:
Also a geekbench result. Take with a grain of salt.

Yeah I'm not sure what to make of that either. I can't bring myself to believe that when I am already so outwardly skeptical of early GB4 benchmark leaks. And of course now all the records of those leaks have been erased so make of that what you will.

SarahKerrigan · Oct 3, 2016

cdimauro said:
Regarding the IPC dispute, here are also some statements by AMD's Papermaster from a recent interview, which I think isn't second to Daniel Bowers:

" It's actually pretty much rolling up the sleeves and hardcore micro architecture engineering, it's the nuts and bolts of how you take an existing instruction set architecture x86 and deliver more performance every clock cycle. In fact 40% more performance every clock cycle is the target we've set out and in fact achieved. We did that through three areas, performance of the engine itself, the throughput to that engine and of course thirdly, doing that at very high energy efficiency which matters in every workload in which you're running across your CPU.
[...]
So FinFET on top of all the micro architectural improvements it got 40% improvement per clock FinFET is an excellent technology highly scalable and delivers significant energy efficiency gains."
He doesn't seem to have talked about single-threaded...

I was at HC28. Mr. Clark was quite clear that it was ST.

Questioner from Nvidia: You had 40% uplift on IPC. Did it include the dual-thread, or was that per thread?
Clark: That was just a one-thread number. We do have good throughput on SMT but we're not stating numbers on that right now.

For those who have access to HC presentation videos, the timecode is 1:27:46 on the session9 video.

inf64 · Oct 3, 2016

SarahKerrigan said:
I was at HC28. Mr. Clark was quite clear that it was ST.

Questioner from Nvidia: You had 40% uplift on IPC. Did it include the dual-thread, or was that per thread?
Clark: That was just a one-thread number. We do have good throughput on SMT but we're not stating numbers on that right now.

For those who have access to HC presentation videos, the timecode is 1:27:46 on the session9 video.

Thanks for the quotes from the HC presentation. I reckon they targeted 2x throughput Vs Piledriver core (at the same clock) and this is easily achievable via ~60% IPC improvement over PD and ~25% SMT gain in MT code : 1.6x1.25=2 ; note I didn't use the CMT penalty adjustment since PD loses 15-20% in total throughput when 2 threads are competing for resources in a module (this would mean that Zen core would have even >2x throughput Vs a PD module)..

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Golden Member

Senior member

Member

Member

Golden Member

Lifer

Member

Golden Member

Lifer

Member

Member

Golden Member

Lifer

Lifer

Golden Member

Member

Member

Golden Member

Member

Golden Member

Diamond Member

Lifer

Lifer

Senior member

Diamond Member