AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Abwx · Oct 2, 2016

Arachnotronic said:
I think Phynaz meant that AMD should do what you are saying -- send a Zen chip to a popular website and let them go to town showing how it curb stomps Broadwell-E

They will, once it s launched, or did any manfacturer of anything send an ES to any site before the product was launched, tell us when this already happened, i m much interested to see if any other firm ever bowed to such irrealistic and irrational demands...

cdimauro said:
The question is quite simple: you cannot use all available ports all the time, even with such kind of favorable code.

Not sure that your statement is not self contradictory..

In Blender it is obvious that HW for instance doesnt execute 2FP MUL or 1 FP MUL + 1 FP ADD for a single thread each cycle, otherwise there wouldnt be enough ressource left to gain 50-60% when pushing a second thread in the same core, the ratio suggest that the code has dependencies such that only 1.3 FP ops per thread and per cycle are executed.

This explain both HW huge SMT gain and Zen inability to do much better than BDW despite a more adequate FPU.

If each cycle the first thread does 1.3 FP then Zen could theoricaly execute 2.6 FP ops/cycle when using SMT, that is 30% more than Haswell, but of course this would require that the ops repartition in the code is such that a unit comprising 2 FP MUL + 2 FP ADD could provide 30% more ops/cycle than the unit comprising 2 FP MUL or 1 FP MUL + 1 FP ADD, wich is of course unlikely, for instance if the ops are mainly FP MULs and few FP ADDs then the two cores will yield about the same throughput in both ST and SMT..

cdimauro · Oct 2, 2016

I think that you've forgot that even FP-intensive code has to execute scalar/integer instructions, as well as make load/store from/to memory.

In the case of Zen, it was explained by another guy here that an FP instruction which accesses memory needs 2 uops, one that goes to the L/S unit, and another one to one FPU port (when it has the value coming from memory). So, it keeps busy two different ports.

And you have the limit of 6 uops dispatched per cycle by the Micro-op cache, even with 10 total ports available each cycle...

Abwx · Oct 2, 2016

cdimauro said:
In the case of Zen, it was explained by another guy here that an FP instruction which accesses memory needs 2 uops, one that goes to the L/S unit, and another one to one FPU port (when it has the value coming from memory). So, it keeps busy two different ports.

And you have the limit of 6 uops dispatched per cycle by the Micro-op cache, even with 10 total ports available each cycle...

You are assuming that all series of dispatched blocs of 6 uops require a single cycle, that s not the case, the 6 dispatched uops/cycle will eventually necessitate more than one cycle to be completed by the exe units, if an extra cycle is required at a point then there will be up to 12 uops in the schedulers entries (dispatched during the normal and the extra cycle) ready to be sent to the exe unit the next cycle, so each time there s the need of an extra cycle there will be more than 6 uops scheduled once the extra cycle is over, hence the 10 parraleled ports.

cdimauro · Oct 2, 2016

Of course: I haven't said nothing different. In fact, let me quote parte of my previous post:

"you cannot use all available ports all the time"

EDIT. BTW, here is one of the most stressed code in Blender: https://developer.blender.org/diffu.../blender/render/intern/raytrace/rayobject.cpp

Which proves what I've stated before.

KTE · Oct 2, 2016

The Stilt said:
Someone (who is interested enough) recalculate those Zeppelin GB ST figures with assumption that the CPU worked at 1.0GHz, instead of 1.45GHz? 1.0GHz is a valid frequency state for the SKU used in that leak and the lowest (plausible, non PG) of the available states the CPU could have operated at.

If it still doesn't put it within ~20% of the IPC of Intel wells and lakes, then I'd say it is doing a DAR while running the benchmark. If not, AMD's lies have reached a completely new level.

My IVB 3667U at 1.1GHz scores the same as that Zen in ST.

Looking at past frequency power modes, if I was a betting man, that to me is pretty certain a test of Zen at 1GHz...

No chance I can see IVB having a >30% per clock lead over Zen, as that test is showing. Not even by AMDs poor pre-launch standards since 2006.

That would place Zen with IPC between SNB and HSW here.

jpiniero said:
Well, there's one result of a 2961Y (Haswell, 1.1 Ghz) which gets 1258 ST in GB4. Couple of results for 847 (Sandy Bridge, 1.1 Ghz) which the median is roughly 1122 once you throw out the really lowball scores. Ignoring AES, it's pretty competitive to that Zen Server result. So it's plausible that the ZS is running at only 1 Ghz.

In these recent CPUs, turbo boost and power states can confuse everything so such benchmark analysis is generally incorrect. Example,

My IVB 17W is supposed to be at 2GHz stock, but turbos to 3.2/3.0GHz stock. It stays there in MT. That's 50% higher clocks in MT.

Then, if I turn off TB, it still sticks at 2.5GHz in ST/MT load.

Sent from HTC 10
(Opinions are own)

cdimauro · Oct 2, 2016

Nice to talk about the Turbo. AFAIK, it was disabled in the Blender test, and in this case we cannot even take this result as a "good" one for performance measurement.

jpiniero · Oct 2, 2016

The problem with the 1 Ghz idea is that the MT score is so low that even if you gave it 40% more it would still be terrible given 64 cores. I still think it's the MCM.

KTE said:
In these recent CPUs, turbo boost and power states can confuse everything so such benchmark analysis is generally incorrect. Example,

I was thinking the same thing, hence why I chose chips without any turbo. I'm sure the craptops still throttle though.

The Stilt · Oct 2, 2016

KTE said:
That would place Zen with IPC between SNB and HSW here.

And that's exactly where the "40% IPC improvement over Excavator" should put Zen, given that the statement itself is accurate.

Arachnotronic · Oct 2, 2016

The Stilt said:
And that's exactly where the "40% IPC improvement over Excavator" should put Zen, given that the statement itself is accurate.

so I did some digging and apparently AMD promised 20% IPC boost in Steamroller over Piledriver. Was this actually accurate? From what I can tell, this wasn't the case across the board at all. Not even close.

We are working under the assumption that AMD's 40% number is actually legit, but they have fluffed up the numbers before...

The Stilt · Oct 2, 2016

Arachnotronic said:
so I did some digging and apparently AMD promised 20% IPC boost in Steamroller over Piledriver. Was this actually accurate? From what I can tell, this wasn't the case across the board at all. Not even close.

We are working under the assumption that AMD's 40% number is actually legit, but they have fluffed up the numbers before...

In the past AMD has also provided average figures for the IPC improvements. PD to SR was 10% on average and SR to XV was 5%.

Abwx · Oct 2, 2016

jpiniero said:
The problem with the 1 Ghz idea is that the MT score is so low that even if you gave it 40% more it would still be terrible given 64 cores. I still think it's the MCM.
.

How do you know that it s 64C and not 32C/64T. ?..

Because there s no mention of the number of threads in the submission, only 2 CPUs/64C, that could be 64C without SMT but then i wouldnt see the purpose to test a non fully functional chip or eventually to set off SMT..

Scaling get up to 40+ or so in some tests, within the mentioned conditions this would suggest about 25% SMT gain if ever the plateform is 2 x 16C/32T = 32C/64T and assuming that the soft scale at 100%, if that s not the case the SMT gain should be increased by the software scaling ratio inverse.

Besides this Zen plateform has apparently much lower RAM bandwith than the Bristol Ridge ES, assuming this latter use 2 x 2400MHz the Zen plateform is running at 1600MHZ RAM on two channels.

http://browser.primatelabs.com/geekbench3/compare/6032158?baseline=8076878

Glo. · Oct 2, 2016

KTE said:
My IVB 3667U at 1.1GHz scores the same as that Zen in ST.

Looking at past frequency power modes, if I was a betting man, that to me is pretty certain a test of Zen at 1GHz...

No chance I can see IVB having a >30% per clock lead over Zen, as that test is showing. Not even by AMDs poor pre-launch standards since 2006.

That would place Zen with IPC between SNB and HSW here.

In these recent CPUs, turbo boost and power states can confuse everything so such benchmark analysis is generally incorrect. Example,

My IVB 17W is supposed to be at 2GHz stock, but turbos to 3.2/3.0GHz stock. It stays there in MT. That's 50% higher clocks in MT.

Then, if I turn off TB, it still sticks at 2.5GHz in ST/MT load.

Sent from HTC 10
(Opinions are own)

Between Sandy Bridge and Haswell is... Ivy Bridge Architecture.

That is actually not bad at all. The last bit unknown would be the core clocks of the CPUs.

mikk · Oct 2, 2016

Glo. said:
Between Sandy Bridge and Haswell is... Ivy Bridge Architecture.

That is actually not bad at all. The last bit unknown would be the core clocks of the CPUs.

This would be extremely poor. Did you read properly? A much lower clocked Ivy Bridge with half the L3 Cache as fast as Zen @ ST. This is so poor that something can't be right.

Glo. · Oct 2, 2016

mikk said:
This would be extremely poor. Did you read properly? A much lower clocked Ivy Bridge with half the L3 Cache as fast as Zen @ ST. This is so poor that something can't be right.

I was talking about IPC. If it really is on the level of Ivy Bridge then it is pretty good. My 2.3 GHz Macbook Pro mid 2012 had Ivy Bridge CPU and it scored in GB around 3000 pts.

This is very important for any APU.

lolfail9001 · Oct 2, 2016

Abwx said:
How do you know that it s 64C and not 32C/64T. ?..

Because there s no mention of the number of threads in the submission, only 2 CPUs/64C, that could be 64C without SMT but then i wouldnt see the purpose to test a non fully functional chip or eventually to set off SMT..

Scaling get up to 40+ or so in some tests, within the mentioned conditions this would suggest about 25% SMT gain if ever the plateform is 2 x 16C/32T = 32C/64T and assuming that the soft scale at 100%, if that s not the case the SMT gain should be increased by the software scaling ratio inverse.

Besides this Zen plateform has apparently much lower RAM bandwith than the Bristol Ridge ES, assuming this latter use 2 x 2400MHz the Zen plateform is running at 1600MHZ RAM on two channels.

http://browser.primatelabs.com/geekbench3/compare/6032158?baseline=8076878

From the fact that it lists caches with 32 multiplier.

The Stilt · Oct 2, 2016

mikk said:
This would be extremely poor. Did you read properly? A much lower clocked Ivy Bridge with half the L3 Cache as fast as Zen @ ST. This is so poor that something can't be right.

Have you even looked what kind of IPC Excavator has? Sandy / Ivy Bridge kind of IPC on average is exactly what you should get when you increase it by 40%.

inf64 · Oct 2, 2016

I think it is paramount for AMD to have an IPC halfway between IB and Haswell (~5% faster than IB and ~5% slower than Haswell).

If we look at AT generational IPC comparison we can see that this would put Zen at ~10-12% lower IPC than Skylake (in non AVX256/FMA code). They would also need to bump the base and turbo to around 3.2/3.3 and 3.7Ghz in order to be competitive with 8C Broadwell SKUs. Not an impossible scenario but a lot of things have to come together for all that to happen.

mikk · Oct 2, 2016

Glo. said:
I was talking about IPC. If it really is on the level of Ivy Bridge then it is pretty good.

You have to read properly. You told between Sandy and Haswell which is not the case when Ivy Bridge @1.1 Ghz= Zen 1.44 Ghz

And that is a an Ivy Bridge with 4 MB L3.

lolfail9001 · Oct 2, 2016

mikk said:
You have to read properly. You told between Sandy and Haswell which is not the case when Ivy Bridge @1.1 Ghz= Zen 1.44 Ghz

And that is a an Ivy Bridge with 4 MB L3.

We are making a brave assumption that that sample could be running at 1Ghz.

Arachnotronic · Oct 2, 2016

I really wish AMD would give this Summit Ridge chip to a reviewer and get a preview out there.

Sweepr · Oct 2, 2016

inf64 said:
If we look at AT generational IPC comparison we can see that this would put Zen at ~10-12% lower IPC than Skylake (in non AVX256/FMA code).

Except you picked the absolute worst case scenario for Skylake IPC gain out of all reviews. According to Hardware.fr it's 18.25% faster than IB per clock in applications and 21.15% in games. Based on PCLab it's 18.8%/22.7% faster in applications/games.

bjt2 · Oct 2, 2016

cdimauro said:
So, do you plan to see Zen running better with ST code, since with MT it doesn't seem to shine?

No, the opposite... Since in MT seems to go well (blender test), and since the SMT gain should be greater than INTEL's due to much more ports, we can deduce that the ST should be inferior to INTEL's, because in MT, with probabily superior SMT AMD is about on par...

EDIT: i am talking of IPC only... If AMD, for instance, can clock higher in ST due to lower FO4, AMD can go on par or even beat INTEL even inST...

inf64 · Oct 2, 2016

If you calculate the difference from AT review you will find out that Skylake is 18% faster ( 1.112 x 1.033 x 1.027~= 1.8 ) than IB and thus confirming both hardware.fr and PClab findings for app IPC boost. Pairing Haswell with super fast DDR3 RAM would basically match the boost Skylake has from super fast DDR4 in games.

Sweepr · Oct 2, 2016

inf64 said:
Pairing Haswell with super fast DDR3 RAM would basically match the boost Skylake has from super fast DDR4 in games.

Skylake with the exact same DDR3 kit is still 16.8/16.5% faster than IB @ applications/games in Hardware.fr. Regarding PCLab, they paired the DDR3 systems with a very capable 2133 9-9-10-24 1N kit, and chose DDR4-2666 16-17-17-36 2N for Skylake (a far cry from 'super fast DDR4'), so I don't think there's any significant advantage here (if at all) - your 10-12% number is off. That is, if Zen can match Ivy Bridge IPC anyway.

cdimauro · Oct 2, 2016

bjt2 said:
No, the opposite... Since in MT seems to go well (blender test), and since the SMT gain should be greater than INTEL's due to much more ports, we can deduce that the ST should be inferior to INTEL's, because in MT, with probabily superior SMT AMD is about on par...

EDIT: i am talking of IPC only... If AMD, for instance, can clock higher in ST due to lower FO4, AMD can go on par or even beat INTEL even inST...

Having more ports doesn't mean that you can have better performance, and the Blender test clearly shows that (with Zen having also double the FP computing capability): 2% more is negligible result.

P.S. Talking of IPC, as well.

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Lifer

Member

Lifer

Member

Senior member

Member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Member