AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Abwx · Aug 21, 2016

The Stilt said:
Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.

Marginaly faster you said.?.

Besides, look at the impact of SMT in Intel s CPUs for Blender...

http://www.bozzabench.com/Tests/Tes...-Athlon-X4-845-3ds-Max-Cinema-4D-Blender.aspx

inf64 · Aug 21, 2016

The Stilt said:
Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.

Well hopefully in Blender the IPC jump is going to be big since Zen has no cash limitations unlike EX core.

Have you tested the quad ch. vs dual ch. on Haswell-E? If you get that done you should post it as an article because I couldn't fine ONE review online that covers Haswell-E (or later) cores and the effect of memory BW on a range of desktop workloads. Maybe my google skills are not good

KTE · Aug 21, 2016

Abwx said:
Marginaly faster you said.?.

Besides, look at the impact of SMT in Intel s CPUs for Blender...

http://www.bozzabench.com/Tests/Tes...-Athlon-X4-845-3ds-Max-Cinema-4D-Blender.aspx

Which means it's a low IPC workload to begin with, hence greatly amplifying the gain with SMT enabled.

Sounds like a best-case SMT scenario.

I'll run TheStilts scene soon on a few CPUs... I do have a Broadwell-U laptop too.

Sent from HTC 10
(Opinions are own)

KTE · Aug 21, 2016

The Stilt said:
Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.

Why are you testing ST BTW?

AMDs demo was MT.

Sent from HTC 10
(Opinions are own)

The Stilt · Aug 21, 2016

inf64 said:
Have you tested the quad ch. vs dual ch. on Haswell-E? If you get that done you should post it as an article because I couldn't fine ONE review online that covers Haswell-E (or later) cores and the effect of memory BW on a range of desktop workloads. Maybe my google skills are not good

Going to dual from quad did absolutely nothing to the results. Infact in both cases (ST & MT) dual was marginally faster. 16GB CL12 dual rank modules running at 2133MHz.

The Stilt · Aug 21, 2016

KTE said:
Why are you testing ST BTW?

AMDs demo was MT.

Sent from HTC 10
(Opinions are own)

Probably because there is no other way to properly compare CMT and SMT CPUs between each other? And according to the statement AMD made, "40% higher IPC, over Excavator core".

If they meant modules, then they have lied about the core count in their 15h CPUs.

TheELF · Aug 21, 2016

inf64 said:
When you think about it, Zen has 2x the FP and L/S resources per core Vs PD and since it has 2x more threads that comes to around same amount of resources per thread (with supposedly lower instruction latencies). Also it has SMT which adds another 20-30% if done right. Should be just about 130% faster than PD.

Just because hyperthreading is a form of SMT does not mean that every form of SMT is hyperthreading.
Yes,every ZEN core has 2x the FP and L/S resources of an EX core,but it remains to be seen how AMDs SMT will work,if it just cuts down the core into two equal threads then one ZEN core will be the equivalent of an EX module minus the module penalty and plus some other improvements.

The single core speed (when running a single thread) will stay pretty much the same while the ZEN core (module) running two threads will have an ~40% improvement.

(presto chango 40% IPC improvement per "core" (as long as you run multiple threads) )

The Stilt · Aug 21, 2016

One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.

dark zero · Aug 21, 2016

Maybe AMD SMT is on Ivy Bridge levels... We need to wait some more time to see real scores from AMD.

blublub · Aug 21, 2016

dark zero said:
Maybe AMD SMT is on Ivy Bridge levels... We need to wait some more time to see real scores from AMD.

Would that be bad? I mean obviously it works pretty ok

KTE · Aug 21, 2016

The Stilt said:
One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.

That's why I said, why ST to AMDs MT showing

How will you compare when AMD has claimed "IPC per core" but only tested in a highly SMT favoring MT benchmark?

ST testing accounts for CMT/SMT only when we have an idea how much IPC has changed from EXC with/without SMT per core in the first place, let alone how SMT/performance scales to Multicore.

We're still left stabbing in the dark.

A core can do well in ST, in comparison to the average, or well in MT in comparison to the average. Both throw off the scaling comparisons.

Even SMT scaling is completely unknown with >1Core at play (more resource utilization).

Sent from HTC 10
(Opinions are own)

DrMrLordX · Aug 21, 2016

The Stilt said:
Probably because there is no other way to properly compare CMT and SMT CPUs between each other?

That seems kind of backwards. SMT and CMT are so radically different that you're seeing entirely different faces of each CPU running only one thread on them (shared l3 notwithstanding). The only way to compare them "fairly" is to run workloads that scale perfectly (or near-perfectly) with more cores and max out their respective thread counts.

As to why AMD might have shown a (can we not do tildes anymore? What is this?) approximate 130% gain in IPC from Vishera/Piledriver in their Summit Ridge benchmark, well, Blender does make use of AVX2 . . . and even though XV's implementation of AVX2 is quite poor, I have observed (through programs like y-cruncher) that XV is already leaps and bounds ahead of PD on a per-module basis in SIMD-heavy code.

FlanK3r · Aug 21, 2016

Excavator core is really the best AMD core ever. Stronger IPC than Phenoms II (clock to clock). I did review of 845 and it cna fight with 880K Athlon (only games are usually better at 880K). Or some proof of Anandtech also
http://www.anandtech.com/show/10436/amd-carrizo-tested-generational-deep-dive-athlon-x4-845/2

VirtualLarry · Aug 21, 2016

~~~~~~

Dresdenboy · Aug 21, 2016

TheELF said:
Yes,every ZEN core has 2x the FP and L/S resources of an EX core,but it remains to be seen how AMDs SMT will work,if it just cuts down the core into two equal threads then one ZEN core will be the equivalent of an EX module minus the module penalty and plus some other improvements.

The single core speed (when running a single thread) will stay pretty much the same while the ZEN core (module) running two threads will have an ~40% improvement.

(presto chango 40% IPC improvement per "core" (as long as you run multiple threads) )

A Zen core has about 1.3x the FP resources of a XV core with ST code. L/S resources are roughly the same, but likely more efficient.

AMD's SMT (a first try can be seen since BD in the FPUs) doesn't work via back end resource partitioning. And your example with single core vs. SMT IPC improvement backfires, as such a good scaling wouldn't be possible with hardware resources, which won't already increase ST performance significantly. If there are not enough FUs (say 2 ALUs + 2 AGUs), SMT scaling would be <10%, because SMT is not just about better utilization. There is also a lot of shared resources. This is also supported by their statement:

more execution resources benefit both modes

If some discussion in the heat of arguments somehow gets disconnected from known statements, it's time to look at them again.

What I see, is an IPC improvement of a "Zen core" over an "Excavator core" on the original slide. Plus a footnote

3. Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.

Nothing else. And with what I saw so far, 40% only with SMT makes no sense. If this is not the case, then I'm open to sound explainations, why reality differs from what I'm seeing.

The Stilt said:
One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.

This could only be properly analyzed at code or assembly level plus some IBS data. If CB working sets are a bigger than Blender's, the latter might just fit data of 2 threads better into the caches. And if CB code extracts more ILP in one thread, there is less room to fill the gaps with a 2nd threads' FP ops.

DrMrLordX said:
As to why AMD might have shown a (can we not do tildes anymore? What is this?) approximate 130% gain in IPC from Vishera/Piledriver in their Summit Ridge benchmark, well, Blender does make use of AVX2 . . . and even though XV's implementation of AVX2 is quite poor, I have observed (through programs like y-cruncher) that XV is already leaps and bounds ahead of PD on a per-module basis in SIMD-heavy code.

A tilde is a legit symbol. And it saves space, as they actually should be used more often in speculative threads.

A tilde is also used to indicate "approximately equal to" (e.g. 1.902 ~= 2).

https://en.wikipedia.org/wiki/Tilde#As_a_relational_operator

TheELF · Aug 21, 2016

Dresdenboy said:
And your example with single core vs. SMT IPC improvement backfires, as such a good scaling wouldn't be possible with hardware resources, which won't already increase ST performance significantly.

Yes if SMT is adaptive then ST performance will be significantly improved....for any thread actually being able to use all 4 integer ALUs and all FPU units.
Outside of distributed computing do you know any thread that would be able to pull this off?
But I guess if they get some reviews with good CB ST scores that's enough for them to keep selling CPUs since more then enough people will ,based on that, think that the cpu will run anything fast.
Just look at how many people think that FXs play games fast because they get good CB MT scores.

jihe · Aug 21, 2016

itsmydamnation said:
Everyone ignores that a server guy said it, on anything greater the 1P amd still ruled to roost until Nehalem.

But again i ask the doubters, where is the performance deficit coming from, which area's specifically. I would never try to guess IPC with the known information,but you guys have no problem doing so. But there is nothing in all the known detail ( we know quite a bit from link'din, die shot, compiler patches and what amd released a few days ago) thats says IPC can't be high.

More execution width
Lower latency cache
More associative, larger per int core L1I
more cache bandwidth
larger PRF's
greater issue width (u-op cache)

I can keep naming things that are all headed in the direction of improved IPC, So start naming things other then "derp derp AMD" that look like performance limiters..............

Yes I remain cautiously optimistic at a competitive Zen. Sandybridge level of performance is all I ask for.

guskline · Aug 22, 2016

jihe, you have a realistic outlook.

DrMrLordX · Aug 22, 2016

Dresdenboy said:
A tilde is a legit symbol. And it saves space, as they actually should be used more often in speculative threads.

A bit off-topic, but at the time, attempting to use a tilde produced a dash instead (~~~~~ see it's still doing it) so . . . must be a bug/snafu in there somewhere. Anyway I think VirtualLarry caught on to it.

So back on topic. I think people are beginning to see that something better than Sandy Bridge is possible from Zen. Really, there need to be some more benchmarks. It would be nice if some strategically-leaked ES chips popped up for our edification, but then this isn't late 2005/early 2006 when Conroe ES chips were floating around . . .

superstition · Aug 22, 2016

jihe said:
Yes I remain cautiously optimistic at a competitive Zen. Sandybridge level of performance is all I ask for.

It seems reasonable to assume that if Sandy were the mere target then Keller and company would have been able to meet that with a construction derivative instead of a new design. I think it's safe to assume, even before seeing benchmarks, that AMD planned to release a more competitive architecture than that.

Sweepr · Aug 23, 2016

Unpacking AMD's Zen Benchmark: Is Zen actually 2% Faster than Broadwell?

There is a reason for AMD only showing this benchmark - it's either a best case scenario, or they are pitching their expectations exactly where they want people to think. By using a custom workload on open source software, the result is very specific and cannot be extrapolated in any meaningful way. This is why a typical benchmark suite offers 10-20 tests with different workloads, and even enterprise standard workloads like SPEC come with over a dozen tests in play, to cater for single thread or multi-thread or large cache or memory or pixel pushing bottleneck that may occur. Single benchmarks on their own are very limited in scope as a result.

1) The Results Are Not Externally Verifiable At This Time, As Expected
2) No Memory or TDP Numbers Were Provided
3) Blender Is an Open Source Platform
4) Did It Actually Measure IPC? (The Philosophical Debate)
5) The Workload Is Custom
6) It Is Only One Benchmark
7) There's Plenty about the Microarchitecture and Chip We Don't Know Yet, e.g. Uncore
8) Clock Speeds Are Not Final, Efficiency Not Known
9) We Will Have to Wait to Test

www.anandtech.com/show/10585/unpacking-amds-zen-benchmark-is-zen-actually-2-faster-than-broadwell

Abwx · Aug 23, 2016

Sweepr said:
Unpacking AMD's Zen Benchmark: Is Zen actually 2% Faster than Broadwell?

www.anandtech.com/show/10585/unpacking-amds-zen-benchmark-is-zen-actually-2-faster-than-broadwell

"Nice" article..

I read this in this paper :

The test was to render a mockup of a Zen based desktop CPU, with an effective workload of 50 seconds for these chips.

Did he forget to reset the speed of the video..?
Because it look to last 20s rather than 50s, the rendering seems to start at 1mn27 in the link below, i did put the cursor at 1mn25.

AtenRa · Aug 23, 2016

I think the rendering video is at 4x speed.

And as i have said before, this demo was about MT throughput (IPC + SMT + Multi Core scaling)

raghu78 · Aug 23, 2016

AMD have overhyped and underdelivered too many times in the past. So I am wary of any AMD benchmark in a controlled scenario. The final word on Zen will be on launch when there is independent third party reviews. We have to wait and see what clocks /TDP AMD hit for 8c/16t summit ridge and 32c/64t naples. Right now I am cautiously optimistic and hoping that AMD do not repeat mistakes of the past.

inf64 · Aug 23, 2016

raghu78 said:
AMD have overhyped and underdelivered too many times in the past. So I am wary of any AMD benchmark in a controlled scenario. The final word on Zen will be on launch when there is independent third party reviews. We have to wait and see what clocks /TDP AMD hit for 8c/16t summit ridge and 32c/64t naples. Right now I am cautiously optimistic and hoping that AMD do not repeat mistakes of the past.

From a high level view and based on what Hot Chips slides revealed, Zen is somewhere in between SB and Skylake when core resources are in question (closer to Skylake in integer and closer to SB/IB in FP). I bet the IPC will end up right in the middle of these two and from AT we know the gap is around 25%. So if it was to end up in the middle of that range or a bit lower (around 10% faster than SB) it would end up being: 1) just a tiny bit faster than IB (~4%) ; 2) 7% slower than HSWL; 3) 10% slower than BDWL ; 4) ~14% slower than Skylake (funny enough the int/fp schedulers in Zen are ~15-20% smaller than Skylake's and right at Haswell level).

AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Lifer

Diamond Member

Senior member

Senior member

Golden Member

Golden Member

Diamond Member

Golden Member

Platinum Member

Member

Senior member

Lifer

Senior member

No Lifer

Golden Member

Diamond Member

Senior member

Diamond Member

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Diamond Member

Diamond Member