AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.

Marginaly faster you said.?.

Besides, look at the impact of SMT in Intel s CPUs for Blender...

blender.png



blender-276.jpg


http://www.bozzabench.com/Tests/Tes...-Athlon-X4-845-3ds-Max-Cinema-4D-Blender.aspx
 
Last edited:

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.
Well hopefully in Blender the IPC jump is going to be big since Zen has no cash limitations unlike EX core.

Have you tested the quad ch. vs dual ch. on Haswell-E? If you get that done you should post it as an article because I couldn't fine ONE review online that covers Haswell-E (or later) cores and the effect of memory BW on a range of desktop workloads. Maybe my google skills are not good :(
 

KTE

Senior member
May 26, 2016
478
130
76
Last edited:

KTE

Senior member
May 26, 2016
478
130
76
Of course it is over Excavator. But the thing is, in Blender Excavator is marginally faster than Piledriver. Just did the same test on FX-8800P. 6 minutes 57 seconds at 3.4GHz, which would mean ~ 6 minutes 45 seconds at 3.5GHz ((3400/3500) * 417), i.e < 1% higher IPC than on PD. Most likely the smaller L2 cache and AVX2 at it again.
Why are you testing ST BTW?

AMDs demo was MT.

Sent from HTC 10
(Opinions are own)
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Have you tested the quad ch. vs dual ch. on Haswell-E? If you get that done you should post it as an article because I couldn't fine ONE review online that covers Haswell-E (or later) cores and the effect of memory BW on a range of desktop workloads. Maybe my google skills are not good :(

Going to dual from quad did absolutely nothing to the results. Infact in both cases (ST & MT) dual was marginally faster. 16GB CL12 dual rank modules running at 2133MHz.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Why are you testing ST BTW?

AMDs demo was MT.

Sent from HTC 10
(Opinions are own)

Probably because there is no other way to properly compare CMT and SMT CPUs between each other? And according to the statement AMD made, "40% higher IPC, over Excavator core".

If they meant modules, then they have lied about the core count in their 15h CPUs.
 
  • Like
Reactions: Arachnotronic

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
When you think about it, Zen has 2x the FP and L/S resources per core Vs PD and since it has 2x more threads that comes to around same amount of resources per thread (with supposedly lower instruction latencies). Also it has SMT which adds another 20-30% if done right. Should be just about 130% faster than PD.
Just because hyperthreading is a form of SMT does not mean that every form of SMT is hyperthreading.
Yes,every ZEN core has 2x the FP and L/S resources of an EX core,but it remains to be seen how AMDs SMT will work,if it just cuts down the core into two equal threads then one ZEN core will be the equivalent of an EX module minus the module penalty and plus some other improvements.

The single core speed (when running a single thread) will stay pretty much the same while the ZEN core (module) running two threads will have an ~40% improvement.

(presto chango 40% IPC improvement per "core" (as long as you run multiple threads) )
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.
 
  • Like
Reactions: Arachnotronic

dark zero

Platinum Member
Jun 2, 2015
2,655
140
106
Maybe AMD SMT is on Ivy Bridge levels... We need to wait some more time to see real scores from AMD.
 

KTE

Senior member
May 26, 2016
478
130
76
One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.
That's why I said, why ST to AMDs MT showing

How will you compare when AMD has claimed "IPC per core" but only tested in a highly SMT favoring MT benchmark?

ST testing accounts for CMT/SMT only when we have an idea how much IPC has changed from EXC with/without SMT per core in the first place, let alone how SMT/performance scales to Multicore.

We're still left stabbing in the dark.

A core can do well in ST, in comparison to the average, or well in MT in comparison to the average. Both throw off the scaling comparisons.

Even SMT scaling is completely unknown with >1Core at play (more resource utilization).

Sent from HTC 10
(Opinions are own)
 

DrMrLordX

Lifer
Apr 27, 2000
22,931
13,014
136
Probably because there is no other way to properly compare CMT and SMT CPUs between each other?

That seems kind of backwards. SMT and CMT are so radically different that you're seeing entirely different faces of each CPU running only one thread on them (shared l3 notwithstanding). The only way to compare them "fairly" is to run workloads that scale perfectly (or near-perfectly) with more cores and max out their respective thread counts.

As to why AMD might have shown a (can we not do tildes anymore? What is this?) approximate 130% gain in IPC from Vishera/Piledriver in their Summit Ridge benchmark, well, Blender does make use of AVX2 . . . and even though XV's implementation of AVX2 is quite poor, I have observed (through programs like y-cruncher) that XV is already leaps and bounds ahead of PD on a per-module basis in SIMD-heavy code.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Yes,every ZEN core has 2x the FP and L/S resources of an EX core,but it remains to be seen how AMDs SMT will work,if it just cuts down the core into two equal threads then one ZEN core will be the equivalent of an EX module minus the module penalty and plus some other improvements.

The single core speed (when running a single thread) will stay pretty much the same while the ZEN core (module) running two threads will have an ~40% improvement.

(presto chango 40% IPC improvement per "core" (as long as you run multiple threads) )
A Zen core has about 1.3x the FP resources of a XV core with ST code. L/S resources are roughly the same, but likely more efficient.

AMD's SMT (a first try can be seen since BD in the FPUs) doesn't work via back end resource partitioning. And your example with single core vs. SMT IPC improvement backfires, as such a good scaling wouldn't be possible with hardware resources, which won't already increase ST performance significantly. If there are not enough FUs (say 2 ALUs + 2 AGUs), SMT scaling would be <10%, because SMT is not just about better utilization. There is also a lot of shared resources. This is also supported by their statement:
more execution resources benefit both modes
s_864720a7910a4a2fb5062436dfa04c89.jpg


If some discussion in the heat of arguments somehow gets disconnected from known statements, it's time to look at them again.

What I see, is an IPC improvement of a "Zen core" over an "Excavator core" on the original slide. Plus a footnote
3. Based on internal AMD estimates for “Zen” x86 CPU core compared to “Excavator” x86 CPU core.
Nothing else. And with what I saw so far, 40% only with SMT makes no sense. If this is not the case, then I'm open to sound explainations, why reality differs from what I'm seeing. ;)

One more observation regarding Blender. The SMT yield in Blender appears to be unusually high. In similar applications, such as Cinebench the yield is around 27% on Haswell-E. In Blender the yield is > 59%. Blender BMW benchmark (at default resolution, 20x20 tiles) was completed in 127.98 seconds with 18C/18T while with SMT enabled the time was reduced to 90.07 seconds.
This could only be properly analyzed at code or assembly level plus some IBS data. If CB working sets are a bigger than Blender's, the latter might just fit data of 2 threads better into the caches. And if CB code extracts more ILP in one thread, there is less room to fill the gaps with a 2nd threads' FP ops.

As to why AMD might have shown a (can we not do tildes anymore? What is this?) approximate 130% gain in IPC from Vishera/Piledriver in their Summit Ridge benchmark, well, Blender does make use of AVX2 . . . and even though XV's implementation of AVX2 is quite poor, I have observed (through programs like y-cruncher) that XV is already leaps and bounds ahead of PD on a per-module basis in SIMD-heavy code.
A tilde is a legit symbol. And it saves space, as they actually should be used more often in speculative threads. :)
A tilde is also used to indicate "approximately equal to" (e.g. 1.902 ~= 2).
https://en.wikipedia.org/wiki/Tilde#As_a_relational_operator
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
And your example with single core vs. SMT IPC improvement backfires, as such a good scaling wouldn't be possible with hardware resources, which won't already increase ST performance significantly.
Yes if SMT is adaptive then ST performance will be significantly improved....for any thread actually being able to use all 4 integer ALUs and all FPU units.
Outside of distributed computing do you know any thread that would be able to pull this off?
But I guess if they get some reviews with good CB ST scores that's enough for them to keep selling CPUs since more then enough people will ,based on that, think that the cpu will run anything fast.
Just look at how many people think that FXs play games fast because they get good CB MT scores.
 

jihe

Senior member
Nov 6, 2009
747
97
91
Everyone ignores that a server guy said it, on anything greater the 1P amd still ruled to roost until Nehalem.

But again i ask the doubters, where is the performance deficit coming from, which area's specifically. I would never try to guess IPC with the known information,but you guys have no problem doing so. But there is nothing in all the known detail ( we know quite a bit from link'din, die shot, compiler patches and what amd released a few days ago) thats says IPC can't be high.

More execution width
Lower latency cache
More associative, larger per int core L1I
more cache bandwidth
larger PRF's
greater issue width (u-op cache)

I can keep naming things that are all headed in the direction of improved IPC, So start naming things other then "derp derp AMD" that look like performance limiters..............

Yes I remain cautiously optimistic at a competitive Zen. Sandybridge level of performance is all I ask for.
 

DrMrLordX

Lifer
Apr 27, 2000
22,931
13,014
136
A tilde is a legit symbol. And it saves space, as they actually should be used more often in speculative threads. :)

A bit off-topic, but at the time, attempting to use a tilde produced a dash instead (~~~~~ see it's still doing it) so . . . must be a bug/snafu in there somewhere. Anyway I think VirtualLarry caught on to it.

So back on topic. I think people are beginning to see that something better than Sandy Bridge is possible from Zen. Really, there need to be some more benchmarks. It would be nice if some strategically-leaked ES chips popped up for our edification, but then this isn't late 2005/early 2006 when Conroe ES chips were floating around . . .
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Yes I remain cautiously optimistic at a competitive Zen. Sandybridge level of performance is all I ask for.
It seems reasonable to assume that if Sandy were the mere target then Keller and company would have been able to meet that with a construction derivative instead of a new design. I think it's safe to assume, even before seeing benchmarks, that AMD planned to release a more competitive architecture than that.
 

Sweepr

Diamond Member
May 12, 2006
5,148
1,143
136
Unpacking AMD's Zen Benchmark: Is Zen actually 2% Faster than Broadwell?

There is a reason for AMD only showing this benchmark - it's either a best case scenario, or they are pitching their expectations exactly where they want people to think. By using a custom workload on open source software, the result is very specific and cannot be extrapolated in any meaningful way. This is why a typical benchmark suite offers 10-20 tests with different workloads, and even enterprise standard workloads like SPEC come with over a dozen tests in play, to cater for single thread or multi-thread or large cache or memory or pixel pushing bottleneck that may occur. Single benchmarks on their own are very limited in scope as a result.

1) The Results Are Not Externally Verifiable At This Time, As Expected
2) No Memory or TDP Numbers Were Provided
3) Blender Is an Open Source Platform
4) Did It Actually Measure IPC? (The Philosophical Debate)
5) The Workload Is Custom
6) It Is Only One Benchmark
7) There's Plenty about the Microarchitecture and Chip We Don't Know Yet, e.g. Uncore
8) Clock Speeds Are Not Final, Efficiency Not Known
9) We Will Have to Wait to Test


www.anandtech.com/show/10585/unpacking-amds-zen-benchmark-is-zen-actually-2-faster-than-broadwell
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136

"Nice" article..

I read this in this paper :

The test was to render a mockup of a Zen based desktop CPU, with an effective workload of 50 seconds for these chips.

Did he forget to reset the speed of the video..?
Because it look to last 20s rather than 50s, the rendering seems to start at 1mn27 in the link below, i did put the cursor at 1mn25.

 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
I think the rendering video is at 4x speed.

And as i have said before, this demo was about MT throughput (IPC + SMT + Multi Core scaling)
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
AMD have overhyped and underdelivered too many times in the past. So I am wary of any AMD benchmark in a controlled scenario. The final word on Zen will be on launch when there is independent third party reviews. We have to wait and see what clocks /TDP AMD hit for 8c/16t summit ridge and 32c/64t naples. Right now I am cautiously optimistic and hoping that AMD do not repeat mistakes of the past.
 
  • Like
Reactions: Gikaseixas

inf64

Diamond Member
Mar 11, 2011
3,884
4,692
136
AMD have overhyped and underdelivered too many times in the past. So I am wary of any AMD benchmark in a controlled scenario. The final word on Zen will be on launch when there is independent third party reviews. We have to wait and see what clocks /TDP AMD hit for 8c/16t summit ridge and 32c/64t naples. Right now I am cautiously optimistic and hoping that AMD do not repeat mistakes of the past.
From a high level view and based on what Hot Chips slides revealed, Zen is somewhere in between SB and Skylake when core resources are in question (closer to Skylake in integer and closer to SB/IB in FP). I bet the IPC will end up right in the middle of these two and from AT we know the gap is around 25%. So if it was to end up in the middle of that range or a bit lower (around 10% faster than SB) it would end up being: 1) just a tiny bit faster than IB (~4%) ; 2) 7% slower than HSWL; 3) 10% slower than BDWL ; 4) ~14% slower than Skylake (funny enough the int/fp schedulers in Zen are ~15-20% smaller than Skylake's and right at Haswell level).
 
Status
Not open for further replies.