AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 223 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,685
136
Maybe cache thrashing causes a perf stall/drop (per core) on SMT machines.
To be honest, at this moment I am unable to make any sense of the PCGamer Prime numbers. SMT does not seem to yield any benefit. If we apply frequency scaling to the 7600K score towards either 7700K @ 3.4Ghz or 7700K stock, the score is so close that delta is under margin of error.
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
Just guessing (watching taskmgr) they run it on all threads in parallel so on an 8T processor its really a 32mb benchmark, that would make both cache latency and memory latency important.
Not if memory is shared.

the algorithm has a lot of If statements. one every few steps to see if the conditions are met. that's a lot of branching. remember the Flitz Chess benchmark? it had poor results too (8c vs 4c comparison).
Yes branching certainly is more an issue than the use of mod.
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
:D

https://warosu.org/g/thread/50289226

1442283302255.jpg
 

malitze

Junior Member
Feb 15, 2017
24
49
51
the algorithm has a lot of If statements. one every few steps to see if the conditions are met. that's a lot of branching. remember the Flitz Chess benchmark? it had poor results too (8c vs 4c comparison).

It is more a question of the pattern these branches are taken or not taken. I took this simple python implementation of the sieve of Atkin algorithm and added a simple 2bit predictor to roughly approximate how good that one would work. With a limit of 100000 that is what I got:

Code:
All branches: 440832, taken: 40954 (0.09290160423925668 %), predicted correctly: 399873 (0.9070870535714286 %)

As I said that is a very rough approximation but I think more sophisticated predictors in modern CPUs should not have a hard time with this particular use case, unless I happened to overlook something crucial ;)
 

OrangeKhrush

Senior member
Feb 11, 2017
220
343
96
My SFF system is aging, the 4460 at 3.2/3.4 can barely hit 2000 on Passmark Singlethread. Add the option of better IPC over moarrrr cores and SFF fus ro dah to the competition.
 

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
It is more a question of the pattern these branches are taken or not taken. I took this simple python implementation of the sieve of Atkin algorithm and added a simple 2bit predictor to roughly approximate how good that one would work. With a limit of 100000 that is what I got:

Code:
All branches: 440832, taken: 40954 (0.09290160423925668 %), predicted correctly: 399873 (0.9070870535714286 %)

As I said that is a very rough approximation but I think more sophisticated predictors in modern CPUs should not have a hard time with this particular use case, unless I happened to overlook something crucial ;)
The optimized reference implementation of Bernstein has more than 2% of branch mispred on an Ivy Bridge:
Code:
primegen-0.97$ perf stat -e instructions,branch-misses,branches,cpu-cycles ./primespeed 32000000
(...)
1973815 primes up to 32000000.
(...)
Performance counter stats for './primespeed 32000000':
       145,635,054      instructions              #    2.26  insns per cycle       
           309,039      branch-misses             #    2.16% of all branches       
        14,316,256      branches                                                   
        64,559,941      cpu-cycles                                                 
       0.028545228 seconds time elapsed
That is 2.1 MPKI which is indeed rather low (SPECint 2000 has about 5 MPKI on such a machine).
 
  • Like
Reactions: malitze

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
How did Sandy Bridge do in the PassMark Prime Number benchmark? No one seems to break these down - they just give an aggregate score for all PassMark benches. I'm curious if there was something in Haswell and up that gave Intel a big edge here, or if it's just a straight trend line for all the recent Core CPUs.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,685
136
Being a 4T machine having 2 fast DIMMs the analysis might become skewed a bit. I'll include your numbers, too. Maybe cache thrashing causes a perf stall/drop (per core) on SMT machines.
Are you ready for more fun?! I sure am!

Got home and ran the benchmark on my mobile Haswell 4c/8t @ fixed 3.4Ghz (DDR3 1600 CL11, 6MB L3). The scores were:
  • 25-27 default
  • 29-31 when setting affinity for logical cores 0-2-4-6 only
 

Agent-47

Senior member
Jan 17, 2017
290
249
76
It is more a question of the pattern these branches are taken or not taken. I took this simple python implementation of the sieve of Atkin algorithm and added a simple 2bit predictor to roughly approximate how good that one would work. With a limit of 100000 that is what I got:

Code:
All branches: 440832, taken: 40954 (0.09290160423925668 %), predicted correctly: 399873 (0.9070870535714286 %)

As I said that is a very rough approximation but I think more sophisticated predictors in modern CPUs should not have a hard time with this particular use case, unless I happened to overlook something crucial ;)

not bad for a first post!

we already know Intel has good branch prediction, so you can have a conclusive answer if you run it on zen and then compare to intel.

do you mind sharing the code? I would like to compare the numbers FX. if a FX has the same level of performance, we know for sure the branch prediction requirements for this algorithm is not relavent towards the poor scores
 

tamz_msc

Diamond Member
Jan 5, 2017
3,710
3,554
136
It is more a question of the pattern these branches are taken or not taken. I took this simple python implementation of the sieve of Atkin algorithm and added a simple 2bit predictor to roughly approximate how good that one would work. With a limit of 100000 that is what I got:

Code:
All branches: 440832, taken: 40954 (0.09290160423925668 %), predicted correctly: 399873 (0.9070870535714286 %)

As I said that is a very rough approximation but I think more sophisticated predictors in modern CPUs should not have a hard time with this particular use case, unless I happened to overlook something crucial ;)
Your percentage figures have the decimal point shifted two places left.
 

malitze

Junior Member
Feb 15, 2017
24
49
51
not bad for a first post!

we already know Intel has good branch prediction, so you can have a conclusive answer if you run it on zen and then compare to intel.

do you mind sharing the code? I would like to compare the numbers FX. if a FX has the same level of performance, we know for sure the branch prediction requirements for this algorithm is not relavent towards the poor scores

I wouldn't mind but since it is just simulating a simple branch predictor in software it won't make difference if run on a different CPU. A much better way that actually would actually consider the hardware it is run on was shown by Nothingness a few posts ago, if you have access to some kind of linux :)

I ran it on my i7-3520m for comparison:

Code:
primegen-0.97]$ perf stat -e instructions,branch-misses,branches,cpu-cycles ./primespeed 32000000
(...)
 
 Performance counter stats for './primespeed 32000000':
 
       145,055,943      instructions:u            #    2.85  insn per cycle                                           
           295,536      branch-misses:u           #    2.08% of all branches       
        14,199,131      branches:u                                                 
        50,950,712      cpu-cycles:u                                               
 
       0.025605543 seconds time elapsed
 

inf64

Diamond Member
Mar 11, 2011
3,685
3,957
136
You are reading too much into these prime benchmarks. It just one workload and as can be seen from other tests it is not reflective of general performance of the core. Every core has some weak(er) points, who cares about few tests? For desktop we have several things that matter: rendering, encoding, gaming, streaming (while gaming) and multitasking. In all of these scenarios Ryzen will be good. If it "sux" in something Vs its main competition it better be a wprime benchmark because nobody will care ;).
 
  • Like
Reactions: Drazick

AtenRa

Lifer
Feb 2, 2009
14,000
3,357
136
You are reading too much into these prime benchmarks. It just one workload and as can be seen from other tests it is not reflective of general performance of the core. Every core has some weak(er) points, who cares about few tests? For desktop we have several things that matter: rendering, encoding, gaming, streaming (while gaming) and multitasking. In all of these scenarios Ryzen will be good. If it "sux" in something Vs its main competition it better be a wprime benchmark because nobody will care ;).

Im sure some will make it the second most important benchmark the next weeks :rolleyes:
 
  • Like
Reactions: Drazick

Nothingness

Platinum Member
Jul 3, 2013
2,371
713
136
You are reading too much into these prime benchmarks. It just one workload and as can be seen from other tests it is not reflective of general performance of the core. Every core has some weak(er) points, who cares about few tests? For desktop we have several things that matter: rendering, encoding, gaming, streaming (while gaming) and multitasking. In all of these scenarios Ryzen will be good. If it "sux" in something Vs its main competition it better be a wprime benchmark because nobody will care ;).
Come on, we have to waste our time while waiting for real benchmarks :D
 
  • Like
Reactions: inf64

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
How did Sandy Bridge do in the PassMark Prime Number benchmark? No one seems to break these down - they just give an aggregate score for all PassMark benches. I'm curious if there was something in Haswell and up that gave Intel a big edge here, or if it's just a straight trend line for all the recent Core CPUs.

I originally did this vs the Ryzen baseline when it showed up, but decided to run some extra numbers.

CPU mark, 2500k @ 4.5GHz, fixed 10-11-10-30-1T timings

Memory mark, 2500k @ 4.5GHz, fixed 10-11-10-30-1T timings

AABslX.png



I might have made a mistake here and there since I feel like crap (stupid cold/flu-ish), I could do the same with the CPU clocked up to 5.1GHz if needed.

I realize I'm varying bandwidth and latency at the same time having left timings equal throughout the run, but the results mirror itsmydamnation's Ivy Bridge results. Sandy and Ivy are pretty much the same at a high level after all, so it was expected.

Extrapolating to Ryzen, its around 14ns results could be compared to my DDR3-1333 15ns run. Look how much performance was left on the table by not using faster RAM in latency sensitive workloads.

I know if I'm getting my hands on Ryzen I'll get sticks capable of what my DDR3-1866 is (around 10ns or better, that could be DDR4 3000 CAS 15 -perfect 10ns- to start with) as not to unnecessarily gimp the CPU.
 
Last edited:

cytg111

Lifer
Mar 17, 2008
23,049
12,717
136
Yea, great work. Damn, that scaling. Now to keep speed the same and only mess with latencies? :)
 

KTE

Senior member
May 26, 2016
478
130
76
XFR to me is a gimmick, except for mobile users, and maybe, just maybe, standard Joe Bloggs.

Clocks are limited as it is. Keeping constant FO4 is fine but the problem with a 14nm high clock design is the tiny wires, RC not scaling, major EM issues, DP complexity, huge fin variations affecting current, end to Dennard scaling meaning high current densities, so high localized heat causing with ~30C deltas, and Vt variability being a significant showstopper. Plenty of research papers have discussed these at length.

IBMs J Warnock had a research paper discussing this based on their own 4-5.7GHz chips. And these are watercooled/chilled watercooled chips still facing these issues.

If XFR is dependent on average chip temps, that might work well, but if a single hotspot kills XFR, then it wouldn't be much use except a benchmark winner.

Voltages for Turbo and XFR clocks are also key here.

Sent from HTC 10
(Opinions are own)
 

CentroX

Senior member
Apr 3, 2016
351
152
116
Someone on neogaf said that GloFo's 14nm process is equivalent to intels 22nm. Are they really that far behind?
 

revanchrist

Junior Member
Aug 6, 2016
9
5
41
Status
Not open for further replies.