AMD Ryzen (Summit Ridge) Benchmarks Thread (use new thread)

Page 151 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
I really dont get what you are trying to point out with these random bunch of numbers.
The progression does not suggest that the official Blender build is optimized enough to be considered a credible benchmark for comparing construction core CPUs with Ryzen.

The Stilt's builds call the official builds into question greatly. Ancient AMD processors should not beat much newer ones nor should Lynnfield beat or tie Sandy. In the case of Lynnfield the program would have to either benefit a whole lot from not having SMT enabled or it would not be using recent instructions.

The Stilt's "SIMD" build shows a minor performance increase over 2.78a on Lynnfield and his "AVX2" build shows a minor performance hit. All official builds scale from slowest and newest, 2.78a, to fastest and oldest, 2.75a.

On Piledriver the exact same scaling occurs with the official builds. Yet, both of The Stilt's builds are much much faster.

Piledriver is being killed by Phenom and even ancient AMD processors in 2.78a. Turning off CMT doesn't affect the pattern at all. All it does is give PD a very minor boost on the ranking chart over its fully-enabled brethren.

This suggests that Piledriver isn't being granted access to modern instructions, in a manner reminiscent of the GenuineIntel scandal. Something is amiss.

Also, I recall reading The Stilt say his build(s) increased performance on Intel. I think it was something like 8% on Haswell in an older comment. I don't remember the specifics. But, it seems fairly clear that he didn't make either of these builds for the benefit of Piledriver in the first place (i.e. not specifically tailoring them to be optimal on PD and not Intel). It just turns out that my testing has exposed a big issue with Blender.
 
Last edited:

rvborgh

Member
Apr 16, 2014
195
94
101
The #s on the left aren't really random... they are samples per second per core per GHz... somethat that Jhu started and i just carried on with. They are sorted in order of performance for the various processors.

I really dont get what you are trying to point out with these random bunch of numbers.
 

rvborgh

Member
Apr 16, 2014
195
94
101
You are arguing backwards from the conclusion that you want (Piledriver just cannot be worse than K10 since its newer) then saying the data is tainted, instead of arguing forwards from the data to what the conclusion should be given that data (Piledriver isn't great running 2.78a relative to other processors - truth).

Stilt's builds was not what was run by AMD during the Blender Demo... so its not useful for comparison purposes for this thread wrt Zen (although i really do highly respect the man - of all the people that were speculating what that supposed Blenchmark "Zen" leak was a while back... he came the closest - although he probably does not know that until now).

The progression does not suggest that the official Blender build is optimized enough to be considered a credible benchmark for comparing construction core CPUs with Ryzen.

The Stilt's builds call the official builds into question greatly. Ancient AMD processors should not beat much newer ones nor should Lynnfield beat or tie Sandy. In the case of Lynnfield the program would have to either benefit a whole lot from not having SMT enabled or it would not be using recent instructions.

The Stilt's "SIMD" build shows a minor performance increase over 2.78a on Lynnfield and his "AVX2" build shows a minor performance hit. All official builds scale from slowest and newest, 2.78a, to fastest and oldest, 2.75a.

On Piledriver the exact same scaling occurs with the official builds. Yet, both of The Stilt's builds are much much faster.

Piledriver is being killed by Phenom and even ancient AMD processors in 2.78a.

This suggests that Piledriver isn't being granted access to modern instructions, in a manner reminiscent of the GenuineIntel scandal. Something is amiss.
 
  • Like
Reactions: USER8000 and sirmo

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
You are arguing backwards from the conclusion that you want (Piledriver just cannot be worse than K10 since its newer) then saying the data is tainted, instead of arguing forwards from the data to what the conclusion should be given that data (Piledriver isn't great running 2.78a relative to other processors - truth).
Your addition of an ad hominem to the argument you made in an earlier post doesn't make the data fit your conclusion.
Stilt's builds was not what was run by AMD during the Blender Demo... so its not useful for comparison purposes for this thread
An illogical argument.
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,055
3,857
136
You realize there are things in bulldozer like filling up all the FP scheduler entries causes the integer cores to stall on the next FP op, now think about what that means for single vs dual thread CMT FP workloads ( this was said by Michael Clake to be one of the big FP performance improvers during the hot chips presentation)......

There is nothing in this data that says blender code is bad,
But we know lots of issues with bulldozer that could have an impact.

Should i start listing them?

all that needs to happen is a decrease in FP ops for the same amount of work, that alone would have a big performance improvement if the issue above was causing a problem.
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
There is nothing in this data that says blender code is bad,
But we know lots of issues with bulldozer that could have an impact.

Should i start listing them?
How about just figure out why The Stilt's builds improve performance dramatically on recent CPUs by both Intel and AMD but not with an old Intel CPU like Lynnfield.

JackCY said:
4690K @ 4.5GHz, 150 samples

00:48.34, 2.78 SIMD
01:24.73, 2.78a

That's 43% less time

50% time = 100% faster = +100% = 2x speed = 200% speed
57% time = should be 75.279% faster

But, nah... I guess AMD's ancient CPUs, like the 65nm chip running DDR2-800 are just much more efficient.
 

jpiniero

Lifer
Oct 1, 2010
16,527
7,032
136
How about just figure out why The Stilt's builds improve performance dramatically on recent CPUs by both Intel and AMD but not with an old Intel CPU like Lynnfield.

Lynnfield doesn't support AVX, Intel added it with Sandy Bridge.

How does it run on K10?
 

sirmo

Golden Member
Oct 10, 2011
1,014
391
136
Ancient AMD processors should not beat much newer ones
This is nothing new. This was my biggest issue with Bulldozer arch.. it was often worse than the previous architectures from AMD. Yes some new features and instructions made it faster in rare scenarios but overall Bulldozer was a regression in performance compared to K10.
You are basically re-discovering what we knew back in 2011 when BD came out.

Just die shrinking Thuban to 32nm would have yielded an all around better CPU back when BD came out.
 
  • Like
Reactions: rvborgh

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
Lynnfield doesn't support AVX, Intel added it with Sandy Bridge.

How does it run on K10?
Lynnfield tied Sandy in that data posted which suggests AVX isn't being used for Sandy, if anything, in the stock Blender builds.

Practically everything from AMD seems to beat Piledriver with stock Blender*

Lynnfield did not see much benefit from the SIMD build and saw a tiny loss from the AVX2 build.

Piledriver saw a huge performance increase from both the SIMD and AVX2 builds.

*the exceptions are:
19,815 25:14 min AMD E-350 Zacate 2C/2T, 1.6 GHz, kein Turbo, DDR3-1066 SC
20,222 08:36 min AMD A8-4500M Trinity 2C/4T, 2.3 GHz, kein Turbo, DDR3-1333 DC
21,428 00:50 min 2x AMD Opteron 6380 Abu Dhabi
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
6700K Skylake, stock, turbo on, static undervolt, 3200 16 GB 16-18-18-2T, Windows 10 AE

as usual, each score is best of two runs (except Cinebench single thread as it’s slow)

in order of speed, rounded to tenths:

“SIMD”, The Stilt

100: 0:23.9
150: 0:35.7
200: 0:47.6

“AVX2”, The Stilt

100: 0:24.9
150: 0:37.2
200: 0:49.7

2.75a

100: 0:34.8
150: 0:52.2
200: 1:09.4

2.76b

150: 0:52.7

2.78a

100: 0:36.0
150: 0:53.7
200: 1:11.7

———————— other benches ————————

POV-Ray 3.7.0.msvc10

kernel: 0.02
user: 1030.84
total: 1030.86
elapsed: 132.15
CPU vs. elapsed ratio: 7.80
render averaged 1983.75 PPS (254.30 PPS CPU time)
262144 pixels, 8 threads

Cinebench R15

single: 183
multi: 929
ratio: 5.09x
OpenGL (Sapphire 1070 stock): 157.6 fps

Cinebench R11.5

single: 2.06
multi: 10.22
ratio: 4.96x
OpenGL (same GPU): 89.3 fps

CPU-Z 1.78.1x64

single: 2109
multi: 9304

CPU-Z 1.77.0.x64

single: 2109
multi: 9202
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
I suppose The Stilt's builds are faster on Skylake because of the deficiencies of the Piledriver architecture.
 

superstition

Platinum Member
Feb 2, 2008
2,219
221
101
It does suggest that the stock builds aren't using the AVX(2) compiler options.
AVX period, let alone AVX2.

That isn't very good for measuring performance of a 2012 Piledriver processor and then using that to compare with an unreleased one. Even the 8150 from 2011 has AVX, right?
 

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
So, stock Blender builds are a terrible benchmark unless one wants to go back to, eh, mid-2011 or so. Let's be generous and say 2012.

It seems to be the case. My 2500k gets a 30% improvement in rendering times and higher temperatures & power consumption from using The Stilt's AVX2 build. In this case, it's AVX being enabled for Sandy.

Official blender builds available so far just don't seem to make use of any new instruction sets. Construction cores all around seem to suck at old SSE code, at least going by this Blender example, and getting better once they are able to use newer instruction sets.

I'm happy that Ryzen is performing as it is on the old SSE code, with how sluggish software is to embrace newer instructions it's just the better solution... building a better all around machine. Netburst did good on software specifically optimized for it... didn't change the fact that it was a hot, slow turd on the other 95% of the tasks. It's not as bad with the bulldozer family, but still.
 
  • Like
Reactions: rvborgh

itsmydamnation

Diamond Member
Feb 6, 2011
3,055
3,857
136
So, stock Blender builds are a terrible benchmark unless one wants to go back to, eh, mid-2011 or so. Let's be generous and say 2012.
No not really, you are not understanding what has happened. by your logic 99% real world apps aren't good benchmarks ( think about that)

Blender is a renderer, it uses 32bit 4packed vectors (128bit, x,y,z,a).
SSE is 128bit so that is fine.
Now AVX doesn't have across the board support thax to intel, but avx really only adds 3 operand support and 256bit op support but because we are working on xyza 256bit wont give you much.
AVX2 adds the same as AVX but for integer and some memory/register/data manipulation operations.

Now the build that the stilt has built has advanced Vectorization WRITTEN BY A INTEL CODE NINJA. They have likely found a way to vectorize code that was otherwise being executed as scaler FP operations. Now the question is does that advanced vectorization require a operation only found in AVX/AVX2 or can in work on SSE. But either way by Packing those operations together and executing them as one operation you remove pressure on Several bottlenecks that exist in Bulldozers design.

the obvious ones:
1. the FPU store bandwidth
2. filling the FPU front end

But there could be others:
1. less pressure on decode
2. less L1i thrashing/aliasing
3. less pressure on store pipeline
etc,etc

Again Blender isn't the problem!
Go build as 2.75 build with AVX2 and watch it performance very close if not the same as 2.75..........


EDIT: CON cores are good at SSE and very good at INT SIMD in general, when they came out that was one of their biggest strengths.
 

rvborgh

Member
Apr 16, 2014
195
94
101
hi Sirmo, i agree... a while back i did some simulated Cinebench bench runs for a hypothetical Phenom X8 @3.5 GHz.

https://www.youtube.com/watch?v=THhZTmkrxzQ

https://www.youtube.com/watch?v=4P9kiwUBL2A

K10 just seems to me to be better at running legacy software (ie more balanced - stronger per core performance than BD, and would have been more powerful than BD multithreaded). 8.04 is also faster than i7-2600K (score ~6.83-6.92) for CB MT....

As far as the type of memory used... when i run CB on my older dual Opteron 8439SE machine (6 K10 cores @ 2.8 GHz each socket w/DDR2 666 ECC memory)... the scores come out proportionately the same as my quad Opteron 61xx machine (4 sockets w/DDR3 1600). The type of memory doesn't seem to have all that much effect on the scoring.

i am glad that Ryzen seems to run the legacy stuff well...

This is nothing new. This was my biggest issue with Bulldozer arch.. it was often worse than the previous architectures from AMD. Yes some new features and instructions made it faster in rare scenarios but overall Bulldozer was a regression in performance compared to K10.
You are basically re-discovering what we knew back in 2011 when BD came out.

Just die shrinking Thuban to 32nm would have yielded an all around better CPU back when BD came out.
 
Last edited:
  • Like
Reactions: sirmo

sm625

Diamond Member
May 6, 2011
8,172
137
106
Can someone with a 6900k hook up a killawatt and give us power consumption readings for blender, fritz, and cinebench?

I'm more interested in relative power consumption vs absolute...
 

Abwx

Lifer
Apr 2, 2011
11,851
4,819
136
8.04 is also faster than i7-2600K (score 7.24) for CB MT....
.

2600K score 6.51 in Cinebench 11.5 while the FX8350 score 6.92...

index.php
 

itsmydamnation

Diamond Member
Feb 6, 2011
3,055
3,857
136
Fixed... thanks.

btw, any processor gurus see any bottle necks in the Zen design vs. say K10, and BD (or any of the recent Intel designs)?
Short answer, no . long answer no.....lol
The only question is about 256bit throughput and how far behind >haswell it is per clock.

The core itself looks very solid base with one or two interesting things thrown in ( the FMA Bridge style FPU , the stack engine/memfile and the load store pipeline ). What will be interesting is to see how Zen grows and will we see a updated Zen in early/mid 2018 on GF 14nm HP before a "Zen+" in something like 2019.
 

rvborgh

Member
Apr 16, 2014
195
94
101
Should be interesting indeed. One wonders how close to the theoretical limit current day processors are at extracting ILP and if that is why Intel's gains per generation are slowing down. Wasn't AMD at one time working on hardware based speculative multi-threading as well? (although burning power speculating is probably against the spirit of Zen :) ).

Short answer, no . long answer no.....lol
The only question is about 256bit throughput and how far behind >haswell it is per clock.

The core itself looks very solid base with one or two interesting things thrown in ( the FMA Bridge style FPU , the stack engine/memfile and the load store pipeline ). What will be interesting is to see how Zen grows and will we see a updated Zen in early/mid 2018 on GF 14nm HP before a "Zen+" in something like 2019.
 
Status
Not open for further replies.