• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Post your Ryzen Blender Demo Scores! (AMD clarifies Blender Benchmark Confusion, Run @ 150 Samples)

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

siriq

Junior Member
Oct 18, 2014
15
0
16
Note to siriq...

Your 4.8 overclock doesn't seem optimal after all. I guess I mistook your SIMD results for your stock Blender results or something because your Blender results are lower than my Piledriver results as is your Cinebench multi. In fact, I was able to get 770 as a new high with Cinebench multi at 4.8 (up from two runs at 767).

Using the formula with your SIMD result gets only this score: 37314. Mine was 45455 at 4.8.

With the stock build the result is only 20000. Something is off with your system. Must be throttling.

You said 2:05 (125 seconds) for stock Blender at 150 samples. That's 96000000/125/8/4.8 = 20000. By contrast, I got 23923.44
I use this CPU in a motherboard, does not officialy supporting top FX series. Also, my ram is running on low mhz. Only 1700. Cinebench and render software's are preferring high ram settings.

http://www.gigabyte.com/products/product-page.aspx?pid=3417#ov
 

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
Rather interesting... it seems with improved LLano K10 core it gives 99% the performance of K10 (on this benchmark) per core at same frequency as K10 but without L3 cache. i really wonder how LLano would have done with L3 (39037 vs 39500).
That's always been one of my biggest tech "what-ifs" of the last few years. In general the L3 cache gave the Phenom II a ~10% advantage over a similarly clocked Athlon II. Llano was effectively neck and and neck with a similarly clocked Phenom II, or again about 10% ahead of the Athlon II. Now that could entirely be chalked up to the move from a 4x512kB L2 to a 4x1MB arrangement and it might have lessened the gains from an L3. Regardless, I still would love to have seen what an 8 core variant with the iGPU cut and a decent sized L3 slapped on. Considering how often the FX line failed to provide a meaningful boost in multi-threaded performance over Thuban, an additional pair of cores and some minor architectural tweaks might have been enough to negate any need for Bulldozer. Considering how nicely the GF 32nm SHP process matured, we might have seen this chips breaking 4Ghz too.
 
  • Like
Reactions: rvborgh

rvborgh

Member
Apr 16, 2014
190
80
101
i would have loved to have seen a 4 wide version of Thuban (Llano plus?), with L3 as well and for AMD to have fixed the retire issue.

https://m.reddit.com/r/Amd/comments/475oya/what_should_have_been/?ref=readnext_9

That's always been one of my biggest tech "what-ifs" of the last few years. In general the L3 cache gave the Phenom II a ~10% advantage over a similarly clocked Athlon II. Llano was effectively neck and and neck with a similarly clocked Phenom II, or again about 10% ahead of the Athlon II. Now that could entirely be chalked up to the move from a 4x512kB L2 to a 4x1MB arrangement and it might have lessened the gains from an L3. Regardless, I still would love to have seen what an 8 core variant with the iGPU cut and a decent sized L3 slapped on. Considering how often the FX line failed to provide a meaningful boost in multi-threaded performance over Thuban, an additional pair of cores and some minor architectural tweaks might have been enough to negate any need for Bulldozer. Considering how nicely the GF 32nm SHP process matured, we might have seen this chips breaking 4Ghz too.
 
  • Like
Reactions: nismotigerwvu

superstition

Platinum Member
Feb 2, 2008
2,219
216
101
That's always been one of my biggest tech "what-ifs" of the last few years. In general the L3 cache gave the Phenom II a ~10% advantage over a similarly clocked Athlon II. Llano was effectively neck and and neck with a similarly clocked Phenom II, or again about 10% ahead of the Athlon II. Now that could entirely be chalked up to the move from a 4x512kB L2 to a 4x1MB arrangement and it might have lessened the gains from an L3. Regardless, I still would love to have seen what an 8 core variant with the iGPU cut and a decent sized L3 slapped on. Considering how often the FX line failed to provide a meaningful boost in multi-threaded performance over Thuban, an additional pair of cores and some minor architectural tweaks might have been enough to negate any need for Bulldozer. Considering how nicely the GF 32nm SHP process matured, we might have seen this chips breaking 4Ghz too.
The Stilt's testing suggested that L3 is overrated in importance on Piledriver because it's slow. So, since it seems to be able to equal or beat Phenom in a lot of areas — once clocks are raised to compensate for its design choice to rely on clocks, I wonder how much L3 would matter for Phenom. Maybe it would matter more. However, if Piledriver's L3 had been faster then maybe it would have helped it more as well. Having two extra cores plus more L3 may have been enough, though, to make Bulldozer/Piledriver unnecessary. I don't know. This is all guesswork. It's clear, though, that AMD left Phenom behind for a reason.

I'm just guessing but I bet the biggest thing that held back Piledriver (besides poorly-optimized software including Windows 7) is the lack of Sandy's level of μop caching. That was the thing that the Anandtech review cited. But, given the slow RAM performance of Piledriver (although with a lot lower latency than Steamroller and Excavator according to The Stilt), having fast L3 would have also probably been helpful.
 

superstition

Platinum Member
Feb 2, 2008
2,219
216
101
I use this CPU in a motherboard, does not officialy supporting top FX series. Also, my ram is running on low mhz. Only 1700. Cinebench and render software's are preferring high ram settings.
I would back off the MHz and see what kind of score you can get. Are you using BCLK to overclock? Can you use just multiplier? Can you set the FSB to 2400? From what I recall of my testing, I don't think RAM speed, in MHz, is actually all that important for Cinebench R15 and Piledriver. I did, though, see a big boost from bumping up CPU NB voltage once when testing at 4.6. That's probably because I was using such aggressive RAM timings and the CPU NB just needed more power to handle it. I don't think the 2133 9-11-10-30-1T is necessarily that much of an improvement over something like 1800 or 1900 which is what I was originally running.

One thing I noticed is that I got better scores with my 8320E and my old OCZ 2 X 2 RAM sticks. The system only had 4 GB of RAM with either set because they wouldn't run together. One of my 16 GB (2 X 8) sets is double sided so I don't know what rank interleaving would be such an advantage for those doubled-sided OCZ sets and not the Patriot. I also have a Crucial set. The best Cinebench scores were with either OCZ set back when I was on air (for the GHz speeds I could actually reach). Maybe Piledriver's IMC somehow produces better performance with less RAM. Maybe Cinebench runs faster with less RAM. I have no idea. I never ram those OCZ sets very fast (~1600 I think) but they were at 8-9-8-1T.
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
216
101
superstition said:
Blender stock

51306.17 — Lynnfield 3.8 4/4
25806.45 — PD 5 GHz CMT off
25680.53 — PD 4.4 GHz CMT off
23944.45 — PD 4.4 GHz CMT on
superstition said:
Blender "SIMD"

53952 — Piledriver 4.4 GHz CMT off 1:41.1
53935 — i5 750 Lynnfield, 1600 RAM 1:57.1
53752 — Piledriver 5 GHz CMT off, 1:29.36
45683 — Piledriver 4.4 GHz CMT on, 0:59.7
Despite the lacking nature of the equation it certainly shows us one thing...
 

bjt2

Senior member
Sep 11, 2016
784
180
86
I would back off the MHz and see what kind of score you can get. Are you using BCLK to overclock? Can you use just multiplier? Can you set the FSB to 2400? From what I recall of my testing, I don't think RAM speed, in MHz, is actually all that important for Cinebench R15 and Piledriver. I did, though, see a big boost from bumping up CPU NB voltage once when testing at 4.6. That's probably because I was using such aggressive RAM timings and the CPU NB just needed more power to handle it. I don't think the 2133 9-11-10-30-1T is necessarily that much of an improvement over something like 1800 or 1900 which is what I was originally running.

One thing I noticed is that I got better scores with my 8320E and my old OCZ 2 X 2 RAM sticks. The system only had 4 GB of RAM with either set because they wouldn't run together. One of my 16 GB (2 X 8) sets is double sided so I don't know what rank interleaving would be such an advantage for those doubled-sided OCZ sets and not the Patriot. I also have a Crucial set. The best Cinebench scores were with either OCZ set back when I was on air (for the GHz speeds I could actually reach). Maybe Piledriver's IMC somehow produces better performance with less RAM. Maybe Cinebench runs faster with less RAM. I have no idea. I never ram those OCZ sets very fast (~1600 I think) but they were at 8-9-8-1T.
As I suggested in other posts, try to keep NB multiplier to a non prime number (12, 15 or 16) and keep core multiplier under double that. Then rise BCLK. I would suggest NB 2400, core 4800 (best because the multipliers are exact multiple) and then rise bclk... Try to reach 5200 and then try with bclk=100 and NB=2600 and core=5200 to see if there is differences with prime multiplier. If the core doesn't reach 5200 in the first try, lower bclk under 100MHz in the second try with 26x ad 52x multiplier to match the clock of first try...
 

siriq

Junior Member
Oct 18, 2014
15
0
16
I would back off the MHz and see what kind of score you can get. Are you using BCLK to overclock? Can you use just multiplier? Can you set the FSB to 2400? From what I recall of my testing, I don't think RAM speed, in MHz, is actually all that important for Cinebench R15 and Piledriver. I did, though, see a big boost from bumping up CPU NB voltage once when testing at 4.6. That's probably because I was using such aggressive RAM timings and the CPU NB just needed more power to handle it. I don't think the 2133 9-11-10-30-1T is necessarily that much of an improvement over something like 1800 or 1900 which is what I was originally running.

One thing I noticed is that I got better scores with my 8320E and my old OCZ 2 X 2 RAM sticks. The system only had 4 GB of RAM with either set because they wouldn't run together. One of my 16 GB (2 X 8) sets is double sided so I don't know what rank interleaving would be such an advantage for those doubled-sided OCZ sets and not the Patriot. I also have a Crucial set. The best Cinebench scores were with either OCZ set back when I was on air (for the GHz speeds I could actually reach). Maybe Piledriver's IMC somehow produces better performance with less RAM. Maybe Cinebench runs faster with less RAM. I have no idea. I never ram those OCZ sets very fast (~1600 I think) but they were at 8-9-8-1T.
I use 4 ram modules, which makes some impact on performance in negative way. My NB is running at 2544 mhz. My weakness is 9-10-9-1T @1700 mhz on ram. Tested it. Also the bios is not optimized well. I got the beta from Gigabyte, which allows me to run this CPU. No throttling etc. . Everything is to max performance.

Still, compare to official boards, this rig just doing fine in the margin error of performance. Quite happy with it. In this motherboard, i can do 5+ GHz but with un-optimal settings. So, i use the best combo settings what i can with the HW i have.

I can render animations 24/7 with this settings at the best performance. Already tried it. Longest was , over a week. Apm was off , of course.
 
Last edited:

batlin211

Junior Member
Jan 15, 2017
1
0
1
Intel Xeon ES 2670v3 2.2Ghz / 2.5Ghz turbo. (The real non ES version runs faster 2.3Ghz/3.1Ghz Turbo)
Matches the 36 seconds render time @ 150 samples with 12 core / 24 threads.

Ryzen is looking good if it is getting 36 second results. Can't wait for the release!
 

jhu

Lifer
Oct 10, 1999
11,919
8
81
Code:
Debian 9, Blender 2.78a, MSM8974AB (HTC One M8)
custom compiled (gcc 6.1.1): 12 minutes 41 seconds

Debian 8, Blender 2.72 32-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
repository binary: 9 minutes 54 seconds

Debian 8, Blender 2.78 64-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
custom compiled (gcc 6.2.1): 5 minutes 43 seconds

Debian 9, Blender 2.78a, Core i5 6200U
official binary: 2 minutes 3 seconds
custom compiled (gcc 6.2.1): 2 minutes 1 second

Debian 9, Blender 2.78a, FX-8350
official binary: 1 minutes 8 seconds
custom compiled (gcc 6.2.1): 1 minute 3 seconds

Windows 10, Blender 2.78a, Core i5 6200U
official binary: 3 minutes 26 seconds
Stilt AVX2 MSVC: 2 minutes 25 seconds
Stilt SIMD: 2 minutes 19 seconds
Stilt's builds from here.
Looks like the platform has a noticeable impact too.
 
Last edited:

siriq

Junior Member
Oct 18, 2014
15
0
16
Check this out: AMD RyZEN Blender Benchmark
107 CPUs tested in Blender running 100 samples.
It is interesting to see top 100 with this test. Ryzen SR7 - 3.4 Ghz (No Turbo) : 25 seconds - Samples 100 it's on 10th place (the list is updated)
I just did a very quick test with 100 sample and FX 8350@4.8 and lot of background programs. 00:46:13

Just did one more with less programs, 00:44:70

With the latest SIMD build, i got 00:38:29 :D

150 samples : 00:57:76 :D 8-9 sec faster :D
 
Last edited:

superstition

Platinum Member
Feb 2, 2008
2,219
216
101
Code:
Debian 8, Blender 2.72 32-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
9 minutes 54 seconds

Debian 8, Blender 2.78 64-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
custom compiled (gcc 6.2.1): 5 minutes 43 seconds

Debian 9, Blender 2.78a, Core i5 6200U
official binary: 2 minutes 3 seconds
custom compiled (gcc 6.2.1): 2 minutes 1 second

Windows 10, Blender 2.78a, Core i5 6200U
official binary: 3 minutes 26 seconds
Stilt AVX2 MSVC: 2 minutes 25 seconds
Stilt SIMD: 2 minutes 19 seconds
Stilt's builds from here.
Looks like the platform has a noticeable impact too.
Very useful information. It suggests, among other things, that Intel's compiler isn't needed to get good performance with Blender.
 
  • Like
Reactions: Drazick

beginner99

Diamond Member
Jun 2, 2009
4,742
1,145
136
Looks like the platform has a noticeable impact too.
Yeah. It can have a pretty huge one. At work I'm using an open-source library (C but we use the python api) a lot on windows but it's much faster on Linux and developers have not figured out yet why there is such a huge difference. it is in the 30% range.
 

jhu

Lifer
Oct 10, 1999
11,919
8
81
That's not what some have said so it's nice to see data that shows one doesn't have to necessarily rely on Intel's compiler.
We've had data in this forum for several years showing gcc about equivalent to icc. Intel's MKL, however, seems to be top notch and faster than the open source equivalents.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Based on my experience GCC does just fine as long as legacy (>= SSE4.2) instructions are used. However as soon as FMA or AVX/2 is used it (and MSVC) starts to fall behind. The difference can sometimes be pretty extreme: In FMA heavy workload generic N-Body, I've seen ICC 2017 producing 7x faster code than GCC/MSVC regardless of the optimizations used. The auto parallelization / vectorization in ICC are complely superior to GCC/MSVC. Where GCC needs to be "spoon fed", ICC provides a truly automatic parallelization and vectorization.
 
  • Like
Reactions: Drazick

Nothingness

Platinum Member
Jul 3, 2013
2,160
406
126
Based on my experience GCC does just fine as long as legacy (>= SSE4.2) instructions are used. However as soon as FMA or AVX/2 is used it (and MSVC) starts to fall behind. The difference can sometimes be pretty extreme: In FMA heavy workload generic N-Body, I've seen ICC 2017 producing 7x faster code than GCC/MSVC regardless of the optimizations used. The auto parallelization / vectorization in ICC are complely superior to GCC/MSVC. Where GCC needs to be "spoon fed", ICC provides a truly automatic parallelization and vectorization.
Even though I have little doubt the icc vectorizer is better than the gcc one, a 7x speedup likely means you hit a performance bug in gcc, or an area where some specific icc optimization enables vectorization. Tycho nbody package shows that gcc is competitive with icc when the proper AoS vs SoA layout is used. Given that icc broke libquantum in part due to AoS -> SoA tricks, it makes sense this can applied to other software like Tycho nbody.
 

ASK THE COMMUNITY