Post your Ryzen Blender Demo Scores! (AMD clarifies Blender Benchmark Confusion, Run @ 150 Samples)

siriq · Jan 13, 2017

superstition said:
Note to siriq...

Your 4.8 overclock doesn't seem optimal after all. I guess I mistook your SIMD results for your stock Blender results or something because your Blender results are lower than my Piledriver results as is your Cinebench multi. In fact, I was able to get 770 as a new high with Cinebench multi at 4.8 (up from two runs at 767).

Using the formula with your SIMD result gets only this score: 37314. Mine was 45455 at 4.8.

With the stock build the result is only 20000. Something is off with your system. Must be throttling.

You said 2:05 (125 seconds) for stock Blender at 150 samples. That's 96000000/125/8/4.8 = 20000. By contrast, I got 23923.44

I use this CPU in a motherboard, does not officialy supporting top FX series. Also, my ram is running on low mhz. Only 1700. Cinebench and render software's are preferring high ram settings.

http://www.gigabyte.com/products/product-page.aspx?pid=3417#ov

nismotigerwvu · Jan 13, 2017

rvborgh said:
Rather interesting... it seems with improved LLano K10 core it gives 99% the performance of K10 (on this benchmark) per core at same frequency as K10 but without L3 cache. i really wonder how LLano would have done with L3 (39037 vs 39500).

That's always been one of my biggest tech "what-ifs" of the last few years. In general the L3 cache gave the Phenom II a ~10% advantage over a similarly clocked Athlon II. Llano was effectively neck and and neck with a similarly clocked Phenom II, or again about 10% ahead of the Athlon II. Now that could entirely be chalked up to the move from a 4x512kB L2 to a 4x1MB arrangement and it might have lessened the gains from an L3. Regardless, I still would love to have seen what an 8 core variant with the iGPU cut and a decent sized L3 slapped on. Considering how often the FX line failed to provide a meaningful boost in multi-threaded performance over Thuban, an additional pair of cores and some minor architectural tweaks might have been enough to negate any need for Bulldozer. Considering how nicely the GF 32nm SHP process matured, we might have seen this chips breaking 4Ghz too.

rvborgh · Jan 14, 2017

i would have loved to have seen a 4 wide version of Thuban (Llano plus?), with L3 as well and for AMD to have fixed the retire issue.

https://m.reddit.com/r/Amd/comments/475oya/what_should_have_been/?ref=readnext_9

nismotigerwvu said:
That's always been one of my biggest tech "what-ifs" of the last few years. In general the L3 cache gave the Phenom II a ~10% advantage over a similarly clocked Athlon II. Llano was effectively neck and and neck with a similarly clocked Phenom II, or again about 10% ahead of the Athlon II. Now that could entirely be chalked up to the move from a 4x512kB L2 to a 4x1MB arrangement and it might have lessened the gains from an L3. Regardless, I still would love to have seen what an 8 core variant with the iGPU cut and a decent sized L3 slapped on. Considering how often the FX line failed to provide a meaningful boost in multi-threaded performance over Thuban, an additional pair of cores and some minor architectural tweaks might have been enough to negate any need for Bulldozer. Considering how nicely the GF 32nm SHP process matured, we might have seen this chips breaking 4Ghz too.

nismotigerwvu · Jan 14, 2017

You know, you totally see how a 4 wide Stars+ could evolve into what Zen turned out to be fairly easily.

superstition · Jan 15, 2017

nismotigerwvu said:
That's always been one of my biggest tech "what-ifs" of the last few years. In general the L3 cache gave the Phenom II a ~10% advantage over a similarly clocked Athlon II. Llano was effectively neck and and neck with a similarly clocked Phenom II, or again about 10% ahead of the Athlon II. Now that could entirely be chalked up to the move from a 4x512kB L2 to a 4x1MB arrangement and it might have lessened the gains from an L3. Regardless, I still would love to have seen what an 8 core variant with the iGPU cut and a decent sized L3 slapped on. Considering how often the FX line failed to provide a meaningful boost in multi-threaded performance over Thuban, an additional pair of cores and some minor architectural tweaks might have been enough to negate any need for Bulldozer. Considering how nicely the GF 32nm SHP process matured, we might have seen this chips breaking 4Ghz too.

The Stilt's testing suggested that L3 is overrated in importance on Piledriver because it's slow. So, since it seems to be able to equal or beat Phenom in a lot of areas — once clocks are raised to compensate for its design choice to rely on clocks, I wonder how much L3 would matter for Phenom. Maybe it would matter more. However, if Piledriver's L3 had been faster then maybe it would have helped it more as well. Having two extra cores plus more L3 may have been enough, though, to make Bulldozer/Piledriver unnecessary. I don't know. This is all guesswork. It's clear, though, that AMD left Phenom behind for a reason.

I'm just guessing but I bet the biggest thing that held back Piledriver (besides poorly-optimized software including Windows 7) is the lack of Sandy's level of μop caching. That was the thing that the Anandtech review cited. But, given the slow RAM performance of Piledriver (although with a lot lower latency than Steamroller and Excavator according to The Stilt), having fast L3 would have also probably been helpful.

superstition · Jan 15, 2017

siriq said:
I use this CPU in a motherboard, does not officialy supporting top FX series. Also, my ram is running on low mhz. Only 1700. Cinebench and render software's are preferring high ram settings.

I would back off the MHz and see what kind of score you can get. Are you using BCLK to overclock? Can you use just multiplier? Can you set the FSB to 2400? From what I recall of my testing, I don't think RAM speed, in MHz, is actually all that important for Cinebench R15 and Piledriver. I did, though, see a big boost from bumping up CPU NB voltage once when testing at 4.6. That's probably because I was using such aggressive RAM timings and the CPU NB just needed more power to handle it. I don't think the 2133 9-11-10-30-1T is necessarily that much of an improvement over something like 1800 or 1900 which is what I was originally running.

One thing I noticed is that I got better scores with my 8320E and my old OCZ 2 X 2 RAM sticks. The system only had 4 GB of RAM with either set because they wouldn't run together. One of my 16 GB (2 X 8) sets is double sided so I don't know what rank interleaving would be such an advantage for those doubled-sided OCZ sets and not the Patriot. I also have a Crucial set. The best Cinebench scores were with either OCZ set back when I was on air (for the GHz speeds I could actually reach). Maybe Piledriver's IMC somehow produces better performance with less RAM. Maybe Cinebench runs faster with less RAM. I have no idea. I never ram those OCZ sets very fast (~1600 I think) but they were at 8-9-8-1T.

superstition · Jan 15, 2017

superstition said:
Blender stock

51306.17 — Lynnfield 3.8 4/4
25806.45 — PD 5 GHz CMT off
25680.53 — PD 4.4 GHz CMT off
23944.45 — PD 4.4 GHz CMT on

superstition said:
Blender "SIMD"

53952 — Piledriver 4.4 GHz CMT off 1:41.1
53935 — i5 750 Lynnfield, 1600 RAM 1:57.1
53752 — Piledriver 5 GHz CMT off, 1:29.36
45683 — Piledriver 4.4 GHz CMT on, 0:59.7

Despite the lacking nature of the equation it certainly shows us one thing...

bjt2 · Jan 15, 2017

superstition said:
I would back off the MHz and see what kind of score you can get. Are you using BCLK to overclock? Can you use just multiplier? Can you set the FSB to 2400? From what I recall of my testing, I don't think RAM speed, in MHz, is actually all that important for Cinebench R15 and Piledriver. I did, though, see a big boost from bumping up CPU NB voltage once when testing at 4.6. That's probably because I was using such aggressive RAM timings and the CPU NB just needed more power to handle it. I don't think the 2133 9-11-10-30-1T is necessarily that much of an improvement over something like 1800 or 1900 which is what I was originally running.

One thing I noticed is that I got better scores with my 8320E and my old OCZ 2 X 2 RAM sticks. The system only had 4 GB of RAM with either set because they wouldn't run together. One of my 16 GB (2 X 8) sets is double sided so I don't know what rank interleaving would be such an advantage for those doubled-sided OCZ sets and not the Patriot. I also have a Crucial set. The best Cinebench scores were with either OCZ set back when I was on air (for the GHz speeds I could actually reach). Maybe Piledriver's IMC somehow produces better performance with less RAM. Maybe Cinebench runs faster with less RAM. I have no idea. I never ram those OCZ sets very fast (~1600 I think) but they were at 8-9-8-1T.

As I suggested in other posts, try to keep NB multiplier to a non prime number (12, 15 or 16) and keep core multiplier under double that. Then rise BCLK. I would suggest NB 2400, core 4800 (best because the multipliers are exact multiple) and then rise bclk... Try to reach 5200 and then try with bclk=100 and NB=2600 and core=5200 to see if there is differences with prime multiplier. If the core doesn't reach 5200 in the first try, lower bclk under 100MHz in the second try with 26x ad 52x multiplier to match the clock of first try...

siriq · Jan 15, 2017

superstition said:
I would back off the MHz and see what kind of score you can get. Are you using BCLK to overclock? Can you use just multiplier? Can you set the FSB to 2400? From what I recall of my testing, I don't think RAM speed, in MHz, is actually all that important for Cinebench R15 and Piledriver. I did, though, see a big boost from bumping up CPU NB voltage once when testing at 4.6. That's probably because I was using such aggressive RAM timings and the CPU NB just needed more power to handle it. I don't think the 2133 9-11-10-30-1T is necessarily that much of an improvement over something like 1800 or 1900 which is what I was originally running.

One thing I noticed is that I got better scores with my 8320E and my old OCZ 2 X 2 RAM sticks. The system only had 4 GB of RAM with either set because they wouldn't run together. One of my 16 GB (2 X 8) sets is double sided so I don't know what rank interleaving would be such an advantage for those doubled-sided OCZ sets and not the Patriot. I also have a Crucial set. The best Cinebench scores were with either OCZ set back when I was on air (for the GHz speeds I could actually reach). Maybe Piledriver's IMC somehow produces better performance with less RAM. Maybe Cinebench runs faster with less RAM. I have no idea. I never ram those OCZ sets very fast (~1600 I think) but they were at 8-9-8-1T.

I use 4 ram modules, which makes some impact on performance in negative way. My NB is running at 2544 mhz. My weakness is 9-10-9-1T @1700 mhz on ram. Tested it. Also the bios is not optimized well. I got the beta from Gigabyte, which allows me to run this CPU. No throttling etc. . Everything is to max performance.

Still, compare to official boards, this rig just doing fine in the margin error of performance. Quite happy with it. In this motherboard, i can do 5+ GHz but with un-optimal settings. So, i use the best combo settings what i can with the HW i have.

I can render animations 24/7 with this settings at the best performance. Already tried it. Longest was , over a week. Apm was off , of course.

batlin211 · Jan 15, 2017

Intel Xeon ES 2670v3 2.2Ghz / 2.5Ghz turbo. (The real non ES version runs faster 2.3Ghz/3.1Ghz Turbo)
Matches the 36 seconds render time @ 150 samples with 12 core / 24 threads.

Ryzen is looking good if it is getting 36 second results. Can't wait for the release!

superstition · Jan 16, 2017

I think it has been said that Blender isn't scaling well for that many threads.

rvborgh · Jan 16, 2017

48 Blender threads scale very well on my overclocked 48 core quad Opteron 61xx rig.

superstition said:
I think it has been said that Blender isn't scaling well for that many threads.

wembley · Jan 16, 2017

Check this out: AMD RyZEN Blender Benchmark
107 CPUs tested in Blender running 100 samples.
It is interesting to see top 100 with this test. Ryzen SR7 - 3.4 Ghz (No Turbo) : 25 seconds - Samples 100 it's on 10th place (the list is updated)

itsmydamnation · Jan 16, 2017

the stupid bot got the samples wrong....... needs Zen powered AI.......

bjt2 · Jan 16, 2017

wembley said:
Check this out: AMD RyZEN Blender Benchmark
107 CPUs tested in Blender running 100 samples.
It is interesting to see top 100 with this test. Ryzen SR7 - 3.4 Ghz (No Turbo) : 25 seconds - Samples 100 it's on 10th place (the list is updated)

And many of the 10 are OCed CPUs...

jhu · Jan 17, 2017

Code:

Debian 9, Blender 2.78a, MSM8974AB (HTC One M8)
custom compiled (gcc 6.1.1): 12 minutes 41 seconds

Debian 8, Blender 2.72 32-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
repository binary: 9 minutes 54 seconds

Debian 8, Blender 2.78 64-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
custom compiled (gcc 6.2.1): 5 minutes 43 seconds

Debian 9, Blender 2.78a, Core i5 6200U
official binary: 2 minutes 3 seconds
custom compiled (gcc 6.2.1): 2 minutes 1 second

Debian 9, Blender 2.78a, FX-8350
official binary: 1 minutes 8 seconds
custom compiled (gcc 6.2.1): 1 minute 3 seconds

Windows 10, Blender 2.78a, Core i5 6200U
official binary: 3 minutes 26 seconds
Stilt AVX2 MSVC: 2 minutes 25 seconds
Stilt SIMD: 2 minutes 19 seconds

Stilt's builds from here.
Looks like the platform has a noticeable impact too.

siriq · Jan 17, 2017

wembley said:
Check this out: AMD RyZEN Blender Benchmark
107 CPUs tested in Blender running 100 samples.
It is interesting to see top 100 with this test. Ryzen SR7 - 3.4 Ghz (No Turbo) : 25 seconds - Samples 100 it's on 10th place (the list is updated)

I just did a very quick test with 100 sample and FX 8350@4.8 and lot of background programs. 00:46:13

Just did one more with less programs, 00:44:70

With the latest SIMD build, i got 00:38:29

150 samples : 00:57:76

8-9 sec faster

superstition · Jan 19, 2017

jhu said:

Code:

Debian 8, Blender 2.72 32-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
9 minutes 54 seconds

Debian 8, Blender 2.78 64-bit, MSM8992 (Snapdragon 808, Nextbit Robin)
custom compiled (gcc 6.2.1): 5 minutes 43 seconds

Debian 9, Blender 2.78a, Core i5 6200U
official binary: 2 minutes 3 seconds
custom compiled (gcc 6.2.1): 2 minutes 1 second

Windows 10, Blender 2.78a, Core i5 6200U
official binary: 3 minutes 26 seconds
Stilt AVX2 MSVC: 2 minutes 25 seconds
Stilt SIMD: 2 minutes 19 seconds

Stilt's builds from here.
Looks like the platform has a noticeable impact too.

Very useful information. It suggests, among other things, that Intel's compiler isn't needed to get good performance with Blender.

jhu · Jan 21, 2017

superstition said:
Very useful information. It suggests, among other things, that Intel's compiler isn't needed to get good performance with Blender.

Not at all. GCC produces decently fast code. Also added some ARM chips in my previous post.

superstition · Jan 21, 2017

jhu said:
GCC produces decently fast code.

That's not what some have said so it's nice to see data that shows one doesn't have to necessarily rely on Intel's compiler.

beginner99 · Jan 21, 2017

jhu said:
Looks like the platform has a noticeable impact too.

Yeah. It can have a pretty huge one. At work I'm using an open-source library (C but we use the python api) a lot on windows but it's much faster on Linux and developers have not figured out yet why there is such a huge difference. it is in the 30% range.

jhu · Jan 21, 2017

superstition said:
That's not what some have said so it's nice to see data that shows one doesn't have to necessarily rely on Intel's compiler.

We've had data in this forum for several years showing gcc about equivalent to icc. Intel's MKL, however, seems to be top notch and faster than the open source equivalents.

The Stilt · Jan 22, 2017

Based on my experience GCC does just fine as long as legacy (>= SSE4.2) instructions are used. However as soon as FMA or AVX/2 is used it (and MSVC) starts to fall behind. The difference can sometimes be pretty extreme: In FMA heavy workload generic N-Body, I've seen ICC 2017 producing 7x faster code than GCC/MSVC regardless of the optimizations used. The auto parallelization / vectorization in ICC are complely superior to GCC/MSVC. Where GCC needs to be "spoon fed", ICC provides a truly automatic parallelization and vectorization.

Nothingness · Jan 22, 2017

The Stilt said:
Based on my experience GCC does just fine as long as legacy (>= SSE4.2) instructions are used. However as soon as FMA or AVX/2 is used it (and MSVC) starts to fall behind. The difference can sometimes be pretty extreme: In FMA heavy workload generic N-Body, I've seen ICC 2017 producing 7x faster code than GCC/MSVC regardless of the optimizations used. The auto parallelization / vectorization in ICC are complely superior to GCC/MSVC. Where GCC needs to be "spoon fed", ICC provides a truly automatic parallelization and vectorization.

Even though I have little doubt the icc vectorizer is better than the gcc one, a 7x speedup likely means you hit a performance bug in gcc, or an area where some specific icc optimization enables vectorization. Tycho nbody package shows that gcc is competitive with icc when the proper AoS vs SoA layout is used. Given that icc broke libquantum in part due to AoS -> SoA tricks, it makes sense this can applied to other software like Tycho nbody.

siriq · Feb 4, 2017

Just made a quick video with the result :
https://www.youtube.com/watch?v=-4qWKJf6je4

Post your Ryzen Blender Demo Scores! (AMD clarifies Blender Benchmark Confusion, Run @ 150 Samples)

Junior Member

Golden Member

Member

Golden Member

Platinum Member

Platinum Member

Platinum Member

Senior member

Junior Member

Junior Member

Platinum Member

Member

Junior Member

Diamond Member

Senior member

Lifer

Junior Member

Platinum Member

Lifer

Platinum Member

Diamond Member

Lifer

Golden Member

Diamond Member

Junior Member