some Cinebench r15 FX-8xxx module/thread scaling results

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
All you have posted is a non sequitur. Made up CPU inefficiencies still can't get you to greater scaling than Amdahls law.

It's good you are going to stop. You should have stopped a while ago. Before you had to resort to ad hominem attacks rather than posting proof.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
I guess he's arguing that single core speed is actually faster than his own results would suggest(cache misses and stuff,you know) ,because you know...cinebench is totally crappy software that everyone uses for the last,how-many-years to benchmark cpus only because it's totally pro intel biased and not because it's one of the few programs that actually use all of the available resources of a cpu.

But then again, if you look at the hyper threaded intels,1 core + 0 core = more than one core,so everything's possible.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
This may help as well, from my Bulldozer CMT scaling numbers.

Turbo off for all CPUs.

nq7gib.jpg


go0mqkj

Those scaling numbers appear off for FX. From going 1 to 8 threads, I'm getting 680% scaling on Povray and 640% for Blender Cycles.

Your Phenom II scaling numbers show greater than 1:1 for some reason. That seems off since my Povray and Blender results (not shown) show near 1:1 scaling.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Those scaling numbers appear off for FX. From going 1 to 8 threads, I'm getting 680% scaling on Povray and 640% for Blender Cycles.

Your Phenom II scaling numbers show greater than 1:1 for some reason. That seems off since my Povray and Blender results (not shown) show near 1:1 scaling.

You tested on linux, i used Win 7 64bit.

Secondly, not every application is the same.

Also, CPU frequencies remained constant at base level by disabling Turbo in all three processors.

POV-Ray 3,7 RC (Balcony Project at 1024x768, AA 0,3)

20uy3wn.jpg


2rypv0o.jpg


fmt95i.jpg


2j2i1l0.jpg
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
That still doesn't make any sense. The only way scaling would be > 1:1 would be if all the background tasks were also stuck on the same core during the 1 core test thereby artifactually lowering the true single core performance.

Look at the Povray results again. The anomalous result is the 1 core result at 2279 pps. The 2, 4, and 6 results all have about 2400 pps/core.

x264 results look about right with slightly < 1:1 scaling.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,911
4,890
136
In Cinebech 11.5 Techreport got 1.03 with the turbo wich get 0.927 at 3.6GHz.

The single core result for this set up should be 0.89 if we compare its 8 core 5.79 score with Techreport s 6.03, look like the single thread score is not what it should be, hence the scaling apparent issue.
 
Dec 30, 2004
12,553
2
76
FWIW I can consistently get 99 on single core on core 2 in r15 over 96 on core 0. there may be other threads locked to core 0.
 

Jorge_Orwell

Banned<br>RBM schmuckley
Apr 10, 2015
21
0
0
few comments:
  • each test repeated 3x to check score consistency
  • Windows Scheduler still not properly prioritizing cores (priority should be in the following order: 0,2,4,6, then 1,3,5,7) sometimes. (With second monitor, can watch while gaming. Sometimes 0,2,4,6 are the loaded cores. Haven't paid attention to which games but mainly UT3)
  • Cinebench r15 overrides core affinity settings affecting benchmark performance. To get around this (if you'd like to repeat the test), have the 'affinity' window open with the settings you want, and click 'OK' after you start the bench run. Verify the proper cores are being loaded in Task Manager or SpeedFan. 0,2,4,6 should be pegged at 100% and 1,3,5,7 almost completely inactive.
  • 16.4% [!!!] threading performance improvement when scheduling four threads to 0,2,4,6 instead of 0,1,2,3

jmG0suT.png


4 modules, 4 threads occupying cores 0,2,4,6
pACa5Os.png



2 modules, 4 threads occupying cores 0,1,2,3
sfr56Xu.png



4 modules, 4 threads occupying cores 0,1,2,3,4,5,6,7
zDfwYB6.png

What's your score without 12 Chrome windows open and running a server?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Because the results are flawed. It's impossible to have scaling greater than unity.
This is true for old static architectures with constant clock frequency and no shared components except memory bus.

With a shared cache and threads working on the same stuff they could help eachother as intelligent prefetchers (by not predicting but doing the real thing). That happens at multiple levels: shared L3$, shared L2$, shared I$ and in case of HT shared D$.
 
Aug 11, 2008
10,451
642
126
So you want me to prove a negative, while you supposedly have evidence that Amdahl got it wrong.

And you're the one posting rolling eyes.

Seriously, you should stop now.

I think it is basically a matter of semantics. I would agree that if only one process is active, you cant get more than linear scaling. However in a cpu, especially one with shared resources like vishera, other resources may come into play, so the scaling could be effectively more than 1:1, or less than 1:1 as well.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
This is true for old static architectures with constant clock frequency and no shared components except memory bus.

With a shared cache and threads working on the same stuff they could help eachother as intelligent prefetchers (by not predicting but doing the real thing). That happens at multiple levels: shared L3$, shared L2$, shared I$ and in case of HT shared D$.

Possibly. However, as I and others have pointed out, most likely the 1 core test under estimates the true single thread performance (look at the Phenom II results, the only strange one is the 1 core result) - probably all OS threads are also on the same core.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
I think it is basically a matter of semantics. I would agree that if only one process is active, you cant get more than linear scaling. However in a cpu, especially one with shared resources like vishera, other resources may come into play, so the scaling could be effectively more than 1:1, or less than 1:1 as well.

See the Phenom II results. They're also > 1:1 scaling. Even the Intel results (HT off) show this, but to a much smaller degree. The anomaly is the 1 core result. The multicore results show the expected per core performance scaling.
 
Last edited:

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Possibly. However, as I and others have pointed out, most likely the 1 core test under estimates the true single thread performance (look at the Phenom II results, the only strange one is the 1 core result) - probably all OS threads are also on the same core.
This could also be the case. Other SC/MC scaling results might be interesting here.

A single core run would still render the whole image and thus run longer, averaging out the OS thread effects. But if the single thread isn't pinned to one CPU it might constantly jump between cores and the cache contents have to be flushed (if not written through) or reloaded all the time.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
This could also be the case. Other SC/MC scaling results might be interesting here.

A single core run would still render the whole image and thus run longer, averaging out the OS thread effects. But if the single thread isn't pinned to one CPU it might constantly jump between cores and the cache contents have to be flushed (if not written through) or reloaded all the time.

My results don't show this anomalous 1 thread result (didn't pin the thread to one CPU), so it doesn't appear to be an issue.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Here's what I get:

Povray 3.7, Windows 8.1, Phenom II 1090T


1 thread @ 3.6 GHz
Render Time:
Photon Time: 0 hours 0 minutes 2 seconds (2.968 seconds)
using 4 thread(s) with 2.983 CPU-seconds total
Radiosity Time: No radiosity
Trace Time: 0 hours 20 minutes 29 seconds (1229.596 seconds)
using 1 thread(s) with 1226.734 CPU-seconds total

6 threads @ 3.3 GHz
Render Time:
Photon Time: 0 hours 0 minutes 2 seconds (2.562 seconds)
using 9 thread(s) with 2.967 CPU-seconds total
Radiosity Time: No radiosity
Trace Time: 0 hours 3 minutes 50 seconds (230.688 seconds)
using 6 thread(s) with 1345.419 CPU-seconds total

1 thread: 59.2 pps/core/GHz
6 threads: 57.4 pps/core/GHz

As you can see, IPC goes down slightly as expected. It shouldn't go up. If it does, that means something else is going on.
 

AtenRa

Lifer
Feb 2, 2009
14,003
3,362
136
Here's what I get:

Povray 3.7, Windows 8.1, Phenom II 1090T


1 thread @ 3.6 GHz
Render Time:
Photon Time: 0 hours 0 minutes 2 seconds (2.968 seconds)
using 4 thread(s) with 2.983 CPU-seconds total
Radiosity Time: No radiosity
Trace Time: 0 hours 20 minutes 29 seconds (1229.596 seconds)
using 1 thread(s) with 1226.734 CPU-seconds total

6 threads @ 3.3 GHz
Render Time:
Photon Time: 0 hours 0 minutes 2 seconds (2.562 seconds)
using 9 thread(s) with 2.967 CPU-seconds total
Radiosity Time: No radiosity
Trace Time: 0 hours 3 minutes 50 seconds (230.688 seconds)
using 6 thread(s) with 1345.419 CPU-seconds total

1 thread: 59.2 pps/core/GHz
6 threads: 57.4 pps/core/GHz

As you can see, IPC goes down slightly as expected. It shouldn't go up. If it does, that means something else is going on.

Now do the same without Turbo for the single thread. Also, i have disabled cores through BIOS, so when you see a single core or thread it means only a single core/thread was usable in the system. Same goes with Bulldozer Modules/Cores and Intel Cores/HT.