CMT vs SMT - Bulldozer vs Gulftown Scaling

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
I'm rather impressed with how well the $250 8150 held in there compared to a $1000 990X.
 

BlueBlazer

Senior member
Nov 25, 2008
555
0
76
The end results do matter (that's what benchmarks measure). The thing is the scaling is very similar for both despite the throughput claims. Perhaps rather than CMT, AMD should be looking at SMT which is less complex and takes up less die area. :)

The odd one out is CLOMP.......
CLOMP is the C version of the Livermore OpenMP benchmark developed to measure OpenMP overheads and other performance impacts due to threading in order to influence future system designs. This particular test profile configuration is currently set to look at the OpenMP static schedule speed-up across all available CPU cores using the recommended test configuration.
Bulldozer seems to level off at 4 threads. Could be "core"/thread scheduling impact (second "core"/thread of each module being used)? :hmm:

Also there's a weird anomaly for the Gulftown at between 6 to 8 threads on some of these tests. :p
 
Last edited:

Tuna-Fish

Golden Member
Mar 4, 2011
1,677
2,560
136
I would think that SMT is more complicated to implement.

Why? Register renaming (which all modern x86 cpus have) takes you halfway there. If you can rename the flags register too, you basically don't have to do any changes to the execution units to support SMT.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Full 12 threads 990x, full 8 threads FX-8150
Test Speedup % 990x full HT Speedup % FX-8150 full Module
C-Ray 5.18 83.35
Smallpt 38.16 83.65
GraphicsMagick 39.39 44.68
GraphicsMagick 46.88 62.50
7-Zip 30.82 91.54
x264 21.75 88.44
NAS Parallel -4.38 68.60
NAS Parallel 49.06 86.60
NAS Parallel -20.92 59.29
NAS Parallel 9.25 67.91
NAS Parallel 9.69 65.28
CLOMP 25.00 -2.77

vwrak.png
 
Last edited:

BlueBlazer

Senior member
Nov 25, 2008
555
0
76
Full 12 threads 990x, full 8 threads FX-8150
For scaling, I'm looking at 8 threads on the Bulldozer versus 8 threads on the Gulftown, since we do not have 12 thread Bulldozer sample data (thus how Bulldozer scales beyond 8 threads is unknown, waiting for Interlagos on that one). Deltas are similar on a few of those tests (the ones without the Gulftown anomalies) :)
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
I was contrasting HT to CMT gains, I must say HT has come a long way since the P4 implementation some solid gains. CMT doesn't have as much room to improve but it's an interesting approach to running more threads.

Is it possible to disable cores in a 2600K? Someone with access to comparable 2600K and 8150 systems could then start from 1 core no HT and 1 module 1 thread all the way up to 4 core with HT and 4 module 2 thread per module.
 
Last edited:

HurleyBird

Platinum Member
Apr 22, 2003
2,814
1,550
136
For scaling, I'm looking at 8 threads on the Bulldozer versus 8 threads on the Gulftown, since we do not have 12 thread Bulldozer sample data (thus how Bulldozer scales beyond 8 threads is unknown, waiting for Interlagos on that one). Deltas are similar on a few of those tests (the ones without the Gulftown anomalies) :)

That's problematic. For eight threads we know that BD is fully loaded, but what does that mean for Gulftown? It might mean that the benchmark is running on four cores with two threads per core, or it could be running on all six cores with only two cores fully loaded via SMT -- or five with three fully loaded. Depending on how the threads are managed you could be looking at a huge difference in scaling.

A better comparison for CMT vs. SMT would be one Intel core vs. one BD module, or at least compare products that have the same number of total threads. It can be difficult to tell if the benchmarks are favouring loading up physical cores before logical ones, or the other way around. 6/12 vs 4/8 just compounds the issue. Too much random noise.
 
Last edited:

rvborgh

Member
Apr 16, 2014
195
94
101
i wonder how a 12 core Opteron 8439SE/2439SE system would have done against Gulftown...
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
SMT is almost free to implement die size wise. The same can hardly be said about AMDs CMT.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
SMT is almost free to implement die size wise. The same can hardly be said about AMDs CMT.

CMT with Intel's beefy Haswell cores would be something of interest, CMT with otherwise weakened and lackluster cores not so much.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
That's problematic. For eight threads we know that BD is fully loaded, but what does that mean for Gulftown? It might mean that the benchmark is running on four cores with two threads per core, or it could be running on all six cores with only two cores fully loaded via SMT -- or five with three fully loaded. Depending on how the threads are managed you could be looking at a huge difference in scaling.

That's a non-issue, Windows is HT aware and will always load physical cores first and logical cores second so with 8 threads it will always load 6 physical cores and 2 logical cores. An older unpatched version of Windows like Windows XP may load 4 physical cores and 4 logical cores but it doesn't happen on newer versions of Windows.
BTW. If you don't trust windows scheduler you can always assign core affinity manually.