SMP: When they recommend you disable hyper threading they're not kidding!

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
So my Dell Precision T7500s were waiting at the dock for me yesterday. :)
They are HEAVY boxes btw at 50 pounds. :eek:

In any case I notice that HT is turned off in the BIOS. I ran Cinebench with HT OFF and got 29540 (XP Pro X64). With it on despite having sixteen bars rendering it was clearly slower and the score of 24503 proves it! I'm not sure what is going on as a single Xeon with HT certainly improves throughput.

Now booting into Win7 x64 things change for the better. Broke 30k (barely though) with HT ON. These are Xeon W5580 (2) CPUs running at 3.2GHz. (3.3GHz due to turbo).

And just out of curiosity I have a spare W5580 and despite it needing a dual CPU board it does not! It runs fine on an Asus P6T7 "supercomputer" board. 25X multiplier. Did get it up to 4.5GHz but ran out of time to play with it.

I have pics of the insides of the T7500 if anyone is curious. Four heat pipe cooler with about 1/3 the finned area of a Megahalems! No wonder they hit mid 80s with LinX! :eek:
 

PsiStar

Golden Member
Dec 21, 2005
1,184
0
76
pics are always good ... lets see those bad boys

Not all software scale very well with additional threads. One number cruncher I use will put all 16 "CPUs" of dual W5590s (not mine, sadly) to nearly 100% ... another competitive program tapers off rapidly after 4 threads, but different versions of the same problem can be launched to solve 4 different problems concurrently in the same box.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
I'm confused as to what you're trying to say here. If I'm reading correctly:

WinXP-64 | HT-OFF : 29540
WinXP-64 | HT-ON : 24503
Win7-64 | HT-ON : 30xxx

??
What score do you get with HT off in Win7?
 
Dec 30, 2004
12,553
2
76
So my Dell Precision T7500s were waiting at the dock for me yesterday. :)
They are HEAVY boxes btw at 50 pounds. :eek:

In any case I notice that HT is turned off in the BIOS. I ran Cinebench with HT OFF and got 29540 (XP Pro X64). With it on despite having sixteen bars rendering it was clearly slower and the score of 24503 proves it! I'm not sure what is going on as a single Xeon with HT certainly improves throughput.

Now booting into Win7 x64 things change for the better. Broke 30k (barely though) with HT ON. These are Xeon W5580 (2) CPUs running at 3.2GHz. (3.3GHz due to turbo).

And just out of curiosity I have a spare W5580 and despite it needing a dual CPU board it does not! It runs fine on an Asus P6T7 "supercomputer" board. 25X multiplier. Did get it up to 4.5GHz but ran out of time to play with it.

I have pics of the insides of the T7500 if anyone is curious. Four heat pipe cooler with about 1/3 the finned area of a Megahalems! No wonder they hit mid 80s with LinX! :eek:

Cinebench WOULD be faster if the image were much larger. Cinebench is becoming dated as a lot of time is spent starting a new thread, and the threads do not all get to work on actual rendering the whole time. If you rendered a much larger image, the performance benefit would be realized.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
I got the new Cinebench 11.5 going to run that in a bit and get some pix uploaded later this evening or tomorrow. I also have another box with HD5970 scribbled on it. I don't know where those are going at this point. :eek:
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
Cinebench WOULD be faster if the image were much larger. Cinebench is becoming dated as a lot of time is spent starting a new thread, and the threads do not all get to work on actual rendering the whole time. If you rendered a much larger image, the performance benefit would be realized.

Looks like this has been addressed in 11.5.

HT ON

t7500-FX4800-HTON.gif


HT OFF

t7500-FX4800-HTOFF.gif


Linpack definitely prefers HT OFF. That's always been the case. Looks like it's a FLOP!

gflops.gif
 
Last edited:

PsiStar

Golden Member
Dec 21, 2005
1,184
0
76
Interesting results. Now, could the yet to be available OC-ed i7 980X be faster?
 
Dec 30, 2004
12,553
2
76
Cinebench WOULD be faster if the image were much larger. Cinebench is becoming dated as a lot of time is spent starting a new thread, and the threads do not all get to work on actual rendering the whole time. If you rendered a much larger image, the performance benefit would be realized.
Looks like this has been addressed in 11.5.

HT ON

t7500-FX4800-HTON.gif


HT OFF

t7500-FX4800-HTOFF.gif


Linpack definitely prefers HT OFF. That's always been the case. Looks like it's a FLOP!

gflops.gif

yeah I made that comment and 11.5 got released lol :)

I am very happy with this new release. Much more reliable at producing a score than 10.5
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
Seems that not every software is able to take advantage of Hyper Threading regardless of the threads used. Is understandably that may be such software is hitting a bottleneck somewhere in the CPU, or switching so many threads that shares the same execution engine.

Windows 7 definitively shows better multi threading performance compared to Windows XP.
 
Dec 30, 2004
12,553
2
76
Seems that not every software is able to take advantage of Hyper Threading regardless of the threads used. Is understandably that may be such software is hitting a bottleneck somewhere in the CPU, or switching so many threads that shares the same execution engine.

Windows 7 definitively shows better multi threading performance compared to Windows XP.

you gotta have more threads to take advantage of the extra cores + hyperthreading.
Few programs can do 8-threaded computation.
 

JFAMD

Senior member
May 16, 2009
565
0
0
Part of the performance hit can also be the cache on some programs. If two threads are sharing a single core, if they need the same data/instructions, then you could see a benefit from HT. However, if you have cache contention and you are spending all of your time emptying out the cache and refilling it for the other thread, then performance suffers.

This is why we are not hot on the idea of SMT. There are times when there are benefits and there are times when performance is better by turning it off. How many people are running things slower because they have a "performance" feature turned on?

Cores will always scale if you have the threads to take advantage of them.
 

evolucion8

Platinum Member
Jun 17, 2005
2,867
3
81
you gotta have more threads to take advantage of the extra cores + hyperthreading.
Few programs can do 8-threaded computation.

But Cinebench can scale up to 48 cores, so why it wouldn't take advantage of Hyper Threading? What JFAMD posted makes a lot of sense.
 
Dec 30, 2004
12,553
2
76
But Cinebench can scale up to 48 cores, so why it wouldn't take advantage of Hyper Threading? What JFAMD posted makes a lot of sense.

Oh it does, I'm not saying it wouldn't. I'm just explaining why most programs don't see a benefit-- it's the same reason most programs don't see a benefit from my quad core-- because they aren't multithreaded.
 
Dec 30, 2004
12,553
2
76
Part of the performance hit can also be the cache on some programs. If two threads are sharing a single core, if they need the same data/instructions, then you could see a benefit from HT. However, if you have cache contention and you are spending all of your time emptying out the cache and refilling it for the other thread, then performance suffers.

This is why we are not hot on the idea of SMT. There are times when there are benefits and there are times when performance is better by turning it off. How many people are running things slower because they have a "performance" feature turned on?

Cores will always scale if you have the threads to take advantage of them.

this is really only a consequence for branch-heavy programming-- games for example. Branch prediction in math calculations (such as this program) is very accurate.
 

PsiStar

Golden Member
Dec 21, 2005
1,184
0
76
If highly OC MAYBE.

Of course the stock W5580 pair can beat the GT. There's no substitute for cubic inches in a drag race! ;)
Impressive results. I do use highly multi-threaded number crunching software that makes use of every cubic inch. My OC-ed i7 920 (4.34 GHz) gets a Cinebench score of 7.52. Not bad, but considering that I just ran a job that ran for 50 hours ... ugh!!! ... I am all over these Xeons.

I look forward to getting the i7 980X. However, considering the spread in these numbers I am no longer expecting to quite hit the performance of the dual Xeons ... doesn't mean that I won't be all over that race.

Turbo charged 2.5 L engines are in 9 sec cars these days, but we prefer curves.:biggrin: All of those cubes add weight. Although nothing, but nothing, beats the sound of Detroit power, idling or cranked.
 
Last edited:

PsiStar

Golden Member
Dec 21, 2005
1,184
0
76
Partial thread hi-jack ...

I just noticed that my OC-ed i7 920 with a much higher clock still has a lower Cinebench score than Rubycon's HT OFF score ... both running 8 threads.
 

Rubycon

Madame President
Aug 10, 2005
17,768
485
126
Partial thread hi-jack ...

I just noticed that my OC-ed i7 920 with a much higher clock still has a lower Cinebench score than Rubycon's HT OFF score ... both running 8 threads.

Yes because there's just 4 cores. The earlier version of CB ran both a single threaded and multithreaded render test with a MP speedup score. Like I said no sub for cubes. ;)

It will get interesting when eight core CPUs debut. If these have HT that will be 32 threads on a dual socket board and sixteen threads on a desktop board! :eek:
 

sxr7171

Diamond Member
Jun 21, 2002
5,079
40
91
I'm confused as to what you're trying to say here. If I'm reading correctly:

WinXP-64 | HT-OFF : 29540
WinXP-64 | HT-ON : 24503
Win7-64 | HT-ON : 30xxx

??
What score do you get with HT off in Win7?

Ha Ha. Question of the day.
 

sxr7171

Diamond Member
Jun 21, 2002
5,079
40
91
Part of the performance hit can also be the cache on some programs. If two threads are sharing a single core, if they need the same data/instructions, then you could see a benefit from HT. However, if you have cache contention and you are spending all of your time emptying out the cache and refilling it for the other thread, then performance suffers.

This is why we are not hot on the idea of SMT. There are times when there are benefits and there are times when performance is better by turning it off. How many people are running things slower because they have a "performance" feature turned on?

Cores will always scale if you have the threads to take advantage of them.

I am wondering if for the average user who uses regular mostly single threaded apps in a regular desktop, browsing, office using capacity it makes sense to have HT.

I have more cores than I really know what to do with. The one program that maxes out my CPU is clearly single threaded in that with HT on I see it pegging the CPU at 13%. With HT off it pegs my CPU at 25%.

That having been said, I don't see it running anywhere close to twice as fast with HT off.

But I don't any sense in keeping HT on. Does it ever help in a "domestic" usage situation?
 

sxr7171

Diamond Member
Jun 21, 2002
5,079
40
91
Yes because there's just 4 cores. The earlier version of CB ran both a single threaded and multithreaded render test with a MP speedup score. Like I said no sub for cubes. ;)

It will get interesting when eight core CPUs debut. If these have HT that will be 32 threads on a dual socket board and sixteen threads on a desktop board! :eek:

I understand that it will take years for most of our software to be optimized for 2 cores leave alone 4 cores. Certainly none of the programs I run seem to take advantage of the cores except for dBpoweramp. With dBpoweramp it was beautiful thing as it is with pretty much encoding program.

However, I don't know if I see it speeding up anything else (other than some games recently).

They say to then optimize software for 4 cores would take another several years and so on so forth. They also say the rate of this software optimization is likely to occur at a rate that is several magnitudes lower than the rate of our number of processor cores doubling. I tend to believe that assessment by what I have noticed.

I too am excited about 6 cores and then 8/16/32/64/128 as time progresses. My "techie" nature won't allow me to. However, I'm not really excited by it because I don't see what problem it is going to solve at least for desktop users. I suppose maybe that's why the cloud "solution" is being pushed so hard.

Anyway, maybe my conception of the benefit of this technology may be incorrect at least to desktop users at home.
 

PsiStar

Golden Member
Dec 21, 2005
1,184
0
76
Yes because there's just 4 cores. The earlier version of CB ran both a single threaded and multithreaded render test with a MP speedup score. Like I said no sub for cubes. ;)

It will get interesting when eight core CPUs debut. If these have HT that will be 32 threads on a dual socket board and sixteen threads on a desktop board! :eek:
That option is still available under File on the menu.

This benchmark is new to me. I recently did make a number of timed solution runs with a number cruncher versus number of threads. That particular program had a definite roll off versus number of threads. After 4 threads there was little benefit, BUT that program has the option of launching additional versions of the same problem. But that is a different topic as what this thread is about ... does causes me to think about the effect of memory BW tho.

When I run CB for the CPU, I notice what I will call sticky windows. But it doesn't seem absolutely consistent ... what the heck is it with that?? Some runs are like the Snake Game with the little windows always together. Others have a couple of stragglers. What the heck?

My OpenGL is 45.55 on a XFX HD5770 (OC-ed) BTW .. is that good?():)