8 Physical cores vs 4 Physical 8 Threads?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
I've clocked mine to 3.2ghz and 1600fsb, should it make that much difference at the same clock speed?

QX9775 should still have bit of an advantage due to the consumer optimized platform and the CPU being the more advanced Penryn core.

if the coding is done right, then 8 will always beat 6+HT, assuming same clock speed and processor design.

That's too close of a comparison. 8 cores are in theoretical 33% faster than 6 cores and the scaling is nowhere near 100%(just because of Amdahl's law, there are other factors like interconnect, optimized programs).

There are also limited cases that favor Hyperthreading.
 
Dec 30, 2004
12,553
2
76
QX9775 should still have bit of an advantage due to the consumer optimized platform and the CPU being the more advanced Penryn core.



That's too close of a comparison. 8 cores are in theoretical 33% faster than 6 cores and the scaling is nowhere near 100%(just because of Amdahl's law, there are other factors like interconnect, optimized programs).

There are also limited cases that favor Hyperthreading.

Wrong, if the coding is done correctly, 8 cores will always win.
 

Diogenes2

Platinum Member
Jul 26, 2001
2,151
0
0
Wrong, if the coding is done correctly, 8 cores will always win.
" If the coding is done correctly .. "

That's a big IF, and not something you can count own.

I use Vegas Video, and HyperThreading usually speeds things up, depending on the codec.

I'll do some experimenting with disabling some cores, and switching Hyperthreading on and off, and see what happens.

Not sure how soon I can get back with the results; someone else might beat me to it..
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Wrong, if the coding is done correctly, 8 cores will always win.
So you're saying that code inherently has to use every available unit otherwise the programmer didn't know what they're doing? In most consumer apps probably true, but that generalization won't hold always..
 
Dec 30, 2004
12,553
2
76
So you're saying that code inherently has to use every available unit otherwise the programmer didn't know what they're doing? In most consumer apps probably true, but that generalization won't hold always..

Well if the application lends itself to parallelization, yes. Games like FarCry2 I _believe_ are examples of poorly optimized code-- the framerate scales linearly with HT. This means not all the functional units in the cores are being used in the calculations.

For example, if you write highly optimized assembly code, you can ensure that all of the processor is used every moment of every second. In this scenario, HT is not going to help you any, because all parts of the core are already busy. That's all I'm saying-- HT fills unused parts of the core with calculations. If all are used already, then you won't get much extra performance.

This is why the HT on the atom processor scales so well-- it's an in-order architecture, so the main code path gets held up frequently waiting for data. Out of order architectures can go on and process other calculations for which the processor has sufficient data to start processing on; but not the Atom-- so then the hyperthreading takes over and fills those unused functional units in the processor with calculations from a 2nd thread.
 
Last edited:

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Well if the application lends itself to parallelization, yes. Games like FarCry2 I _believe_ are examples of poorly optimized code-- the framerate scales linearly with HT. This means not all the functional units in the cores are being used in the calculations.

That's BS. We don't even know what's "linear" with Hyperthreading. Is the max 50% gain? 30%? 5%? 1%?

No perfect code will ever exist, and there will be always an opportunity to use Hyperthreading to increase utilization of functional units. Benchmarks show programs that gain significantly from Hyperthreading are those that gain big from more cores.

Should the developers spend all of their time and resources in creating single thread optimized code over everything else(multi-threading/gameplay/graphics/stability/time to market)?

SMT variants like Hyperthreading doesn't just speed code by merely filling in on an empty execution time. It can work for things like cache misses as well. Some even favor Hyperthreading over 33% more cores. 6 vs 4+HT? I would agree. 8 vs 6+HT way too close.
 
Last edited:

Makaveli

Diamond Member
Feb 8, 2002
4,979
1,571
136
This topic gets more interesting as the post keep coming.

Would love to see a huge round up of applications benchmarks of Core+HT vs without + more cores.

With just intel cpu's for now from the last 2 generations think it would make for an interesting article.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
For example, if you write highly optimized assembly code, you can ensure that all of the processor is used every moment of every second. In this scenario, HT is not going to help you any, because all parts of the core are already busy. That's all I'm saying-- HT fills unused parts of the core with calculations. If all are used already, then you won't get much extra performance.
And thankfully there's not even one algorithm out there who doesn't need both int and fp units in exactly the right combination ;)
While you can easily craft programs which will show huge gains because of SMT, in real world that factor is obviously mitigated because both types of programs (those that use all execution units and those that use only, say, the fp units) are rather atypical.
And still there are programs where we see absolutely no performance benefits to those with ~30% increases.
 
Dec 30, 2004
12,553
2
76
That's BS. We don't even know what's "linear" with Hyperthreading. Is the max 50% gain? 30%? 5%? 1%?

No perfect code will ever exist, and there will be always an opportunity to use Hyperthreading to increase utilization of functional units. Benchmarks show programs that gain significantly from Hyperthreading are those that gain big from more cores.

Should the developers spend all of their time and resources in creating single thread optimized code over everything else(multi-threading/gameplay/graphics/stability/time to market)?

SMT variants like Hyperthreading doesn't just speed code by merely filling in on an empty execution time. It can work for things like cache misses as well. Some even favor Hyperthreading over 33% more cores. 6 vs 4+HT? I would agree. 8 vs 6+HT way too close.

Nah, the best we've consistently seen on Out of Order architectures was like 20% gains, more usually 15% IIRC, so I'd still go for the 8 any day.

BTW, I think it would be pretty trivial to write code that scales 100% with HT. Just write code that purposefully misses cache and causes your CPU architecture to mis-predict branches with branch history register aliasing or something.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
For example, if you write highly optimized assembly code, you can ensure that all of the processor is used every moment of every second. In this scenario, HT is not going to help you any, because all parts of the core are already busy. That's all I'm saying-- HT fills unused parts of the core with calculations. If all are used already, then you won't get much extra performance.

You have some pretty high standards for coding. You'd have to be pretty lucky to get OOO to give you perfect allocation for the entirety of your code.
 

PlasmaBomb

Lifer
Nov 19, 2004
11,636
2
81
Nah, the best we've consistently seen on Out of Order architectures was like 20% gains, more usually 15% IIRC, so I'd still go for the 8 any day.

BTW, I think it would be pretty trivial to write code that scales 100% with HT. Just write code that purposefully misses cache and causes your CPU architecture to mis-predict branches with branch history register aliasing or something.

Sounds like a challenge for you then!

1) Create a "perfect" code, test on various architectures
2) Create a "trivial" code designed to miss cache, test on various architectures
3) Compare results
4) ???
5) Profit
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
You have some pretty high standards for coding. You'd have to be pretty lucky to get OOO to give you perfect allocation for the entirety of your code.

Its not so much of writing perfect code to bypass the benefits of HTT completely, its more of having good enough code that performs better without the added overhead of HTT thrashing the cache. Even Linpack has sections of code that benefit from HTT; however, the majority of it's code suffers with it on.

HTT is disabled by default on most server motherboards because performance usually suffers.
 
Dec 30, 2004
12,553
2
76
Sounds like a challenge for you then!

1) Create a "perfect" code, test on various architectures
2) Create a "trivial" code designed to miss cache, test on various architectures
3) Compare results
4) ???
5) Profit

already done #2 before, and I'm not motivated to prove this to anyone because I'm convinced by myself, so ...
 
Dec 30, 2004
12,553
2
76
You have some pretty high standards for coding. You'd have to be pretty lucky to get OOO to give you perfect allocation for the entirety of your code.

But it doesn't have to be perfect. It only has to be less than 33% better and I win (by taking the 8 cores over the 6).
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
HTT is disabled by default on most server motherboards because performance usually suffers.

There are few articles that say Hyperthreading is mistakenly disabled on Nehalem because of how people have seen it behaved back in the Pentium 4 era, which isn't true anymore. Few apps still exist that degrade, but significantly less number of them.
 

Lorne

Senior member
Feb 5, 2001
873
1
76
I see some people still not knowing what HT really does on the hardware side,,,
Some are assuming that a with HT on a CPU/core adds onto its 100% speed overal for a second thred, This is false.
A second thred is innitiated with remaining pipelines which may not even be 50% of the core ability, There is also issue that the next few threds trade of available amounts of pipelines which goes into the programming issue (another subject).
HT also requires more memory and bus resources/bandwidth per thred as do all threds HT or pure core power.
Finally you cannot compare C2 vs the I series, Those are 2 completely different archetectures, Its like the Athlon64 vs the P4,,, With the built in MMU and Hypertranport abillity (Intels HT name eludes me at this time, Sorry) opens so much more capability on the I series over C2.
C2 shares the bus to its MMU in the Northbridge, This is loaded up with all data going to and from all hardware expantions and chokes the bandwidth as it loads up, The I series does not suffer this as badly.
Im not sure myself but assume that a 2 core I series w/HT should be at least equal if not better then a C2Q cycle per cycle and same speed memory.

For our lab we have dissabeled HT on most of the servers that do heavy rendering because it just chokes a system to a stall and can issue data out of sequence in some cases, The light load (small thred) systems we leave it on as it lowers latency on request.
 

HAL9000

Lifer
Oct 17, 2010
22,021
3
76
Well I'm glad I started this topic, all I was wondering was if I made the right choice about processors for video encoding, now here we are!
 

Dufus

Senior member
Sep 20, 2010
675
119
101
already done #2 before, and I'm not motivated to prove this to anyone because I'm convinced by myself, so ...
Hi soccerballtux, how close to 100% increase did you get with HT scaling? IMHO it would be interesting to see how your cache missing code works and would be much appreciated if you would be kind enough to post and share the code as an example. :)
 

grimpr

Golden Member
Aug 21, 2007
1,095
7
81
Thats a nice way to put it and i agree, but the trend on usability goes to graphics, just imagine animated *live* interfaces running with antialiased text, 3d graphics and dx11 post process filters on windows 7 apps. Its just amazing but requires a change in mindset, from a hardcore start>all programs luddite to gamer designer.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
just imagine animated *live* interfaces running with antialiased text, 3d graphics and dx11 post process filters on windows 7 apps.
You just described my worst nightmares, thanks. Honestly, you really want animated interfaces with 3d graphics in your standard software? The whole point about a good UI is that it's intuitive and not distracting.

And considering that HT does lots of things the programmer can't even influence, I don't see how it makes programmers lazy or not. Especially considering that usually the executeables will have to run on a vast range of x86 processors.