8 Physical cores vs 4 Physical 8 Threads?

IntelUser2000 · Nov 7, 2010

neckarb said:
I've clocked mine to 3.2ghz and 1600fsb, should it make that much difference at the same clock speed?

QX9775 should still have bit of an advantage due to the consumer optimized platform and the CPU being the more advanced Penryn core.

if the coding is done right, then 8 will always beat 6+HT, assuming same clock speed and processor design.

That's too close of a comparison. 8 cores are in theoretical 33% faster than 6 cores and the scaling is nowhere near 100%(just because of Amdahl's law, there are other factors like interconnect, optimized programs).

There are also limited cases that favor Hyperthreading.

soccerballtux · Nov 7, 2010

IntelUser2000 said:
QX9775 should still have bit of an advantage due to the consumer optimized platform and the CPU being the more advanced Penryn core.

That's too close of a comparison. 8 cores are in theoretical 33% faster than 6 cores and the scaling is nowhere near 100%(just because of Amdahl's law, there are other factors like interconnect, optimized programs).

There are also limited cases that favor Hyperthreading.

Wrong, if the coding is done correctly, 8 cores will always win.

HAL9000 · Nov 7, 2010

soccerballtux said:
Wrong, if the coding is done correctly, 8 cores will always win.

Looks like a battle has begun! Fight!

Diogenes2 · Nov 7, 2010

soccerballtux said:
Wrong, if the coding is done correctly, 8 cores will always win.

" If the coding is done correctly .. "

That's a big IF, and not something you can count own.

I use Vegas Video, and HyperThreading usually speeds things up, depending on the codec.

I'll do some experimenting with disabling some cores, and switching Hyperthreading on and off, and see what happens.

Not sure how soon I can get back with the results; someone else might beat me to it..

Voo · Nov 7, 2010

soccerballtux said:
Wrong, if the coding is done correctly, 8 cores will always win.

So you're saying that code inherently has to use every available unit otherwise the programmer didn't know what they're doing? In most consumer apps probably true, but that generalization won't hold always..

soccerballtux · Nov 7, 2010

Voo said:
So you're saying that code inherently has to use every available unit otherwise the programmer didn't know what they're doing? In most consumer apps probably true, but that generalization won't hold always..

Well if the application lends itself to parallelization, yes. Games like FarCry2 I _believe_ are examples of poorly optimized code-- the framerate scales linearly with HT. This means not all the functional units in the cores are being used in the calculations.

For example, if you write highly optimized assembly code, you can ensure that all of the processor is used every moment of every second. In this scenario, HT is not going to help you any, because all parts of the core are already busy. That's all I'm saying-- HT fills unused parts of the core with calculations. If all are used already, then you won't get much extra performance.

This is why the HT on the atom processor scales so well-- it's an in-order architecture, so the main code path gets held up frequently waiting for data. Out of order architectures can go on and process other calculations for which the processor has sufficient data to start processing on; but not the Atom-- so then the hyperthreading takes over and fills those unused functional units in the processor with calculations from a 2nd thread.

IntelUser2000 · Nov 7, 2010

Well if the application lends itself to parallelization, yes. Games like FarCry2 I _believe_ are examples of poorly optimized code-- the framerate scales linearly with HT. This means not all the functional units in the cores are being used in the calculations.

That's BS. We don't even know what's "linear" with Hyperthreading. Is the max 50% gain? 30%? 5%? 1%?

No perfect code will ever exist, and there will be always an opportunity to use Hyperthreading to increase utilization of functional units. Benchmarks show programs that gain significantly from Hyperthreading are those that gain big from more cores.

Should the developers spend all of their time and resources in creating single thread optimized code over everything else(multi-threading/gameplay/graphics/stability/time to market)?

SMT variants like Hyperthreading doesn't just speed code by merely filling in on an empty execution time. It can work for things like cache misses as well. Some even favor Hyperthreading over 33% more cores. 6 vs 4+HT? I would agree. 8 vs 6+HT way too close.

Makaveli · Nov 7, 2010

This topic gets more interesting as the post keep coming.

Would love to see a huge round up of applications benchmarks of Core+HT vs without + more cores.

With just intel cpu's for now from the last 2 generations think it would make for an interesting article.

HendrixFan · Nov 7, 2010

Which CPUs are you comparing?

Voo · Nov 7, 2010

soccerballtux said:
For example, if you write highly optimized assembly code, you can ensure that all of the processor is used every moment of every second. In this scenario, HT is not going to help you any, because all parts of the core are already busy. That's all I'm saying-- HT fills unused parts of the core with calculations. If all are used already, then you won't get much extra performance.

And thankfully there's not even one algorithm out there who doesn't need both int and fp units in exactly the right combination

While you can easily craft programs which will show huge gains because of SMT, in real world that factor is obviously mitigated because both types of programs (those that use all execution units and those that use only, say, the fp units) are rather atypical.
And still there are programs where we see absolutely no performance benefits to those with ~30% increases.

soccerballtux · Nov 7, 2010

IntelUser2000 said:
That's BS. We don't even know what's "linear" with Hyperthreading. Is the max 50% gain? 30%? 5%? 1%?

No perfect code will ever exist, and there will be always an opportunity to use Hyperthreading to increase utilization of functional units. Benchmarks show programs that gain significantly from Hyperthreading are those that gain big from more cores.

Should the developers spend all of their time and resources in creating single thread optimized code over everything else(multi-threading/gameplay/graphics/stability/time to market)?

SMT variants like Hyperthreading doesn't just speed code by merely filling in on an empty execution time. It can work for things like cache misses as well. Some even favor Hyperthreading over 33% more cores. 6 vs 4+HT? I would agree. 8 vs 6+HT way too close.

Nah, the best we've consistently seen on Out of Order architectures was like 20% gains, more usually 15% IIRC, so I'd still go for the 8 any day.

BTW, I think it would be pretty trivial to write code that scales 100% with HT. Just write code that purposefully misses cache and causes your CPU architecture to mis-predict branches with branch history register aliasing or something.

TuxDave · Nov 8, 2010

soccerballtux said:
For example, if you write highly optimized assembly code, you can ensure that all of the processor is used every moment of every second. In this scenario, HT is not going to help you any, because all parts of the core are already busy. That's all I'm saying-- HT fills unused parts of the core with calculations. If all are used already, then you won't get much extra performance.

You have some pretty high standards for coding. You'd have to be pretty lucky to get OOO to give you perfect allocation for the entirety of your code.

PlasmaBomb · Nov 8, 2010

soccerballtux said:
Nah, the best we've consistently seen on Out of Order architectures was like 20% gains, more usually 15% IIRC, so I'd still go for the 8 any day.

BTW, I think it would be pretty trivial to write code that scales 100% with HT. Just write code that purposefully misses cache and causes your CPU architecture to mis-predict branches with branch history register aliasing or something.

Sounds like a challenge for you then!

1) Create a "perfect" code, test on various architectures
2) Create a "trivial" code designed to miss cache, test on various architectures
3) Compare results
4) ???
5) Profit

Ben90 · Nov 8, 2010

TuxDave said:
You have some pretty high standards for coding. You'd have to be pretty lucky to get OOO to give you perfect allocation for the entirety of your code.

Its not so much of writing perfect code to bypass the benefits of HTT completely, its more of having good enough code that performs better without the added overhead of HTT thrashing the cache. Even Linpack has sections of code that benefit from HTT; however, the majority of it's code suffers with it on.

HTT is disabled by default on most server motherboards because performance usually suffers.

soccerballtux · Nov 8, 2010

PlasmaBomb said:
Sounds like a challenge for you then!

1) Create a "perfect" code, test on various architectures
2) Create a "trivial" code designed to miss cache, test on various architectures
3) Compare results
4) ???
5) Profit

already done #2 before, and I'm not motivated to prove this to anyone because I'm convinced by myself, so ...

soccerballtux · Nov 8, 2010

TuxDave said:
You have some pretty high standards for coding. You'd have to be pretty lucky to get OOO to give you perfect allocation for the entirety of your code.

But it doesn't have to be perfect. It only has to be less than 33% better and I win (by taking the 8 cores over the 6).

IntelUser2000 · Nov 8, 2010

Ben90 said:
HTT is disabled by default on most server motherboards because performance usually suffers.

There are few articles that say Hyperthreading is mistakenly disabled on Nehalem because of how people have seen it behaved back in the Pentium 4 era, which isn't true anymore. Few apps still exist that degrade, but significantly less number of them.

Lorne · Nov 8, 2010

I see some people still not knowing what HT really does on the hardware side,,,
Some are assuming that a with HT on a CPU/core adds onto its 100% speed overal for a second thred, This is false.
A second thred is innitiated with remaining pipelines which may not even be 50% of the core ability, There is also issue that the next few threds trade of available amounts of pipelines which goes into the programming issue (another subject).
HT also requires more memory and bus resources/bandwidth per thred as do all threds HT or pure core power.
Finally you cannot compare C2 vs the I series, Those are 2 completely different archetectures, Its like the Athlon64 vs the P4,,, With the built in MMU and Hypertranport abillity (Intels HT name eludes me at this time, Sorry) opens so much more capability on the I series over C2.
C2 shares the bus to its MMU in the Northbridge, This is loaded up with all data going to and from all hardware expantions and chokes the bandwidth as it loads up, The I series does not suffer this as badly.
Im not sure myself but assume that a 2 core I series w/HT should be at least equal if not better then a C2Q cycle per cycle and same speed memory.

For our lab we have dissabeled HT on most of the servers that do heavy rendering because it just chokes a system to a stall and can issue data out of sequence in some cases, The light load (small thred) systems we leave it on as it lowers latency on request.

betasub · Nov 8, 2010

^ And I thought tweakboy's posting style was unique! ty gl and gb

HAL9000 · Nov 9, 2010

Well I'm glad I started this topic, all I was wondering was if I made the right choice about processors for video encoding, now here we are!

Dufus · Nov 10, 2010

soccerballtux said:
already done #2 before, and I'm not motivated to prove this to anyone because I'm convinced by myself, so ...

Hi soccerballtux, how close to 100% increase did you get with HT scaling? IMHO it would be interesting to see how your cache missing code works and would be much appreciated if you would be kind enough to post and share the code as an example.

grimpr · Nov 10, 2010

HT makes lazy programmers.

IntelUser2000 · Nov 11, 2010

grimpr said:
HT makes lazy programmers.

HT makes better programs that focuses on usability.

grimpr · Nov 11, 2010

Thats a nice way to put it and i agree, but the trend on usability goes to graphics, just imagine animated *live* interfaces running with antialiased text, 3d graphics and dx11 post process filters on windows 7 apps. Its just amazing but requires a change in mindset, from a hardcore start>all programs luddite to gamer designer.

Voo · Nov 11, 2010

grimpr said:
just imagine animated *live* interfaces running with antialiased text, 3d graphics and dx11 post process filters on windows 7 apps.

You just described my worst nightmares, thanks. Honestly, you really want animated interfaces with 3d graphics in your standard software? The whole point about a good UI is that it's intuitive and not distracting.

And considering that HT does lots of things the programmer can't even influence, I don't see how it makes programmers lazy or not. Especially considering that usually the executeables will have to run on a vast range of x86 processors.

8 Physical cores vs 4 Physical 8 Threads?

Elite Member

Lifer

Lifer

Platinum Member

Golden Member

Lifer

Elite Member

Diamond Member

Diamond Member

Golden Member

Lifer

Lifer

Lifer

Platinum Member

Lifer

Lifer

Elite Member

Senior member

Platinum Member

Lifer

Senior member

Golden Member

Elite Member

Golden Member

Golden Member