Hyper-threading

sm625 · Oct 5, 2011

AtenRa said:
Core i3 2100 (2C/4T) vs Pentium G850
http://www.anandtech.com/bench/Product/289?vs=404

Hyper Threading + larger Cache really helps dual core CPUs today, quad cores dont take that much.

Notice how in SC2, the i3 loses by about 5%, despite being clocked 5% higher AND having twice the L2 cache! If the extra cache is worth 5% then we have an example of sandy bridge HT hindering performance by 15%!

I wish someone would run a set of tests on a 2100 with and without HT.

TakeNoPrisoners · Oct 5, 2011

On dual cores it is helpful. On tri-cores it can be helpful. On quad cores it is much less helpful. This is just for games. It depends on the application.

Tsavo · Oct 5, 2011

sm625 said:
Notice how in SC2, the i3 loses by about 5%, despite being clocked 5% higher AND having twice the L2 cache! If the extra cache is worth 5% then we have an example of sandy bridge HT hindering performance by 15%!

I wish someone would run a set of tests on a 2100 with and without HT.

...and you completely ignore all the other benches where the i3 eviscerates the Pentium. :hmm:

sm625 · Oct 5, 2011

I wouldnt call it evisceration. The point is how do we know how much of that comes from L2 cache differences, clock differences, or HT. It seems to me it is mostly cache and clock.

The Ultimate · Oct 5, 2011

pcslookout said:
What does the extra cache help with?

It helps a lot in games, there's a point were increasing the cache size doesn't give you tangible gains, but that's largely dependant of the architecture. Core 2 Duo Mobile based on Penryn for example, from 1MB to 3MB yields you between 5%-14% in CPU limited gaming performance, but from 3MB to 6MB, only between 4% and 7%. That's what I experienced upgrading my laptop's CPU from Pentium T4300 to Core 2 T8300 to Core 2 T9500, the latter barely increased performance at all, money wasted LOLL

mrjoltcola · Oct 5, 2011

sm625 said:
I wouldnt call it evisceration. The point is how do we know how much of that comes from L2 cache differences, clock differences, or HT. It seems to me it is mostly cache and clock.

Differences in a specific game may just be an optimization for a particular architecture that changed. Not sure, in this case, but it must be considered.

Blastman · Oct 6, 2011

sm625 said:
See

http://images.anandtech.com/graphs/graph4524/40757.png

In this case a 2.80 GHz pentium G840 is quite a bit faster, clock for clock, than an i3-2100. One must assume it is HT causing this mini-meltdown.

There is something wrong with the DAO benches for the G850, G840 and G620 in that chart. No way the G850 should be close to the i3-2100 in DAO which takes good advantage of multiple cores and HT on the i3's. I remember when the results for that test first went up and I read that review, I was going to post a note on the comment section that those DAO benches were wrong, but never got around to it. Thanks for jogging my memory.

HT on the i3's Clarkdales in DAO boosts performance about in the 40-50% range, which if is any indication, the SB G processors should be far behind the i3's.

Consider the Clarkdales

i3-540 (3.06) 95.7 57.1% faster
G6950 (2.8) 60.9

The i3 only has a 9.3% clock advantage yet is 57.1% faster.

The G6950 has a slightly slower Un-core and other minor differences, but that likely only accounts for about 5% advantage for the i3 at best. That still leaves the i3-540 over 40% faster clock for clock over the HT'less G6950.

From the SB G850 DAO test

G620 (2.6) 107.3
G620T (2.2) 77.3

The G620 has a 18% clock advantage over the G620T (same processor), yet it's 39% faster! ??? No way, no how.

I'm thinking most likely someone posted the wrong numbers for most of the G processors in that DAO chart.

Perhaps a mod can inform Anand and get someone to check the game performance numbers in that review if they want to bother to correct them. Some of the other gaming numbers in that review could be off too -- the G840 is listed as faster than the i3-2100 in Starcraft 2 -- which could be wrong too, but since it's a small margin it could be just benchmark variances.

TheRyuu · Oct 6, 2011

Depends on your workload. If you're encoding video with say x264 you can expect gains in the 25% area with HT enabled (I'm talking about gain when turning on/off HT on the same CPU). Other heavily threaded applications should see a similar benefit.

Idontcare · Oct 6, 2011

drizek said:
I'd agree with you, except we passed the threshold for single core performance a while ago. Even with all its inefficiencies, a Bulldozer chip at 5GHz is really more than enough for just about anything you could want to do. At this point, and increasingly in the future, most of the most demanding applications are multithreaded and HT/many cores will become increasingly important.

HT owes it existence, and performance, to living in a bubble that is created by pipeline inefficiencies which are themselves created by coding inefficiencies.

Those programmed inefficiencies are forever hard-coded into legacy programs unless the source code is revisited and recompiled, an unlikely outcome.

But I would hope that as we move forward into the future that the programming efficiencies (compilers) are asymptotically moving us closer to the point of eliminating the opportunities for the very sort of pipeline inefficiencies that HT is able to take advantage of.

We already see that in programs like Prime95 and IBT/LinX.

So when you refer to a future world where programs are all the more threaded and optimized for multi-core environments, I would like to think those apps will show even less benefit from HT-enabled processors...unless they are actually being optimized to intentionally ah heck up the pipeline such that HT is able to live inside a larger bubble and "save the day".

CMT is the only path forward if one is attempting to maximize TLP while balancing the expense of maximizing ILP. Just my opinion.

toyota · Oct 6, 2011

sm625 said:
I wouldnt call it evisceration. The point is how do we know how much of that comes from L2 cache differences, clock differences, or HT. It seems to me it is mostly cache and clock.

Anandtech mislabeled the 850 because it and the i3 the SAME amount of L2.

mrjoltcola · Oct 6, 2011

Idontcare said:
HT owes it existence, and performance, to living in a bubble that is created by pipeline inefficiencies which are themselves created by coding inefficiencies.

Those programmed inefficiencies are forever hard-coded into legacy programs unless the source code is revisited and recompiled, an unlikely outcome.

There are plenty of examples of softwares that have been multi-threaded for decades. Posix threads and Win32 both are common examples that came about in the early 90s.

It is an issue of optimization. You can't optimize for every case. A system optimized for a single application may not scale as you add other applications. And you can't benchmark your system for one model, and then claim the optimization (HT) doesn't work for other models. A lot of the HT detractors benchmark a single application, see that HT performs worse, and mistakenly apply that to a multi-app system.

Lets take a single program as an example. I have control of the source code and can modify the implementation as I see fit. Context switching is optional. I can implement it single-threaded or I can divide the work. The penalty is, a context switch costs, so if my code isn't constrained by memory / cache stalls (take a simple tight loop), then it certainly won't perform better by breaking it into two pieces unless those pieces can truly run in parallel (physical cores), so HT's false logical cores don't help here, they hurt.

Now, take a second model, the one of multiple programs. The context switch isn't optional; the OS must timeshare between them. That same program may run alongside other programs _better_ with HT enabled because we are optimizing away that non-optional context switch somewhat with HT.

If I distribute one build of my software, I'm probably better off with a multi-threaded build, accepting the 3% penalty seen on single core systems, knowing that on multi-core there will be 100%-300% benefit. At best, my program can detect the number of cores and existence of HT at startup, and adjust accordingly, or take the route of many softwares and let the end user change a config file (Oracle is one of the best examples - extremely tunable, and extremely threaded if you so desire to configure it that way).

Too many artificial benchmarks that claim HT "suffers" focus on the wrong thing, by analyzing performance on the single program scenario, they expect that to apply to the multi-program scenario, or at least the readers of the benchmark are the ones making the mistake. It won't ever work that way. You can't use a single application as a benchmark for a multi-app system and vice-versa. But this is true for any optimization, not just HT.

As it stands HT is really an optimization that requires cooperation of hardware and software to work best; it isn't an optimization that can be simply dismissed, as it has measurable performance boost in the general case, but the fact that we can't assume a core is a core will always be a problem, and amounts to technical dishonesty for the sake of marketing.

soccerballtux · Oct 6, 2011

Cogman said:
You probably won't see too much gains there. Hyperthreading works best when you have an application that does lots of integer and floating point operations while being threaded. This is really a pretty limited space (games, sometimes). Other than that, there aren't a lot of big gains from hyperthreading.

Hyperthreading is useful when the main thread gets stalled for data, which rarely happens during math-heavy computations because branch prediction for maths is easy.

It works out for games because there are lots of unpredictable branches (which direction is the user going to look? Is he going to fire, or not fire?). If you have a dual core without hyperthreading, the CPU would stall waiting for the data from the RAM. If you have a dual core with hyperthreading and the game is quad-threaded, then the other 2 threads can execute and get stuff done while the first 2 threads are waiting for data from system ram.

soccerballtux · Oct 6, 2011

The Ultimate said:
In the best case scenario, it can give you up to 30% performance boost, usually is around 7%-18%, sometimes very intensive computing software leaves no bubbles in the execution engine, resulting in a impact in performance when Hyper Threading is enabled. But usually that doesn't happen with common software, may be scientific stuff.

very true

soccerballtux · Oct 6, 2011

Idontcare said:
HT owes it existence, and performance, to living in a bubble that is created by pipeline inefficiencies which are themselves created by coding inefficiencies.

Those programmed inefficiencies are forever hard-coded into legacy programs unless the source code is revisited and recompiled, an unlikely outcome.

But I would hope that as we move forward into the future that the programming efficiencies (compilers) are asymptotically moving us closer to the point of eliminating the opportunities for the very sort of pipeline inefficiencies that HT is able to take advantage of.

We already see that in programs like Prime95 and IBT/LinX.

So when you refer to a future world where programs are all the more threaded and optimized for multi-core environments, I would like to think those apps will show even less benefit from HT-enabled processors...unless they are actually being optimized to intentionally ah heck up the pipeline such that HT is able to live inside a larger bubble and "save the day".

CMT is the only path forward if one is attempting to maximize TLP while balancing the expense of maximizing ILP. Just my opinion.

I thought this for a while.
Prime and IBT are all maths though, nothing to optimize, and accurate branch predictors for these are a dime a dozen. Had to simulate some of the early branch predictors pioneered by PhDs in the early 90s, all you have to do is XOR the current memory address with the branch destination address to get a very unique identifier that you can start building a branch history on top of and boom there's 95% prediction accuracy average, worst case 80%+.
http://home.eng.iastate.edu/~zzhang/courses/cpre585-f03/reading/isca92-yeh-2l_branch.pdf
Interesting to note, compilers (GCC for example) are very hard to optimize for...I think it's the nature of the beast....

Maybe Microsoft has just gotten lazy and isn't optimizing their code, but seeing how hard it can be to write efficient code I am not so eager to cast the first stone anymore...

Revolution 11 · Oct 7, 2011

Well, a general rule of good programmers is that it is cheaper to throw hardware at a problem instead of hiring more programmers.

Programming is hard stuff and heavily optimizing code takes a lot of time and money that could be spent on more important projects. Optimized code is also harder to modify.

I am not justifying bad or sloppy programming or feature bloat, but this is just the truth. When Moore's Law stops being true and hardware improvements slow down enough, programming will improve again as the cost-benefit equation has changed.

Idontcare · Oct 7, 2011

Revolution 11 said:
Well, a general rule of good programmers is that it is cheaper to throw hardware at a problem instead of hiring more programmers.

Programming is hard stuff and heavily optimizing code takes a lot of time and money that could be spent on more important projects. Optimized code is also harder to modify.

I am not justifying bad or sloppy programming or feature bloat, but this is just the truth. When Moore's Law stops being true and hardware improvements slow down enough, programming will improve again as the cost-benefit equation has changed.

Therein lies another "devil is in the details" moment.

With very few exceptions are software products "overlooked" and their competitors software products are purchased instead for the sole sake of "speed of the application".

Look no farther than antivirus scanners, huge disparities in the processing times across the spectrum of available products and for the most part those products are not purchased with "scanning time" as the priority. Cost and effectiveness (perceived or real) tend to be the deciding factors.

Look at photoshop. Whether or not CS6 has improved multithreading or improved speed over CS5 will not determine its market success. Feature set will.

People went ape over the "Content Aware" feature when cropping out stuff in the foreground and the background would automagically be recreated and filled in. No body really stepped up and said "yeah but I'm not buying it because the multi-threaded scaling of that feature is lacking

".

And this is the reality of the software industry. What doomed Vista wasn't the speed gains and losses in various capacities over XP. And what makes people choose Win7 vs. OS X are not really coming down to matters of "well I timed them both in a controlled environment and the OS X is faster than Win7". Its still a feature-set (real or perceived) driven purchase.

So there is a very mild prioritization of multi-threading and improving application performance in these desktop and mobile markets. Yes people will gladly take speed improvements, but at the same time companies do not really appear to be at risk of losing significant sales if they fail to spend the extra money in developing the product so that it is faster.

Another example of this is winzip vs winrar vs 7zip. These all have widely differing threaded capabilities and speeds, and yet it is not those aspects of the programs that seem to determine which product people use.

I love 7zip for its ability to make very compact files, but the command line interface does not allow me to create automated timestamp filenames whereas winrar does, so I use winrar (paid for it even) despite the fact it is slower and less compacting than 7zip (which is free).

The opportunity cost to winrar for having not spent more money to develop faster products than 7zip is really small. No motivation to compete in metrics that actually do not heavily factor into the purchasing decisions of the customer.

The opposite is clearly true in the world of servers and HPC. In that arena, applications live and die by their performance, no question.

jacktesterson · Oct 7, 2011

Short Answer.... Quad Sandy Bridges... not worth the extra cost. Dual Core SB's... seem to help a lot!

Absolution75 · Oct 7, 2011

One thing to note that I've just recently noticed.

Hyper threading is apparently really good for actually compiling code. I canceled my order of a 2500k due to the fact it doesn't have HT. It would actually be a downgrade or sidegrade from my i7 860. Its odd that an older 1156 CPU will kind of destroy a faster clocked newer processor (i7 880 versus 2500k).

http://www.anandtech.com/show/4083/...core-i7-2600k-i5-2500k-core-i3-2100-tested/19

mrjoltcola · Oct 7, 2011

Absolution75 said:
Hyper threading is apparently really good for actually compiling code.

+1

Code compilation hyperthreads well due to the heavy stall rate, a lot of disk & mem must be accessed in a short amount of time, and the data being accessed is highly transient.

pcslookout · Oct 7, 2011

This may seem like a stupid question but who compiles code?

Absolution75 · Oct 7, 2011

pcslookout said:
This may seem like a stupid question but who compiles code?

Me? Developers? Anyone who works in the software industry?

Its a niche market yeah, but its still there.

Munky · Oct 7, 2011

Hyperthreading helps in well-threaded applications like video encoding, photo editing, 3d rendering and such, giving roughly a 25% performance boost versus a non-HT quad core. In games, the benefit is generally minimal. That said, if you plan on keeping the cpu for 2 years or longer, I'd spend the extra money for HT.

aigomorla · Oct 7, 2011

Pish..

the real reasons we all get HT on our cpu's is so when we look at task manager, we see a lot of pretty graph boxes!!!

:\

Seriously op, HT can be a double edged sword.
A HT thread is not as fast as a physical core, but its a place holder.

While work is being done on real cores, the HT cores queues the next work up, and starts on it until the physical cores free up.

Games using HT had issues tho.
HT is slower then a physical core, and priority fetching wasnt optimized, so your virtual cores could do the work your physical cores should be doing.

But im fairly sure intel has fixed this, and on top given Turbo on, to help the virtual cores by overclocking.

But yeah... dont forget the most important reason is to see all those pretty graph boxes!!!

soccerballtux · Oct 7, 2011

you joke but that's definitely what's drawing me to bulldozer. I can't imagine having 12 of them like you do!!!

pcslookout · Oct 7, 2011

Absolution75 said:
Me? Developers? Anyone who works in the software industry?

Its a niche market yeah, but its still there.

Ok good thing I am not a developer or a programmer. Not that I don't appreciate it just I never thought I would be good at such a thing. Prefer the computer hardware.

Hyper-threading

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Banned

Senior member

Golden Member

Diamond Member

Elite Member

Lifer

Senior member

Lifer

Lifer

Lifer

Senior member

Elite Member

Diamond Member

Senior member

Senior member

Lifer

Senior member

Diamond Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Lifer

Lifer