Multicore Hyperthreading and the Task Manager

2March · Jan 25, 2009

Greetings,

I recently upgraded to an I7 which, as you probably know, supports SMT or hyperthreading. The feature can, as always, be switched of in the bios. Works sweet...

But lately I delved into some performance testing results and found that multithreading has a rather peculiar effect on the Windows Task Manager and the way is measures CPU load. Initially I set SMT enabled but I noticed that some applications that only use one core now run on only half a core. Now, don't laugh, I know it is not running on half a core of course but being a noob I had to investigate further because some apps seemed to work slower in this setting then on SMT disabled.

FS9, known for only using a single core, runs way faster on Single Threading then on Hyper. So my initial conclusion was that it did force FS9 into a less favourable situation which it does. With LFS however, I noticed no difference in performance at all. It runs maxed out with no problem. LFS is also strictly single thread. In comes FSX and SMT disabled I noticed it loaded op to a hundred percent. I was quite amazed because I know FSX to use three cores but four? On the Performance manager it shows a recognizable pattern of about 70% of the time 20% load and the rest between 40% and 100% load. SMT enabled the same pattern occurs but now maxing out at 50% CPU load. The graph per core shows the first 4 cores under the same load as with SMT disabled. The other 4 are almost idle. Thing is however, FSX seemes to perform almost equal in both cases. Temperature of the core is also comparable.

So something is funky with the performance monitor. It would seem to me that it is only with Hyper enabled but I'm still not sure what to think about that 100% CPU load under FSX. Either way, when the OS decides when to use the core for a second thread or not, what does 100% full load mean? Sure when temperatures are comparable power consumption is obviously the same, but the processor is running at a lower effeciency.. or maybe not. Who knows?

I'm getting dizzy... How am I going to set affinities of those 8 cores? Should the app get only one core assigned or 2 when it uses only one core. FS9 and LFS obviously don't agree with eachother. Or leave the whole Hyperthreading for what it is?

Diogenes2 · Jan 25, 2009

I found that SMT reduced my Folding@Home performance a bit as well as encoding with Vegas Video .. Not much, but why lose anything at all?
I turned it off .. I would be interested to know if anyone is aware of any apps that improve significantly with SMT enabled ..

Dravic · Jan 25, 2009

"Its like deja vu all over again" - yogi berra

soccerballtux · Jan 25, 2009

Originally posted by: Diogenes2
I found that SMT reduced my Folding@Home performance a bit as well as encoding with Vegas Video .. Not much, but why lose anything at all?
I turned it off .. I would be interested to know if anyone is aware of any apps that improve significantly with SMT enabled ..

Anandtech's review covered this, I'm too lazy to provide a link right now, you can find it yourself I think.

Idontcare · Jan 25, 2009

Thread migration is real, it happens. A single-thread will bounce around from core to core every X milliseconds (exact value I do not know) but the rate is far higher than what task manager samples, so it smooths out the fact that each core is 100% active while the others are not for a few tens of milliseconds and then vice versa...what you see in task manager is that each core will appear to be less than 100% utilized and the aggregate sum of all the core's individual utilization appears to be equal to one full core being 100% utilized.

The question of why this thread migration impacts performance is one of data transfer as well as power-state transition time. When a thread migrates from an active core to an inactive core the inactive core takes a non-zero amount of time to clock-up and sync the data resident in the prior active core.

The amount of data needed to migrate will begin to dictate the magnitude of the performance degradation caused by thread migration.

Here's a thread we had here on thread migration which does a decent job IMO in discussing this:
http://forums.anandtech.com/me...250963&highlight_key=y

I'll quote myself from that thread as I thought the temperature tests and correlation to task manager (a test proposed by Phynaz, which I thought was a great idea) really highlight the fact this is a real phenomenon and not some task manager shenanigans.

Originally posted by: Idontcare

Originally posted by: Phynaz
If you are are looking in task manager, that's the problem. It basically lies about where the utilization is.

Run your favorite DC client on one core of dual core machine. Task manager will show two 50% utilized cores. Now fire up your temperature monitoring tool. See that one hot core and one cool core? That DC thread is staying pinned on one core, or else both cores would be the same temp.

Click to expand...

Out of curiosity I just ran this experiment.

In my case I am testing a quad and I am using a single-threaded program which fully loads a single-core. I disabled C1E and EIST in the BIOS so there is no funny business in the idle temps vs. load temps for unloaded idle cores while one core is fully loaded.

I had coretemp 0.99.3 log the temps in 1s intervals. I imported the temperature data into Excel and then created a delta temperature plot over time.

The baseline temp for each core was taken as the idle temperature as reported by Coretemp. The graph indicates the increase in core temperature over the idle temperature during the single-threaded application fully loading and eventually stopping.

Here's the results for no affinity set in task manager:
http://i272.photobucket.com/al..._bucket/NoAffinity.jpg

We see all cores rise in temperature during the run and the temperature per core actually spikes a lot (lots of jitter in the temperature data). I included a screen capture of task manager during the run as well, you can see it report all 4 cores are equally loaded at 25% utilization per core.

Here's the results when I set the affinity to Core#2 in task manager:
http://i272.photobucket.com/al...bucket/SetAffinity.jpg

We clearly see Core#2 rising in temperature markedly more so than the other three idle cores (confirmed idle by the included task manager screen capture). Core#2 is nearly 9°C warmer when fully loaded versus idle.

It is also interesting that the temperature data is now quite stable point-to-point, none of the jitter we saw in the first test where the thread was not locked. To me this is more evidence that the thread is bouncing core-to-core when affinity is not set versus when affinity is locked to a specific core.

In my opinion the data quite clearly prove that thread migration as reported by task manager is very real and is in fact occurring as confirmed by the temperature results for the given CPU cores.

IntelUser2000 · Jan 25, 2009

Originally posted by: Diogenes2
I found that SMT reduced my Folding@Home performance a bit as well as encoding with Vegas Video .. Not much, but why lose anything at all?
I turned it off .. I would be interested to know if anyone is aware of any apps that improve significantly with SMT enabled ..

http://www.techreport.com/articles.x/15818/12

"The benchmark keeps eight threads active all of the time on the Core i7, which reduces per-thread performance."

There you go, its not really slower.

I think what's happening is the Operating Systems are really bad at managing multi-thread multi-core processors and you are losing performance. Some single thread app running on the Core i7 is actually stressing more threads than we think because the OS sucks at managing threads and are using more resources than needed, slowing the single thread performance down.

Diogenes2 · Jan 25, 2009

Well, it is: slower when my F@H interval for 1% is 11-12 minutes versus 9 - 10.. Or Is this processor so powerfull it is somehow warping time ?

Idontcare · Jan 25, 2009

Originally posted by: Diogenes2
Well, it is: slower when my F@H interval for 1% is 11-12 minutes versus 9 - 10.. Or Is this processor so powerfull it is somehow warping time ?

Yes, lest ye forget the faster things go the slower the time domain thru which they travel...relativity and all that...be worried if that chip gets going anywhere near the speed of light, it'll go back in time and murder the 286 and then we'll all be stuck in this present day paradox where our i7's are architecturally grandfather-less freaks'o nature...so to speak.

GLeeM · Jan 26, 2009

Originally posted by: Diogenes2
Well, it is: slower when my F@H interval for 1% is 11-12 minutes versus 9 - 10.. Or Is this processor so powerfull it is somehow warping time ?

I hope I am not going to far off-topic here, but it has been a day since the last post and I am curious about this.

Is this with one SMP client (which has four threads) and not much else running?

Could the threads be migrating so that each is not running on a physical core, but possibly all four threads are running on two physical (four virtual) cores and thus running slower?

I wonder how two SMP clients would do? I may have to wait a bit until I get an i7

Diogenes2 · Jan 26, 2009

I will probably get around to trying two, and see what happens. I will try with SMT and without.. Might be a few days. Don't have a lot of time to play with it right now.

SickBeast · Jan 27, 2009

Originally posted by: GLeeM

Originally posted by: Diogenes2
Well, it is: slower when my F@H interval for 1% is 11-12 minutes versus 9 - 10.. Or Is this processor so powerfull it is somehow warping time ?

Click to expand...

I hope I am not going to far off-topic here, but it has been a day since the last post and I am curious about this.

Is this with one SMP client (which has four threads) and not much else running?

Could the threads be migrating so that each is not running on a physical core, but possibly all four threads are running on two physical (four virtual) cores and thus running slower?

I wonder how two SMP clients would do? I may have to wait a bit until I get an i7

I'm pretty sure that the F@H SMP client can only exploit 4 cores. That's probably why hyperthreading is of detriment. It's probably confusing the software and using the hyperthread "cores" instead of the real ones at times.

soccerballtux · Jan 27, 2009

Originally posted by: Idontcare
Thread migration is real, it happens. A single-thread will bounce around from core to core every X milliseconds (exact value I do not know) but the rate is far higher than what task manager samples, so it smooths out the fact that each core is 100% active while the others are not for a few tens of milliseconds and then vice versa...what you see in task manager is that each core will appear to be less than 100% utilized and the aggregate sum of all the core's individual utilization appears to be equal to one full core being 100% utilized.

The question of why this thread migration impacts performance is one of data transfer as well as power-state transition time. When a thread migrates from an active core to an inactive core the inactive core takes a non-zero amount of time to clock-up and sync the data resident in the prior active core.

The amount of data needed to migrate will begin to dictate the magnitude of the performance degradation caused by thread migration.

Here's a thread we had here on thread migration which does a decent job IMO in discussing this:
http://forums.anandtech.com/me...250963&highlight_key=y

I'll quote myself from that thread as I thought the temperature tests and correlation to task manager (a test proposed by Phynaz, which I thought was a great idea) really highlight the fact this is a real phenomenon and not some task manager shenanigans.

Originally posted by: Idontcare

Originally posted by: Phynaz
If you are are looking in task manager, that's the problem. It basically lies about where the utilization is.

Run your favorite DC client on one core of dual core machine. Task manager will show two 50% utilized cores. Now fire up your temperature monitoring tool. See that one hot core and one cool core? That DC thread is staying pinned on one core, or else both cores would be the same temp.

Click to expand...

Out of curiosity I just ran this experiment.

In my case I am testing a quad and I am using a single-threaded program which fully loads a single-core. I disabled C1E and EIST in the BIOS so there is no funny business in the idle temps vs. load temps for unloaded idle cores while one core is fully loaded.

I had coretemp 0.99.3 log the temps in 1s intervals. I imported the temperature data into Excel and then created a delta temperature plot over time.

The baseline temp for each core was taken as the idle temperature as reported by Coretemp. The graph indicates the increase in core temperature over the idle temperature during the single-threaded application fully loading and eventually stopping.

Here's the results for no affinity set in task manager:
http://i272.photobucket.com/al..._bucket/NoAffinity.jpg

We see all cores rise in temperature during the run and the temperature per core actually spikes a lot (lots of jitter in the temperature data). I included a screen capture of task manager during the run as well, you can see it report all 4 cores are equally loaded at 25% utilization per core.

Here's the results when I set the affinity to Core#2 in task manager:
http://i272.photobucket.com/al...bucket/SetAffinity.jpg

We clearly see Core#2 rising in temperature markedly more so than the other three idle cores (confirmed idle by the included task manager screen capture). Core#2 is nearly 9°C warmer when fully loaded versus idle.

It is also interesting that the temperature data is now quite stable point-to-point, none of the jitter we saw in the first test where the thread was not locked. To me this is more evidence that the thread is bouncing core-to-core when affinity is not set versus when affinity is locked to a specific core.

In my opinion the data quite clearly prove that thread migration as reported by task manager is very real and is in fact occurring as confirmed by the temperature results for the given CPU cores.

Click to expand...

I've been told it does this to balance the thermal load out across the chip. It's too late to run the maths but I don't think an extra 5C on one core is going to provide enough extra thermal stress (above a constantly fluctuating temperature) to break the connections (to the PCB) under the silicon.

I seem to recall reading that the Linux kernel does much less of this thread bouncing.

IntelUser2000 · Jan 27, 2009

I'm pretty sure that the F@H SMP client can only exploit 4 cores. That's probably why hyperthreading is of detriment. It's probably confusing the software and using the hyperthread "cores" instead of the real ones at times.

It supports more than 4 according to Tech Report review.

Diogenes2 · Jan 27, 2009

A review of F@H ?

Can you point to that review ?

palladium · Jan 27, 2009

Yeh pretty sure F@H only uses 4 threads. HT allows you to run 2 of those SMP clients, but I usually run one and use the other 4 ( logical) cores for gaming, movies, etc. I always assign F@H to CPUs 4-7 ( 2 physical+ 2 virtual). In my case, if I turn off HT, my F@H would only have 2 logical cores to run on ( since I need the other 2 cores), and I'm pretty sure ( haven't done any tests yet) F@H SMP would run faster on 4 logical cores compared to 2, even though the actual physical core count is only 2 in both cases ( Linpack would not, however).

Idontcare · Jan 27, 2009

Originally posted by: soccerballtux
I've been told it does this to balance the thermal load out across the chip. It's too late to run the maths but I don't think an extra 5C on one core is going to provide enough extra thermal stress (above a constantly fluctuating temperature) to break the connections (to the PCB) under the silicon.

I seem to recall reading that the Linux kernel does much less of this thread bouncing.

I can see why that idea would get generated, but it can hardly be the truth. The ramifications of thermal gradients within a CPU are already accommodated during the development cycle.

And if there were any truth to it then MS would have removed the affinity option from task manager and the cpu guys would state that running threads with affinity settings breaks the warranty.

At one point in time someone around here posited that thread migration was implemented intentionally to deal with performance issues of threads on multi-socket hyperthreading systems in which you could conceivably have a situation where two threads are running on a core and its HT hardware while another core in another socket is sitting idle. Since HT was less efficient than running on two cores on two sockets there would be even more of a performance degradation in this environment by not having thread migration versus having it. It is the lesser of two evils.

But the thermal argument...not a chance there is any validity to it.

GLeeM · Jan 27, 2009

Originally posted by: IntelUser2000

I'm pretty sure that the F@H SMP client can only exploit 4 cores. That's probably why hyperthreading is of detriment. It's probably confusing the software and using the hyperthread "cores" instead of the real ones at times.

Click to expand...

It supports more than 4 according to Tech Report review.

Yeah, Oops!

I was assuming Windows.

The Windows SMP beta uses 4 threads, always. No more, no less.

I think Linux/OSX SMP client can use up to 8 (or maybe the number can be configured).

soccerballtux · Jan 28, 2009

Originally posted by: Idontcare

Originally posted by: soccerballtux
I've been told it does this to balance the thermal load out across the chip. It's too late to run the maths but I don't think an extra 5C on one core is going to provide enough extra thermal stress (above a constantly fluctuating temperature) to break the connections (to the PCB) under the silicon.

I seem to recall reading that the Linux kernel does much less of this thread bouncing.

Click to expand...

I can see why that idea would get generated, but it can hardly be the truth. The ramifications of thermal gradients within a CPU are already accommodated during the development cycle.

And if there were any truth to it then MS would have removed the affinity option from task manager and the cpu guys would state that running threads with affinity settings breaks the warranty.

At one point in time someone around here posited that thread migration was implemented intentionally to deal with performance issues of threads on multi-socket hyperthreading systems in which you could conceivably have a situation where two threads are running on a core and its HT hardware while another core in another socket is sitting idle. Since HT was less efficient than running on two cores on two sockets there would be even more of a performance degradation in this environment by not having thread migration versus having it. It is the lesser of two evils.

But the thermal argument...not a chance there is any validity to it.

Well, it was the source (along with ROHS solder) of MS's RROD XBox360 problems...that's on a much larger scale though; I agree.

Idontcare · Jan 28, 2009

Originally posted by: soccerballtux

Originally posted by: Idontcare

Originally posted by: soccerballtux
I've been told it does this to balance the thermal load out across the chip. It's too late to run the maths but I don't think an extra 5C on one core is going to provide enough extra thermal stress (above a constantly fluctuating temperature) to break the connections (to the PCB) under the silicon.

I seem to recall reading that the Linux kernel does much less of this thread bouncing.

Click to expand...

I can see why that idea would get generated, but it can hardly be the truth. The ramifications of thermal gradients within a CPU are already accommodated during the development cycle.

And if there were any truth to it then MS would have removed the affinity option from task manager and the cpu guys would state that running threads with affinity settings breaks the warranty.

At one point in time someone around here posited that thread migration was implemented intentionally to deal with performance issues of threads on multi-socket hyperthreading systems in which you could conceivably have a situation where two threads are running on a core and its HT hardware while another core in another socket is sitting idle. Since HT was less efficient than running on two cores on two sockets there would be even more of a performance degradation in this environment by not having thread migration versus having it. It is the lesser of two evils.

But the thermal argument...not a chance there is any validity to it.

Click to expand...

Well, it was the source (along with ROHS solder) of MS's RROD XBox360 problems...that's on a much larger scale though; I agree.

I am not arguing against the existence of thermal gradients inducing fail-rates...I am arguing against the idea that the cause and effect of the existence of thread migration in OS's is to circumvent thermal gradient issues.

Thermal gradients causing packaging failures is entirely a different subject. MS's issues is the same as Nvidia's. The stuff that happens ex-fab, packaging/shipping/sales/distribution is all done to far less rigorous and exacting standards (and with commensurately lower budgets)...when we are talking packaging fails which represent corner cases where engineering cut corners a little too much and in-field fail-rates end up higher than budgeted then you got yourself a whole other issue.

The same craptastic engineering/management mentality that afflicted NV's and MS's package fail products can cause similar CTE issues with CMOS IC's if the managers/engineers on those projects elected to abandon common sense as well...doesn't mean thread migration was implemented in order to add more margin to thermal gradient induced in-field fail-rates though. Cause and effect are not linked at all. (having been a part of projects in the area of packaging I can go on and on all day about this

)

taltamir · Jan 28, 2009

Originally posted by: Idontcare

Originally posted by: Diogenes2
Well, it is: slower when my F@H interval for 1% is 11-12 minutes versus 9 - 10.. Or Is this processor so powerfull it is somehow warping time ?

Click to expand...

Yes, lest ye forget the faster things go the slower the time domain thru which they travel...relativity and all that...be worried if that chip gets going anywhere near the speed of light, it'll go back in time and murder the 286 and then we'll all be stuck in this present day paradox where our i7's are architecturally grandfather-less freaks'o nature...so to speak.

technically, electricity does travel at the speed of light.

Idontcare · Jan 28, 2009

Originally posted by: taltamir

Originally posted by: Idontcare

Originally posted by: Diogenes2
Well, it is: slower when my F@H interval for 1% is 11-12 minutes versus 9 - 10.. Or Is this processor so powerfull it is somehow warping time ?

Click to expand...

Yes, lest ye forget the faster things go the slower the time domain thru which they travel...relativity and all that...be worried if that chip gets going anywhere near the speed of light, it'll go back in time and murder the 286 and then we'll all be stuck in this present day paradox where our i7's are architecturally grandfather-less freaks'o nature...so to speak.

Click to expand...

technically, electricity does travel at the speed of light.

Come again?

What exactly do you mean by electricity? (are you discussing the transfer of energy by coupling EMF's or direct-current embodiments of a traveling electron?)

And what exactly do you by the speed of light? In a vacuum or in a medium other than a vacuum? You aren't trying to be fancy and invoke Cerenkov radiation by any chance, are you?

Mind you my PhD is in quantum chemistry, heavy emphasis on the quantum, so I fancy discussions on these things but I do not come to the debate lacking typical rudimentary concepts of relativity and the likes. So...young Padawan...enlighten me on how electricity technically travels faster than the speed of light...for I am eager to learn of such new and wondrous things.

Phynaz · Jan 28, 2009

Mind you my PhD is in quantum chemistry

Nerd.

Idontcare · Jan 28, 2009

Originally posted by: Phynaz

Mind you my PhD is in quantum chemistry

Click to expand...

Nerd.

SlyNine · Jan 28, 2009

Well put your glasses back on and reread your very own quote, he never said faster.

Idontcare · Jan 28, 2009

Originally posted by: SlyNine
Well put your glasses back on and reread your very own quote, he never said faster.

You are right that is what he wrote. Still doesn't negate any of my questions though...electricity does not travel at the speed the light unless you are restricting the initial conditions to discussing electrons traveling (i.e. DC mode of electricity) at something equal to or greater than the speed of light in whatever medium the electrons are traveling.

In CPU's, where the medium of conduction is copper and silicon and the charge carriers are electrons and holes operating in DC mode the mean velocity is substantially less than the speed of light.

SPEED OF "ELECTRICITY"

The Speed of Electrons in Copper

Multicore Hyperthreading and the Task Manager

Member

Platinum Member

Senior member

Lifer

Elite Member

Elite Member

Platinum Member

Elite Member

Elite Member

Platinum Member

Lifer

Lifer

Elite Member

Platinum Member

Senior member

Elite Member

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Member

Elite Member