Why is thread migration more energy efficient ?

William Gaatjes · Dec 16, 2017

I was reading about thread migration being more energy efficient.
Is this true, and how is that ?

Any os of the last decade has lots of threads, more than a modern 8 core/ 16 thread cpu can handle.
When i think about it, i assume the scheduler of any kernel would just select a core where the currently running threads (SMT) both cannot continue because of waiting for a semaphore or because the scheduler schedules another thread to be run because of multitasking.

I get the impression that on any given time on a device that is not in sleep, there are so many threads to be run that thread migration happens automatically.
Why would an OS migrate threads for energy efficiency or just do extra shuffling of thread for a reason i do not know ?

TheELF · Dec 16, 2017

Just as with TDP, little bursts of high turbo speeds are easier to maintain then longer ones,I guess and it's just that a guess,that running short bursts on a core is more power efficient especially since you can clock down every other core.

Also almost all active threads that are running on a modern OS are idle and only get executed when they call for it.

Even threads that are running and using up all of the core they run on get migrated at every time slice it makes you loose performance sometimes,if it's a true single(threaded) thread,but it keeps the cores from wearing out needlessly.

William Gaatjes · Dec 16, 2017

It is possible that a short burst at a higher frequency consumes less power but i have problems accepting that.
for the mosfet transistors used in a cpu a general formula can be used for the cpu.
P = C * V² *f.
To reach a higher frequency the voltage must also be increased. But the voltage is squared so the power consumption increases exponentially with the voltage.
Let say that some execution of instructions runs for 1 second at 1GHz at a given cpu core.
at 2GHz those sequence of instructions would be executed in 0.5 seconds for the same cpu core.
At same voltage, same power consumption after averaging.
But in reality to reach 2GHz more voltage is needed thus that no longer is the case.
After averaging and taking into account increased voltage, there would also be a higher power consumption because of the higher voltage needed to run at 2GHz.
But in all honesty i am ignoring here all effects on cache and other supporting logic in the cpu.

When thinking of active threads, i am thinking more of when a given system is doing heavy number crunching, simulation or gaming.
Then all cores are constantly in use. Which thread does what is not deterministic, thus the thread shuffling happens anyway because of scheduling and triggers because of semaphores.
What is here the better energy efficiency because of also doing thread migration on top of all that ?

When a core is idle and running at its lowest clock and having a relaxed time, i find it difficulty to believe that these cores are wearing out. I mean , the clock is lowered, the voltage is lowered.
But when idle threads get active, i can imagine that when the cpu is boosting clocks and voltage and there are free cores available that migrating threads to free cores is a good thing.
It would improve performance.
But i fail to see the energy efficiency.

TheELF · Dec 16, 2017

Heated up core looses efficiency?Isn't that why most overclocks stop at 5Ghz? So keeping a core cool is better for efficiency?

When all cores are fully loaded I guess it wouldn't do any difference and I also don't know if migration still happens at full load.

Think of a system whose user only plays FSX and does nothing else,without wear leveling his first core would work at 100% for hours on end while the rest of the cores would idle.

William Gaatjes · Dec 16, 2017

TheELF said:
Heated up core looses efficiency?Isn't that why most overclocks stop at 5Ghz? So keeping a core cool is better for efficiency?

When all cores are fully loaded I guess it wouldn't do any difference and I also don't know if migration still happens at full load.

Think of a system whose user only plays FSX and does nothing else,without wear leveling his first core would work at 100% for hours on end while the rest of the cores would idle.

I have no idea to be honest.
I can think of several subjects.
Electrical current even in a 14nm chip needs some time to travel but i doubt that is the case.

I do not know if this has anything to do with it, but the principle of a mosfets is that the Rdson increases when the temperature rises.
Rdson is the resistance measured between the source and the drain of a mosfet, the outputs of a mosfet so to say. The mosfet has an input called gate. Current flows from drain to source in the normal view that is teached.
Mosfets have a positive temperature coefficient.
Perhaps something to do with either the threshold voltage increasing or that the electrons cannot move as easy through the channel because of thermal vibrations, i am not sure.
This all means less drive current for the next mosfet stage because the current the previous transistor conducts decreases with temperature increase. Thus with a lower temperature the current increases.
More current means faster charging and discharging of the gate capacity of following mosfets and thus increases the frequency that the combination of mosfets can switch on and off.
This is i think a very simplified view.

beginner99 · Dec 17, 2017

William Gaatjes said:
or just do extra shuffling of thread for a reason i do not know ?

To distribute the work evenly on all cores so that 1 core isn't 100°C and the other at ambient temp in case, say 30°C.

And lower temp of a core means lower voltage means lower power consumption.

William Gaatjes · Dec 17, 2017

beginner99 said:
To distribute the work evenly on all cores so that 1 core isn't 100°C and the other at ambient temp in case, say 30°C.

And lower temp of a core means lower voltage means lower power consumption.

This i can understand in an environment where there are free cores available or when 1 core is running at max clock and the others at a much lower set clock frequency.
But this is not needed when for example during gaming right ?

CatMerc · Dec 17, 2017

AFAIK, it's not done for energy efficiency reasons. It's just done to spread the heat across all cores evenly.

That said, higher temperature transistors are more leaky, so it MIGHT save some energy, but I dunno how much that saves vs the energy used in moving all the data in the caches.

wahdangun · Dec 17, 2017

You are missing the context, why the os need to shuffle the thread is because the paradigm of race to idle, but it will crank up the speed to maximum, and since cool core can reach the maximum easily, so os maker decide best practices was to shuffle the thread so cool core always available, and since maximum performance can be reached then the core can go idle faster or forced to idle when they reach certain temperature.

beginner99 · Dec 17, 2017

William Gaatjes said:
But this is not needed when for example during gaming right ?

Most games still have 1 main thread that has most of the load. So yes it also makes sense there. When it doesn't make sense is when running something like Prime95 or some similar benchmark/calculating tool that easily takes up 100% on all cores.

IntelUser2000 · Dec 17, 2017

Where did you read it? If you want to know the actual reason it would be good to see the explanation from the source why it is so.

why the os need to shuffle the thread is because the paradigm of race to idle,

The idea of race to idle is a bit outdated now. CPUs take quite a bit of time to transition between different clock speeds, and even slower to switch between different C and P states.

That means to really save power it has to consider whether its worth doing the transition versus just keeping it at the current state.

William Gaatjes · Dec 20, 2017

IntelUser2000 said:
Where did you read it? If you want to know the actual reason it would be good to see the explanation from the source why it is so.

The idea of race to idle is a bit outdated now. CPUs take quite a bit of time to transition between different clock speeds, and even slower to switch between different C and P states.

That means to really save power it has to consider whether its worth doing the transition versus just keeping it at the current state.

I was searching about thread migration in general what the benfits are because i am interested.
And i did not find much except that google search came up with that thread migration and energy efficiency.

I have not read the complete papers, that is why i came here to ask the question.
https://www.computer.org/csdl/trans/tc/2014/02/ttc2014020349-abs.html
After removing the google search text, this is the pdf:
http://cs.txstate.edu/~aq10/papers/alvarado_techcon14.pdf

William Gaatjes · Dec 20, 2017

wahdangun said:
You are missing the context, why the os need to shuffle the thread is because the paradigm of race to idle, but it will crank up the speed to maximum, and since cool core can reach the maximum easily, so os maker decide best practices was to shuffle the thread so cool core always available, and since maximum performance can be reached then the core can go idle faster or forced to idle when they reach certain temperature.

This i understand, but normally when an os is running with programs running and all kinds of services and interrupts being serviced, all running in the background, all cores are in use somewhat.
I am forgetting how incredibly fast modern x86 cores are.
When looking at the taskmanager resource monitor, i see that from the 4 (piledriver) cores that i have, cores 3 and 4 are parked most of the time and even at this moment of typing.

beginner99 said:
Most games still have 1 main thread that has most of the load. So yes it also makes sense there. When it doesn't make sense is when running something like Prime95 or some similar benchmark/calculating tool that easily takes up 100% on all cores.

I see. Is windows 10 game mode not being about preventing thread migration and core parking during more modern games that use more threads ?
What are the benefits of windows 10 game mode besides ( i think) making the virus scanner less active ?

wahdangun · Dec 20, 2017

IntelUser2000 said:
Where did you read it? If you want to know the actual reason it would be good to see the explanation from the source why it is so.

The idea of race to idle is a bit outdated now. CPUs take quite a bit of time to transition between different clock speeds, and even slower to switch between different C and P states.

That means to really save power it has to consider whether its worth doing the transition versus just keeping it at the current state.

If race to iddle is outdated then why Intel and now and trying to improve their burst state ? I know that migrating thread between core is costly, but I think today CPU with faster and large cache can hide the impact of it.

William Gaatjes said:
This i understand, but normally when an os is running with programs running and all kinds of services and interrupts being serviced, all running in the background, all cores are in use somewhat.
I am forgetting how incredibly fast modern x86 cores are.
When looking at the taskmanager resource monitor, i see that from the 4 (piledriver) cores that i have, cores 3 and 4 are parked most of the time and even at this moment of typing.
?

You must remember that core parking and thread migration is different between CPU manufacturer and architecture it can even be harmfull(like amd construction core).

William Gaatjes · Dec 20, 2017

wahdangun said:
You must remember that core parking and thread migration is different between CPU manufacturer and architecture it can even be harmfull(like amd construction core).

Interesting, you have any detailed information about that ?

edit:
It seems this thread migration was/is also an issue going from module to module for buldozer architecture.
And since the richland apu that i have also consists of 2 modules. That seems to be the case.
With respect to core parking, i have not read about that.
I do am reading about it in the technical ryzen thread.

IntelUser2000 · Dec 21, 2017

wahdangun said:
If race to iddle is outdated then why Intel and now and trying to improve their burst state ?

What you are saying is true, but is more complicated than that. If you are talking about energy efficiency in terms of battery life, its much different from peak power, and TDP ratings. HUGI concept is still valid, it needs a revision and call it HUGI 2.0.

If everything is truly instantaneous, the concept of HUGI would always win in favor of the highest performance to drop down to idle as fast as possible. But that's not the case. States take quite a bit of time to transition. It may use more power to rapidly switch rather than staying at certain frequency.

wahdangun · Dec 22, 2017

William Gaatjes said:
Interesting, you have any detailed information about that ?

edit:
It seems this thread migration was/is also an issue going from module to module for buldozer architecture.
And since the richland apu that i have also consists of 2 modules. That seems to be the case.
With respect to core parking, i have not read about that.
I do am reading about it in the technical ryzen thread.

The information is right there when it's launch, same with Zen the cost of thread migration between ccx can quite high and sometimes hinder it's performance, that's is why OS need driver to extract it's maximum performance.

wahdangun · Dec 22, 2017

IntelUser2000 said:
What you are saying is true, but is more complicated than that. If you are talking about energy efficiency in terms of battery life, its much different from peak power, and TDP ratings. HUGI concept is still valid, it needs a revision and call it HUGI 2.0.

If everything is truly instantaneous, the concept of HUGI would always win in favor of the highest performance to drop down to idle as fast as possible. But that's not the case. States take quite a bit of time to transition. It may use more power to rapidly switch rather than staying at certain frequency.

Yeah just like I said to function perfectly it's need fast cache, and alot of it to hide or minimize the cost, it's why CPU now day's designed like that.

Hail The Brain Slug · Dec 24, 2017

I am kind of curious about the situation where you have an enthusiast configured setup to run all cores at maximum clockspeeds with adequate cooling, meaning there appears to be no benefit to using a cool core.

If there is a penalty to thread migration, is it apparent in this situation? Would performance increase measurably, albeit a slight amount, if migration were disabled?

Why is there no method by which to disable or halt migration in an os like Windows 10?

William Gaatjes · Dec 24, 2017

XabanakFanatik said:
I am kind of curious about the situation where you have an enthusiast configured setup to run all cores at maximum clockspeeds with adequate cooling, meaning there appears to be no benefit to using a cool core.

If there is a penalty to thread migration, is it apparent in this situation? Would performance increase measurably, albeit a slight amount, if migration were disabled?

Why is there no method by which to disable or halt migration in an os like Windows 10?

That is what i am trying to find out as well.
What does the game mode actually do in windows 10 1709 ?
Is it just suspending some services or is it also changing the way the cpu cores are used and threads are .

I read on forums that it seems that parking cores costs performance because of the long startup time that a core has to come up from sleep state C6.
I wonder if windows 10 1709 changes sleep states for the cores when needed.
Is there some learning algorithm that tracks how often cores are in use to determine the proper cleep states ?
Is the os aware that a game is being played ?
These are all question that i am interested in but it is difficult to find anything about it.

moinmoin · Dec 24, 2017

XabanakFanatik said:
Why is there no method by which to disable or halt migration in an os like Windows 10?

Being curious about this myself I watch this thread since its creation but there has been no good answers to why thread migration is ever a good thing.

To me it seems to be purely a Windows based myth that built up around Microsoft disability to move the scheduler logic out of ancient kernel core logic into more easily configurable modules. Windows systems force threads to migrate at a regular interval, the knowledge of the CPU topology (which would help avoid the negative impact of migrations) lies on another separate system level, consumer builds of Windows migrate more often than server builds. Microsoft's official communication is that it's the consumer software that needs to set affinity themselves to respect different CPU topologies and avoid negative impacts, acquitting themselves of handling that on OS level.

Under Linux the process scheduler is a flexible configurable module and for ages by default follows the concept of soft affinity, that is threads are kept at their place until lack of resources forces the moving of some of them. (Looking at pre-millenium documentation both NT and Linux supported both soft/natural and hard affinity back then already, but the NT line appears to have lost its interfacing with CPU topology details on the way, respectively the complexity of CPU topologies outgrew what the scheduler was prepared for.)

TheELF · Dec 24, 2017

Thread migration is not only about "cool core" and power efficiency,that's just the threads topic.
Migration also allows task manager to group different threads together at each cycle,that way it guarantees you the highest used instructions each cycle.
For a game task manager would try to place the main thread on the fastes core and if possible alone at each cycle or if not possible it would try to find thread(s) that would interfere the least with it's execution.

wahdangun · Dec 24, 2017

XabanakFanatik said:
I am kind of curious about the situation where you have an enthusiast configured setup to run all cores at maximum clockspeeds with adequate cooling, meaning there appears to be no benefit to using a cool core.

If there is a penalty to thread migration, is it apparent in this situation? Would performance increase measurably, albeit a slight amount, if migration were disabled?

Why is there no method by which to disable or halt migration in an os like Windows 10?

I guess if the CPU have high frequency celling then the performance in single thread is better or same, but if the CPU can't sustain it then it will have lower performance.

Hail The Brain Slug · Dec 24, 2017

TheELF said:
Thread migration is not only about "cool core" and power efficiency,that's just the threads topic.
Migration also allows task manager to group different threads together at each cycle,that way it guarantees you the highest used instructions each cycle.
For a game task manager would try to place the main thread on the fastes core and if possible alone at each cycle or if not possible it would try to find thread(s) that would interfere the least with it's execution.

Sure, I see your point. I still dont see why running a heavy singlethreaded load on a system with all cores running at the same frequency and no other meaningful load besides windows idle means the load constantly round robin jumps between the cores.

I fail to see any reason for this behavior in this scenario. It seems like it would degrade performance constantly moving the workload.

William Gaatjes · Dec 24, 2017

XabanakFanatik said:
Sure, I see your point. I still dont see why running a heavy singlethreaded load on a system with all cores running at the same frequency and no other meaningful load besides windows idle means the load constantly round robin jumps between the cores.

I fail to see any reason for this behavior in this scenario. It seems like it would degrade performance constantly moving the workload.

it may have to do something with this :

I found this interesting webpage :

https://www.gamersnexus.net/news-pc/2870-ryzen-power-plan-update-min-frequency-90-pct

Here they mention the difference between windows 7 and windows 10.
I read about this before, but collecting it all in one thread is handy for fast lookup.

“Win7 keeps all physical cores awake, and parks SMT cores. Win10 keeps one physical and one logical core away (Core0+1), then parks the rest as often as possible. This change alone is what’s responsible for the cases where Win7 was faster than Win10 gaming performance, not the scheduler as the community thought.”

This is strange behavior at first sight for windows 10 in comparison except from a energy efficiency perspective.
I can image that if a thread is run on a logical core, that windows 10 would migrate the thread to a pysical core instead when utilization of the cores jump to max.
I can imagine that the kernel has performance counters to track how much utilization there is, i mean the taks manager shows it so it makes sense the kernel uses it as well.
So for energy efficiency keep threads on physical core and (SMT) logical core and keep all other physical and accompanying logical cores parked and move from the used logical core(because a physical core and logical smt core share the same hardware) to a possible free but parked physical core when core utilization is getting maxed out : Thread migration.
Just an idea, i do not know if this is really the case.

Why is thread migration more energy efficient ?

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Golden Member

Golden Member

Diamond Member

Elite Member

Lifer

Lifer

Golden Member

Lifer

Elite Member

Golden Member

Golden Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer