Discussion Intel current and future Lakes & Rapids thread

Page 486 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Thala

Golden Member
Nov 12, 2014
1,355
653
136
This "magical" scheduler talk needs to stop. The scheduler does not do "heavy lifting", it has only one job, which is to minimize performance loss. In the case of benchmarks such as CB, GB, Passmark - the scheduler's job is extremely simple: puts the ST test on a big core, puts the MT test on all cores. It doesn't get more basic than that.

Right - i wonder how often this has to be repeated. In your examples above (e.g. synthetic benchmarks) even a non hybrid-aware scheduler would find the optimum schedule, because the solution is so trivial - as you perfectly describe. Even sophisticated performance counters will not give any insights to change the schedule in these cases, because the schedule is already optimal.
The more interesting cases for a hybrid scheduler are the non-trivial cases, where the system is only partially loaded with complex dependencies between threads.
 
Last edited:

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.

"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half :)
 

diediealldie

Member
May 9, 2020
77
68
61
And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.

"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half :)

At least everyone knows answer exists. In ARM ecosystem and that's something MS and laptop vendors really really want to learn. Since they know that the problem is solvable, I think it'll work anyway with some 'frictions'. Of course, there'll be x86 domain-specific this and that but don't we all? Without big and little success(whatever they call this), Intel will be experiencing gloomy 2022~2023 thanks to Zen 4 + 5nm TSMC manufacturing.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
At least everyone knows answer exists. In ARM ecosystem and that's something MS and laptop vendors really really want to learn.

Serious? If anything ARM vendors are very well known for benchmark cheating and ridiculous heuristics to try to beat the system. Need recent example of BS?


Apple's solution is user experience focused 100%, not about winning all benchmarks. Will their rumoured workstation class ARM cpus even have hybrid cores? Remains to be seen frankly. I expect them to have them for not optimization/power, but rather same OS base throwing some background threads on those cores and forgetting them.

Hybrid is not solved and is in fact hard to solve, because it involves scheduling that is a class of computing problems known to be very hard and impossible to optimally solve without knowing task specifics ahead of time.
 
  • Like
Reactions: Tlh97 and lobz

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
Whatever the solution is, it will certainly involve cycling threads between CPUs, such that any threads which are holding back dependents will move to big cores. That will incur a penalty, but the efficiency gain with small cores, and the resulting TDP headroom afforded to the big cores, is likely worth it.
 
  • Like
Reactions: Tlh97 and Hulk

eek2121

Platinum Member
Aug 2, 2005
2,904
3,903
136
Serious? If anything ARM vendors are very well known for benchmark cheating and ridiculous heuristics to try to beat the system. Need recent example of BS?


Apple's solution is user experience focused 100%, not about winning all benchmarks. Will their rumoured workstation class ARM cpus even have hybrid cores? Remains to be seen frankly. I expect them to have them for not optimization/power, but rather same OS base throwing some background threads on those cores and forgetting them.

Hybrid is not solved and is in fact hard to solve, because it involves scheduling that is a class of computing problems known to be very hard and impossible to optimally solve without knowing task specifics ahead of time.

I imagine that adding one "small" core and making the OS (Windows) aware of it, combined with having Microsoft's scheduler have the intelligence to dump all the low priority background threads ON that small core would help performance of the big cores by not having a bunch of background threads periodically waking up and doing things to spike CPU usage. Similarly, we need something in the GPU space, which is why I'm excited to see AMD finally integrating a GPU onto their chips.

On my laptop, you know what the biggest culprit of slowdowns/spikes is? Windows Defender. Shove that sucker on a CPU core with the update service and everything else, have it use the onboard graphics to help scan the machine, and suddenly all the performance jankiness will go away.

Windows has to be smart about scheduling, however. Should a browser be on a big core or small core? A game? Microsoft Teams? I argue that a browser or Microsoft Teams should not be on a big core. Neither are particularly taxing to a CPU (unless you have a game running in a browser ala WebGL, or you are doing some heavy lifting JavaScript).

Hopefully Microsoft will give us more tools to manage these workloads if they have no desire to be "smart" about them.
 

NTMBK

Lifer
Nov 14, 2011
10,208
4,940
136
I imagine that adding one "small" core and making the OS (Windows) aware of it, combined with having Microsoft's scheduler have the intelligence to dump all the low priority background threads ON that small core would help performance of the big cores by not having a bunch of background threads periodically waking up and doing things to spike CPU usage. Similarly, we need something in the GPU space, which is why I'm excited to see AMD finally integrating a GPU onto their chips.

On my laptop, you know what the biggest culprit of slowdowns/spikes is? Windows Defender. Shove that sucker on a CPU core with the update service and everything else, have it use the onboard graphics to help scan the machine, and suddenly all the performance jankiness will go away.

Windows has to be smart about scheduling, however. Should a browser be on a big core or small core? A game? Microsoft Teams? I argue that a browser or Microsoft Teams should not be on a big core. Neither are particularly taxing to a CPU (unless you have a game running in a browser ala WebGL, or you are doing some heavy lifting JavaScript).

Hopefully Microsoft will give us more tools to manage these workloads if they have no desire to be "smart" about them.

A Web browser is one of the most latency sensitive workloads out there. You definitely want it on a big core.
 
  • Like
Reactions: Tlh97 and scineram

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
Who dares wins, great idea let’s rename 10nm to Intel 7........................ :grinning:

 

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
Who dares wins, great idea let’s rename 10nm to Intel 7........................ :grinning:

This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.

I'd argue they're skipping far more than the should with "4nm" and especially 3nm. Their 3nm will be a full node behind TSMC's in area.
 

mikk

Diamond Member
May 15, 2012
4,111
2,104
136
10 to 15% performance per watt improvement for 10ESF over 10nm SuperFin is actually a big improvement. How big was the perf/w improvement from 10+ to 10SF, wasn't it similar? The new node naming is more in line with TSMC now.
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
I'd argue they're skipping far more than the should with "4nm" and especially 3nm. Their 3nm will be a full node behind TSMC's in area.

Intel 7nm (4nm now? ) should be somewhere between TSMC 5nm and 3nm (density-wise, performance unknown), and 3nm is touted to be 18% performance improvement along with higher density. So while TSMC 3nm will probably remain superior, Intel 3nm(?) it isn't going to be a full node behind.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Intel 7nm (4nm now? ) should be somewhere between TSMC 5nm and 3nm (density-wise, performance unknown), and 3nm is touted to be 18% performance improvement along with higher density. So while TSMC 3nm will probably remain superior, Intel 3nm(?) it isn't going to be a full node behind.

I think the other way around is more likely. Comparable performance (HP logic only), node behind in density.
 

Det0x

Golden Member
Sep 11, 2014
1,027
2,953
136
Who dares wins, great idea let’s rename 10nm to Intel 7........................ :grinning:

Just to make it easy for those who dont want to open link:
1627336085611.png

1627332506508.png
 
Last edited:

mikk

Diamond Member
May 15, 2012
4,111
2,104
136
RibbonFET and PowerVia sounds like it's a breakthrough, it would be insane if it's really ready in 2024.
 
  • Like
Reactions: clemsyn