• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 486 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,203
1,102
136
And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.

"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half :)
 

diediealldie

Member
May 9, 2020
27
35
51
And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.

"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half :)
At least everyone knows answer exists. In ARM ecosystem and that's something MS and laptop vendors really really want to learn. Since they know that the problem is solvable, I think it'll work anyway with some 'frictions'. Of course, there'll be x86 domain-specific this and that but don't we all? Without big and little success(whatever they call this), Intel will be experiencing gloomy 2022~2023 thanks to Zen 4 + 5nm TSMC manufacturing.
 

JoeRambo

Golden Member
Jun 13, 2013
1,203
1,102
136
At least everyone knows answer exists. In ARM ecosystem and that's something MS and laptop vendors really really want to learn.
Serious? If anything ARM vendors are very well known for benchmark cheating and ridiculous heuristics to try to beat the system. Need recent example of BS?


Apple's solution is user experience focused 100%, not about winning all benchmarks. Will their rumoured workstation class ARM cpus even have hybrid cores? Remains to be seen frankly. I expect them to have them for not optimization/power, but rather same OS base throwing some background threads on those cores and forgetting them.

Hybrid is not solved and is in fact hard to solve, because it involves scheduling that is a class of computing problems known to be very hard and impossible to optimally solve without knowing task specifics ahead of time.
 
  • Like
Reactions: Tlh97 and lobz

firewolfsm

Golden Member
Oct 16, 2005
1,846
23
81
Whatever the solution is, it will certainly involve cycling threads between CPUs, such that any threads which are holding back dependents will move to big cores. That will incur a penalty, but the efficiency gain with small cores, and the resulting TDP headroom afforded to the big cores, is likely worth it.
 
  • Like
Reactions: Tlh97 and Hulk

eek2121

Golden Member
Aug 2, 2005
1,212
1,267
136
Serious? If anything ARM vendors are very well known for benchmark cheating and ridiculous heuristics to try to beat the system. Need recent example of BS?


Apple's solution is user experience focused 100%, not about winning all benchmarks. Will their rumoured workstation class ARM cpus even have hybrid cores? Remains to be seen frankly. I expect them to have them for not optimization/power, but rather same OS base throwing some background threads on those cores and forgetting them.

Hybrid is not solved and is in fact hard to solve, because it involves scheduling that is a class of computing problems known to be very hard and impossible to optimally solve without knowing task specifics ahead of time.
I imagine that adding one "small" core and making the OS (Windows) aware of it, combined with having Microsoft's scheduler have the intelligence to dump all the low priority background threads ON that small core would help performance of the big cores by not having a bunch of background threads periodically waking up and doing things to spike CPU usage. Similarly, we need something in the GPU space, which is why I'm excited to see AMD finally integrating a GPU onto their chips.

On my laptop, you know what the biggest culprit of slowdowns/spikes is? Windows Defender. Shove that sucker on a CPU core with the update service and everything else, have it use the onboard graphics to help scan the machine, and suddenly all the performance jankiness will go away.

Windows has to be smart about scheduling, however. Should a browser be on a big core or small core? A game? Microsoft Teams? I argue that a browser or Microsoft Teams should not be on a big core. Neither are particularly taxing to a CPU (unless you have a game running in a browser ala WebGL, or you are doing some heavy lifting JavaScript).

Hopefully Microsoft will give us more tools to manage these workloads if they have no desire to be "smart" about them.
 

NTMBK

Diamond Member
Nov 14, 2011
9,353
2,807
136
I imagine that adding one "small" core and making the OS (Windows) aware of it, combined with having Microsoft's scheduler have the intelligence to dump all the low priority background threads ON that small core would help performance of the big cores by not having a bunch of background threads periodically waking up and doing things to spike CPU usage. Similarly, we need something in the GPU space, which is why I'm excited to see AMD finally integrating a GPU onto their chips.

On my laptop, you know what the biggest culprit of slowdowns/spikes is? Windows Defender. Shove that sucker on a CPU core with the update service and everything else, have it use the onboard graphics to help scan the machine, and suddenly all the performance jankiness will go away.

Windows has to be smart about scheduling, however. Should a browser be on a big core or small core? A game? Microsoft Teams? I argue that a browser or Microsoft Teams should not be on a big core. Neither are particularly taxing to a CPU (unless you have a game running in a browser ala WebGL, or you are doing some heavy lifting JavaScript).

Hopefully Microsoft will give us more tools to manage these workloads if they have no desire to be "smart" about them.
A Web browser is one of the most latency sensitive workloads out there. You definitely want it on a big core.
 
  • Like
Reactions: Tlh97 and scineram

Asterox

Senior member
May 15, 2012
597
915
136
Who dares wins, great idea let’s rename 10nm to Intel 7........................ :grinning:

 

Dayman1225

Golden Member
Aug 14, 2017
1,035
670
146
Who dares wins, great idea let’s rename 10nm to Intel 7........................ :grinning:

This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.
 

Exist50

Senior member
Aug 18, 2016
317
339
136
This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.
I'd argue they're skipping far more than the should with "4nm" and especially 3nm. Their 3nm will be a full node behind TSMC's in area.
 

mikk

Diamond Member
May 15, 2012
3,211
1,029
136
10 to 15% performance per watt improvement for 10ESF over 10nm SuperFin is actually a big improvement. How big was the perf/w improvement from 10+ to 10SF, wasn't it similar? The new node naming is more in line with TSMC now.
 

JasonLD

Senior member
Aug 22, 2017
351
316
136
I'd argue they're skipping far more than the should with "4nm" and especially 3nm. Their 3nm will be a full node behind TSMC's in area.
Intel 7nm (4nm now? ) should be somewhere between TSMC 5nm and 3nm (density-wise, performance unknown), and 3nm is touted to be 18% performance improvement along with higher density. So while TSMC 3nm will probably remain superior, Intel 3nm(?) it isn't going to be a full node behind.
 

Exist50

Senior member
Aug 18, 2016
317
339
136
Intel 7nm (4nm now? ) should be somewhere between TSMC 5nm and 3nm (density-wise, performance unknown), and 3nm is touted to be 18% performance improvement along with higher density. So while TSMC 3nm will probably remain superior, Intel 3nm(?) it isn't going to be a full node behind.
I think the other way around is more likely. Comparable performance (HP logic only), node behind in density.
 

Det0x

Senior member
Sep 11, 2014
531
701
136
Who dares wins, great idea let’s rename 10nm to Intel 7........................ :grinning:

Just to make it easy for those who dont want to open link:
1627336085611.png

1627332506508.png
 
Last edited:

mikk

Diamond Member
May 15, 2012
3,211
1,029
136
RibbonFET and PowerVia sounds like it's a breakthrough, it would be insane if it's really ready in 2024.
 
  • Like
Reactions: clemsyn

moinmoin

Platinum Member
Jun 1, 2017
2,671
3,475
136
I for one am sad that this means we won't see a 10nm SMESF (super mega enhanced super fin) anymore.
This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.
Is the industry moving to Ångström at 2nm? Kind of oddly inconsequential to remove the "nm" for 7, 4 and 3 but add it back for 2nm, I mean 20A.
 

Det0x

Senior member
Sep 11, 2014
531
701
136
Intel is claiming a clear path to process performance leadership by 2025, but i don't see how that can be realistic
1627338164377.png

I mean they are behind both TSMC and Samsung in the race to GAA, and that is best case, if they can stick to their own roadmap this time.
1627338295317.png
They are 2 years behind TSMC in regards to die to die stacking (hybrid bond in zen3 vcashe)

Density numbers:
1627338604652.png
IBM 2nm = Intel 20A -> (speculation)

TSMC 3nm = 292
Intel 20A = 333
TSMC "2nm" = ??? (released before intel 20A)
Screenshots taken from: Intel's Process Roadmap to 2025: with 4nm, 3nm, 20A and 18A?!
 
Last edited:

ASK THE COMMUNITY

TRENDING THREADS