Discussion Intel current and future Lakes & Rapids thread

Thala · Jul 26, 2021

coercitiv said:
This "magical" scheduler talk needs to stop. The scheduler does not do "heavy lifting", it has only one job, which is to minimize performance loss. In the case of benchmarks such as CB, GB, Passmark - the scheduler's job is extremely simple: puts the ST test on a big core, puts the MT test on all cores. It doesn't get more basic than that.

Right - i wonder how often this has to be repeated. In your examples above (e.g. synthetic benchmarks) even a non hybrid-aware scheduler would find the optimum schedule, because the solution is so trivial - as you perfectly describe. Even sophisticated performance counters will not give any insights to change the schedule in these cases, because the schedule is already optimal.
The more interesting cases for a hybrid scheduler are the non-trivial cases, where the system is only partially loaded with complex dependencies between threads.

JoeRambo · Jul 26, 2021

And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.

"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half

semiman · Jul 26, 2021

JoeRambo said:
And even if we ignore the "performance" domain completely, there are plenty of other things for scheduler to consider - like power policies, CPU parking decisions and so on. And all of them feed back into performance as well.
What If you have limited power budget and 1B+4S cores, 4 threads of workload, what will result in best MT performance if power is limited and 1Big is half as efficient, but 25% faster than 1Small?
What if there are 4B+8S and 5?8? threads of workload and still limited power?
What if load dynamically changes and transitions from what is optimal for running on big cores into loosing MT potential performance due to power ceiling.
Even obviuos stuff, like scheduling decisions need to obviously run on actual CPU and burn cycles and evict good guys from CPU caches. Too complex and you start loosing performance.

"Hardware" cannot solve any of those, other than help transitions between clocks, or core wake up faster or use clever tricks like Speed Shift that take decisions from software. But it can give hints from hardware about the state of hybrid system load and power characteristics, that can then guide scheduler and power subsystem of OS to make hopefully better decisions. Or move your critical rendering thread to Atom core due to the bug and cut your FPS in half

At least everyone knows answer exists. In ARM ecosystem and that's something MS and laptop vendors really really want to learn. Since they know that the problem is solvable, I think it'll work anyway with some 'frictions'. Of course, there'll be x86 domain-specific this and that but don't we all? Without big and little success(whatever they call this), Intel will be experiencing gloomy 2022~2023 thanks to Zen 4 + 5nm TSMC manufacturing.

JoeRambo · Jul 26, 2021

diediealldie said:
At least everyone knows answer exists. In ARM ecosystem and that's something MS and laptop vendors really really want to learn.

Serious? If anything ARM vendors are very well known for benchmark cheating and ridiculous heuristics to try to beat the system. Need recent example of BS?

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Apple's solution is user experience focused 100%, not about winning all benchmarks. Will their rumoured workstation class ARM cpus even have hybrid cores? Remains to be seen frankly. I expect them to have them for not optimization/power, but rather same OS base throwing some background threads on those cores and forgetting them.

Hybrid is not solved and is in fact hard to solve, because it involves scheduling that is a class of computing problems known to be very hard and impossible to optimally solve without knowing task specifics ahead of time.

firewolfsm · Jul 26, 2021

Whatever the solution is, it will certainly involve cycling threads between CPUs, such that any threads which are holding back dependents will move to big cores. That will incur a penalty, but the efficiency gain with small cores, and the resulting TDP headroom afforded to the big cores, is likely worth it.

eek2121 · Jul 26, 2021

JoeRambo said:
Serious? If anything ARM vendors are very well known for benchmark cheating and ridiculous heuristics to try to beat the system. Need recent example of BS?

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Apple's solution is user experience focused 100%, not about winning all benchmarks. Will their rumoured workstation class ARM cpus even have hybrid cores? Remains to be seen frankly. I expect them to have them for not optimization/power, but rather same OS base throwing some background threads on those cores and forgetting them.

Hybrid is not solved and is in fact hard to solve, because it involves scheduling that is a class of computing problems known to be very hard and impossible to optimally solve without knowing task specifics ahead of time.

I imagine that adding one "small" core and making the OS (Windows) aware of it, combined with having Microsoft's scheduler have the intelligence to dump all the low priority background threads ON that small core would help performance of the big cores by not having a bunch of background threads periodically waking up and doing things to spike CPU usage. Similarly, we need something in the GPU space, which is why I'm excited to see AMD finally integrating a GPU onto their chips.

On my laptop, you know what the biggest culprit of slowdowns/spikes is? Windows Defender. Shove that sucker on a CPU core with the update service and everything else, have it use the onboard graphics to help scan the machine, and suddenly all the performance jankiness will go away.

Windows has to be smart about scheduling, however. Should a browser be on a big core or small core? A game? Microsoft Teams? I argue that a browser or Microsoft Teams should not be on a big core. Neither are particularly taxing to a CPU (unless you have a game running in a browser ala WebGL, or you are doing some heavy lifting JavaScript).

Hopefully Microsoft will give us more tools to manage these workloads if they have no desire to be "smart" about them.

NTMBK · Jul 26, 2021

eek2121 said:
I imagine that adding one "small" core and making the OS (Windows) aware of it, combined with having Microsoft's scheduler have the intelligence to dump all the low priority background threads ON that small core would help performance of the big cores by not having a bunch of background threads periodically waking up and doing things to spike CPU usage. Similarly, we need something in the GPU space, which is why I'm excited to see AMD finally integrating a GPU onto their chips.

On my laptop, you know what the biggest culprit of slowdowns/spikes is? Windows Defender. Shove that sucker on a CPU core with the update service and everything else, have it use the onboard graphics to help scan the machine, and suddenly all the performance jankiness will go away.

Windows has to be smart about scheduling, however. Should a browser be on a big core or small core? A game? Microsoft Teams? I argue that a browser or Microsoft Teams should not be on a big core. Neither are particularly taxing to a CPU (unless you have a game running in a browser ala WebGL, or you are doing some heavy lifting JavaScript).

Hopefully Microsoft will give us more tools to manage these workloads if they have no desire to be "smart" about them.

A Web browser is one of the most latency sensitive workloads out there. You definitely want it on a big core.

eek2121 · Jul 26, 2021

NTMBK said:
A Web browser is one of the most latency sensitive workloads out there. You definitely want it on a big core.

I disagree. You can browse the web just fine on a Raspberry Pi at this point.

NTMBK · Jul 26, 2021

eek2121 said:
I disagree. You can browse the web just fine on a Raspberry Pi at this point.

If you're happy with a lagfest, sure.

Asterox · Jul 26, 2021

Who dares wins, great idea let’s rename 10nm to Intel 7........................

Intel introduces its new node naming, Enhanced Superfin is now Intel 7 - VideoCardz.com

Intel 7, 4, 3, and 20A nodes Intel decided to make adjustments to its node naming. The manufacturer has changed the naming for two of its already announced nodes and also revealed a roadmap for its 2023-2024 products. Intel 10nm Enhanced SuperFin has been renamed to Intel 7. Intel revealed that...

videocardz.com

Dayman1225 · Jul 26, 2021

Asterox said:
Who dares wins, great idea let’s rename 10nm to Intel 7........................

Intel introduces its new node naming, Enhanced Superfin is now Intel 7 - VideoCardz.com

Intel 7, 4, 3, and 20A nodes Intel decided to make adjustments to its node naming. The manufacturer has changed the naming for two of its already announced nodes and also revealed a roadmap for its 2023-2024 products. Intel 10nm Enhanced SuperFin has been renamed to Intel 7. Intel revealed that...

videocardz.com

This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.

lobz · Jul 26, 2021

eek2121 said:
I disagree. You can browse the web just fine on a Raspberry Pi at this point.

You're kidding. Please don't ever work in a position where you give advice to users on what to buy in order to browse just fine.

Exist50 · Jul 26, 2021

Dayman1225 said:
This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.

I'd argue they're skipping far more than the should with "4nm" and especially 3nm. Their 3nm will be a full node behind TSMC's in area.

mikk · Jul 26, 2021

10 to 15% performance per watt improvement for 10ESF over 10nm SuperFin is actually a big improvement. How big was the perf/w improvement from 10+ to 10SF, wasn't it similar? The new node naming is more in line with TSMC now.

JasonLD · Jul 26, 2021

Exist50 said:
I'd argue they're skipping far more than the should with "4nm" and especially 3nm. Their 3nm will be a full node behind TSMC's in area.

Intel 7nm (4nm now? ) should be somewhere between TSMC 5nm and 3nm (density-wise, performance unknown), and 3nm is touted to be 18% performance improvement along with higher density. So while TSMC 3nm will probably remain superior, Intel 3nm(?) it isn't going to be a full node behind.

Exist50 · Jul 26, 2021

JasonLD said:
Intel 7nm (4nm now? ) should be somewhere between TSMC 5nm and 3nm (density-wise, performance unknown), and 3nm is touted to be 18% performance improvement along with higher density. So while TSMC 3nm will probably remain superior, Intel 3nm(?) it isn't going to be a full node behind.

I think the other way around is more likely. Comparable performance (HP logic only), node behind in density.

JasonLD · Jul 26, 2021

Exist50 said:
I think the other way around is more likely. Comparable performance (HP logic only), node behind in density.

Hmm, maybe, though i think having a comparable performance is more important than theoretical density maximum anyways.

Det0x · Jul 26, 2021

Asterox said:
Who dares wins, great idea let’s rename 10nm to Intel 7........................

Intel introduces its new node naming, Enhanced Superfin is now Intel 7 - VideoCardz.com

Intel 7, 4, 3, and 20A nodes Intel decided to make adjustments to its node naming. The manufacturer has changed the naming for two of its already announced nodes and also revealed a roadmap for its 2023-2024 products. Intel 10nm Enhanced SuperFin has been renamed to Intel 7. Intel revealed that...

videocardz.com

Just to make it easy for those who dont want to open link:

jpiniero · Jul 26, 2021

Ian seems to be saying that the 20A is the IBM node albeit doesn't directly say that.

mikk · Jul 26, 2021

RibbonFET and PowerVia sounds like it's a breakthrough, it would be insane if it's really ready in 2024.

Hitman928 · Jul 26, 2021

mikk said:
RibbonFET and PowerVia sounds like it's a breakthrough, it would be insane if it's really ready in 2024.

To me they sound like Intel brand names for a nanosheet process, similar to Samsung's MBCFET.

Gideon · Jul 26, 2021

Hitman928 said:
To me they sound like Intel brand names for a nanosheet process, similar to Samsung's MBCFET.

Yes, the first one is, but are TSMC and Samsung also separating power and signal wires to separate parts of the chip? lbut therThorough article up on anandtech:

Intel Foundry Roadmap Update - New 18A-PT variant that enables 3D die stacking, 14A process node enablement

Smaller, faster, better.

www.anandtech.com

Dayman1225 · Jul 26, 2021

Based on this slide, 2 Gracemont clusters (8 cores) is the size of 2x Golden Cove Cores or 1/4 the size

Hitman928 · Jul 26, 2021

Gideon said:
Yes, the first one is, but are TSMC and Samsung also separating power and signal wires to separate parts of the chip? lbut therThorough article up on anandtech:

Intel Foundry Roadmap Update - New 18A-PT variant that enables 3D die stacking, 14A process node enablement

Smaller, faster, better.

www.anandtech.com

I don't think metal layer info for either TSMC or Samsung's GAA tech is known yet, though I haven't tried to look into them in over a year.

moinmoin · Jul 26, 2021

I for one am sad that this means we won't see a 10nm SMESF (super mega enhanced super fin) anymore.

Dayman1225 said:
This was rumoured before, not surprised that Intel finally aligns their node naming with industry and it’s only 10nm ESF now being called 7, the rest aren’t.

Is the industry moving to Ångström at 2nm? Kind of oddly inconsequential to remove the "nm" for 7, 4 and 3 but add it back for 2nm, I mean 20A.

Discussion Intel current and future Lakes & Rapids thread

Golden Member

Golden Member

Member

Golden Member

Golden Member

Diamond Member

Lifer

Diamond Member

Lifer

Golden Member

Golden Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Senior member

Golden Member

Lifer

Diamond Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Diamond Member