Discussion Intel Nova Lake in H2-2026: Discussion Threads

Page 46 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

LightningZ71

Platinum Member
Mar 10, 2017
2,579
3,270
136
And remember, the E cores only need to be orthogonally complete, they do NOT have to be performant on AVX512 code. Double pumped 256 data paths like AMD's mobile cores? Yes please. Quad pumped 128bit paths? Why not. Extra XTORS for higher clocks? Don't think so.
 

Jan Olšan

Senior member
Jan 12, 2017
581
1,141
136
APX doesn't quite sound as something with high risk of being an errata minefield. TSX you can absolutely see how, but APX?
 

DavidC1

Golden Member
Dec 29, 2023
1,939
3,077
96
And remember, the E cores only need to be orthogonally complete, they do NOT have to be performant on AVX512 code. Double pumped 256 data paths like AMD's mobile cores? Yes please. Quad pumped 128bit paths? Why not. Extra XTORS for higher clocks? Don't think so.
There's only 12% difference by enabling 512-bit datapath. Most of the AVX512 gains are because of the instruction set.
Off: 1x
256 mode: 1.3x
512 mode: 1.45x(12% over 256 mode)

There's better ways of using transistors rather than wasting power and die for 512 bit vector units, which is a big increase. Like if they improved the uarch further, it would bring gains everywhere, including on AVX512 workloads. A hypothetical future core that's 256 mode but gains extra 5% due to further uarch improvements would reduce the differences versus 512 mode to a mere 6%, while being faster everywhere else, lower power, and smaller core size.

For AMD, 256 mode Zen 6 will likely be equal to 512 mode Zen 5. Yes in cornercase scenarios it'll be better, but you are bandwidth limited in most cases, and end up better in power and area.
 
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,628
7,812
136
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.
 
Last edited:
  • Like
Reactions: Joe NYC

Covfefe

Member
Jul 23, 2025
59
96
51
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.
 
  • Like
Reactions: 511 and MoogleW

Hitman928

Diamond Member
Apr 15, 2012
6,720
12,427
136
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.

Threads jumping CCXs can be a major issue in gaming but it depends on which thread is jumping. Some threads for games are rather latency insensitive and don't need much performance, those can migrate just fine. However, if the primary threads (draw calls, game logic) get moved, you get big performance hits which result in things like the below. In general, AMD/MS/Game Devs have done a much better job in recent years of making sure this doesn't happen.

1761830488068.png
 
  • Like
Reactions: Tlh97 and Covfefe

Covfefe

Member
Jul 23, 2025
59
96
51
Well, that's the origin of my concern. Because in some games, it's still pretty useful to totally park the second CCD (effectively).
I guess I don't consider the performance hit to be significant. It's under 5% on average. Serious gamers will go for a 3D cache CPU anyways (or the rumored nova lake big cache SKU).

Regardless, I don't think Intel has a choice. Unless they want to mark one compute tile's e-cores as higher priority than the other compute tile's p-cores, Windows is going to schedule on all 16 p-cores first.
 

naukkis

Golden Member
Jun 5, 2002
1,030
854
136
Regardless, I don't think Intel has a choice. Unless they want to mark one compute tile's e-cores as higher priority than the other compute tile's p-cores, Windows is going to schedule on all 16 p-cores first.

If scheluder is LLC-awareness it won't split program threads to other LLC domains. But Nova-lake CCDs are on silicon interposer unlike AMD multichips - with that kind of hardware there's also a possibility that two-CCD version bridges their L3 rings together through silicon interposer bringing unified 288MB L3 cache to table - thus making that two-CCD version actually useful.
 

511

Diamond Member
Jul 12, 2024
4,762
4,324
106
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.
I think the game thread would be better off with 16E cores and the Main thread on the P core on NVL .

@Hulk how about a cinebench table for Nova Lake?
 

511

Diamond Member
Jul 12, 2024
4,762
4,324
106
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.
Intel has thread director for this stuff precisely it knows how to deal with scenario and can give the hints to scheduler unless the OS is stupid to ignore it.
 

Covfefe

Member
Jul 23, 2025
59
96
51
Intel has thread director for this stuff precisely it knows how to deal with scenario and can give the hints to scheduler unless the OS is stupid to ignore it.
Thread Director is for moving threads between efficient cores and fast cores. I've seen nothing that indicates it can be used for keeping threads in the same L3 domain as other threads.
 

511

Diamond Member
Jul 12, 2024
4,762
4,324
106
Thread Director is for moving threads between efficient cores and fast cores. I've seen nothing that indicates it can be used for keeping threads in the same L3 domain as other threads.
It's a glorified scheduler hinter it can ofc do it it's entire purpose is thread scheduling between P/E/PL-E why can't it be made aware of L3 Domains with threads. P/E Cores and LP-E have different cache domains as well.
 

Covfefe

Member
Jul 23, 2025
59
96
51
It's a glorified scheduler hinter it can ofc do it it's entire purpose is thread scheduling between P/E/PL-E why can't it be made aware of L3 Domains with threads. P/E Cores and LP-E have different cache domains as well.
P/E and lpe have different cache domains, but that isn't used as a determining factor when deciding where to send the thread.

What you're suggesting is for each core to track the origin of all the data that enters its caches and how long ago it was retrieved. Every cache line would need to store that telemetry and the internal fabric would need to transmit it.

That's a crazy amount of added complexity.
 
  • Like
Reactions: KompuKare

511

Diamond Member
Jul 12, 2024
4,762
4,324
106
P/E and lpe have different cache domains, but that isn't used as a determining factor when deciding where to send the thread.

What you're suggesting is for each core to track the origin of all the data that enters its caches and how long ago it was retrieved. Every cache line would need to store that telemetry and the internal fabric would need to transmit it.

That's a crazy amount of added complexity.
Or they would have some other solution for this or they would simply copy from AMDs playbook and park the 2nd cpu tile
 

MS_AT

Senior member
Jul 15, 2024
890
1,786
96
Intel has thread director for this stuff precisely it knows how to deal with scenario and can give the hints to scheduler unless the OS is stupid to ignore it.
You speak with a confidence that suggest you have done investigations to verify the matter and if we asked, you could present proof to support your claim that Thread Director works as intended and gives meaningful hints to the OS.

Don't worry, I won't ask;)

Plus it seems TD is not strictly necessary https://www.phoronix.com/review/cache-aware-scheduling-amd-turin
 
  • Like
Reactions: KompuKare

511

Diamond Member
Jul 12, 2024
4,762
4,324
106

MS_AT

Senior member
Jul 15, 2024
890
1,786
96
Done by Intel Linux engineers, AMD CPU SW is a meme
I don't deny it, but Intel doesn't seem to want to reimagine itself as software company which arguably was one of its few strenghts in last few years. Quite the opposite seeing the number of devs they let go.
 

511

Diamond Member
Jul 12, 2024
4,762
4,324
106
Yeah the layoffs affected software the most IFS was left very untouched especially when he said he would kill bloat but he killed Many SW engineers.
 
Last edited: