Discussion Intel Nova Lake in H2-2026: Discussion Threads

Page 46 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

LightningZ71

Platinum Member
Mar 10, 2017
2,577
3,266
136
And remember, the E cores only need to be orthogonally complete, they do NOT have to be performant on AVX512 code. Double pumped 256 data paths like AMD's mobile cores? Yes please. Quad pumped 128bit paths? Why not. Extra XTORS for higher clocks? Don't think so.
 

Jan Olšan

Senior member
Jan 12, 2017
581
1,141
136
APX doesn't quite sound as something with high risk of being an errata minefield. TSX you can absolutely see how, but APX?
 

DavidC1

Golden Member
Dec 29, 2023
1,933
3,072
96
And remember, the E cores only need to be orthogonally complete, they do NOT have to be performant on AVX512 code. Double pumped 256 data paths like AMD's mobile cores? Yes please. Quad pumped 128bit paths? Why not. Extra XTORS for higher clocks? Don't think so.
There's only 12% difference by enabling 512-bit datapath. Most of the AVX512 gains are because of the instruction set.
Off: 1x
256 mode: 1.3x
512 mode: 1.45x(12% over 256 mode)

There's better ways of using transistors rather than wasting power and die for 512 bit vector units, which is a big increase. Like if they improved the uarch further, it would bring gains everywhere, including on AVX512 workloads. A hypothetical future core that's 256 mode but gains extra 5% due to further uarch improvements would reduce the differences versus 512 mode to a mere 6%, while being faster everywhere else, lower power, and smaller core size.

For AMD, 256 mode Zen 6 will likely be equal to 512 mode Zen 5. Yes in cornercase scenarios it'll be better, but you are bandwidth limited in most cases, and end up better in power and area.
 
Last edited:

gdansk

Diamond Member
Feb 8, 2011
4,615
7,773
136
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.
 
Last edited:
  • Like
Reactions: Joe NYC

Covfefe

Member
Jul 23, 2025
57
94
51
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.
 
  • Like
Reactions: MoogleW

Hitman928

Diamond Member
Apr 15, 2012
6,720
12,420
136
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.

Threads jumping CCXs can be a major issue in gaming but it depends on which thread is jumping. Some threads for games are rather latency insensitive and don't need much performance, those can migrate just fine. However, if the primary threads (draw calls, game logic) get moved, you get big performance hits which result in things like the below. In general, AMD/MS/Game Devs have done a much better job in recent years of making sure this doesn't happen.

1761830488068.png
 
  • Like
Reactions: Covfefe

Covfefe

Member
Jul 23, 2025
57
94
51
Well, that's the origin of my concern. Because in some games, it's still pretty useful to totally park the second CCD (effectively).
I guess I don't consider the performance hit to be significant. It's under 5% on average. Serious gamers will go for a 3D cache CPU anyways (or the rumored nova lake big cache SKU).

Regardless, I don't think Intel has a choice. Unless they want to mark one compute tile's e-cores as higher priority than the other compute tile's p-cores, Windows is going to schedule on all 16 p-cores first.
 

naukkis

Golden Member
Jun 5, 2002
1,028
853
136
Regardless, I don't think Intel has a choice. Unless they want to mark one compute tile's e-cores as higher priority than the other compute tile's p-cores, Windows is going to schedule on all 16 p-cores first.

If scheluder is LLC-awareness it won't split program threads to other LLC domains. But Nova-lake CCDs are on silicon interposer unlike AMD multichips - with that kind of hardware there's also a possibility that two-CCD version bridges their L3 rings together through silicon interposer bringing unified 288MB L3 cache to table - thus making that two-CCD version actually useful.