Discussion Intel Nova Lake in H2-2026: Discussion Threads

LightningZ71 · Oct 26, 2025

And remember, the E cores only need to be orthogonally complete, they do NOT have to be performant on AVX512 code. Double pumped 256 data paths like AMD's mobile cores? Yes please. Quad pumped 128bit paths? Why not. Extra XTORS for higher clocks? Don't think so.

Jan Olšan · Oct 27, 2025

APX doesn't quite sound as something with high risk of being an errata minefield. TSX you can absolutely see how, but APX?

adroc_thurston · Oct 27, 2025

Jan Olšan said:
APX doesn't quite sound as something with high risk of being an errata minefield

This is Intel we're talking about, they shipped broken ECC on *Xeons*.

DavidC1 · Oct 27, 2025

LightningZ71 said:
And remember, the E cores only need to be orthogonally complete, they do NOT have to be performant on AVX512 code. Double pumped 256 data paths like AMD's mobile cores? Yes please. Quad pumped 128bit paths? Why not. Extra XTORS for higher clocks? Don't think so.

There's only 12% difference by enabling 512-bit datapath. Most of the AVX512 gains are because of the instruction set.

AVX-512 Performance With 256-bit vs. 512-bit Data Path For AMD EPYC 9005 CPUs Review - Phoronix

www.phoronix.com

Off: 1x
256 mode: 1.3x
512 mode: 1.45x(12% over 256 mode)

There's better ways of using transistors rather than wasting power and die for 512 bit vector units, which is a big increase. Like if they improved the uarch further, it would bring gains everywhere, including on AVX512 workloads. A hypothetical future core that's 256 mode but gains extra 5% due to further uarch improvements would reduce the differences versus 512 mode to a mere 6%, while being faster everywhere else, lower power, and smaller core size.

For AMD, 256 mode Zen 6 will likely be equal to 512 mode Zen 5. Yes in cornercase scenarios it'll be better, but you are bandwidth limited in most cases, and end up better in power and area.

gdansk · Oct 29, 2025

What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.

Covfefe · Oct 30, 2025

gdansk said:
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.

The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.

Hitman928 · Oct 30, 2025

Covfefe said:
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.

Threads jumping CCXs can be a major issue in gaming but it depends on which thread is jumping. Some threads for games are rather latency insensitive and don't need much performance, those can migrate just fine. However, if the primary threads (draw calls, game logic) get moved, you get big performance hits which result in things like the below. In general, AMD/MS/Game Devs have done a much better job in recent years of making sure this doesn't happen.

gdansk · Oct 30, 2025

Covfefe said:
From what I've seen, scheduling a game across two CCXs isn't really an issue.

Well, that's the origin of my concern. Because in some games, it's still pretty useful to totally park the second CCD (effectively).

Covfefe · Oct 30, 2025

gdansk said:
Well, that's the origin of my concern. Because in some games, it's still pretty useful to totally park the second CCD (effectively).

I guess I don't consider the performance hit to be significant. It's under 5% on average. Serious gamers will go for a 3D cache CPU anyways (or the rumored nova lake big cache SKU).

Regardless, I don't think Intel has a choice. Unless they want to mark one compute tile's e-cores as higher priority than the other compute tile's p-cores, Windows is going to schedule on all 16 p-cores first.

naukkis · Oct 30, 2025

Covfefe said:
Regardless, I don't think Intel has a choice. Unless they want to mark one compute tile's e-cores as higher priority than the other compute tile's p-cores, Windows is going to schedule on all 16 p-cores first.

If scheluder is LLC-awareness it won't split program threads to other LLC domains. But Nova-lake CCDs are on silicon interposer unlike AMD multichips - with that kind of hardware there's also a possibility that two-CCD version bridges their L3 rings together through silicon interposer bringing unified 288MB L3 cache to table - thus making that two-CCD version actually useful.

511 · Nov 1, 2025

adroc_thurston said:
This is Intel we're talking about, they shipped broken ECC on *Xeons*.

They are also the ones who basically wrote software from AVX to APX to AMX as for shipping broken stuff every vendor has broken stuff ...

511 · Nov 1, 2025

gdansk said:
What happens on NVL-SK when e.g. a game wants to use a 9th thread for say physics calculations?
How do they share state? It must all sync through L3, but how does it keep memory in sync between chips? I can't look to Arrow Lake S for an example, since it has no 9th P core it will go to an E core on the same chip. Is there a way of managing this on the IO tile or SoC tile? And if so, is it already doing that for ARL-S.

I think the game thread would be better off with 16E cores and the Main thread on the P core on NVL .

@Hulk how about a cinebench table for Nova Lake?

511 · Nov 1, 2025

Covfefe said:
The default behavior is to always favor the strongest cores first.

Ryzen CPUs with two CCDs have a similar problem. Is it better to put the extra thread on the second CCD or use SMT? From what I've seen, scheduling a game across two CCXs isn't really an issue. A 9900X performs roughly the same as a 9700X. If cross-CCX latency were a problem, there would be a performance hit.

I don't see any reason why Nova Lake CPUs with two compute tiles should be any different.

Intel has thread director for this stuff precisely it knows how to deal with scenario and can give the hints to scheduler unless the OS is stupid to ignore it.

adroc_thurston · Nov 1, 2025

511 said:
every vendor has broken stuff ...

Lmao cope, no one besides Intel shipped broken DDR4 ECC just like that.

511 · Nov 1, 2025

adroc_thurston said:
Lmao cope, no one besides Intel shipped broken DDR4 ECC just like that.

Every vendor has shipped broken stuff look at Blackwell missing ROP Intel had worse bug than this remember the FIDV?

adroc_thurston · Nov 1, 2025

511 said:
Every vendor has shipped broken stuff look at Blackwell missing ROP Intel had worse bug than this remember the FIDV?

Binning screwups from NV != Intel shipping broken ECC on Purley Xeons.

511 · Nov 1, 2025

Right... Broken hw is broken hw may it be from binning or some other thing

Covfefe · Nov 1, 2025

511 said:
Intel has thread director for this stuff precisely it knows how to deal with scenario and can give the hints to scheduler unless the OS is stupid to ignore it.

Thread Director is for moving threads between efficient cores and fast cores. I've seen nothing that indicates it can be used for keeping threads in the same L3 domain as other threads.

511 · Nov 1, 2025

Covfefe said:
Thread Director is for moving threads between efficient cores and fast cores. I've seen nothing that indicates it can be used for keeping threads in the same L3 domain as other threads.

It's a glorified scheduler hinter it can ofc do it it's entire purpose is thread scheduling between P/E/PL-E why can't it be made aware of L3 Domains with threads. P/E Cores and LP-E have different cache domains as well.

Covfefe · Nov 1, 2025

511 said:
It's a glorified scheduler hinter it can ofc do it it's entire purpose is thread scheduling between P/E/PL-E why can't it be made aware of L3 Domains with threads. P/E Cores and LP-E have different cache domains as well.

P/E and lpe have different cache domains, but that isn't used as a determining factor when deciding where to send the thread.

What you're suggesting is for each core to track the origin of all the data that enters its caches and how long ago it was retrieved. Every cache line would need to store that telemetry and the internal fabric would need to transmit it.

That's a crazy amount of added complexity.

511 · Nov 1, 2025

Covfefe said:
P/E and lpe have different cache domains, but that isn't used as a determining factor when deciding where to send the thread.

What you're suggesting is for each core to track the origin of all the data that enters its caches and how long ago it was retrieved. Every cache line would need to store that telemetry and the internal fabric would need to transmit it.

That's a crazy amount of added complexity.

Or they would have some other solution for this or they would simply copy from AMDs playbook and park the 2nd cpu tile

MS_AT · Nov 1, 2025

511 said:
Intel has thread director for this stuff precisely it knows how to deal with scenario and can give the hints to scheduler unless the OS is stupid to ignore it.

You speak with a confidence that suggest you have done investigations to verify the matter and if we asked, you could present proof to support your claim that Thread Director works as intended and gives meaningful hints to the OS.

Don't worry, I won't ask

Plus it seems TD is not strictly necessary https://www.phoronix.com/review/cache-aware-scheduling-amd-turin

511 · Nov 1, 2025

MS_AT said:
You speak with a confidence that suggest you have done investigations to verify the matter and if we asked, you could present proof to support your claim that Thread Director works as intended and gives meaningful hints to the OS.

Don't worry, I won't ask

Plus it seems TD is not strictly necessary https://www.phoronix.com/review/cache-aware-scheduling-amd-turin

Done by Intel Linux engineers, AMD CPU SW is a meme

MS_AT · Nov 1, 2025

511 said:
Done by Intel Linux engineers, AMD CPU SW is a meme

I don't deny it, but Intel doesn't seem to want to reimagine itself as software company which arguably was one of its few strenghts in last few years. Quite the opposite seeing the number of devs they let go.

511 · Nov 1, 2025

Yeah the layoffs affected software the most IFS was left very untouched especially when he said he would kill bloat but he killed Many SW engineers.

Discussion Intel Nova Lake in H2-2026: Discussion Threads

Platinum Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Member

Diamond Member

Diamond Member

Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member