Discussion Intel Nova Lake in H2-2026: Discussion Threads

dullard · Dec 11, 2025

Hulk said:
Wow, this is an aggressive path for Intel to be releasing basically 4 new core architectures inside of 12 months, if all goes well I wish them luck.

As others have said, Nova Lake will have Coyote Cove (which is just basically Panther Cove with a few things missing, and as 511 pointed out this is not to be confused with Panther Lake). The information that we have so far is that its focus is on larger IPC gains, efficiencies, and APX. https://www.tomshardware.com/pc-com...th-big-ipc-improvements-support-for-intel-apx
APX will require the software developers to recompile. So, don't expect instant gains from APX -- especially with initial reviews on older software. But over time as software is updated, you'll see improvements.

Thunder 57 · Dec 11, 2025

dullard said:
As others have said, Nova Lake will have Coyote Cove (which is just basically Panther Cove with a few things missing, and as 511 pointed out this is not to be confused with Panther Lake). The information that we have so far is that its focus is on larger IPC gains, efficiencies, and APX. https://www.tomshardware.com/pc-com...th-big-ipc-improvements-support-for-intel-apx
APX will require the software developers to recompile. So, don't expect instant gains from APX -- especially with initial reviews on older software. But over time as software is updated, you'll see improvements.

We've seen this movie before with 64 bit. 64 bit alone didn't do much in most cases but those extra GPR's still needed software to be recompiled to know they were there.

511 · Dec 11, 2025

There are few stuff in APX that make your code have less branches iirc which is nice

Thunder 57 · Dec 11, 2025

511 said:
There are few stuff in APX that make your code have less branches iirc which is nice

How does that work? Hopefully not like Branchless Doom

.

dullard · Dec 11, 2025

Thunder 57 said:
We've seen this movie before with 64 bit. 64 bit alone didn't do much in most cases but those extra GPR's still needed software to be recompiled to know they were there.

True. Any CPU with significant new features often does better in hindsight than in the first reviews. This happens as software is recompiled, or even better optimized, for the new CPU.

dullard · Dec 11, 2025

Thunder 57 said:
How does that work? Hopefully not like Branchless Doom .

The concept has been around for decades.

Standard Method
If-Then-Else statements when compiled into machine language have a lot of code and a lot of jumps and then all that code/jumps need to be predicted by branch prediction for optimum speed. That prediction that just doesn't work as well as you'd want once things get even remotely complex.

The code needs to evaluate if something is true or false.
If it is false, then run the false code.
1. Then jump to the end of the true code.
Otherwise run the true code.
Then join the two paths back together.

Predicated If-Conversion Method
Instead of If-Then-Else statements, you can just write a couple lines of code and have only the necessary code run. Far fewer lines of machine language and no jumps. There is nothing to predict. Just the necessary code runs.

Run True code if necessary
Run False code if necessary

Half as many pseudocode lines. No branching. Nothing to predict. https://en.wikipedia.org/wiki/Predication_(computer_architecture)#Overview Now the compiler can choose to run #1 or #2 in any order that the compiler can determine will be most optimum. Or it could start up two threads and run both at the same time.

But, the drawback is that the code is complex, the predicated method just can't be done with such a limited number of registers. APX gives the compiler far more opportunities to remove the if-statements entirely.

Thunder 57 · Dec 11, 2025

dullard said:
The concept has been around for decades.

Standard Method
If-Then-Else statements when compiled into machine language have a lot of code and a lot of jumps and then all that code/jumps need to be predicted by branch prediction for optimum speed. That prediction that just doesn't work as well as you'd want once things get even remotely complex.

The code needs to evaluate if something is true or false.

If it is false, then run the false code.

Then jump to the end of the true code.

Otherwise run the true code.

Then join the two paths back together.

Predicated If-Conversion Method
Instead of If-Then-Else statements, you can just write a couple lines of code and have only the necessary code run. Far fewer lines of machine language and no jumps. There is nothing to predict. Just the necessary code runs.

Run True code if necessary

Run False code if necessary

Half as many pseudocode lines. No branching. Nothing to predict. https://en.wikipedia.org/wiki/Predication_(computer_architecture)#Overview

But, the drawback is that the code is complex, the predicated method just can't be done with such a limited number of registers. APX gives the compiler far more opportunities to remove the if-statements entirely.

If-Then-Else-Switch all evaluate and costs time, but have been around since they work. Branch predicition is extremely accurate. It's an interesting conversation, but hasn't the jump to 16 GPR's and register renaming given us many gains? I'd like to see x86 match ARM with APX but I believe the returns will be limited.

I appreciate the link though.

dullard · Dec 11, 2025

Thunder 57 said:
If-Then-Else-Switch all evaluate and costs time, but have been around since they work. Branch predicition is extremely accurate. It's an interesting conversation, but hasn't the jump to 16 GPR's and register renaming given us many gains? I'd like to see x86 match ARM with APX but I believe the returns will be limited.

I appreciate the link though.

Even with perfect prediction, you have a jump and a join to execute. That and any branch prediction fail is a major delay, so even missing 1% of the time can slow things down quite a bit.

x86 vs ARM compiler difference discussion is actually well past my knowledge. And it is best in another thread.

MS_AT · Dec 11, 2025

dullard said:
The concept has been around for decades.

I think you have went a bit too far with respect to APX. It simply introduces more conditional instructions that operate based on the status flags modified by preceeding instructions. The benefit is you save branch predictor buffer entries, the negative is that you introduce explicit dependency that cannot be reordered around. In other words, if your condition is very predictable stick to branches, if your condition is on the more random side use conditional instructions.

dullard · Dec 11, 2025

MS_AT said:
I think you have went a bit too far with respect to APX. It simply introduces more conditional instructions that operate based on the status flags modified by preceeding instructions. The benefit is you save branch predictor buffer entries, the negative is that you introduce explicit dependency that cannot be reordered around. In other words, if your condition is very predictable stick to branches, if your condition is on the more random side use conditional instructions.

That is how I interpret statements like this:

APX also adds to the x86 ISA’s predicated-execution capabilities, which should help compilers eliminate performance-sapping, hard-to-predict branches.

https://www.techinsights.com/blog/apx-biggest-x86-addition-64-bits

and

These enhancements expand the applicability of if-conversion to much larger code regions, cutting down on the number of branches that may incur misprediction penalties.

https://www.intel.com/content/www/u...ical/advanced-performance-extensions-apx.html

So, I explained predicated-execution / if-conversion.

Thunder 57 · Dec 11, 2025

dullard said:
x86 vs ARM compiler difference discussion is actually well past my knowledge. And it is best in another thread.

Agreed. All I'll say is EPIC certainly didn't end the compilier problem. They ended EPIC and OoOE lives on.

MS_AT · Dec 11, 2025

dullard said:
So, I explained predicated-execution / if-conversion.

I do not dispute this, but I just found it a bit too complex in relation to what APX provides in reality. That's all

Thunder 57 said:
It's an interesting conversation, but hasn't the jump to 16 GPR's and register renaming given us many gains? I'd like to see x86 match ARM with APX but I believe the returns will be limited.

It's hard to say really how willing people will be to recompile. Actually I am looking forward to APX as I need additional GPRs for my... AVX512 code

[memory addresses and loop control are held in GPRs in case somebody is wondering].

511 · Dec 11, 2025

MS_AT said:
I do not dispute this, but I just found it a bit too complex in relation to what APX provides in reality. That's all

It's hard to say really how willing people will be to recompile. Actually I am looking forward to APX as I need additional GPRs for my... AVX512 code [memory addresses and loop control are held in GPRs in case somebody is wondering].

NVL Buyer spotted 😛

Cardyak · Dec 12, 2025

DavidC1 said:
Not really. Arctic Wolf for example will likely be the 12-wide x86 core that Jim Keller was working on. 12-wide issue by itself will only bring few single digit % gains. So that's just an enabler. Other than that, we don't know anything about it. Ticks like Darkmont we can speculate much easier. How much of Skymont could we have got from Gracemont and Crestmont? Nothing really.

The wording "12-wide" is a little ambiguous. Indeed, early rumours indicate that Arctic Wolf will be 12-wide at decode, but that doesn't guarantee it will be 12-wide at rename stage.

If you look at Skymont, it's 9-wide at decode but only 8-wide at rename. This isn't necessarily a waste as it allows the decode to "overfill" and ensure the 8-wide machine is well utilized, but it still limits the overall design to being an 8-wide core.

I strongly suspect Arctic Wolf will be a similar affair, I'd bet good money that it will be 12-wide at decode stage but 10-wide for rename/allocate.

511 · Dec 12, 2025

Cardyak said:
I strongly suspect Arctic Wolf will be a similar affair, I'd bet good money that it will be 12-wide at decode stage but 10-wide for rename/allocate

Both Coyote and Arctic wolf are 12 wide decode and both are clustered decode.
The retirement is 16 Wide in Skymont iirc which is massive imo

Cardyak · Dec 12, 2025

511 said:
Both Coyote and Arctic wolf are 12 wide decode and both are clustered decode.
The retirement is 16 Wide in Skymont iirc which is massive imo

None of these points precludes the design being only 10 wide at rename/allocate

511 · Dec 12, 2025

Cardyak said:
None of these points precludes the design being only 10 wide at rename/allocate

Yup

LightningZ71 · Dec 12, 2025

It can easily be crazy wide after decode without blooming the XTOR budget if most of the ways are quite simple.

511 · Dec 14, 2025

@MS_AT Intel has thrown everything at the problem with NVL packing/extra cache/extra cores/extra pci-e/Integrated TB5 and they have fixed shortcoming of ARL as well

DavidC1 · Dec 14, 2025

LightningZ71 said:
It can easily be crazy wide after decode without blooming the XTOR budget if most of the ways are quite simple.

Skymont is ~30% larger than Crestmont iso-process if we exclude the FP increases, meaning it's a very good 1:1 increase from the performance increase it got. Based on their history I'm confident they'll get linear gains again, but it won't happen without innovation, which is what we cannot guess.

Wide is a waste especially without better branch prediction, so that's basically the ceiling on how much wide they can go before it hits diminishing returns quickly.

MS_AT said:
MS_AT said:

A lot? Isn't it just more of the same? Or are you able to name at least 3 distinct features which are not about making something bigger? (Clustered decode was there, it just got bigger, distributed schedulers were there, got bigger, more of execution units, bigger BTB and reorder buffer). Actually from memory, I think only Nanocode stands out as something that is new, and not just bigger. Of course might be wrong.

Click to expand...

Would we have been able to guess them having 16-wide retire which they said it was an attempt to efficiently increase resources? Or how doubling ALUs which were all simple ones because it was "cheap to add"? What about having more stores over loads which is also in contrary to established expectations? They are quite distinct. That's why I focus on the E core team doing things efficiently not just expanding without thinking.

Cardyak said:
The wording "12-wide" is a little ambiguous. Indeed, early rumours indicate that Arctic Wolf will be 12-wide at decode, but that doesn't guarantee it will be 12-wide at rename stage.
If you look at Skymont, it's 9-wide at decode but only 8-wide at rename. This isn't necessarily a waste as it allows the decode to "overfill" and ensure the 8-wide machine is well utilized, but it still limits the overall design to being an 8-wide core.

I strongly suspect Arctic Wolf will be a similar affair, I'd bet good money that it will be 12-wide at decode stage but 10-wide for rename/allocate.

The typical use of "12-wide" is on the decode side. If they go 10-wide for rename/allocate, still it's overall similar result, because it's still a substantial 25% increase over the predecessor. Oh, and Gracemont was 5-wide while Crestmont was 6-wide rename/allocate. We got maybe 3% out of that, and Crestmont had few more small changes too. In reality, decode is just an enabler, and overall performance wise just one out of maybe dozen high-level features that dictate performance.

We don't even have full performance picture of the simple Tick+ cores in Pantherlake. There's no way we can guess what's going on in Arctic Wolf.

Fjodor2001 · Dec 18, 2025

New leak about NVL-S SKUs, four of them are expected to have bLLC.

Original source:

https://twitter.com/x/status/2001298954365858135

Intel Nova Lake Desktop CPUs With Big Cache 'bLLC" To Feature Four Flavors In 52, 42, 28, 24 Cores, 288 MB "Core Ultra 9" & 144 MB "Core Ultra 7"

Intel's next-gen Nova Lake Desktop CPUs will include four SKU flavors with the highly anticipated "bLLC" big cache, and up to 52 cores.

wccftech.com

gdansk · Dec 18, 2025

Double bLLC? Interesting

adroc_thurston · Dec 18, 2025

gdansk said:
Double bLLC? Interesting

nnnnope you get a single bLLC tile and then a normal NVL-S 816.

Fjodor2001 · Dec 18, 2025

adroc_thurston said:
nnnnope you get a single bLLC tile and then a normal NVL-S 816.

adroc_thurston · Dec 18, 2025

Fjodor2001 said:
View attachment 135337

he's wrong

Discussion Intel Nova Lake in H2-2026: Discussion Threads

Elite Member

Diamond Member

Diamond Member

Diamond Member

Elite Member

Elite Member

Diamond Member

Elite Member

Senior member

Elite Member

Diamond Member

Senior member

Diamond Member

Member

Diamond Member

Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member