Discussion Intel’s Unified Core

poke01 · Jul 20, 2025

Why does the P core exist, because of its 1t perf. So if the e-core exceeds P core in absolute 1t, there is no need for P core. Thats probably what UC will do.

Joe NYC · Jul 20, 2025

Geddagod said:
Would be surprised it it doesn't have SMT, though that seems to be the case.

In server environment, not having SMT would be a serious handicap vs. AMD.

In Intel's Lion Cove presentation, Intel said that SMT can be turned on and off, added when needed. Implying Intel is not killing off SMT in P-Core only products.

Diamond Rapids is not that far away, so we should get a confirmation on this soon...

DavidC1 · Jul 20, 2025

Mind you as impressive as Skymont is, the ultimate test is being able to go head to head against ARM, which is 30% better for X925, while fitting in a phone. Apple of course is even better. And this is in 2025, not 2027. Being par with then-current ARM cores is what I expect they can do. Don't know whether they can be peak, it'll be very hard.

If they weren't in the x86 bubble, they would have felt that ten years ago.

Geddagod said:
While the E-core team is supposed to be the team in charge of unified core, it would be surprising for the E-cores to esentially already be caught up in IPC before that even.

Darkmont with trivial changes gets 3-4% improvements, which eats up majority of the ~10% advantage it has over Skymont. That's less than 7% advantage, which is basically same generation in terms of "IPC".

With such a massive core size difference, just doubling the shared caches might make up for the rest by itself, nevermind a 4 cluster, 12-wide one...

To go from that to 10% is literally a fail.

Here are two scenarios I can see:
-The current Intel company wide failures eventually result in even the E core and other good divisions being terrible.
-The E cores do really well, then people would credit Tan for saving Intel.

poke01 · Jul 20, 2025

DavidC1 said:
Mind you as impressive as Skymont is, the ultimate test is being able to go head to head against ARM, which is 30% better for X925, while fitting in a phone. Apple of course is even better. And this is in 2025, not 2027. Being par with by-current ARM cores is what I expect they can do. Don't know whether they can be peak, it'll be very hard.

I wouldn't be suprised if Qualcomm over took Apple this year now that they got NUVIA and been working on V3 cores for a while now.

Geddagod · Jul 20, 2025

Joe NYC said:
In server environment, not having SMT would be a serious handicap vs. AMD.

Could be. We will see how it pans out.

Joe NYC said:
In Intel's Lion Cove presentation, Intel said that SMT can be turned on and off, added when needed. Implying Intel is not killing off SMT in P-Core only products.

I don't remember the exact words, but yes something similar to that effect, which is why I was surprised too.

Joe NYC said:
Diamond Rapids is not that far away, so we should get a confirmation on this soon...

Hopefully

DavidC1 said:
Darkmont with trivial changes gets 3-4% improvements, which eats up majority of the ~10% advantage it has over Skymont. That's less than 7% advantage, which is basically same generation in terms of "IPC".

If Darkmont is a 3-4% improvement over Skymont, wouldn't that by definition not be eating up the majority of the 10% advantage LNC has over Skymont? Lol

DavidC1 said:
To go from that to 10% is literally a fail.

Unless they never were planning for this core to be a large int IPC improvement in the first place...

poke01 said:
I wouldn't be suprised if Qualcomm over took Apple this year now that they got NUVIA and been working on V3 cores for a while now.

Funnily enough I have always thought that, at least this generation, Qualcomm might have been doing just as fine going with ARM's "vanilla" cores rather than their own custom Oryon cores.

511 · Jul 20, 2025

Joe NYC said:
I am a little confused here: Isn't Xeon 7 P Core called Diamond Rapids (DMR)? Are you saying Diamond Rapids will not have HT?

Yes it doesn't have HT as PNC doesn't have HT.

511 · Jul 20, 2025

Geddagod said:
If this is the case, the IO die, iGPU die, and Wildcat Lake can all be shifted over to different nodes or have their plans changed to accommodate 8+16 dies on 18A-P.
Intel is wasting a bunch of money, and is also getting a bunch of bad investor press, by going external. So they have the money to do large payments to TSMC to use their N2 node, but not enough to expand capacity for 18A? You aren't even building whole new fabs, all you are doing is expanding capacity.
Capacity reason makes 0 sense. Even Intel isn't claiming this is the case- it's performance and timing apparently.
And the timing reason is BS too...

If only you knew the truth but I am done arguing over it we will see when David Huang compars Cougar Cove and Lion Cove.

Geddagod said:
But if the SOC die is different like it is rn, they lose a bunch of battery life, so Intel would have to figure something out for that.
But my main point there was that the 8+16 N2 die looks like it is going to get a lot of use lol.

SoC is shared in NVL across all Mobile+Desktop SKU

DavidC1 · Jul 20, 2025

Geddagod said:
If Darkmont is a 3-4% improvement over Skymont, wouldn't that by definition not be eating up the majority of the 10% advantage LNC has over Skymont? Lol

Why would anyone think 5-6% advantage is significant in any terms? It's already quite small at only 10%. It's so close that in Arrowlake just overclocking the E cores makes 1+16 faster than 8P or 8P+16E in games.

While you are arguing they can't do this, in some cases they already have.

Geddagod said:
Unless they never were planning for this core to be a large int IPC improvement in the first place...

And why would that be the case, if they want to replace the predecessor? Advancing general purpose scalar integer performance has been dreams and aspirations for CPU engineers for 40 years.

Also getting 256-bit and AVX support just itself will result in minimal gains in majority of applications. This would also fly in face of them doubling of FP units so it benefits everybody.

The fact that there are still doubts about E core being vastly better suggests that they can pull more rabbits out of their hat. Pentium M and it's successors were in the same position literally until Core 2 reveal.

511 · Jul 20, 2025

DavidC1 said:
And why would that be the case, if they want to replace the predecessor? Advancing general purpose scalar integer performance has been dreams and aspirations for CPU engineers for 40 years.

Well AVX-512 took quite a lot out of their
development time so the Int performance would be smaller than FP improvement

DavidC1 · Jul 20, 2025

511 said:
Well AVX-512 took quite a lot out of their
development time so the Int performance would be smaller than FP improvement

I disagree. They made bigger changes for many many years. And like I said AVX-512 can be made quite small, especially if it's using 256-bit width.

If there's a team that can do it, it's them.

511 · Jul 20, 2025

DavidC1 said:
I disagree. They made bigger changes for many many years. And like I said AVX-512 can be made quite small, especially if it's using 256-bit width.

If there's a team that can do it, it's them.

The wouldn't change the fact that Int improvement will be less than FP considering they are going from 4*128b units to 4*256b units

Geddagod · Jul 20, 2025

511 said:
If only you knew the truth

That N2 is likely a full node better than 18A and 18A-P?

511 said:
we will see when David Huang compars Cougar Cove and Lion Cove.

The current LNC curve is scuffed. Huang even comments on such and tried to figure out why when he was looking at LNC in LNL vs LNC in ARL and workloads that primarily fit in the core private caches.

DavidC1 said:
Why would anyone think 5-6% advantage is significant in any terms? It's already quite small at only 10%.

It's interesting to see the gap be significantly high in other common client workloads though. GB6 specifically.
But anyway, you think that Arctic Wolf will have higher ~10% higher IPC than PTC, keeping with the 30% improvement trend? Or even a 20% improvement trend, you think that Arctic Wolf will have outright higher IPC than a P-core?
I mean even if this is the case, would undeniably be funny. Unless @Doug S has another objection about my sense of humor lol.

DavidC1 said:
While you are arguing they can't do this, in some cases they already have.

I have never said they can't do this, I've always said I find it hard to believe.
And also, they have not expanded past 128 bit FPU width in the past several generations.

DavidC1 said:
And why would that be the case, if they want to replace the predecessor? Advancing general purpose scalar integer performance has been dreams and aspirations for CPU engineers for 40 years.

Because a good bit of the engineering and area is going to be going to increasing the vector width.
And they presumably still need to maintain that ~1:4 P-core to E-core cluster ratio.

DavidC1 said:
Also getting 256-bit and AVX support just itself will result in minimal gains in majority of applications

And yet that's apparently what's going to be happening.

DavidC1 said:
The fact that there are still doubts about E core being vastly better suggests that they can pull more rabbits out of their hat.

What?
Obviously they are not, considering they are now apparently being the ones in charge of unified core, despite the large political hurdles against them.

DavidC1 said:
And like I said AVX-512 can be made quite small, especially if it's using 256-bit width.

And yet AMD even just changing how AVX-512 is implemented in Zen 5 doubled their FPU area.

DavidC1 said:
If there's a team that can do it, it's them.

Holy glaze lol

DavidC1 · Jul 20, 2025

511 said:
The wouldn't change the fact that Int improvement will be less than FP considering they are going from 4*128b units to 4*256b units

The literally doubled FP in Skymont and got extra 25% out of the already impressive 30% gain. The great thing about aiming for scalar integer is that you gain that in FP as well, because you improve everything the instructions are passing through.

So without the doubling we would have got 30/30, rather than 30/65%. This means 2x FP = 1.25x gain

Going from 128-bit to 256-bit is more nebulous of a gain, because even nowadays AVX2 isn't universally supported, or not enough of a codebase to really matter. Go compare Sandy Bridge with AVX, Haswell with AVX2 and see how it performs in modern applications with predecessors. Most gains are in line with general purpose gain, with is 10-20%.

DavidC1 · Jul 20, 2025

Geddagod said:
But anyway, you think that Arctic Wolf will have higher ~10% higher IPC than PTC, keeping with the 30% improvement trend? Or even a 20% improvement trend, you think that Arctic Wolf will have outright higher IPC than a P-core?
I mean even if this is the case, would undeniably be funny. Unless @Doug S has another objection about my sense of humor lol.

Yea, that's what exactly happened when Pentium M and Pentium 4M and Pentium 4 Mobile coexisted. Confusingly small gains by going to power hungry P4 parts. Of course the P cores will clock higher enough to negate the difference and more.

Geddagod said:
Because a good bit of the engineering and area is going to be going to increasing the vector width.
And they presumably still need to maintain that ~1:4 P-core to E-core cluster ratio.

They won't. It's already 1:3. It's going to 1:2 in NVL for a reason. E core does need to grow. I'm expecting 30% growth due to overall uarch and 25% on top of that for total of 60-70%.

Geddagod said:
And yet that's apparently what's going to be happening.

Not only.

Geddagod said:
What?
Obviously they are not, considering they are now apparently being the ones in charge of unified core, despite the large political hurdles against them.

You think Arctic Wolf is also going to have small changes.

Geddagod said:
And yet AMD even just changing how AVX-512 is implemented in Zen 5 doubled their FPU area.

AMD is in the big picture following Intel, just with better execution.

Geddagod said:
Holy glaze lol

Enough past historical evidence and this is all you could say?

511 · Jul 20, 2025

Geddagod said:
The current LNC curve is scuffed. Huang even comments on such and tried to figure out why when he was looking at LNC in LNL vs LNC in ARL and workloads that primarily fit in the core private caches.

Not for Lunar that was perfectly fine

Geddagod said:
That N2 is likely a full node better than 18A and 18A-P?

18A and 18AP has 8% difference in PPW also I am done with this get the simulation details yourself for 18A and N3B lol.

poke01 · Jul 20, 2025

We should just wait for Panther lake to judge 18A.

Geddagod · Jul 20, 2025

DavidC1 said:
Yea, that's what exactly happened when Pentium M and Pentium 4M and Pentium 4 Mobile coexisted.

We will see. You must understand why at the very least, if that is the case, it would be surprising.

DavidC1 said:
They won't. It's already 1:3.

The ratio has been getting worse, but you can still roughly equate a P-core to an E-core cluster, which is where the 1:4 ratio is coming from.

DavidC1 said:
It's going to 1:2 in NVL for a reason

Where has been this rumored? I've not heard of it tbh.

DavidC1 said:
I'm expecting 30% growth due to overall uarch and 25% on top of that for total of 60-70%.

If the area increase is that drastic, sure, than Arctic Wolf can have such a large IPC improvement.
Never heard of anyone saying it would be that large though.

DavidC1 said:
You think Arctic Wolf is also going to have small changes.

For a lack of trying.

DavidC1 said:
AMD is in the big picture following Intel, just with better execution.

That's not the point, the point is that getting to 256 bit width to support AVX-512 will be a substantial area cost.

DavidC1 said:
Enough past historical evidence and this is all you could say?

I think people overhype the E-cores too much.
It's great in area. But in power and perf, arguably the more important two categories, it's okish.
They could be sacrificing a bunch of power and perf to chase after area, but that's the way things are.

511 said:
Not for Lunar that was perfectly fine

Sure, that's fine.

511 said:
18A and 18AP has 8% difference in PPW

A massive difference....

511 said:
also I am done with this get the simulation details yourself for 18A and N3B lol.

Intel is also done with this. Hence why they are using N2 over 18A-P for NVL-S lol. Power on announced in the earnings call this week hopefully.

poke01 said:
We should just wait for Panther lake to judge 18A.

The best comparison should be NVL 4+8 tiles vs NVL 8+16 tiles.

511 · Jul 20, 2025

Geddagod said:
The best comparison should be NVL 4+8 tiles vs NVL 8+16 tiles.

We can simply do core comparison as both the cores are same and so is SoC in PPA.

DavidC1 · Jul 20, 2025

Geddagod said:
We will see. You must understand why at the very least, if that is the case, it would be surprising.

Confusing decisions happen because these are mega corporations while we treat them as an essentially a hive mind.

Geddagod said:
The ratio has been getting worse, but you can still roughly equate a P-core to an E-core cluster, which is where the 1:4 ratio is coming from.

Where has been this rumored? I've not heard of it tbh.

It's logic. 8+16 means 2:1 area.

Geddagod said:
If the area increase is that drastic, sure, than Arctic Wolf can have such a large IPC improvement.
Never heard of anyone saying it would be that large though.

That's not large for potential 1.3x in scalar integer(meaning everything) plus gains in vector. P cores were 1.15-1.2x in scalar gains with 1.5x core size.

Geddagod said:
That's not the point, the point is that getting to 256 bit width to support AVX-512 will be a substantial area cost.

Of course it is. It's still going to end up substantially small(2:1).

Geddagod · Jul 21, 2025

DavidC1 said:
It's logic. 8+16 means 2:1 area.

It's 8+16 rn, but one P-core is still being swapped out by a 4 core E-core cluster?
What's the obvious logic that points to Arctic Wolf dramatically increasing its area to the point where one P-core is going to be swapped out by a 2 core E-core cluster, or two P-cores will be being swapped out by a 4 E-core cluster?

DavidC1 said:
That's not large for potential 1.3x in scalar integer(meaning everything) plus gains in vector. P cores were 1.15-1.2x in scalar gains with 1.5x core size.

For the perf improvement, sure, it won't be large...
But it would screw up the area ratio that Intel has been cultivating since, IIRC, tremont?

DavidC1 · Jul 21, 2025

Geddagod said:
For the perf improvement, sure, it won't be large...
But it would screw up the area ratio that Intel has been cultivating since, IIRC, tremont?

No, the E core team unexpectedly caught up, just as Pentium M was originally designed to address Transmeta's lineup, but eventually came to be their bread and butter.

Then when they really struggled with 10nm, with AMD's Ryzen, they needed a quick solution. Hence the janky hybrid implementation. If you hear about what was going in their minds at that time, some high level managers denied 10nm would be delayed further, so by the time they responded, it was basically a panic decision. That's why the server team didn't have a successor for such a long time. The server manager at Intel at that time said she knew that 10nm might not be ready, but there were denials all over.

All the weirdness such as having AVX512 on Alder but disabling it on later, is not due to carefully planned decisions years in advance.

We know the P cores vs E core comparison isn't just about E cores being straightup better, but P cores also sucking against everyone else, such as Apple's M4(and yes it's smaller than Lion Cove) and even against their direct x86 competition, Zen 4 and Zen 5. Lion Cove is similar in perf to Zen 5 while being larger even though Zen 5 is on a less dense process.

Geddagod · Jul 21, 2025

DavidC1 said:
such as Apple's M4(and yes it's smaller than Lion Cove)

Only when you count the fact that LNC has large core private caches.
What really saves Apple a bunch of area is their caching hierarchy.
A LNC core without the L1.5 + L2 SRAM arrays and associated logic related to handling that is similar in area to the M3.
Meanwhile Xiaomi's X925 implementation with the core private caches is still smaller than the M4 P-core.
The M4 P-core is massive.

DavidC1 said:
No, the E core team unexpectedly caught up, just as Pentium M was originally designed to address Transmeta's lineup, but eventually came to be their bread and butter.

But the E-cores aren't on NVL for ST performance though.
Why would Intel want the E-cores to be so much larger on NVL, when the entire point of them being on there is nT perf/watt?

DavidC1 said:
Lion Cove is similar in perf to Zen 5 while being larger even though Zen 5 is on a less dense process.

LNC is the worst P-core by any of the major vendors, I agree, but LNC is only that large because of them being stuffed with massive core private caches.

The problem is that you don't necessarily need caches that big to have similar Zen 5 perf. Willow Cove for example had a 2.5x increase in L2 cache capacity over SNC, IPC didn't change much, actually regressed in TGL vs ICL. RPC increased L2 cache by 60%, and RPL IPC is low single digits better than GLC IPC. From Chips and Cheese's article about arm chair QBing GLC, we see in simulations that using AMD's L2 cache, which cuts L2 capacity nearly in third vs GLC, results in only <3% IPC losses.

Intel's huge core private caches seem to be there for energy efficiency and isolating them from the terrible uncore, both in server and in client. From a perf perspective, they can prob reduce caches to a large degree and still retain most of the perf, if area was that much of a concern for them. Or, if their uncore was just better.

Looking at just the core logic + L1, Zen 5 doesn't seem to hold on to the AMD area lead they used to have. I think LNC dramatically improved Intel's area competitiveness. Though this also might just also be because LNC cut away AVX-512...

Doug S · Jul 21, 2025

poke01 said:
I wouldn't be suprised if Qualcomm over took Apple this year now that they got NUVIA and been working on V3 cores for a while now.

Sounds like they will be binning on frequency which should be worth at least 10% for the top bin, so I agree Qualcomm is likely to beat Apple on raw performance. Performance per watt is likely a different story though.

Kepler_L2 · Jul 21, 2025

Geddagod said:
Where has been this rumored? I've not heard of it tbh.

Just need someone to "ENHANCE!" this image and we can figure it out https://pbs.twimg.com/media/GvOsGkYWoAATRMq?format=jpg&name=small

511 · Jul 21, 2025

Kepler_L2 said:
Just need someone to "ENHANCE!" this image and we can figure it out https://pbs.twimg.com/media/GvOsGkYWoAATRMq?format=jpg&name=small

Image is too blurred out for even AI to enhance without generating garbage

Discussion Intel’s Unified Core

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Diamond Member

Golden Member

Diamond Member