Discussion Intel’s Unified Core: There is hope

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

poke01

Diamond Member
Mar 8, 2022
4,309
5,633
106
Why does the P core exist, because of its 1t perf. So if the e-core exceeds P core in absolute 1t, there is no need for P core. Thats probably what UC will do.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,689
5,229
136
Would be surprised it it doesn't have SMT, though that seems to be the case.

In server environment, not having SMT would be a serious handicap vs. AMD.

In Intel's Lion Cove presentation, Intel said that SMT can be turned on and off, added when needed. Implying Intel is not killing off SMT in P-Core only products.

Diamond Rapids is not that far away, so we should get a confirmation on this soon...
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
Mind you as impressive as Skymont is, the ultimate test is being able to go head to head against ARM, which is 30% better for X925, while fitting in a phone. Apple of course is even better. And this is in 2025, not 2027. Being par with then-current ARM cores is what I expect they can do. Don't know whether they can be peak, it'll be very hard.

If they weren't in the x86 bubble, they would have felt that ten years ago.
While the E-core team is supposed to be the team in charge of unified core, it would be surprising for the E-cores to esentially already be caught up in IPC before that even.
Darkmont with trivial changes gets 3-4% improvements, which eats up majority of the ~10% advantage it has over Skymont. That's less than 7% advantage, which is basically same generation in terms of "IPC".

With such a massive core size difference, just doubling the shared caches might make up for the rest by itself, nevermind a 4 cluster, 12-wide one...

To go from that to 10% is literally a fail.

Here are two scenarios I can see:
-The current Intel company wide failures eventually result in even the E core and other good divisions being terrible.
-The E cores do really well, then people would credit Tan for saving Intel.
 
Last edited:

poke01

Diamond Member
Mar 8, 2022
4,309
5,633
106
Mind you as impressive as Skymont is, the ultimate test is being able to go head to head against ARM, which is 30% better for X925, while fitting in a phone. Apple of course is even better. And this is in 2025, not 2027. Being par with by-current ARM cores is what I expect they can do. Don't know whether they can be peak, it'll be very hard.
I wouldn't be suprised if Qualcomm over took Apple this year now that they got NUVIA and been working on V3 cores for a while now.
 

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
In server environment, not having SMT would be a serious handicap vs. AMD.
Could be. We will see how it pans out.
In Intel's Lion Cove presentation, Intel said that SMT can be turned on and off, added when needed. Implying Intel is not killing off SMT in P-Core only products.
I don't remember the exact words, but yes something similar to that effect, which is why I was surprised too.
Diamond Rapids is not that far away, so we should get a confirmation on this soon...
Hopefully
Darkmont with trivial changes gets 3-4% improvements, which eats up majority of the ~10% advantage it has over Skymont. That's less than 7% advantage, which is basically same generation in terms of "IPC".
If Darkmont is a 3-4% improvement over Skymont, wouldn't that by definition not be eating up the majority of the 10% advantage LNC has over Skymont? Lol
To go from that to 10% is literally a fail.
Unless they never were planning for this core to be a large int IPC improvement in the first place...
I wouldn't be suprised if Qualcomm over took Apple this year now that they got NUVIA and been working on V3 cores for a while now.
Funnily enough I have always thought that, at least this generation, Qualcomm might have been doing just as fine going with ARM's "vanilla" cores rather than their own custom Oryon cores.
 

511

Diamond Member
Jul 12, 2024
4,654
4,244
106
If this is the case, the IO die, iGPU die, and Wildcat Lake can all be shifted over to different nodes or have their plans changed to accommodate 8+16 dies on 18A-P.
Intel is wasting a bunch of money, and is also getting a bunch of bad investor press, by going external. So they have the money to do large payments to TSMC to use their N2 node, but not enough to expand capacity for 18A? You aren't even building whole new fabs, all you are doing is expanding capacity.
Capacity reason makes 0 sense. Even Intel isn't claiming this is the case- it's performance and timing apparently.
And the timing reason is BS too...
If only you knew the truth but I am done arguing over it we will see when David Huang compars Cougar Cove and Lion Cove.
But if the SOC die is different like it is rn, they lose a bunch of battery life, so Intel would have to figure something out for that.
But my main point there was that the 8+16 N2 die looks like it is going to get a lot of use lol.
SoC is shared in NVL across all Mobile+Desktop SKU
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
If Darkmont is a 3-4% improvement over Skymont, wouldn't that by definition not be eating up the majority of the 10% advantage LNC has over Skymont? Lol
Why would anyone think 5-6% advantage is significant in any terms? It's already quite small at only 10%. It's so close that in Arrowlake just overclocking the E cores makes 1+16 faster than 8P or 8P+16E in games.

While you are arguing they can't do this, in some cases they already have.
Unless they never were planning for this core to be a large int IPC improvement in the first place...
And why would that be the case, if they want to replace the predecessor? Advancing general purpose scalar integer performance has been dreams and aspirations for CPU engineers for 40 years.

Also getting 256-bit and AVX support just itself will result in minimal gains in majority of applications. This would also fly in face of them doubling of FP units so it benefits everybody.

The fact that there are still doubts about E core being vastly better suggests that they can pull more rabbits out of their hat. Pentium M and it's successors were in the same position literally until Core 2 reveal.
 
Last edited:

511

Diamond Member
Jul 12, 2024
4,654
4,244
106
And why would that be the case, if they want to replace the predecessor? Advancing general purpose scalar integer performance has been dreams and aspirations for CPU engineers for 40 years.
Well AVX-512 took quite a lot out of their
development time so the Int performance would be smaller than FP improvement
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
Well AVX-512 took quite a lot out of their
development time so the Int performance would be smaller than FP improvement
I disagree. They made bigger changes for many many years. And like I said AVX-512 can be made quite small, especially if it's using 256-bit width.

If there's a team that can do it, it's them.
 

511

Diamond Member
Jul 12, 2024
4,654
4,244
106
I disagree. They made bigger changes for many many years. And like I said AVX-512 can be made quite small, especially if it's using 256-bit width.

If there's a team that can do it, it's them.
The wouldn't change the fact that Int improvement will be less than FP considering they are going from 4*128b units to 4*256b units
 

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
If only you knew the truth
That N2 is likely a full node better than 18A and 18A-P?
we will see when David Huang compars Cougar Cove and Lion Cove.
The current LNC curve is scuffed. Huang even comments on such and tried to figure out why when he was looking at LNC in LNL vs LNC in ARL and workloads that primarily fit in the core private caches.
Why would anyone think 5-6% advantage is significant in any terms? It's already quite small at only 10%.
It's interesting to see the gap be significantly high in other common client workloads though. GB6 specifically.
But anyway, you think that Arctic Wolf will have higher ~10% higher IPC than PTC, keeping with the 30% improvement trend? Or even a 20% improvement trend, you think that Arctic Wolf will have outright higher IPC than a P-core?
I mean even if this is the case, would undeniably be funny. Unless @Doug S has another objection about my sense of humor lol.
While you are arguing they can't do this, in some cases they already have.
I have never said they can't do this, I've always said I find it hard to believe.
And also, they have not expanded past 128 bit FPU width in the past several generations.
And why would that be the case, if they want to replace the predecessor? Advancing general purpose scalar integer performance has been dreams and aspirations for CPU engineers for 40 years.
Because a good bit of the engineering and area is going to be going to increasing the vector width.
And they presumably still need to maintain that ~1:4 P-core to E-core cluster ratio.
Also getting 256-bit and AVX support just itself will result in minimal gains in majority of applications
And yet that's apparently what's going to be happening.
The fact that there are still doubts about E core being vastly better suggests that they can pull more rabbits out of their hat.
What?
Obviously they are not, considering they are now apparently being the ones in charge of unified core, despite the large political hurdles against them.
And like I said AVX-512 can be made quite small, especially if it's using 256-bit width.
And yet AMD even just changing how AVX-512 is implemented in Zen 5 doubled their FPU area.
If there's a team that can do it, it's them.
Holy glaze lol
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
The wouldn't change the fact that Int improvement will be less than FP considering they are going from 4*128b units to 4*256b units
The literally doubled FP in Skymont and got extra 25% out of the already impressive 30% gain. The great thing about aiming for scalar integer is that you gain that in FP as well, because you improve everything the instructions are passing through.

So without the doubling we would have got 30/30, rather than 30/65%. This means 2x FP = 1.25x gain

Going from 128-bit to 256-bit is more nebulous of a gain, because even nowadays AVX2 isn't universally supported, or not enough of a codebase to really matter. Go compare Sandy Bridge with AVX, Haswell with AVX2 and see how it performs in modern applications with predecessors. Most gains are in line with general purpose gain, with is 10-20%.
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
But anyway, you think that Arctic Wolf will have higher ~10% higher IPC than PTC, keeping with the 30% improvement trend? Or even a 20% improvement trend, you think that Arctic Wolf will have outright higher IPC than a P-core?
I mean even if this is the case, would undeniably be funny. Unless @Doug S has another objection about my sense of humor lol.
Yea, that's what exactly happened when Pentium M and Pentium 4M and Pentium 4 Mobile coexisted. Confusingly small gains by going to power hungry P4 parts. Of course the P cores will clock higher enough to negate the difference and more.
Because a good bit of the engineering and area is going to be going to increasing the vector width.
And they presumably still need to maintain that ~1:4 P-core to E-core cluster ratio.
They won't. It's already 1:3. It's going to 1:2 in NVL for a reason. E core does need to grow. I'm expecting 30% growth due to overall uarch and 25% on top of that for total of 60-70%.
And yet that's apparently what's going to be happening.
Not only.
What?
Obviously they are not, considering they are now apparently being the ones in charge of unified core, despite the large political hurdles against them.
You think Arctic Wolf is also going to have small changes.
And yet AMD even just changing how AVX-512 is implemented in Zen 5 doubled their FPU area.
AMD is in the big picture following Intel, just with better execution.
Holy glaze lol
Enough past historical evidence and this is all you could say?
 

511

Diamond Member
Jul 12, 2024
4,654
4,244
106
The current LNC curve is scuffed. Huang even comments on such and tried to figure out why when he was looking at LNC in LNL vs LNC in ARL and workloads that primarily fit in the core private caches.
Not for Lunar that was perfectly fine
That N2 is likely a full node better than 18A and 18A-P?
18A and 18AP has 8% difference in PPW also I am done with this get the simulation details yourself for 18A and N3B lol.
 

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
Yea, that's what exactly happened when Pentium M and Pentium 4M and Pentium 4 Mobile coexisted.
We will see. You must understand why at the very least, if that is the case, it would be surprising.
They won't. It's already 1:3.
The ratio has been getting worse, but you can still roughly equate a P-core to an E-core cluster, which is where the 1:4 ratio is coming from.
It's going to 1:2 in NVL for a reason
Where has been this rumored? I've not heard of it tbh.
I'm expecting 30% growth due to overall uarch and 25% on top of that for total of 60-70%.
If the area increase is that drastic, sure, than Arctic Wolf can have such a large IPC improvement.
Never heard of anyone saying it would be that large though.
You think Arctic Wolf is also going to have small changes.
For a lack of trying.
AMD is in the big picture following Intel, just with better execution.
That's not the point, the point is that getting to 256 bit width to support AVX-512 will be a substantial area cost.
Enough past historical evidence and this is all you could say?
I think people overhype the E-cores too much.
It's great in area. But in power and perf, arguably the more important two categories, it's okish.
They could be sacrificing a bunch of power and perf to chase after area, but that's the way things are.
Not for Lunar that was perfectly fine
Sure, that's fine.
18A and 18AP has 8% difference in PPW
A massive difference....
also I am done with this get the simulation details yourself for 18A and N3B lol.
Intel is also done with this. Hence why they are using N2 over 18A-P for NVL-S lol. Power on announced in the earnings call this week hopefully.
We should just wait for Panther lake to judge 18A.
The best comparison should be NVL 4+8 tiles vs NVL 8+16 tiles.
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
We will see. You must understand why at the very least, if that is the case, it would be surprising.
Confusing decisions happen because these are mega corporations while we treat them as an essentially a hive mind.
The ratio has been getting worse, but you can still roughly equate a P-core to an E-core cluster, which is where the 1:4 ratio is coming from.

Where has been this rumored? I've not heard of it tbh.
It's logic. 8+16 means 2:1 area.
If the area increase is that drastic, sure, than Arctic Wolf can have such a large IPC improvement.
Never heard of anyone saying it would be that large though.
That's not large for potential 1.3x in scalar integer(meaning everything) plus gains in vector. P cores were 1.15-1.2x in scalar gains with 1.5x core size.
That's not the point, the point is that getting to 256 bit width to support AVX-512 will be a substantial area cost.
Of course it is. It's still going to end up substantially small(2:1).
 

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
It's logic. 8+16 means 2:1 area.
It's 8+16 rn, but one P-core is still being swapped out by a 4 core E-core cluster?
What's the obvious logic that points to Arctic Wolf dramatically increasing its area to the point where one P-core is going to be swapped out by a 2 core E-core cluster, or two P-cores will be being swapped out by a 4 E-core cluster?
That's not large for potential 1.3x in scalar integer(meaning everything) plus gains in vector. P cores were 1.15-1.2x in scalar gains with 1.5x core size.
For the perf improvement, sure, it won't be large...
But it would screw up the area ratio that Intel has been cultivating since, IIRC, tremont?
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,043
96
For the perf improvement, sure, it won't be large...
But it would screw up the area ratio that Intel has been cultivating since, IIRC, tremont?
No, the E core team unexpectedly caught up, just as Pentium M was originally designed to address Transmeta's lineup, but eventually came to be their bread and butter.

Then when they really struggled with 10nm, with AMD's Ryzen, they needed a quick solution. Hence the janky hybrid implementation. If you hear about what was going in their minds at that time, some high level managers denied 10nm would be delayed further, so by the time they responded, it was basically a panic decision. That's why the server team didn't have a successor for such a long time. The server manager at Intel at that time said she knew that 10nm might not be ready, but there were denials all over.

All the weirdness such as having AVX512 on Alder but disabling it on later, is not due to carefully planned decisions years in advance.

We know the P cores vs E core comparison isn't just about E cores being straightup better, but P cores also sucking against everyone else, such as Apple's M4(and yes it's smaller than Lion Cove) and even against their direct x86 competition, Zen 4 and Zen 5. Lion Cove is similar in perf to Zen 5 while being larger even though Zen 5 is on a less dense process.
 

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
such as Apple's M4(and yes it's smaller than Lion Cove)
Only when you count the fact that LNC has large core private caches.
What really saves Apple a bunch of area is their caching hierarchy.
A LNC core without the L1.5 + L2 SRAM arrays and associated logic related to handling that is similar in area to the M3.
Meanwhile Xiaomi's X925 implementation with the core private caches is still smaller than the M4 P-core.
The M4 P-core is massive.
No, the E core team unexpectedly caught up, just as Pentium M was originally designed to address Transmeta's lineup, but eventually came to be their bread and butter.
But the E-cores aren't on NVL for ST performance though.
Why would Intel want the E-cores to be so much larger on NVL, when the entire point of them being on there is nT perf/watt?
Lion Cove is similar in perf to Zen 5 while being larger even though Zen 5 is on a less dense process.
LNC is the worst P-core by any of the major vendors, I agree, but LNC is only that large because of them being stuffed with massive core private caches.

The problem is that you don't necessarily need caches that big to have similar Zen 5 perf. Willow Cove for example had a 2.5x increase in L2 cache capacity over SNC, IPC didn't change much, actually regressed in TGL vs ICL. RPC increased L2 cache by 60%, and RPL IPC is low single digits better than GLC IPC. From Chips and Cheese's article about arm chair QBing GLC, we see in simulations that using AMD's L2 cache, which cuts L2 capacity nearly in third vs GLC, results in only <3% IPC losses.

Intel's huge core private caches seem to be there for energy efficiency and isolating them from the terrible uncore, both in server and in client. From a perf perspective, they can prob reduce caches to a large degree and still retain most of the perf, if area was that much of a concern for them. Or, if their uncore was just better.

Looking at just the core logic + L1, Zen 5 doesn't seem to hold on to the AMD area lead they used to have. I think LNC dramatically improved Intel's area competitiveness. Though this also might just also be because LNC cut away AVX-512...
 
  • Like
Reactions: Io Magnesso

Doug S

Diamond Member
Feb 8, 2020
3,620
6,401
136
I wouldn't be suprised if Qualcomm over took Apple this year now that they got NUVIA and been working on V3 cores for a while now.

Sounds like they will be binning on frequency which should be worth at least 10% for the top bin, so I agree Qualcomm is likely to beat Apple on raw performance. Performance per watt is likely a different story though.