Discussion Intel’s Unified Core: There is hope

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Io Magnesso

Senior member
Jun 12, 2025
578
164
71
Only when you count the fact that LNC has large core private caches.
What really saves Apple a bunch of area is their caching hierarchy.
A LNC core without the L1.5 + L2 SRAM arrays and associated logic related to handling that is similar in area to the M3.
Meanwhile Xiaomi's X925 implementation with the core private caches is still smaller than the M4 P-core.
The M4 P-core is massive.

But the E-cores aren't on NVL for ST performance though.
Why would Intel want the E-cores to be so much larger on NVL, when the entire point of them being on there is nT perf/watt?

LNC is the worst P-core by any of the major vendors, I agree, but LNC is only that large because of them being stuffed with massive core private caches.

The problem is that you don't necessarily need caches that big to have similar Zen 5 perf. Willow Cove for example had a 2.5x increase in L2 cache capacity over SNC, IPC didn't change much, actually regressed in TGL vs ICL. RPC increased L2 cache by 60%, and RPL IPC is low single digits better than GLC IPC. From Chips and Cheese's article about arm chair QBing GLC, we see in simulations that using AMD's L2 cache, which cuts L2 capacity nearly in third vs GLC, results in only <3% IPC losses.

Intel's huge core private caches seem to be there for energy efficiency and isolating them from the terrible uncore, both in server and in client. From a perf perspective, they can prob reduce caches to a large degree and still retain most of the perf, if area was that much of a concern for them. Or, if their uncore was just better.

Looking at just the core logic + L1, Zen 5 doesn't seem to hold on to the AMD area lead they used to have. I think LNC dramatically improved Intel's area competitiveness. Though this also might just also be because LNC cut away AVX-512...
I accept
LNC is
We will see. You must understand why at the very least, if that is the case, it would be surprising.

The ratio has been getting worse, but you can still roughly equate a P-core to an E-core cluster, which is where the 1:4 ratio is coming from.

Where has been this rumored? I've not heard of it tbh.

If the area increase is that drastic, sure, than Arctic Wolf can have such a large IPC improvement.
Never heard of anyone saying it would be that large though.

For a lack of trying.

That's not the point, the point is that getting to 256 bit width to support AVX-512 will be a substantial area cost.

I think people overhype the E-cores too much.
It's great in area. But in power and perf, arguably the more important two categories, it's okish.
They could be sacrificing a bunch of power and perf to chase after area, but that's the way things are.

Sure, that's fine.

A massive difference....

Intel is also done with this. Hence why they are using N2 over 18A-P for NVL-S lol. Power on announced in the earnings call this week hopefully.

The best comparison should be NVL 4+8 tiles vs NVL 8+16 tiles.
Is the difference between PPW true?
 

LightningZ71

Platinum Member
Mar 10, 2017
2,551
3,246
136
A couple things to keep in mind when comparing 18A and N2: remember, while N2 has the structures in place for BSPDN, it isn't actually part of the PDK for that node. It's getting that density without involving it. 18A has BSPDN as an intrinsic, but has going with the easiest to implement of the three known designs that also gives the least improvements in density. It appears to me that the BSPDN thermal issues are highly localized to each transistor, as opposed to just heat soaking the entire die region. The challenge is getting the heat out of each Xtor while keeping density high enough to make the approach even worth it in the first place.

I suspect that 18A won't suffer from the same thermal issues that TSMC's BSPDN designs will because of the density difference, but there will still be thermal issues surrounding their BSPDN strategy. For those touting 18A-P as being amazing, note that N2P will also be a thing. I doubt that either will be able to implement marketable products at their highest achievable densities and the balance they can strike will be most important.

IMHO, I think TSMC'S greater density will serve them well in DC applications, and their less aggressive nodes will be fine for consumer grade products.
 

poke01

Diamond Member
Mar 8, 2022
4,315
5,641
106
1753093337776.png
1753098325543.png
Note: Cinebench tests taken from Geekerwan.

Even Lunar's Lake excellent perf/w and use of MOP, PMIC to "hide" Lion Cove's bad design is ripped when tested against great designs. I like how cut throat this statement is.

Under such circumstances, an excellent microarchitecture design should have significantly better power consumption than competing products. For example, the single-threaded energy efficiency and performance of Apple M3/M4 are much better than Strix Point. However, Lion Cove's performance is only as good as that of M2 and as efficient as that of M1.

David makes some good points.
  • Considering that the 255H is an N3B processor and the HX 370 is an N4P, the energy efficiency measured in this article is hardly satisfactory. Even without considering factors such as SMT, Lion Cove's performance is clearly insufficient in high power consumption and completely defeated in low power consumption.
    • In a previous Lunar Lake review article, a reader commented that the N3B HP has no advantages over the N4P HD in terms of medium and low voltage.
    • At that time, the energy efficiency issues of Intel's microarchitecture could be covered up by Lunar Lake's own low-power design (such as power-optimized MSC, MOP, PMIC, etc.), making it close to the overall energy efficiency curve of AMD processors. But when Arrow Lake's Lion Cove showed such terrible results in low power consumption, how many reasons could be used to explain it?
    • What's more, why does AMD use the old generation HD process to manufacture a smaller core, whose single-threaded absolute performance and energy efficiency are approximately equal to Intel's N3B HP core? What is the huge gap in micro-architecture design capabilities and even the entire semiconductor engineering capabilities behind this?

For Intel its seems that process nodes advancements are negated by bad design that AMD better microarchitecture outstrips Intel's node advantage.
Intel needs the E-core team to go all out, they have fresh new ideas.
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,047
96
However, Lion Cove's performance is only as good as that of M2 and as efficient as that of M1.
There's another big difference that current x86 vendors can never do. This isn't the fault of the ISA, it's the fault of the two vendors.

Go that low in power while having respectable performance. 9W at 8000 for M2 is spectacular. At 9W, Intel/AMD chips crash in performance. Like "why is this computer slow?" sluggish.

This means the overall silicon design is vastly superior, down to the uncore and power management, as you can see the uncore takes up an astonishing 3.4W, or 40% of the M4.. For all the talks of power management features, the Apple chip shames them. This is more than just vertical integration - it's simply a better product and execution.

Intel needs to do changes in a bigger level than Lunarlake, at the SoC level. The fact is though that they did achieve this back in 2014 with Bay Trail, which was a cheap as chips chip.
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,898
3,047
96
Only when you count the fact that LNC has large core private caches.
What really saves Apple a bunch of area is their caching hierarchy.
A LNC core without the L1.5 + L2 SRAM arrays and associated logic related to handling that is similar in area to the M3.
I don't care about L2 caches, they are still easy to add. Adding L2 caches as part of the rectangular size was hard maybe 15 years ago but it's clear design capabilities are improving and it's much easier. L1 is the hard part.

Lion Cove is 3.4mm2 for core on a denser N3B. Apple M4 is 3mm2 on N3E. If N3B is 6% larger than it's 3.6mm2 vs 3mm2. Nevermind M4 spanks it in performance per watt, and even in absolute performance.

If M4 belonged to Intel, the marketers would put brand name towards every feature and rub it in our face every month about how good theirs are. Intel in 2025 has really, really degraded in all aspects.

People talk about the Apple RDF but while that might be true as a whole company, their chips are indeed really excellent. x86 RDF exists too.
 
Last edited:

poke01

Diamond Member
Mar 8, 2022
4,315
5,641
106
This isn't the fault of the ISA, it's the fault of the two vendors.
This. The best example is Xiaomi's X925 implementation compared to Mediatek. Its the same CPU core but its much better as it faster and lower power because of the better backend. I have no idea why Intel/AMD don't focus on this, is it cause of cost and time?

It would also help them massively in DC where perf/w is important
 
  • Like
Reactions: Tlh97 and DavidC1

DavidC1

Golden Member
Dec 29, 2023
1,898
3,047
96
I have no idea why Intel/AMD don't focus on this, is it cause of cost and time?
More than that, otherwise 2015 Intel would be the best Intel.

It's cause they don't have to, and the ISA differences shield them from reality. In fact if the regulators really cared about advancements(they don't), then x86 would have been opened in the 80's. Courts have sided with Intel so strongly that Transmeta and Nvidia had to settle in court promising they will never translate the x86 ISA. Intel prevented them from a software translator of their ISA!

Intel knew both Transmeta and Nvidia were a threat.

Transmeta was a startup that made a VLIW chip, ran it through a complex ISA translator that actually had decent compatibility and all ran at a super low power. They optimized it at the transistor level to lower idle power too.

-Now imagine you are Intel, and this tiny startup creates a whole new market and does decently despite having a massive deficit.
-Now imagine how well Transmeta would have done if they could have made a native x86 chip.
-Now imagine what Nvidia could have done

Celeron - Response to Via
Pentium M - Response to Transmeta
Conroe - Response to AMD.
 
Last edited:

Io Magnesso

Senior member
Jun 12, 2025
578
164
71
There's nothing wrong with ISA
ISA is It doesn't mean you can't do anything because it's the cause
Architecture does everything
 

Io Magnesso

Senior member
Jun 12, 2025
578
164
71
I've never heard of an NVIDIA CPU contested with Intel in court.
Intel and NVIDIA were arguing with the cross-licensing of chipsets.
I've heard that
But I've never heard anything more than that
 

511

Diamond Member
Jul 12, 2024
4,656
4,251
106
Intel's P core sucks so much that it's not even funny sometimes their generational process gap hid this
 

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
Lion Cove is 3.4mm2 for core on a denser N3B. Apple M4 is 3mm2 on N3E. If N3B is 6% larger than it's 3.6mm2 vs 3mm2. Nevermind M4 spanks it in performance per watt, and even in absolute performance.
Not including power gates and only including up to the L1, It's ~2.62 for LNC and ~3.09mm2 for the M4. Mind you, I also think LNC wastes a bunch of area in the CPL/clock region of the core, because of how the geometry of the core lines up, but whatever.
I'm also pretty sure LNC spends significantly more area on the FPU of the core.
You over-exaggerate how bad LNC is I think, and then comparatively overhype the E-cores.
 

Io Magnesso

Senior member
Jun 12, 2025
578
164
71
Does anyone have any evidence that the court ruled X86 ISA against NVIDIA?
 
Last edited:

511

Diamond Member
Jul 12, 2024
4,656
4,251
106
Not including power gates and only including up to the L1, It's ~2.62 for LNC and ~3.09mm2 for the M4. Mind you, I also think LNC wastes a bunch of area in the CPL/clock region of the core, because of how the geometry of the core lines up, but whatever.
I'm also pretty sure LNC spends significantly more area on the FPU of the core.
You over-exaggerate how bad LNC is I think, and then comparatively overhype the E-cores.
And the useful part of it is disabled 🤣
 

Io Magnesso

Senior member
Jun 12, 2025
578
164
71
According to the information I've gathered, NVIDIA is not trying to create the x86 processor itself.
NVIDIA made chipset North Bridge and South Bridge for intel CPU Platform
I've never heard of a davidc1 statement
 
Jul 27, 2020
28,173
19,201
146
I've never heard of a davidc1 statement

As he explained, “Nvidia brought out a product called Denver. It was actually that same design. It originally started as an x86 [CPU], but through certain legal issues, had to turn itself into an Arm CPU.”

They were ready to create x86 CPU in 2009: https://www.techpowerup.com/87098/nvidia-to-try-and-develop-x86-cpu-in-two-to-three-years

BUT: https://arstechnica.com/information-technology/2011/02/nvidia-30-and-the-riscification-of-x86/

Huang was quite clear that NVIDIA could have chosen to produce an x86 processor—he described the licensing and technical problems associated with making an x86 CPU as "solvable" for NVIDIA. But he gave two reasons why the company opted not to go down that road.
 
  • Like
Reactions: Io Magnesso

DZero

Golden Member
Jun 20, 2024
1,667
638
96
View attachment 127498
View attachment 127499
Note: Cinebench tests taken from Geekerwan.

Even Lunar's Lake excellent perf/w and use of MOP, PMIC to "hide" Lion Cove's bad design is ripped when tested against great designs. I like how cut throat this statement is.

Under such circumstances, an excellent microarchitecture design should have significantly better power consumption than competing products. For example, the single-threaded energy efficiency and performance of Apple M3/M4 are much better than Strix Point. However, Lion Cove's performance is only as good as that of M2 and as efficient as that of M1.

David makes some good points.
  • Considering that the 255H is an N3B processor and the HX 370 is an N4P, the energy efficiency measured in this article is hardly satisfactory. Even without considering factors such as SMT, Lion Cove's performance is clearly insufficient in high power consumption and completely defeated in low power consumption.
    • In a previous Lunar Lake review article, a reader commented that the N3B HP has no advantages over the N4P HD in terms of medium and low voltage.
    • At that time, the energy efficiency issues of Intel's microarchitecture could be covered up by Lunar Lake's own low-power design (such as power-optimized MSC, MOP, PMIC, etc.), making it close to the overall energy efficiency curve of AMD processors. But when Arrow Lake's Lion Cove showed such terrible results in low power consumption, how many reasons could be used to explain it?
    • What's more, why does AMD use the old generation HD process to manufacture a smaller core, whose single-threaded absolute performance and energy efficiency are approximately equal to Intel's N3B HP core? What is the huge gap in micro-architecture design capabilities and even the entire semiconductor engineering capabilities behind this?

For Intel its seems that process nodes advancements are negated by bad design that AMD better microarchitecture outstrips Intel's node advantage.
Intel needs the E-core team to go all out, they have fresh new ideas.
As more I see I fear the harsh truth: x86 is on the limits, and AMD tries to break it without success. Apple ARM (not the ARM we know) is right now the BEST design in the whole systems right now.

And even nVIDIA realizes that ARM is a lost race due the advantage of Apple and Qualcomm, so they are starting to eye RISC-V.
 

poke01

Diamond Member
Mar 8, 2022
4,315
5,641
106
x86 is on the limits
x86 isn’t the problem, it’s Intels and AMD way of designing their cores where its performance before efficiency. If you give Apple an x86 license, the first thing they will do is focus on perf/w and make the cpu suitable for a phone environment.

x86 is very good but in low power scenarios the current designs, they fall behind as they are made to be high performance a server/desktop not in a phone as low as 5-8 watts.

This is what I’m hoping Unified Core does the atom team has its roots embedded and low power devices.
 

Io Magnesso

Senior member
Jun 12, 2025
578
164
71
x86 isn’t the problem, it’s Intels and AMD way of designing their cores where its performance before efficiency. If you give Apple an x86 license, the first thing they will do is focus on perf/w and make the cpu suitable for a phone environment.

x86 is very good but in low power scenarios the current designs, they fall behind as they are made to be high performance a server/desktop not in a phone as low as 5-8 watts.

This is what I’m hoping Unified Core does the atom team has its roots embedded and low power devices.
x86 isn’t the problem, it’s Intels and AMD way of designing their cores where its performance before efficiency. If you give Apple an x86 license, the first thing they will do is focus on perf/w and make the cpu suitable for a phone environment.

x86 is very good but in low power scenarios the current designs, they fall behind as they are made to be high performance a server/desktop not in a phone as low as 5-8 watts.

This is what I’m hoping Unified Core does the atom team has its roots embedded and low power devices.
That's right, I agree
ISA is not impossible
For historical reasons x86 did not focus on areas like ARM
 

DavidC1

Golden Member
Dec 29, 2023
1,898
3,047
96
Not including power gates and only including up to the L1, It's ~2.62 for LNC and ~3.09mm2 for the M4. Mind you, I also think LNC wastes a bunch of area in the CPL/clock region of the core, because of how the geometry of the core lines up, but whatever.
I'm also pretty sure LNC spends significantly more area on the FPU of the core.
I did analysis again with video from High Yield. I got 3.09mm2 for Lion Cove and 1.04mm2 for Skymont. It takes out the L2 caches, the tags, and L2 control area and the 192KB noise.

Not to forget that M4 is still on a less dense process and it outright outperforms it while consuming less than half the power to do so plus in laptop form factors it gets to power levels that even Lunarlake can't touch while performing decently. What x86 chip gets to 9W total chip power while getting 9000 points in Cinebench?!?

Note that Apple has been sort of coasting after Gerald Williams III defect as well. Someone else pointed this out.
You over-exaggerate how bad LNC is I think, and then comparatively overhype the E-cores.
No, because LNC is actually bad.

And E cores have consistently done much more dramatic changes for the past decade without regressions. This is just a logical progression. Past doesn't necessarily indicate the future, but the P core execution has been extremely shoddy compared to the E core one. The P cores shouldn't be outperformed in branch prediction but it does. Lion Cove managed to not really advance from the predecessor, the first time in history of major uarch changes!
 
Last edited:

Geddagod

Golden Member
Dec 28, 2021
1,543
1,630
106
I did analysis again with video from High Yield. I got 3.09mm2 for Lion Cove and 1.04mm2 for Skymont. It takes out the L2 caches, the tags, and L2 control area and the 192KB noise.
Area in red is ~2.65mm2
1753316332363.png
Smaller than that of the M4 p-core.
Not to forget that M4 is still on a less dense process
More dense by what, single digits?
And a worse yielding, lower perf node. N3E is outright the better node.
and it outright outperforms it while consuming less than half the power to do so plus in laptop form factors it gets to power levels that even Lunarlake can't touch while performing decently. What x86 chip gets to 9W total chip power while getting 9000 points in Cinebench?!?
No one is arguing LNC is better than the M4.
Note that Apple has been sort of coasting after Gerald Williams III defect as well. Someone else pointed this out.
Idk how much of that is "coasting" vs them not being able to improve IPC much, and finding it easier to push frequency instead.
No, because LNC is actually bad.
Hence my use of "over exaggerate" and not "lie".
And E cores have consistently done much more dramatic changes for the past decade without regressions. This is just a logical progression. Past doesn't necessarily indicate the future, but the P core execution has been extremely shoddy compared to the E core one.
It's also harder to scale up performance on the leading edge.
Lion Cove managed to not really advance from the predecessor, the first time in history of major uarch changes!
It does, the word "really" is doing a lot of heavy lifting there.
Also, LNC from a resource side also appears to have not pushed things as far as previous Intel tocks.
---LNCRWCSNC
ROB capacity +13%+45%+57%
uOP cache+28%+78%+50%
Rename Width +33%+20%+25%
L2 BTB+0%+140%+25%
L1 BTB+100%-50%+0%
Store Buffer capacity+5%+58%+29%
Rather than dramatically expand queue and structure capacity, LNC seems more geared towards a re-org and better utilizing what resources they already had.
Being competitive in ST with Zen 5 makes the mediocre IPC and perf uplift bearable for Intel, though obviously not preferred.
 
  • Like
Reactions: igor_kavinski