Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 545 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
847
799
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
To kill off a successful product segment for BOM optimization is a mistake. If LNL is a sales success, they could charge more to offset the higher costs.
Something more important than margins and revenue is the ability to create new markets and shut off competition.

It's same with the arguments regarding iGPUs. Sure it increases die costs a bit, but without it you are completely locked out of majority of markets. That is above and beyond the impact of any die size savings would give you.
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
Based on the size of the NPU, they could have put 24MB MLC and further lowered power and improved performance. It's way way too large.
Nah, this is a misconception likely originating from MLID, about Lioncove bring a part of royal core/an entirely new project and uarch team led by Jim Keller. That's simply not true. Lioncove still has a lot in common with redwood cove and all prior Intel core uarch as shown by C&C and David Huang's articles. The core still behaves largely the same in a lot of key metrics
Yup.

Layout has changed substantially on Prescott too, it just did not perform well. Lion Cove is the size of Zen 5 despite a substantially denser process, without top clock speed advantage(both at 5.7GHz), without SMT which according to Intel should give noticeable ST and die area advantages.

The biggest red flag to me is the branch prediction regression. Branch prediction is THE key to improving performance, and it's now worse than distant successor to "Atom" core which had very very humble beginnings. Branch prediction regression tells me the team is really really struggling.
 
Last edited:

adroc_thurston

Diamond Member
Jul 2, 2023
7,108
9,865
106
the costs are worth it
Nope.
and required to compete at the highest level.
Premium tablet chips are very very very niche.
I think it’d be in best interest of Intel to create a new product segment based off of LNL
It's literally dead already.
PTL and NVL are both focused on BOM optimization.
To kill off a successful product segment
It's not. Premium Windows tablets are 10 years like dead.
Something more important than margins and revenue is the ability to create new markets and shut off competition.
It's not a new market, Core Y is like 12 years old.
It's same with the arguments regarding iGPUs
Fat iGP is niche. Which is why the ULV PTL has a tiny one.
 
  • Like
Reactions: Joe NYC

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
Comparison of CPU core areas
Sjaklalocume.jpg
This was posted on reddit.
Private caches have been included to core area;
Intel/AMD = L0, L1, L2.
Apple/Qualcomm = L1.

It seems Lion Cove is still rather bloated, compared to the competition. On the other hand, Skymont area efficiency is impressive.
 

The Hardcard

Senior member
Oct 19, 2021
332
419
106
Comparison of CPU core areas
View attachment 108593
This was posted on reddit.
Private caches have been included to core area;
Intel/AMD = L0, L1, L2.
Apple/Qualcomm = L1.

It seems Lion Cove is still rather bloated, compared to the competition. On the other hand, Skymont area efficiency is impressive.
Is it bloated? The L2 cache is what it is and it’s 2.5 MB. It would be interesting if logic area could be compared.
 

poke01

Diamond Member
Mar 8, 2022
4,205
5,553
106
It's not a new market, Core Y is like 12 years old.
Remembers 12” MacBook ugh that core M but still.
Comparison of CPU core areas
View attachment 108593
This was posted on reddit.
Private caches have been included to core area;
Intel/AMD = L0, L1, L2.
Apple/Qualcomm = L1.

It seems Lion Cove is still rather bloated, compared to the competition. On the other hand, Skymont area efficiency is impressive.
I think Intel got luckily with N3B, if Lunar was on N3E lion cove would have been even bigger.
 

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
Re Geekbench 6 testing. I set my P cores to 2.7GHz to simulate 8 more E's. No hyperthreading of course. The result shows GB6 basically going to 0 by the time you get to 24 cores. By that I mean the 24th core should theoretically increase overall score by 4.3% but in reality only adds 0.6% increase in performance, or 13.2% of that theoretical amount.

Those first 8 or so cores are really important for a high GB6 score.

1727924446596.png

1727924463859.png
 

511

Diamond Member
Jul 12, 2024
4,525
4,145
106
I think they still have AVX-512 related logic and stuff in the core just it's fricking disabled Intel's presentation Slides that
 

AMDK11

Senior member
Jul 15, 2019
473
407
136
Nah, this is a misconception likely originating from MLID, about Lioncove bring a part of royal core/an entirely new project and uarch team led by Jim Keller. That's simply not true. Lioncove still has a lot in common with redwood cove and all prior Intel core uarch as shown by C&C and David Huang's articles. The core still behaves largely the same in a lot of key metrics

Of course it is. This is not due to any Keller or Royal Core.

LionCove is a transition from a unified schedule of 3xFP/ALU and 2xALU to separate 4xFP + 6xALU. This is a radical change since the days of Pentium PRO(P6).
 

AMDK11

Senior member
Jul 15, 2019
473
407
136
Based on the size of the NPU, they could have put 24MB MLC and further lowered power and improved performance. It's way way too large.

Yup.

Layout has changed substantially on Prescott too, it just did not perform well. Lion Cove is the size of Zen 5 despite a substantially denser process, without top clock speed advantage(both at 5.7GHz), without SMT which according to Intel should give noticeable ST and die area advantages.

The biggest red flag to me is the branch prediction regression. Branch prediction is THE key to improving performance, and it's now worse than distant successor to "Atom" core which had very very humble beginnings. Branch prediction regression tells me the team is really really struggling.
Even though I agree that Intel is pushing for clock speeds like in the Pentium 4 times, you are transferring it too literally to LionCove. Pentium 4 compared to Pentium III had a regression from 3-Wide decoder to 1-Wide.

LionCove is pushing for higher IPC as opposed to Pentium 4.
 

DrMrLordX

Lifer
Apr 27, 2000
22,902
12,971
136
Which means Lunarlake is efficient at the system level that they can juice more to the SoC yet offer same battery life, ending up as a superior product.

That doesn't make any sense. If it pulls more power then it's going to wear out the battery faster unless it somehow is racing to idle quickly enough to justify the additional power draw. That doesn't seem to be the case based on the results of the Geekerwan video.

@Abwx also has a point. Why are the SoC power draw numbers so different? They don't conform to the board power limits, and there's no indicator that the opposition system has parasitic losses outside of the SoC. Also nobody really wants to address the main point he was making, which is that the Lunar Lake SoC requires more power to achieve higher benchmark results. The whole point of the test wasn't to speculate about parasitic losses or platform efficiency, it was to isolate each SoC and determine perf/watt. If Geekerwan intended for both SoCs to use the same amount of power, then he failed in execution, because they simply did not.
It's quite possible it was carried over from Alder Lake.

The only way anyone would have noticed such a problem on Alder Lake is if they somehow tuned it to pull the same TVB volts (1.6v+) as could be requested on a Raptor Lake system. Honestly I have no idea if Alder Lake can exhibit that behavior.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
And Intel/AMD needs to follow that direction, because I suspect large L1 caches are also contributing to efficiency in their designs because it keeps lot of data from going out into slower, higher power cache levels and memory.

The basis of Apple's design is also having stellar design team, because being able to have 4GHz clocks at such low power at just 9 pipeline stages and humongous L1 cache with 3 cycle latency is amazing.
Apple's cache hierarchy is also more cost/area effective. If you compare M3 vs Lunar Lake, Intel is spending more capacity (and hence area!) on caches.

Apple M3

4P + 4E

(192 KB pL1i/128 KB pL1d)×4 + (128 KB pL1i/64 KB pL1d)×4

16 MB sL2 + 4 MB sL2

8 MB SLC


Lunar Lake


4P + 4LPE

(48 KB pL0d)×4 + 0

(192 KB pL1d/128 KB pL1i)×4 + (64 KB L1i/32 KB L1d)×4

(2.5 MB pL2)×4 + 4 MB sL2

12 MB sL3 + 0

8 MB SLC
 

poke01

Diamond Member
Mar 8, 2022
4,205
5,553
106
Apple's cache hierarchy is also more cost/area effective. If you compare M3 vs Lunar Lake, Intel is spending more capacity (and hence area!) on caches.

Apple M3

4P + 4E

(192 KB pL1i/128 KB pL1d)×4 + (128 KB pL1i/64 KB pL1d)×4

16 MB sL2 + 4 MB sL2

8 MB SLC


Lunar Lake

4P + 4LPE

(48 KB pL0d)×4 + 0

(192 KB pL1d/128 KB pL1i)×4 + (64 KB L1i/32 KB L1d)×4

(2.5 MB pL2)×4 + 4 MB sL2

12 MB sL3 + 0

8 MB SLC
So where does this idea “Apple has more caches than Intel/AMD” come from then?
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,761
106
So where does this idea “Apple has more caches than Intel/AMD” come from then?
Good question. Two reasons, I think;

(1) When Apple M1 came out, it did actually have much larger caches than Intel/AMD peers. However since then (M1 -> M4), Apple has hardly increased the sizes at all (L1i/L1d has been the same size, P-core L2 went from 12 MB to 16 MB). Meanwhile in the same time period, Intel/AMD made large increases to their cache capacities, so much so that now Intel has surpassed Apple (as you can see in my previous post).

(2) Apple has more cache for an equivalent level compared to Intel/AMD. Apple's L1 and L2 caches are huge (But they don't have an L3 like Intel/AMD do).
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
That doesn't make any sense. If it pulls more power then it's going to wear out the battery faster unless it somehow is racing to idle quickly enough to justify the additional power draw.
You are also missing the point, or maybe you aren't paying attention. Likely AMD has incorrect sensors as two outlets point this out, how the system power is comparable to Intel platform that has the TDP of the SoC set 5W or so higher.

You have a handheld, you are gaming. One has a 22W SoC but 30W system power, for 2 hours of gaming with a 60WHr battery. The other has a 15W SoC but still same 30W system power, for the same 2 hours of gaming with a 60WHr battery.

If Geekerwan or other outlets set the TDP to be same between Intel platforms and AMD platforms, the system power would end up being LESS than the AMD system thus you end up with better battery life.

As a user do you really care about SoC power? Especially when the system power numbers show different than expected?
So where does this idea “Apple has more caches than Intel/AMD” come from then?
Apple has way more L1 caches than both vendors. It's the fastest cache so it performs better, and it's the closest so you save power because you are needing to move less.
Is it bloated? The L2 cache is what it is and it’s 2.5 MB. It would be interesting if logic area could be compared.
Lion Cove is on N3B process, which affords at least 30% density advantage over N4, thus Zen 5 on N3B would end up being 3.2mm2, making Lion Cove almost 50% larger.
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
It's not a new market, Core Y is like 12 years old.
That's cause Y chips weren't excelling in any areas. The battery life was nowhere near competitive unless you got it thicker than most Tablets. The Y chips were super slow in both single thread and iGPUs, while Lunarlake offers excellent battery life, top notch single thread and iGPU performance.

It still is a market they should pursue, even just to keep WoA out. You don't abandon a market because it sucks for one year.

The BOM argument is short-sighted. The only reason WoA had any chance of taking even 1% marketshare is because Intel refused to make a truly good battery life Intel platform. This is also important on a psychological level, that x86 CAN rival ARM chips in battery life, which from comments on LNL reviews, people are surprised by. Of course Intel is likely going to do what you expect and just focus on BoM, because they are really a finance company that happens to hire engineers.
Fat iGP is niche. Which is why the ULV PTL has a tiny one.
I agree with this. But I'm pretty sure you are smart enough to figure out that wasn't my point.
 
Last edited:

poke01

Diamond Member
Mar 8, 2022
4,205
5,553
106
The only reason WoA had any chance of taking even 1% marketshare is because Intel refused to make a truly good battery life Intel platform.
Is WoA ever a threat to Intel? First MS/qualcomm need to fix the basic productivity software like the adobe suite, blender etc and then Qualcomm needs to make a low power design which next gen X Elite isn’t going be.

I don’t it will be Qualcomm who Intel should be worried about although that 5GHz core if it that happens and its efficient then it’s different. Intel should be worried about Nvidia, if those rumours of a Nvidia entry into CPU WoA market is true then that’s more of a threat to Intel. Nvidia got potent engineers and mindshare and excellent GPU IP. If Nvidia is smart they would release a 4-12P core SoC with 20SM GPU and a RAM up to 128GB.

This is exactly what Thor Jetson is meant to be, if Nvidia release this on the WoA platform then it will be more than 1%.
 

AMDK11

Senior member
Jul 15, 2019
473
407
136
As for the LionCove vs Zen5 surface:

LionCove
L1-I 64KB
L0-D 48KB
L1-D 192KB
L2 2.5MB

Zen5
L1-I 32KB
L1-D 48KB
L2 1MB

In addition, LionCove has larger buffers, scheduler, ROB, etc. All this costs transistors. Besides, a single 8-Wide decoder makes the Front-End more complicated.

We'll see what LionCove will show at ArrowLake. I suspect that outside of L2, the LionCove logic in LunarLake has been slimmed down. Confirmation will be available in a few days.