Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 341 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
919
834
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,034
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,527
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,435
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,321
Last edited:

lightisgood

Senior member
May 27, 2022
250
121
86
Lunar Lake E Cores are now able to talk to each other through their L1 cache, which should dramatically improve core - core latency: https://hothardware.com/reviews/intel-lunar-lake-deep-dive?page=3

Meaning we avoid this in Arrow Lake: https://www.anandtech.com/show/1704...hybrid-performance-brings-hybrid-complexity/6

Previously core communication required trip through ring bus, or in case of LP cores, Meteor Lake’s LP Scalable Fabric. See also https://chipsandcheese.com/2024/05/13/meteor-lakes-e-cores-crestmont-makes-incremental-progress/

Damned good design changes

I remember that this L1$-to-L1$ link was adopted for C2D (Merom) in 2006...
I had been thinking that Alder Lake, the 1st gen x86 hybrid, is very primitive design.
So, I was correct.
 
  • Like
Reactions: del42sa

ondma

Diamond Member
Mar 18, 2018
3,316
1,708
136
That isn't the goal with the P cores, and likely never will be. The P cores are to have a single task done ASAP at the cost of high power. Move to a new architecture, or process and the goal is still the same: complete a single task ASAP at the cost of high power. The P cores are for when you want something very responsive and fluid. But, you can't have large numbers of cores all doing tasks at the cost of high power. There is no free lunch. With 8 P cores, running at 125 W, each gets ~15.6 W. Those P cores can clock a lot faster than 16 P cores each with only ~7.8 W. No matter the architecture or process, when you split your power budget up amongst more and more cores, each core gets less and less to work with.

The E cores are designed to be the workhorses that you can spam in large numbers to do grunt work. The real issue was when the P/E core was first released, the E cores were clocked too high and there were too few of them. The result was that the first E cores were neither that efficient nor that good at grunt work. So, people got the whole idea of P and E cores backwards in their mind thinking that P cores were for the grunt work. You have to switch your mindset. You want more E cores for more work done.
I mean, you just gave a textbook justification of hybrid architecture. You didnt really address the point of my post though. Sorry to keep bringing up AMD in an Intel thread, but they are able to put 16 big cores into a chip and still have excellent performance and power consumption. I guess what I am trying to say, is that Lion Cove still seems behind in performance and/or power consumption, or they would not have to bother with the E cores. It is also disappointing that Lunar Lake and the most performant Arrow Lake are on a TSMC node. What happened to process leadership? I though 20A was supposed to bring leadership. Are we depending on 18A now? And if it is simply a matter of supply, I dont consider a process leading edge if it cant provide sufficient wafers with adequate yields to satisfy production demands.
 
  • Like
  • Haha
Reactions: Lodix and H433x0n

poke01

Diamond Member
Mar 8, 2022
4,606
5,916
106
1717571252264.png
talking about Lunar Lake while wearing a Apple shirt, love the irony. :p
Can't wait for the deepdive from them.

Intel has implemented simliar power management to M1, these are the best chips to come out in a long time from Intel.
 
Last edited:

TwistedAndy

Member
May 23, 2024
159
150
76
I kinda don't understand why Lunar has 4P+4E cores instead of 2+8 for example.

That was made to achieve better efficiency. In Lunar Lake, E-cores are always active. Having more E cores will increase the idle power consumption. Intel is planning to turn off the whole P-cluster when it's not used.
 
  • Like
Reactions: DavidC1

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
Because customers don't run Cinebench on an ultrabook.
Thanks, you didn't disappoint with your "useful" reply as always.

So once more, why did they choose 4+4 config instead of 2+8 for example, which would be comparable in size If not a bit smaller.

Lion Cove is for max ST performance and responsiveness, so It's understandable, to use them, but why 4, when this is intended for ultrabooks with a limited TDP?
Skymont cluster offers better perf/W than a Lion Cove cluster and is also a lot smaller, 2 of them would provide significantly higher performance than a single Lion Cove cluster.

That was made to achieve better efficiency. In Lunar Lake, E-cores are always active. Having more E cores will increase the idle power consumption. Intel is planning to turn off the whole P-cluster when it's not used.
Why can't there be 3 clusters? One with 2 P-cores and 2 clusters with 4 E-cores each?
And Intel could keep active only a single E-core cluster.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,950
5,480
136
Intel Lunar Lake Technical Deep Dive - So many Revolutions in One Chip

E-core looks great on paper.
skymont-17-.jpg
skymont-18-.jpg
skymont-19-.jpg

Even a CPU with only Skymont cores would be strong.

P.S. I am kinda more excited about Lunar Lake than Strix.

You are excited about lower clock speeds and lower IPC?

Seems like you are excited about Zen 3 in era of Zen 5...
 

Joe NYC

Diamond Member
Jun 26, 2021
3,950
5,480
136
I am expecting a good 15-20% total single core uplift (ipc + clocks) over Raptor Lake. Multicore is going to come down to what process gets used due to power limits. The more power efficient the process is, the better the performance. We could see Intel lead AMD by a substantial amount here, but they are also (allegedly) pulling back power limits to be similar to AMD’s limits, so who knows?

Raptor Lake goes up to 6.2 GHz (or 6.0 GHz). Do you expect +1% to +6% clock speed increase?

It 5.7 GHz is the clock speed of Arrow Lake, then it is -5% to -9% clock speed regression.
 

mikk

Diamond Member
May 15, 2012
4,314
2,397
136
Here is a graphics/power comparison between LNL and MTL capped at 60 fps. Reminder that on LNL the on package RAM is included in the package power (roughly 2W), it's not even a fair comparison. It's MTL-H there (Arc graphics)
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
You are excited about lower clock speeds and lower IPC?

Seems like you are excited about Zen 3 in era of Zen 5...
Both types of cores have higher IPC than the predecessors and LNL is a low TDP SoC, so boost clockspeed doesn't necessarily need to be much lower than MTL-U(5GHz/3.8GHz) or RPL-U(5.2GHz/3.9GHz) for either core.
Not sure about sustained clocks during full load, we will have to wait for reviews.

It will be very interesting to limit MTL-U, LNL, PHX(2) and Strix to 15W-30W and see how It performs in CB.
 

DavidC1

Platinum Member
Dec 29, 2023
2,021
3,157
96
Also Intel stated that at iso power Lion Cove in Lunar Lame is up to an 18% performance uplift, not 14. Just depends on where you sit on the power curve.

Something most are missing is theyre describing 14% uplift in the Lunar Lake iteration, not in all implementations.
That has nothing to do with perf/clock.

The curve has shifted likely due to design/process change which benefits lower power.
Another tidbit from Chips n' Cheese:


So we're looking at possibilities of, on Arrow Lake DT:
- Bigger cache
- Return of HT
- L1 to L2 bandwidth to 110B per cycle

Intel new modern sea of cells design really allow for finer grained changes that fit different markers. Quite interesting.
Granite Rapids according to Pat: ten-plus % changes in the core
 

TwistedAndy

Member
May 23, 2024
159
150
76
Why can't there be 3 clusters? One with 2 P-cores and 2 clusters with 4 E-cores each?
And Intel could keep active only a single E-core cluster.

Intel probably decided that there was no sense in having three independent clusters because of the power and memory latency issues. Intel had to introduce a separate 8MB side cache to make the current approach with two independent clusters work.
 

DavidC1

Platinum Member
Dec 29, 2023
2,021
3,157
96
Thanks, you didn't disappoint with your "useful" reply as always.

So once more, why did they choose 4+4 config instead of 2+8 for example, which would be comparable in size If not a bit smaller.
4+4 might be better for Lunarlake being focused as a low power(I mean for battery life, not TDP).

Skymont even at lower clocks is high enough performance to cover most performance needs, and two cores is little bit small nowadays so they bumped it up to 4.

2x P cores again is under the core requirements so for applications that require higher responsiveness and lightly threaded 4 is a good number.

This is just a guess, there might be technical reasons to do so, but Apple also does something similar.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
Reminder that on LNL the on package RAM is included in the package power (roughly 2W), it's not even a fair comparison.
I suppose that scales with RAM capacity?

16 GB = 1W
32 GB = 2W
64 GB = 4W
128 GB = 8W
256 GB = 16 W
256 TB = 16384W
 

mikk

Diamond Member
May 15, 2012
4,314
2,397
136
I suppose that scales with RAM capacity?

16 GB = 1W
32 GB = 2W
64 GB = 4W
128 GB = 8W
256 GB = 16 W
256 TB = 16384W

No idea how it scales but Intel thinks it's 2W and that's why LNL uses 17W and 30W TDP instead of the usual 15W and 28W.
 
  • Like
Reactions: carancho

DavidC1

Platinum Member
Dec 29, 2023
2,021
3,157
96
I suppose that scales with RAM capacity?

16 GB = 1W
32 GB = 2W
64 GB = 4W
128 GB = 8W
256 GB = 16 W
256 TB = 16384W
No. Capacity has little to do with power because only the ones being actively accessed needed to be active, since it's not compute.
 
Last edited:
  • Like
Reactions: FlameTail

DavidC1

Platinum Member
Dec 29, 2023
2,021
3,157
96
He's basically saying what @Exist50 has said.

P core design is in shambles, in addition to the E core team being excellent.

Third: @adroc_thurston doesn't really have sources. I was waiting and waiting to see what he says is true.
CWF is 18A so that's even better, possibly.
Either way, the thing is basically Z4c with worse SIMD.
Cope. Again, and again. Can you at least admit you are wrong once in a while? Or at least don't be like AI and pretend everything you say is written in stone?
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,762
106
Intel's P-cores are clearly excessively bloated.

Where's the Lunar Lake die shot? I want to compare Lion Cove and Apple M3-P core die area.

@poke01 you said you would make a Lunar Lake vs M3 thread sometime?
 

coercitiv

Diamond Member
Jan 24, 2014
7,447
17,752
136
Thanks, you didn't disappoint with your "useful" reply as always.

So once more, why did they choose 4+4 config instead of 2+8 for example, which would be comparable in size If not a bit smaller.

Lion Cove is for max ST performance and responsiveness, so It's understandable, to use them, but why 4, when this is intended for ultrabooks with a limited TDP?
Skymont cluster offers better perf/W than a Lion Cove cluster and is also a lot smaller, 2 of them would provide significantly higher performance than a single Lion Cove cluster.
His reply may have seemed cryptic because you're less focused on the needs of the users who will be buying this product. Workloads will be relatively lightly threaded and rather latency sensitive, 4P cores will make the device look snappy, more cores overall will only help in isolated cases. (in fact most of them see a "real" MT workload when they boot or when they make OS updates)

Browsing and apps built on chromium will probably make up quite a good chunk of the user scenarios. Modern browsers can scale to 6+ cores, but what is more important for browser speed is ST performance of the cores being used. This is in stark contrast with Cinebench, where available throughput is all that matters, because software scaling is... well... embarrassing :p

For the upper range of TDP covered by LNL it would be nice if it came with something like 4+8 (my favorite would still be 6+4, with a better P core), but the NPU stole the rest of the pizza, sorry.

lnl-layout.jpg
 

DavidC1

Platinum Member
Dec 29, 2023
2,021
3,157
96
His reply may have seemed cryptic because you're less focused on the needs of the users who will be buying this product. Workloads will be relatively lightly threaded and rather latency sensitive, 4P cores will make the device look snappy, more cores overall will only help in isolated cases. (in fact most of them see a "real" MT workload when they boot or when they make OS updates)

Browsing and apps built on chromium will probably make up quite a good chunk of the user scenarios. Modern browsers can scale to 6+ cores, but what is more important for browser speed is ST performance of the cores being used. This is in stark contrast with Cinebench, where available throughput is all that matters, because software scaling is... well... embarrassing :p

For the upper range of TDP covered by LNL it would be nice if it came with something like 4+8 (my favorite would still be 6+4, with a better P core), but the NPU stole the rest of the pizza, sorry.

View attachment 100557
Let's think of that die shot.
-Take out the P cores
-Take out the NPUs

There's probably enough room left to put a 20 Xe core monster in there. So much for "AI revolution". 20 Xe cores 320 EUs in old Intel terminology. Skymont is more than fast enough to feed such a GPU.