Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 656 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
846
799
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:

majord

Senior member
Jul 26, 2015
509
711
136
Yeah, for all the Zen5% memes, PPA is still pretty solid on these even accounting for the bulkiest SIMD implementation among the modern CPU cores.

Now if only someone slaughetered a poor poor Turin-D ES to find out Z5c@N3e area.

someone just needs to get one to Fritz and it will happen!
 

Meteor Late

Senior member
Dec 15, 2023
299
323
96
Its on a better node so thats kind of given to be smaller. The performance improvements are pretty mundane as well.

The laptop version is always faster, so 8 Elite chip is going to be around 20% faster than X elite in SC if 8 elite is 10% faster, usually there is around a 10% increase in frequency from Apple smartphone chips to M chips.
 

Meteor Late

Senior member
Dec 15, 2023
299
323
96
My estimate puts Lion Cove P core around 25% larger than an equivalent Zen 5 P core (iso-node & without L2). And thats horrible considering there is no extra performance to justify the extra silicon. Lion Cove not just sucks... it sucks on a whole new level! It's fatter & slower than the other two.

Some napkin math (if all of them were on N3B):
  • Lion Cove - 3.4 mm2
  • Zen 5 P core - 2.74 mm2
  • Apple M4 P core - 2.8 mm2
Another interesting observation is, Zen 5 P core appears to be slightly smaller than Apple M4 P core (given the same node & without L2).
Well yeah but M4 P core is much better than Zen 5 core and consumes much less power, so it makes sense.

An interesting comparison will be Oryon P core vs Zen 5 core, that core is barely above 2 mm2 IIRC.
 
Last edited:

trivik12

Senior member
Jan 26, 2006
350
318
136
Raichu is saying there is a change to FE for Darkmont and will result in 3-5% ipc improvements.
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
Well yeah but M4 P core is much better than Zen 5 core and consumes much less power, so it makes sense.

An interesting comparison will be Oryon P core vs Zen 5 core, that core is barely above 2 mm2 IIRC.
Zen 5 dedicates lots of area towards AVX-512 so it's not fair for Zen 5 the Integer Performance might lack but not FP/SIMD
 

OneEng2

Senior member
Sep 19, 2022
840
1,105
106
Wow. You guys jumped all over my discussion vector. This is exactly where I was going with the original question. Awesome information in this thread, and very interesting consequences as well.

So the AMD is able to be very competitive across many markets with a design on a less dense, less expensive node. This makes Zen5 design very impressive in my book and Lion Cove ..... just sad guys. WTH? How is it even possible for Lion Cove to be this bad in comparison?
SoC area comparison (Measurements my own).

View attachment 110839

Notes
- Lunar Lake and M4 are on 3nm, whereas X Elite and Strix Point are on 4nm. So areas are not directly comparable between them.
- All numbers are in mm²
- Cores area with asterisks (*) include the private L2 cache
- Lunar Lake SoC area is the N3B Compute Tile
- Apple M4 NPU area is suspiciously small, but I have double checked with their iPhone SoCs, and they also have ~5 mm² NPUs

Sources
-
Lunar Lake and Strix Point die shot annotations by Nemez
- M4 die shot annotation by Frederic Orange
- X Elite dieshot annotation by Piglin
Great summary! This is what I was gearing up to do with my original question! Great chart.
True. That makes AMD's core super impressive and its on N4P. Zen6 on N3E/P is gonna be great to look forward too.
It definitely gives some much needed context to many of our design discussions in this and other threads.
My estimate puts Lion Cove P core around 25% larger than an equivalent Zen 5 P core (iso-node & without L2). And thats horrible considering there is no extra performance to justify the extra silicon. Lion Cove not just sucks... it sucks on a whole new level! It's fatter & slower than the other two.

Some napkin math (if all of them were on N3B):
  • Lion Cove - 3.4 mm2
  • Zen 5 P core - 2.74 mm2
  • Apple M4 P core - 2.8 mm2
Another interesting observation is, Zen 5 P core appears to be slightly smaller than Apple M4 P core (given the same node & without L2).
Indeed! Now the only thing missing from your observations are a cross-tab of major features for each core (SMT, AVX512, etc).

I keep hearing how impressive the M4 and M3 are; however, neither one has SMT and neither one supports AVX512 (or 256 for that matter I believe).

Considering what they do well though, it seems to come down to an argument about how specific you want your CPU design to be to a particular market segment.

Seems like the M3/4 design is uniquely qualified for thin-and-light laptops and tablets, but totally useless for a DC processor design.

Zen5 on the other hand seems to cover bases up and down the market chain, but is not as good as M3/4 where the M3/4 is strongest. This makes a great deal of sense to me since AMD is targeting the high margin (and rapidly growing) DC market where M3/4 are not (As far as I know anyway).
Well yeah but M4 P core is much better than Zen 5 core and consumes much less power, so it makes sense.

An interesting comparison will be Oryon P core vs Zen 5 core, that core is barely above 2 mm2 IIRC.
Better where? Better how?
 
  • Like
Reactions: Tlh97 and Joe NYC

poke01

Diamond Member
Mar 8, 2022
4,202
5,552
106
Better where? Better how?
In apps where AVX512 isn’t used the M4 core is better. Essentially in non-SMID tasks its great.

it also delivers much higher single threaded performance while using much less power.
Like you said for the target market ie laptop they are great.
 

cannedlake240

Senior member
Jul 4, 2024
247
138
76
Raichu is saying there is a change to FE for Darkmont and will result in 3-5% ipc improvements.
Clearwater will be a decent jump over SRF and GNR-AP. Makes the rumor about Atom line Xeons/RRF being prematurely canned look more questionable... Unfortunately Pat hinted at this on earnings, apparently they don't like the 'complexity' of the dual track P/E Xeon lineup
 

Meteor Late

Senior member
Dec 15, 2023
299
323
96
I mean, if one can do P core and E core, one can also do an AVX512 core and a non AVX512 core.
AVX512 is more or less irrelevant in Laptop and Desktop segment, I prefer Apple's approach if one has to choose.
 
  • Like
Reactions: Tlh97 and Gideon

Saylick

Diamond Member
Sep 10, 2012
4,036
9,456
136
If this is the big plan Intel came up with to catch up with V-Cache, it looks an inelegant approach / brute force.

Bloating the compute die, on most expensive node, with 144 MB of SRAM is not going to be very cost efficient.

JFC… here we go again. For a Western fab that claims to have packaging prowess, it sure is sad that they couldn’t do something better than to simply brute force more SRAM on the compute die.

I’m also not even sure how much diminishing returns 144 MB of SRAM on the same compute tile will have since there’s a latency trade off with larger caches, and this cache will be further away from the core since it’s all on the same die vs. 3D stacking with vias where the cache is physically closer.

 

poke01

Diamond Member
Mar 8, 2022
4,202
5,552
106
JFC… here we go again. For a Western fab that claims to have packaging prowess, it sure is sad that they couldn’t do something better than to simply brute force more SRAM on the compute die.

I’m also not even sure how much diminishing returns 144 MB of SRAM on the same compute tile will have since there’s a latency trade off with larger caches, and this cache will be further away from the core since it’s all on the same die vs. 3D stacking with vias where the cache is physically closer.

there is also this comment
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
Clearwater will be a decent jump over SRF and GNR-AP. Makes the rumor about Atom line Xeons/RRF being prematurely canned look more questionable... Unfortunately Pat hinted at this on earnings, apparently they don't like the 'complexity' of the dual track P/E Xeon lineup
He meant the complexity of multiple platforms for validations like 6700P/E and 6900P/E the would make a common P/E Platform
 

Thunder 57

Diamond Member
Aug 19, 2007
4,026
6,741
136
Or....

Intel is using a new tech for the 144MB rather than 6T SRAM.

Darkmont changes for branch prediction sounds similar to the one Zen 5 got by the way. Still lots of innovations coming from the E core team.

I thought they used 8T SRAM?
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
I thought they used 8T SRAM?
That's for the lower cache levels. 8T is 33% larger. The lower cache levels are larger even at the same "T" because it's optimized for speed, leakage, and latency. Then on top of that they use 8T.

I think it was during the Nehalem era they said they started using 8T and it got stuck. Kinda like how mass media thought Intel Gen graphics were PowerVR because they used Tile Rendering but PowerVR used deferred rendering while Intel used immediate mode.
 
Last edited:

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
So, that's a 16MB wadd for the core L3 and a 128 MB L4? Or is that just 12MB slices at 12 stations?

Probably 12MB at 12 stations. There's nothing preventing Intel doing so, L3 latency is mostly coming from ring not cache array so L3 latency is fine. Only reason they aren't doing large L3-caches is that doing it needs bigger silicon which drives costs up. For Intel internal process that's fine strategy - it's actually strange they haven't done it earlier.
 
  • Like
Reactions: lightmanek

cannedlake240

Senior member
Jul 4, 2024
247
138
76
Probably 12MB at 12 stations. There's nothing preventing Intel doing so, L3 latency is mostly coming from ring not cache array so L3 latency is fine. Only reason they aren't doing large L3-caches is that doing it needs bigger silicon which drives costs up. For Intel internal process that's fine strategy - it's actually strange they haven't done it earlier.
Really so they can just increase the L3 by 4x without it incurring a massive latency penalty? Vcache is 4-5 cycle for instance