Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

	Intel Alder Lake - N	Intel Wildcat Lake	Intel Lunar Lake	Mediatek D9500
Launch Date	Q1-2023	Q2-2026 ?	Q3-2024	Q3-2025
Model	Intel N300	?	Core Ultra 7 268V	Dimensity 9500 5G
Dies	2	2	2	1
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	TSMC N3P

CPU	8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	C1 1+3+4
Threads	8	6	8	8
Max Clock	3.8 GHz	?	5 GHz
L3 Cache	6 MB	?	12 MB
TDP	7 W	Fanless ?	17 W	Fanless

Memory	64-bit LPDDR5-4800	64-bit LPDDR5-6800 ?	128-bit LPDDR5X-8533	64-bit LPDDR5X-10667
Size	16 GB	?	32 GB	24 GB ?
Bandwidth		~ 55 GB/s	136 GB/s	85.6 GB/s

GPU	UHD Graphics		Arc 140V	G1 Ultra
EU / Xe	32 EU	2 Xe	8 Xe	12
Max Clock	1.25 GHz		2 GHz

NPU	NA	18 TOPS	48 TOPS	100 TOPS ?

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

511 · Nov 2, 2024

P core team is ahead in only one thing

Wasting Silicon Area

majord · Nov 2, 2024

adroc_thurston said:
Yeah, for all the Zen5% memes, PPA is still pretty solid on these even accounting for the bulkiest SIMD implementation among the modern CPU cores.

Now if only someone slaughetered a poor poor Turin-D ES to find out Z5c@N3e area.

someone just needs to get one to Fritz and it will happen!

Meteor Late · Nov 2, 2024

desrever said:
Its on a better node so thats kind of given to be smaller. The performance improvements are pretty mundane as well.

The laptop version is always faster, so 8 Elite chip is going to be around 20% faster than X elite in SC if 8 elite is 10% faster, usually there is around a 10% increase in frequency from Apple smartphone chips to M chips.

SiliconFly · Nov 2, 2024

Meteor Late · Nov 2, 2024

SiliconFly said:
My estimate puts Lion Cove P core around 25% larger than an equivalent Zen 5 P core (iso-node & without L2). And thats horrible considering there is no extra performance to justify the extra silicon. Lion Cove not just sucks... it sucks on a whole new level! It's fatter & slower than the other two.

Some napkin math (if all of them were on N3B):

Lion Cove - 3.4 mm2

Zen 5 P core - 2.74 mm2

Apple M4 P core - 2.8 mm2

Another interesting observation is, Zen 5 P core appears to be slightly smaller than Apple M4 P core (given the same node & without L2).

Well yeah but M4 P core is much better than Zen 5 core and consumes much less power, so it makes sense.

An interesting comparison will be Oryon P core vs Zen 5 core, that core is barely above 2 mm2 IIRC.

trivik12 · Nov 2, 2024

Raichu is saying there is a change to FE for Darkmont and will result in 3-5% ipc improvements.

511 · Nov 2, 2024

Meteor Late said:
Well yeah but M4 P core is much better than Zen 5 core and consumes much less power, so it makes sense.

An interesting comparison will be Oryon P core vs Zen 5 core, that core is barely above 2 mm2 IIRC.

Zen 5 dedicates lots of area towards AVX-512 so it's not fair for Zen 5 the Integer Performance might lack but not FP/SIMD

SiliconFly · Nov 2, 2024

OneEng2 · Nov 2, 2024

Wow. You guys jumped all over my discussion vector. This is exactly where I was going with the original question. Awesome information in this thread, and very interesting consequences as well.

So the AMD is able to be very competitive across many markets with a design on a less dense, less expensive node. This makes Zen5 design very impressive in my book and Lion Cove ..... just sad guys. WTH? How is it even possible for Lion Cove to be this bad in comparison?

FlameTail said:
SoC area comparison (Measurements my own).

View attachment 110839

Notes
- Lunar Lake and M4 are on 3nm, whereas X Elite and Strix Point are on 4nm. So areas are not directly comparable between them.
- All numbers are in mm²
- Cores area with asterisks (*) include the private L2 cache
- Lunar Lake SoC area is the N3B Compute Tile
- Apple M4 NPU area is suspiciously small, but I have double checked with their iPhone SoCs, and they also have ~5 mm² NPUs

Sources
- Lunar Lake and Strix Point die shot annotations by Nemez
- M4 die shot annotation by Frederic Orange
- X Elite dieshot annotation by Piglin

Great summary! This is what I was gearing up to do with my original question! Great chart.

poke01 said:
True. That makes AMD's core super impressive and its on N4P. Zen6 on N3E/P is gonna be great to look forward too.

It definitely gives some much needed context to many of our design discussions in this and other threads.

SiliconFly said:
My estimate puts Lion Cove P core around 25% larger than an equivalent Zen 5 P core (iso-node & without L2). And thats horrible considering there is no extra performance to justify the extra silicon. Lion Cove not just sucks... it sucks on a whole new level! It's fatter & slower than the other two.

Some napkin math (if all of them were on N3B):

Lion Cove - 3.4 mm2

Zen 5 P core - 2.74 mm2

Apple M4 P core - 2.8 mm2

Another interesting observation is, Zen 5 P core appears to be slightly smaller than Apple M4 P core (given the same node & without L2).

Indeed! Now the only thing missing from your observations are a cross-tab of major features for each core (SMT, AVX512, etc).

I keep hearing how impressive the M4 and M3 are; however, neither one has SMT and neither one supports AVX512 (or 256 for that matter I believe).

Considering what they do well though, it seems to come down to an argument about how specific you want your CPU design to be to a particular market segment.

Seems like the M3/4 design is uniquely qualified for thin-and-light laptops and tablets, but totally useless for a DC processor design.

Zen5 on the other hand seems to cover bases up and down the market chain, but is not as good as M3/4 where the M3/4 is strongest. This makes a great deal of sense to me since AMD is targeting the high margin (and rapidly growing) DC market where M3/4 are not (As far as I know anyway).

Meteor Late said:
Well yeah but M4 P core is much better than Zen 5 core and consumes much less power, so it makes sense.

An interesting comparison will be Oryon P core vs Zen 5 core, that core is barely above 2 mm2 IIRC.

Better where? Better how?

poke01 · Nov 2, 2024

OneEng2 said:
Better where? Better how?

In apps where AVX512 isn’t used the M4 core is better. Essentially in non-SMID tasks its great.

it also delivers much higher single threaded performance while using much less power.
Like you said for the target market ie laptop they are great.

cannedlake240 · Nov 2, 2024

trivik12 said:
Raichu is saying there is a change to FE for Darkmont and will result in 3-5% ipc improvements.

Clearwater will be a decent jump over SRF and GNR-AP. Makes the rumor about Atom line Xeons/RRF being prematurely canned look more questionable... Unfortunately Pat hinted at this on earnings, apparently they don't like the 'complexity' of the dual track P/E Xeon lineup

Meteor Late · Nov 2, 2024

I mean, if one can do P core and E core, one can also do an AVX512 core and a non AVX512 core.
AVX512 is more or less irrelevant in Laptop and Desktop segment, I prefer Apple's approach if one has to choose.

Joe NYC · Nov 2, 2024

If this is the big plan Intel came up with to catch up with V-Cache, it looks like an inelegant approach / brute force.

Bloating the compute die, on most expensive node, with 144 MB of SRAM is not going to be very cost efficient.

https://twitter.com/x/status/1852504317707862365

Saylick · Nov 2, 2024

Joe NYC said:
If this is the big plan Intel came up with to catch up with V-Cache, it looks an inelegant approach / brute force.

Bloating the compute die, on most expensive node, with 144 MB of SRAM is not going to be very cost efficient.

https://twitter.com/x/status/1852504317707862365

JFC… here we go again. For a Western fab that claims to have packaging prowess, it sure is sad that they couldn’t do something better than to simply brute force more SRAM on the compute die.

I’m also not even sure how much diminishing returns 144 MB of SRAM on the same compute tile will have since there’s a latency trade off with larger caches, and this cache will be further away from the core since it’s all on the same die vs. 3D stacking with vias where the cache is physically closer.

https://twitter.com/x/status/1852506941001744720

poke01 · Nov 2, 2024

Saylick said:
JFC… here we go again. For a Western fab that claims to have packaging prowess, it sure is sad that they couldn’t do something better than to simply brute force more SRAM on the compute die.

I’m also not even sure how much diminishing returns 144 MB of SRAM on the same compute tile will have since there’s a latency trade off with larger caches, and this cache will be further away from the core since it’s all on the same die vs. 3D stacking with vias where the cache is physically closer.

https://twitter.com/x/status/1852506941001744720

there is also this comment

https://twitter.com/x/status/1852533116646076703

LightningZ71 · Nov 2, 2024

So, that's a 16MB wadd for the core L3 and a 128 MB L4? Or is that just 12MB slices at 12 stations?

cannedlake240 · Nov 2, 2024

poke01 said:
there is also this comment

https://twitter.com/x/status/1852533116646076703

Yeah, wasn't it Tiger lake that regressed IPC because L3 latency went up when 4mb were added? Tweet talks about a 4x increase of the shared cache, wouldn't that also cause a huge latency regression too?

511 · Nov 2, 2024

cannedlake240 said:
Clearwater will be a decent jump over SRF and GNR-AP. Makes the rumor about Atom line Xeons/RRF being prematurely canned look more questionable... Unfortunately Pat hinted at this on earnings, apparently they don't like the 'complexity' of the dual track P/E Xeon lineup

He meant the complexity of multiple platforms for validations like 6700P/E and 6900P/E the would make a common P/E Platform

511 · Nov 2, 2024

Joe NYC said:
If this is the big plan Intel came up with to catch up with V-Cache, it looks like an inelegant approach / brute force.

Bloating the compute die, on most expensive node, with 144 MB of SRAM is not going to be very cost efficient.

https://twitter.com/x/status/1852504317707862365

Clearwater forest will do hybrid bonding with SRAM and Imc On Base Tile

DavidC1 · Nov 3, 2024

poke01 said:
there is also this comment

https://twitter.com/x/status/1852533116646076703

Or....

Intel is using a new tech for the 144MB rather than 6T SRAM.

Darkmont changes for branch prediction sounds similar to the one Zen 5 got by the way. Still lots of innovations coming from the E core team.

Thunder 57 · Nov 3, 2024

DavidC1 said:
Or....

Intel is using a new tech for the 144MB rather than 6T SRAM.

Darkmont changes for branch prediction sounds similar to the one Zen 5 got by the way. Still lots of innovations coming from the E core team.

I thought they used 8T SRAM?

DavidC1 · Nov 3, 2024

Thunder 57 said:
I thought they used 8T SRAM?

That's for the lower cache levels. 8T is 33% larger. The lower cache levels are larger even at the same "T" because it's optimized for speed, leakage, and latency. Then on top of that they use 8T.

I think it was during the Nehalem era they said they started using 8T and it got stuck. Kinda like how mass media thought Intel Gen graphics were PowerVR because they used Tile Rendering but PowerVR used deferred rendering while Intel used immediate mode.

naukkis · Nov 3, 2024

LightningZ71 said:
So, that's a 16MB wadd for the core L3 and a 128 MB L4? Or is that just 12MB slices at 12 stations?

Probably 12MB at 12 stations. There's nothing preventing Intel doing so, L3 latency is mostly coming from ring not cache array so L3 latency is fine. Only reason they aren't doing large L3-caches is that doing it needs bigger silicon which drives costs up. For Intel internal process that's fine strategy - it's actually strange they haven't done it earlier.

coercitiv · Nov 3, 2024

naukkis said:
For Intel internal process that's fine strategy - it's actually strange they haven't done it earlier.

Yeah, they'll just ask IFS to give them a better deal.

cannedlake240 · Nov 3, 2024

naukkis said:
Probably 12MB at 12 stations. There's nothing preventing Intel doing so, L3 latency is mostly coming from ring not cache array so L3 latency is fine. Only reason they aren't doing large L3-caches is that doing it needs bigger silicon which drives costs up. For Intel internal process that's fine strategy - it's actually strange they haven't done it earlier.

Really so they can just increase the L3 by 4x without it incurring a massive latency penalty? Vcache is 4-5 cycle for instance

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Diamond Member

Senior member

Senior member

Golden Member

Senior member

Senior member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Senior member