Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 305 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
851
802
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,030
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,524
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,432
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,319
Last edited:
Jul 27, 2020
28,107
19,174
146
When it comes to better hardware utilization, for better "AI", just using more CPU cores (that have already been there for years) more efficiently would be one area.

Then there is AVX, AVX512. There are more gaming PCs / CPUs that have AVX512 vs NPU.
This remains to be benchmarked. Can 24 CPU threads in ARL outsmart the embedded NPU with more TOPS?

AVX-512 in Ryzens, yes. But Intel will have to wait till AVX10 for their 2nd chance at AVX-512 in consumer desktops. Until then, they gotta depend on the NPU.
 
  • Like
Reactions: Tlh97 and carancho

Hulk

Diamond Member
Oct 9, 1999
5,141
3,734
136
Skylake - SunnyCove
micro-ops(decode + uop cache) from 11 to 11 +0%
Dispatch/Rename from 4 to 5 +25%
execution ports from 8 to 10 +25%
With 2xFP/ALU + 2xALU, 1xS/D + 3xAGU
for 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
IPC average +18%

SunnyCove - GoldenCove
micro-ops(decode + uop cache) from 11 to 14 +27%
Dispatch/Rname from 5 to 6 +20%
execution ports from 10 to 12 +20%
With 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
for 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
FPU+ALU from 4 to 5 +25%
IPC average +19%

GoldenCove - LionCove
micro-ops(decode + uop cache) from 14 to 24 +71.4%
Dispatch/Rename from 6 to 8 +33.3%
execution ports from 12 to 18 +50%
With 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
up to 4xFPU, 6xALU, 2xS/D + 6xAGU
FPU+ALU from 5 to 10 +100%
IPC increase +??%
This is great information. Thank you for taking the time to organize and post it.

What is included in "decode + uop cache" and "dispatch/rename?"

How many decoders? How many uop entries?
Reorder buffers? In-Flight Loads/Stores?

I'm not fully following.
 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
4,052
9,472
136
Skylake - SunnyCove
micro-ops(decode + uop cache) from 11 to 11 +0%
Dispatch/Rename from 4 to 5 +25%
execution ports from 8 to 10 +25%
With 2xFP/ALU + 2xALU, 1xS/D + 3xAGU
for 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
IPC average +18%

SunnyCove - GoldenCove
micro-ops(decode + uop cache) from 11 to 14 +27%
Dispatch/Rname from 5 to 6 +20%
execution ports from 10 to 12 +20%
With 2xFP/ALU + 2xALU, 2xS/D + 4xAGU
for 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
FPU+ALU from 4 to 5 +25%
IPC average +19%

GoldenCove - LionCove
micro-ops(decode + uop cache) from 14 to 24 +71.4%
Dispatch/Rename from 6 to 8 +33.3%
execution ports from 12 to 18 +50%
With 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
up to 4xFPU, 6xALU, 2xS/D + 6xAGU
FPU+ALU from 5 to 10 +100%
IPC increase +??%
WCCFTech be like:
giphy.gif
 

Joe NYC

Diamond Member
Jun 26, 2021
3,655
5,199
136
This remains to be benchmarked. Can 24 CPU threads in ARL outsmart the embedded NPU with more TOPS?

AVX-512 in Ryzens, yes. But Intel will have to wait till AVX10 for their 2nd chance at AVX-512 in consumer desktops. Until then, they gotta depend on the NPU.

Software support for hardware features lags by too much time to be relevant to make the hardware feature a selling point for the hardware.

Intel now talks a lot about "Centrino moment", but that hardware feature, WIFI, was built to the OS, and was ubiquitous for all applications as part of already widely used networking capabilities.

Support support (for a questionable feature) by 100s of ISV, gaming development teams is not happening in the short run.
 

AMDK11

Senior member
Jul 15, 2019
473
407
136
Fix for Skylake-SunnyCove:

Skylake - SunnyCove
micro-ops(decode + uop cache) from 11 to 11 +0%
Dispatch/Rename from 4 to 5 +25%
execution ports from 8 to 10 +25%
With 2xFP/ALU + 2xALU, 1xS/D + 3xAGU
for 3xFP/ALU + 1xALU, 2xS/D + 4xAGU
IPC average +18%

SunnyCove - GoldenCove
micro-ops(decode + uop cache) from 11 to 14 +27%
Dispatch/Rname from 5 to 6 +20%
execution ports from 10 to 12 +20%
With 3xFP/ALU + 1xALU, 2xS/D + 4xAGU
for 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
FPU+ALU from 4 to 5 +25%
IPC average +19%

GoldenCove - LionCove
micro-ops(decode + uop cache) from 14 to 24 +71.4%
Dispatch/Rename from 6 to 8 +33.3%
execution ports from 12 to 18 +50%
With 3xFP/ALU + 2xALU, 2xS/D + 5xAGU
up to 4xFPU, 6xALU, 2xS/D + 6xAGU
FPU+ALU from 5 to 10 +100%
IPC increase +??%

Two LionCove diagrams from one LunarLake graphic:

LionCove Core Diagram1.png
 
Last edited:
  • Like
Reactions: Elfear and Saylick

Ghostsonplanets

Senior member
Mar 1, 2024
774
1,228
96

AMDK11

Senior member
Jul 15, 2019
473
407
136
This is great information. Thank you for taking the time to organize and post it.

What is included in "decode + uop cache" and "dispatch/rename?"

How many decoders? How many uop entries?
Reorder buffers? In-Flight Loads/Stores?

I'm not fully following.
I gave a general summary of decoding + sending from uop Cache because the LionCove diagram is of too low quality and does not specify how much it is for the decoder.

GoldenCove has 14 uops, including 6 from the decoder and 8 from the uop Cache.

LionCove has 24 uops, but it is not 100% sure whether there are 8 from the decoder and 16 from the uop cache or maybe 10 from the decoder and 14 from the uopcache.

I provided the data that can be read from the LionCove diagram. Much is still unknown.


I won't be surprised at all if there is a 10-Way decoder because Skymont has 3x 3-Way, i.e. a total of 9-Way. But it can also be 8-Way.

I think Intel will present the LionCove microarchitecture in June, and as you can see, they already have core diagrams ready for presentation.
 
Last edited:
  • Like
Reactions: dr1337

Ghostsonplanets

Senior member
Mar 1, 2024
774
1,228
96
Not sure if this is reliable. Take with a huge grain of salt.

But, supposedly, ARL-U also gets the new N4P iGP tile from ARL/S (64 EU): N4(P?), XMX units added back and higher clocks.

Basically MTL-U/M with smaller Compute and GPU tile (nodelet shrink to Intel 3 and N4(P?) respectively) for higher yields and cheaper costs.

XMX units will probably provide the TOPs throughout for ARL-U to meet AI PC requirements. Cheaper alternative to Lunar Lake for mainstream U-series designs.
 

Saylick

Diamond Member
Sep 10, 2012
4,052
9,472
136
Holy sheet I had never read their comment section, it truly is below Reddit.

Best quote: "You are the noob as there is nothing wrong with userbenchmark."
If you thought reading that Prakhar guy's tweets made you lose brain cells, WTFTech's comments section devolves you to a Neanderthal.

I refuse to take anyone seriously who regularly posts in their comment's section, even if I see the same people making reasonable, logical statements on Xitter outside of it. I simply will not respect their opinion because they consciously made a decision to be a willing participant in that cesspool to begin with. It's like if I knew someone was actively participating in a Nazi get-together but was a normal behaving human being outside of it. It doesn't matter how that person behaves in a regular setting; they are a Nazi, regardless.

But what can you do. That comment section is probably where they get more than half their ad impressions, and WCCFTech knows it so they will never moderate it.
 

Joe NYC

Diamond Member
Jun 26, 2021
3,655
5,199
136
I don't get it, how is AVX512 any substitute for an NPU ?

Not exactly a substitute, but as a hardware feature for ISV / game developers to use.

It is a very slow process for a hardware feature to become widely used. It's completely unrealistic to expect to have a game released precisely at the time of ARL release supporting a unique feature of ARL.

Example of far more consequential hardware feature: x86-64 instruction set. First introduced in 2003, it took 2 years for first version to support in in 2005 and then it took another decade for the games to start switching to 64 bit.
 
Last edited:
  • Like
Reactions: spursindonesia

Ghostsonplanets

Senior member
Mar 1, 2024
774
1,228
96
Mainstream is still raptor.
ARL-U is in a very weird position overall, the only boon is platform comparability with MTL-U.
Right. ARL-U is an odd duck because it still won't be cheap for mainstream, even with the nodelet shrink. But it also can't be priced as premium because the the product doesn’t justify.

Intel lineup will be in a weird position next year

LNL - >$999
ARL-U -> $700 - $900
RPL-U -> $600 and below
 

Hulk

Diamond Member
Oct 9, 1999
5,141
3,734
136
I gave a general summary of decoding + sending from uop Cache because the LionCove diagram is of too low quality and does not specify how much it is for the decoder.

GoldenCove has 14 uops, including 6 from the decoder and 8 from the uop Cache.

LionCove has 24 uops, but it is not 100% sure whether there are 8 from the decoder and 16 from the uop cache or maybe 10 from the decoder and 14 from the uopcache.

I provided the data that can be read from the LionCove diagram. Much is still unknown.


I won't be surprised at all if there is a 10-Way decoder because Skymont has 3x 3-Way, i.e. a total of 9-Way. But it can also be 8-Way.

I think Intel will present the LionCove microarchitecture in June, and as you can see, they already have core diagrams ready for presentation.
Thanks, that makes perfect sense. They are really opening up both the front end and back end of Lion Cove. I still think the IPC increase will be 20% as that seems to always be Intel's target, maybe 25% if things really work out well for them. That 24uopss is a huge increase, which makes me think the front end is currently the bottleneck.
 

Henry swagger

Senior member
Feb 9, 2022
512
313
106

"From what I hear LNL has good number of design wins in last quater...."

"Not going to put numbers because, earlier design wins itself are high... Having something that big on top of them .... I am not confident enough"

Lunar Lake getting a lot of design wins. It tracks with previous leaks that LNL would have 3x as much design wins as Meteor Lake.
Ofcouse LNL will have design wins.. intel has 88% of market share even apple will superior battery life can't eat into it.. LNL eill be intel's m1 for x86
 

DrMrLordX

Lifer
Apr 27, 2000
22,915
12,988
136
All of that seems fine and good, and would certainly negate the need for trained human supervision. I'm perhaps a little jaded on that whole aspect because this is all work I've been trained to do throughout my adult life: finding trends in data, editing photos, concise and clear communications... That's a broadcasting and journalism degree, in so many words. I guess I'm not the target audience, lol.

The goal here is not to necessarily replace you (the operator) performing those functions, but to instead make it easier for you to do the things you already know how to do. Ideally so that someone trained as you are can do the work of 2-3 people, meaning that your boss then gets to lay off some of your colleagues without raising your pay by much, if at all. Isn't that great?

And that is why there's so much hype around NPUs!
 

DavidC1

Golden Member
Dec 29, 2023
1,873
3,000
96
Is it possible that Skymont might get a micro-op cache?
No. If anything uop caches will be sharing the fate of Hyperthreading and be axed in the future.

Uop caches are a remnant of the extreme clock focus of Netburst. Rather than thinking it brings more performance, you should think of it as maintaining performance while allowing it to raise clocks.
I won't be surprised at all if there is a 10-Way decoder because Skymont has 3x 3-Way, i.e. a total of 9-Way. But it can also be 8-Way.

I think Intel will present the LionCove microarchitecture in June, and as you can see, they already have core diagrams ready for presentation.
Raichu and others said 8-way decode for Lion Cove.

Also, for Skymont to be 8-way, it has to be 2x4. Since the cluster approach is to reduce impact of power/area of decoders, it makes no sense as Intel said cluster 3-way minimizes impact of decoders. Raichu might have thought it was 8-way in the beginning, but later clarified that its 3x3-way.
 

AMDK11

Senior member
Jul 15, 2019
473
407
136
Maybe Raichu found out about the width-8 Dispatch/Rename and assumed that the decoder is also 8-Way?
 
Last edited: