Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 693 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
854
804
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,031
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,525
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,433
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,319
Last edited:

mzocyteae

Member
Dec 29, 2020
26
19
81
The 2+8 ADL p and u chips do just fine in most general use cases. 2 P cores is plenty for web browsing and light usage. It's not amazing, but, having used them for work, it's not hampering me in any way.
Browsers have lots of threads/processes and workloads are irregular.
2P and 0P will make little or no difference, unless there is some steady workload (usually js + webgl, but then 2P is probably insufficient).
 

mzocyteae

Member
Dec 29, 2020
26
19
81
Falling behind at high perf levels is normal, as currently they are built with a different performance target in mind (both core layout and cache structure are aimed at PPA). You can think of this as the reverse of what happened to Zen and Zen C, when optimized for PPA the Zen core loses efficiency at high clocks (and clocks lower too). So in theory, if you wanted to make a P core out of Skymont, you would optimize the layout to ensure better voltage scaling at 4-5Ghz and give it access to proper L2/L3. The core would be bigger, use a bit more power at lower perf levels, but would scale much better at high clocks.

I think folks in the forum should talk less about replacing P cores with E cores, and more about replacing Cove with Mont (arch families). P and E are roles, and they can even be played by the same arch with some tweaks (as shown by AMD). A real world product based on the pedigree of the Mont cores would benefit from a properly planned architecture, they would have this target in mind as they plan the core. Ideally we would want a core close in size to the latest Coves, but with performance to justify the area.
Coves aren't that big -- any core will be similarly big with a big cache (under similar process/library).
Maybe Intel should consider Telum 2's shared L2/3 approach (plus vertical cache), which looks pretty good on numbers.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,525
3,220
136
While the numbers are empirically measured, you aren't necessarily comparing apples to apples. The Lion Cove cores, as implemented in Alder Lake, have different frequency targets, and as such, have more transistors added to help with getting there. In addition, they have more blank space between transistors on critical paths to assist with speed as well.

A more accurate comparison would be to compare the number of core logic transistors that it takes for each one to function. Unfortunately, Intel no longer releases those numbers. I suspect that, were one to add the needed transistors for frequency scaling to the mint cores and also intentionally added buffer space to critical paths, they would find that the size comparison is much closer. Also, keep in mind, Lion Cove is carrying unused functional blocks for SMT and AVX-512 as the core will be reused in servers.
 

mzocyteae

Member
Dec 29, 2020
26
19
81
From what I gather, (without L2, L3, etc.), the sizes of the core (with just L1) are:
  • Lion Cove - 3.4 mm2
  • Skymont - 1.15 mm2
The size ratio of Skymont to Lion Cove is 1 : 3 which is massive! A Lion Cove is like 3X bigger than a Skymont!
This is what you wrote:
Actually, coves are big. They're super big compared to M3, Skymont, Zen 5, Zen 5C, etc.
M3 is around 2.5m^2, 1.36x is normally not in the range of "super big".
Skymont doesn't reach the performance level of Lion Cove, so the comparison is moot.
 
  • Like
Reactions: SiliconFly

LightningZ71

Platinum Member
Mar 10, 2017
2,525
3,220
136
It's not.
Woah... I thought the "dirty little secret" with Lion Cove was that they have a full SMT implementation in the cores as lithographed in Lunar and Arrow Lake CPU CCDs and that it wasn't enabled because they validation wasn't possible to complete by their ship to market deadlines. It's been discussed on the forum several times... I thought I read the same for AVX-512 being in the core, but not enabled on client.
 

511

Diamond Member
Jul 12, 2024
4,596
4,217
106
Someone needs to do an autopsy on ARL Silicon to verify it btw if it had HT and AVX-512 and it was disabled in silicon it would have been plus points that would might justify this area
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,885
3,033
96
While the numbers are empirically measured, you aren't necessarily comparing apples to apples. The Lion Cove cores, as implemented in Alder Lake, have different frequency targets, and as such, have more transistors added to help with getting there. In addition, they have more blank space between transistors on critical paths to assist with speed as well.
Yes, but it doesn't take 3x the size for the mediocre gains over Skymont. Some tests show even gaming performs better with most of Lion Cove cores off.

And a more straightforward comparison is against AMD. Less dense process but the chip clocks just as high, supports SMT, and is slightly faster, while being a smaller core too.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,525
3,220
136
I'm just discussing the approach differences between Skymont and Lion Cove. One is aimed for higher clocks, particularly sustained higher clocks, and that takes more transistors and effective density sacrifices. I'm not saying that Lion is particularly area efficient in and of itself, just that it does have different targets and, to my personal understanding, transistors present for unexposed features.

The actual functional transistor count difference between the two is not as great as implementation area makes it seem.
 

GTracing

Senior member
Aug 6, 2021
478
1,114
106
I'm just discussing the approach differences between Skymont and Lion Cove. One is aimed for higher clocks, particularly sustained higher clocks, and that takes more transistors and effective density sacrifices. I'm not saying that Lion is particularly area efficient in and of itself, just that it does have different targets and, to my personal understanding, transistors present for unexposed features.

The actual functional transistor count difference between the two is not as great as implementation area makes it seem.
The thing is it's not hard to increase clocks. That'll be the easiest thing to change if the mont lineage takes over as the p-core. Intel also needs to greatly widen the FPU and add AVX10, improve the L3 cache latency, and get another 30%+ int ipc (which they've been doing every two years, but it's not guaranteed that they can keep up the pace).

But I wouldn't be surprised to see Intel go wider and slower like the arm cores anyways.
 

DavidC1

Golden Member
Dec 29, 2023
1,885
3,033
96
The thing is it's not hard to increase clocks.
It takes heroic effort beyond the 5GHz frequency range.

Proof? 9 stage CPU clocks at 4.5GHz while a 19 stage one does 5.7GHz.

All the while:
-The lower clocked chip performs better in absolute performance
-It uses multiples of power to reach those clocks
-Cores are much larger
-Sucks for mobile and server, while being mediocre even for desktops

Nevermind chips like Raptorlake.

It used to be doubling the stages got you 80%+ gains. How is it worth now? Unified Core should aim for at maximum low-5GHz.

As for FP, it should stay 256-bit for client. 512-bit is a waste, and was a mistake to do it with AVX-512. It should have been AVX3-256.