Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 270 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
851
802
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,031
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,525
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,433
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,319
Last edited:

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
The ability to use different data types is theoretically possible with NPUs. But, the main issue is memory, bandwidth, and power used. If you only need 4 bits (which AI often only needs 4 bits or 8 bits), then using something set up for 512 bits is quite a waste. Using 512 bits when your application needs 4 bits will require 128x more memory, will have to move 128x more data around, and will have to process 128x more of that data, using much more power. All while only being able to use much smaller AI models due to those limits. So, it isn't really efficient to use something set up for 512 bits with 4 bits

The reverse is true too. If you have an NPU optimized and designed for say, 4-bit math, and need 16-bit data, then you need to transfer that data around in 4 chunks which takes more time. Then you have memory to store only 1/4th the data. It can work, but just won't be as performant as you want.

Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed without losing compatibility. And as it's a part of a ISA pretty much every program used it by default if float or double type variables are used as it's faster use them with FPU than with integer units.
 
  • Like
Reactions: Tlh97 and moinmoin

Hitman928

Diamond Member
Apr 15, 2012
6,695
12,370
136
Not sure if this is news, but Microsoft just announced "Surface AI PCs". They utilize Intel MTL and come in 2 models, the Surface Pro 10 for Business and Surface Laptop 6 for Business.

 

eek2121

Diamond Member
Aug 2, 2005
3,415
5,056
136
Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed without losing compatibility. And as it's a part of a ISA pretty much every program used it by default if float or double type variables are used as it's faster use them with FPU than with integer units.
While I do agree, technology does evolve over time and this is what I am getting at. I was thinking in terms of literal decades when I said that.

The NPU will likely take on more responsibility as time goes on. I will be shocked if x86 20 years from now as it does today. New wafer costs keep going up and having duplicate functionality on multiple parts of the package wastes die space.

I am terrible at predicting , but if I had to I would say the lines between the NPU, CPU cores, and GPU cores are going to blur.

IIRC someone mentioned AMD has a patent for catching exceptions for missing instructions and redirecting the workload off chip. The reason it got brought up is because of Intel’s missing AVX-512. I could absolutely see something similar happening here.

Someone mentioned latency, but once the compilers are changed, there would be no performance penalty and performance may actually increase.

Me personally? I want socketed NPUs so competitors can play too, but of course we won’t get that. We are getting PCIE accelerators soon, however.

We are still in the very early days of AI. If you compare AI to the invention of the internet, we equivalent to where they were in the 70s.
 
  • Like
Reactions: Tlh97 and Thibsie

adroc_thurston

Diamond Member
Jul 2, 2023
7,192
9,969
106
Speculation was about replacing FPU with NPU
did you just invent GPUs.
like dawg we already invented silly parallel SIMD crunchers. in 2005. In Xenos, from Xbox 360.
The NPU will likely take on more responsibility as time goes on.
It does dumb matrix math.
are you daft
I will be shocked if x86 20 years from now as it does today. New wafer costs keep going up and having duplicate functionality on multiple parts of the package wastes die space.
THE FUTURE IS FUSION™
but if I had to I would say the lines between the NPU, CPU cores, and GPU cores are going to blur.
They'll be more clear-cut than ever.
 

DavidC1

Golden Member
Dec 29, 2023
1,882
3,027
96
Technology wise the Atom team's work indeed has been more interesting to follow for quite some time now, like for a decade by now?
After falling out of the spotlight with Silvermont/Airmont, they've been working quietly behind the scenes.

They've been doing a consistent cadence of using new ideas in one generation and optimization/expansion the next.

Bonnell - 2 way in order
New Ideas: Silvermont - 2 way out of order, lowered pipeline stages
New Ideas: Goldmont - 3 way out of order(added OoOE FP) + 16KB predecode
Exp/Opt: Goldmont Plus - 3 way + wider backend, quadrupled(64KB) predecode

New Ideas: Tremont - Clustered, 2x3 way, 128KB predecode, greatly improved branch prediction
Exp/Opt: Gracemont - Improved clustered 2x3 way, so the throughput is effective 6-wide. Predecode cache replaced with OD-ILD that predecodes on the fly for better performance under more demanding workloads and higher area/power efficiency

New Ideas?: Skymont - Clustered 3x3 way + ??
Darkmont - 18A shrink
Exp/Opt: Arctic Wolf - Clustered 3x3 way with backend optimizations?

Gracemont is already superior to Golden Cove in the fetch department, where it can fetch 2x32B(2x16B from the OD-ILD) to feed the two clustered decoders, while Goldemont Cove can only work with 1x32B for its 6-wide decoders.

Not only SMT is going to go away, eventually I suspect uop caches will too, as the primary purpose of the uop cache is to increase clocks while attempting to minimize branch misprediction penalties that comes with increased pipeline stages.
 
Last edited:

Henry swagger

Senior member
Feb 9, 2022
512
313
106
After falling out of the spotlight with Silvermont/Airmont, they've been working quietly behind the scenes.

They've been doing a consistent cadence of using new ideas in one generation and optimization/expansion the next.

Bonnell - 2 way in order
New Ideas: Silvermont - 2 way out of order, lowered pipeline stages
New Ideas: Goldmont - 3 way out of order(added OoOE FP) + 16KB predecode
Exp/Opt: Goldmont Plus - 3 way + wider backend, quadrupled(64KB) predecode

New Ideas: Tremont - Clustered, 2x3 way, 128KB predecode, greatly improved branch prediction
Exp/Opt: Gracemont - Improved clustered 2x3 way, so the throughput is effective 6-wide. Predecode cache replaced with OD-ILD that predecodes on the fly for better performance under more demanding workloads and higher area/power efficiency

New Ideas?: Skymont - Clustered 3x3 way + ??
Darkmont - 18A shrink
Exp/Opt: Arctic Wolf - Clustered 3x3 way with backend optimizations?

Gracemont is already superior to Golden Cove in the fetch department, where it can fetch 2x32B(2x16B from the OD-ILD) to feed the two clustered decoders, while Goldemont Cove can only work with 1x32B for its 6-wide decoders.

Not only SMT is going to go away, eventually I suspect uop caches will too, as the primary purpose of the uop cache is to increase clocks while attempting to minimize branch misprediction penalties that comes with increased pipeline stages.
raichu said skymont will be 8 wide so 4×4 way. And is targeting rocket lake to golden cove ipc