Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

	Intel Alder Lake - N	Intel Wildcat Lake	Intel Lunar Lake	Mediatek D9500
Launch Date	Q1-2023	Q2-2026 ?	Q3-2024	Q3-2025
Model	Intel N300	?	Core Ultra 7 268V	Dimensity 9500 5G
Dies	2	2	2	1
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	TSMC N3P

CPU	8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	C1 1+3+4
Threads	8	6	8	8
Max Clock	3.8 GHz	?	5 GHz
L3 Cache	6 MB	?	12 MB
TDP	7 W	Fanless ?	17 W	Fanless

Memory	64-bit LPDDR5-4800	64-bit LPDDR5-6800 ?	128-bit LPDDR5X-8533	64-bit LPDDR5X-10667
Size	16 GB	?	32 GB	24 GB ?
Bandwidth		~ 55 GB/s	136 GB/s	85.6 GB/s

GPU	UHD Graphics		Arc 140V	G1 Ultra
EU / Xe	32 EU	2 Xe	8 Xe	12
Max Clock	1.25 GHz		2 GHz

NPU	NA	18 TOPS	48 TOPS	100 TOPS ?

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

dullard · Mar 20, 2024

adroc_thurston said:
see the docs.

I've mentioned twice that I can't find it (including in the exact text you quoted). Do you have a link? And again, supporting something isn't the same as doing it with good performance due to the issues that I listed here: https://forums.anandtech.com/thread...akes-discussion-threads.2606448/post-41177881

adroc_thurston · Mar 20, 2024

dullard said:
Do you have a link?

AMD Technical Information Portal

docs.amd.com

dullard said:
And again, supporting something isn't the same as doing it with good performance due to the issues that I listed here

It does support them with good performance, just that you need to use Vitis AI toolchain instead of DirectML. (for now, anyway).

naukkis · Mar 20, 2024

dullard said:
The ability to use different data types is theoretically possible with NPUs. But, the main issue is memory, bandwidth, and power used. If you only need 4 bits (which AI often only needs 4 bits or 8 bits), then using something set up for 512 bits is quite a waste. Using 512 bits when your application needs 4 bits will require 128x more memory, will have to move 128x more data around, and will have to process 128x more of that data, using much more power. All while only being able to use much smaller AI models due to those limits. So, it isn't really efficient to use something set up for 512 bits with 4 bits

The reverse is true too. If you have an NPU optimized and designed for say, 4-bit math, and need 16-bit data, then you need to transfer that data around in 4 chunks which takes more time. Then you have memory to store only 1/4th the data. It can work, but just won't be as performant as you want.

Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed without losing compatibility. And as it's a part of a ISA pretty much every program used it by default if float or double type variables are used as it's faster use them with FPU than with integer units.

Ghostsonplanets · Mar 20, 2024

adroc_thurston said:
That's compared to (non-existent) MTL-U which is one of the worst parts Intel has ever made.
Not a real achievement per se.

Not to be nitpicky, but weren't you quite bullish on LNL before? Did something changed to the project that isn't as efficient as expected?

adroc_thurston · Mar 20, 2024

Ghostsonplanets said:
but weren't you quite bullish on LNL before?

Yeah it's a solid part.

Ghostsonplanets said:
Did something changed to the project that isn't as efficient as expected?

No, just that the 50% moar nT relative to MTL-U puts it not that high up on the efficiency list.

igor_kavinski · Mar 20, 2024

adroc_thurston said:
Yeah it's a solid part.

Is AMD's competing part gonna be more solid?

adroc_thurston · Mar 20, 2024

igor_kavinski said:
Is AMD's competing part gonna be more solid?

They don't have any.

igor_kavinski · Mar 20, 2024

adroc_thurston said:
They don't have any.

Are you saying that they been caught off guard? Or just disinterested?

Goop_reformed · Mar 20, 2024

Geddagod said:
That's wild lmaoooo

I'm playing the talos principle fan mods and they can get really damn difficult. I have no time for cryptic messages atm.

Goop_reformed · Mar 20, 2024

igor_kavinski said:
Are you saying that they been caught off guard? Or just disinterested?

I don't think the current zen can scale to that level, perhaps why amd are making 3 types of venice? This is a total guest btw

adroc_thurston · Mar 20, 2024

igor_kavinski said:
Are you saying that they been caught off guard

No.

igor_kavinski said:
Or just disinterested

AMD tablet chips swimlane is dead.

Goop_reformed · Mar 20, 2024

Freshly squeezed leaks:

Just a picture though.

Feast on this peasants

FlameTail · Mar 20, 2024

Isn't using dummy dies a waste of silicon?

What's the benefit, other than aesthetics?

adroc_thurston · Mar 20, 2024

FlameTail said:
Isn't using dummy dies a waste of silicon?

no

FlameTail said:
What's the benefit, other than aesthetics?

Thermomechanical stability.
Those things expand at different rates.

igor_kavinski · Mar 20, 2024

adroc_thurston said:
Thermomechanical stability.
Those things expand at different rates.

Do the wafers used by Intel and TSMC differ in any way? Or do they both get them from pure silicon ingots?

Hitman928 · Mar 21, 2024

Not sure if this is news, but Microsoft just announced "Surface AI PCs". They utilize Intel MTL and come in 2 models, the Surface Pro 10 for Business and Surface Laptop 6 for Business.

Introducing Surface Pro 10 for Business and Surface Laptop 6 for Business

AI-powered PCs built for a new era of work We are excited to announce the first Surface AI PCs built exclusively for business: Surface Pro 10 for Business and Surface Laptop 6 for Business. These new PCs re

blogs.windows.com

eek2121 · Mar 21, 2024

naukkis said:
Whole NPU meaning is to make optimized hardware for very short datatypes with simplified instructions. FPU does very complex instructions with long datatypes being exactly opposite optimizing point to NPU. And FPU isn't actually co-processor anymore, it's a part of a ISA and it cannot removed without losing compatibility. And as it's a part of a ISA pretty much every program used it by default if float or double type variables are used as it's faster use them with FPU than with integer units.

While I do agree, technology does evolve over time and this is what I am getting at. I was thinking in terms of literal decades when I said that.

The NPU will likely take on more responsibility as time goes on. I will be shocked if x86 20 years from now as it does today. New wafer costs keep going up and having duplicate functionality on multiple parts of the package wastes die space.

I am terrible at predicting , but if I had to I would say the lines between the NPU, CPU cores, and GPU cores are going to blur.

IIRC someone mentioned AMD has a patent for catching exceptions for missing instructions and redirecting the workload off chip. The reason it got brought up is because of Intel’s missing AVX-512. I could absolutely see something similar happening here.

Someone mentioned latency, but once the compilers are changed, there would be no performance penalty and performance may actually increase.

Me personally? I want socketed NPUs so competitors can play too, but of course we won’t get that. We are getting PCIE accelerators soon, however.

We are still in the very early days of AI. If you compare AI to the invention of the internet, we equivalent to where they were in the 70s.

adroc_thurston · Mar 21, 2024

naukkis said:
Speculation was about replacing FPU with NPU

did you just invent GPUs.
like dawg we already invented silly parallel SIMD crunchers. in 2005. In Xenos, from Xbox 360.

eek2121 said:
The NPU will likely take on more responsibility as time goes on.

It does dumb matrix math.
are you daft

eek2121 said:
I will be shocked if x86 20 years from now as it does today. New wafer costs keep going up and having duplicate functionality on multiple parts of the package wastes die space.

THE FUTURE IS FUSION™

eek2121 said:
but if I had to I would say the lines between the NPU, CPU cores, and GPU cores are going to blur.

They'll be more clear-cut than ever.

igor_kavinski · Mar 21, 2024

adroc_thurston said:
It does dumb matrix math.

That's kinda like saying our neurons do dumb matrix math

adroc_thurston · Mar 21, 2024

igor_kavinski said:
That's kinda like saying our neurons do dumb matrix math

They don't, that's the whole point.

mikk · Mar 22, 2024

Genuine Intel(R) 0000 1.60GHz (8C 2.8GHz/1.6GHz 75% OC, 5x 2.5MB L2, 2x 12MB L3)
Intel(R) Arc(TM) Graphics (64CU 512SP SM6.4 1.85GHz, 8MB L2

Details for Computer/Device Samsung 999JZR Galaxy Book5 Pro (Samsung NT940XGK-DSD)

ranker.sisoftware.co.uk

https://twitter.com/x/status/1771183281389605260

This should be Lunar Lake? Sisoft reports 8 threads.

DavidC1 · Mar 22, 2024

dullard said:
I think Intel 3 seems to be what was intended. Because, well, Sierra Forest is on Intel 3.

That makes no sense in context of what Raichu said. He said and I quote "it's easy to get 3 meters".

DavidC1 · Mar 22, 2024

moinmoin said:
Technology wise the Atom team's work indeed has been more interesting to follow for quite some time now, like for a decade by now?

After falling out of the spotlight with Silvermont/Airmont, they've been working quietly behind the scenes.

They've been doing a consistent cadence of using new ideas in one generation and optimization/expansion the next.

Bonnell - 2 way in order
New Ideas: Silvermont - 2 way out of order, lowered pipeline stages
New Ideas: Goldmont - 3 way out of order(added OoOE FP) + 16KB predecode
Exp/Opt: Goldmont Plus - 3 way + wider backend, quadrupled(64KB) predecode

New Ideas: Tremont - Clustered, 2x3 way, 128KB predecode, greatly improved branch prediction
Exp/Opt: Gracemont - Improved clustered 2x3 way, so the throughput is effective 6-wide. Predecode cache replaced with OD-ILD that predecodes on the fly for better performance under more demanding workloads and higher area/power efficiency

New Ideas?: Skymont - Clustered 3x3 way + ??
Darkmont - 18A shrink
Exp/Opt: Arctic Wolf - Clustered 3x3 way with backend optimizations?

Gracemont is already superior to Golden Cove in the fetch department, where it can fetch 2x32B(2x16B from the OD-ILD) to feed the two clustered decoders, while Goldemont Cove can only work with 1x32B for its 6-wide decoders.

Not only SMT is going to go away, eventually I suspect uop caches will too, as the primary purpose of the uop cache is to increase clocks while attempting to minimize branch misprediction penalties that comes with increased pipeline stages.

Henry swagger · Mar 23, 2024

DavidC1 said:
After falling out of the spotlight with Silvermont/Airmont, they've been working quietly behind the scenes.

They've been doing a consistent cadence of using new ideas in one generation and optimization/expansion the next.

Bonnell - 2 way in order
New Ideas: Silvermont - 2 way out of order, lowered pipeline stages
New Ideas: Goldmont - 3 way out of order(added OoOE FP) + 16KB predecode
Exp/Opt: Goldmont Plus - 3 way + wider backend, quadrupled(64KB) predecode

New Ideas: Tremont - Clustered, 2x3 way, 128KB predecode, greatly improved branch prediction
Exp/Opt: Gracemont - Improved clustered 2x3 way, so the throughput is effective 6-wide. Predecode cache replaced with OD-ILD that predecodes on the fly for better performance under more demanding workloads and higher area/power efficiency

New Ideas?: Skymont - Clustered 3x3 way + ??
Darkmont - 18A shrink
Exp/Opt: Arctic Wolf - Clustered 3x3 way with backend optimizations?

Gracemont is already superior to Golden Cove in the fetch department, where it can fetch 2x32B(2x16B from the OD-ILD) to feed the two clustered decoders, while Goldemont Cove can only work with 1x32B for its 6-wide decoders.

Not only SMT is going to go away, eventually I suspect uop caches will too, as the primary purpose of the uop cache is to increase clocks while attempting to minimize branch misprediction penalties that comes with increased pipeline stages.

raichu said skymont will be 8 wide so 4×4 way. And is targeting rocket lake to golden cove ipc

SiliconFly · Mar 23, 2024

Henry swagger said:
raichu said skymont will be 8 wide so 4×4 way. And is targeting rocket lake to golden cove ipc

Kinda skeptical SKT can match GLC.

If it sounds too good to be true, then it probably is...

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Elite Member

Diamond Member

Golden Member

Senior member

Diamond Member

Lifer

Diamond Member

Lifer

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Golden Member

Senior member

Golden Member