Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

	Intel Alder Lake - N	Intel Wildcat Lake	Intel Lunar Lake	Mediatek D9500
Launch Date	Q1-2023	Q2-2026 ?	Q3-2024	Q3-2025
Model	Intel N300	?	Core Ultra 7 268V	Dimensity 9500 5G
Dies	2	2	2	1
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	TSMC N3P

CPU	8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	C1 1+3+4
Threads	8	6	8	8
Max Clock	3.8 GHz	?	5 GHz
L3 Cache	6 MB	?	12 MB
TDP	7 W	Fanless ?	17 W	Fanless

Memory	64-bit LPDDR5-4800	64-bit LPDDR5-6800 ?	128-bit LPDDR5X-8533	64-bit LPDDR5X-10667
Size	16 GB	?	32 GB	24 GB ?
Bandwidth		~ 55 GB/s	136 GB/s	85.6 GB/s

GPU	UHD Graphics		Arc 140V	G1 Ultra
EU / Xe	32 EU	2 Xe	8 Xe	12
Max Clock	1.25 GHz		2 GHz

NPU	NA	18 TOPS	48 TOPS	100 TOPS ?

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

coercitiv · Mar 18, 2024

uzzi38 said:
Yeah XMX has to come back, because MS's TOPs requirements are only getting larger as time goes on. It's a lot more area efficient (AKA cheaper to produce and for the end consumer later) to use XMX than it is to slap on an even _bigger_ NPU, even if the NPU would be more power efficient.

Never really bothered to to get more in-depth with the subject, but my basic understanding is ML based tasks in personal computing will be split in two categories:

"Low" compute tasks where efficiency is important, such as video call background blur, noise reduction, recognition, translation, dictation, grammar & auto correct etc. The NPU should handle them, so it needs to be scaled to their scope and made as efficient as possible.
Heavy compute tasks using generative models (language, multimedia, science & engineering) where performance is important. These will leverage the GPU mostly, because this way the compute area can be used for both AI and graphics, which is a good compromise for a consumer chip.

moinmoin · Mar 18, 2024

SiliconFly said:
If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money). Having two separate designs makes a lot of sense considering LNC is new.

It would be especially silly considering the whole reason of existence for E-cores is area efficiency. But with AVX-512 existing but disabled the P-cores are essentially artificially bloated without reason. And I have a hard time imagining combining P and E-cores with all the hard- and software changes that necessitates is cheaper than just optimizing P-cores.

Geddagod · Mar 18, 2024

SiliconFly said:
I remember reading some articles that had mixed views about the AVX-512 die area during the Linus Torvalds AVX controversy. Many claimed that AVX-512 instructions take up significant die space (as much as 25% per core) due to it's complex logic. While a few others claimed that AVX-512 support doesn't take up significant space in the total die area.

If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money). Having two separate designs makes a lot of sense considering LNC is new.

Where did you hear AVX-512 adds 25% to the core area?
Also, even if it is true (I doubt it is lmao), that's just the core. It's not adding 25% to the whole die, the impact there is gonna be way, waaaay smaller

SiliconFly · Mar 18, 2024

Geddagod said:
Where did you hear AVX-512 adds 25% to the core area?
Also, even if it is true (I doubt it is lmao)

You should. Like I said, AVX-512 % die area projections differs from one die to die. Some are from trust worthy sources while others are plain speak, possibly just rumors or guesses (may or may not be accurate). You be your own judge, cos Intel doesn't publish exact figures.

Link, link & link.

Geddagod said:
...It's not adding 25% to the whole die, the impact there is gonna be way, waaaay smaller

Your accuracy amazes me! Assuming you can read correctly, I clearly mention their claims upto 25% per core. I never said 25% of the whole die area. Then also, I'm sure it's definitely not waaaay smaller. Definitely not a single digit % number.

igor_kavinski · Mar 18, 2024

If we assume each P core in Intel CPUs contains two AVX-512 units and same core is used for server and consumer with the latter having one unit disabled, that's a lot of die area being wasted in the name of segmentation.

Geddagod · Mar 18, 2024

SiliconFly said:
You should. Like I said, AVX-512 % die area projections differs from one die to die. Some are from trust worthy sources while others are plain speak, possibly just rumors or guesses (may or may not be accurate). You be your own judge, cos Intel doesn't publish exact figures.

You can literally just look at skylake client and then skylake server (which has AVX-512).

SiliconFly said:
Assuming you can read correctly, I clearly mention their claims upto 25% per core. I never said 25% of the whole die area.

Shouldn't have said this then

If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money)

You are talking about the die in one sentence, and then in parenthesis just mention 25% is a ton of money? It would be generous of me to assume you are talking about the core area tbh, though if it was some other people who typed that I would have just made that assumption lol

SiliconFly said:
Then also, I'm sure it's definitely not waaaay smaller. Definitely not a single digit % number.

Using very optimistic calculations, at best it looks like on AVX-512 is ~15% of a skylake server core. A skylake core looks to be ~ 2/3 of a skylake "block". There's a lot of stuff on the CPU that's not just skylake "blocks" but let's just ignore that. It very conceivably can be a single digit % number, even if the number is 25% as you think it is.

Geddagod · Mar 18, 2024

SiliconFly said:
Intel might have a marginal lead in client with their upcoming products.

Lol

SiliconFly said:
But, AMD's upcoming cores appear to be better suited for data center than Intel's upcoming cores. Diamond rapids may not match Zen5 series in overall server performance and/or efficiency.

DMR not matching Zen 5 would be kinda pathetic since DMR would be launching pretty much near Zen 6

Markfw said:
don't think the transistors are in it to enable it, right ? While googling on the subject, I found this from tomshardware.com

That doesn't mean the transistors won't/can't be there....

eek2121 said:
AMD drops performance in exchange for area/cost for the Zen4c cores. This leaves Intel in a situation where they need a faster Atom or a more efficient Cove core, but they have neither.

Why?

SiliconFly said:
Ya. Agree. Nothing special. But still gonna be light years ahead of Zen5 I presume.

"I presume"

AMDK11 said:
LionCove+ will be something like RaptorCove compared to GoldenCove or RedwoodCove. Nothing more.

Uhh I think Bionc said something about changes to the L0 and L1, but I could be misremembering

dullard · Mar 18, 2024

Geddagod said:
You are talking about the die in one sentence, and then in parenthesis just mention 25% is a ton of money?

So many of the arguments here are simple misunderstandings like that. People here tend to change subject midsentence and not tell anyone about the change of subject. Or, my pet peeve, use a pronoun that doesn't refer to anything remotely close to the sentence the pronoun is in. The way I read his post was that paragraph was talking about dies, and thus a 25% cost would naturally be assumed to be referring to the entire die cost.

Geddagod · Mar 18, 2024

dullard said:
So many of the arguments here are simple misunderstandings like that. People here tend to change subject midsentence and not tell anyone about the change of subject. Or, my pet peeve, use a pronoun that doesn't refer to anything remotely close to the sentence the pronoun is in. The way I read his post was that paragraph was talking about dies, and thus a 25% cost would naturally be assumed to be referring to the entire die cost.

Yup. Regardless, idk why he was so mad, other than the "I doubt it is lmao", nothing in my reply was thaaaat annoying. And that was referring to the core, not the total die.
But whatever, I* don't use this site much anymore due to how many times it loads super slowly, or just doesn't load at all.

SiliconFly · Mar 19, 2024

igor_kavinski said:
Once done benchmarking, put the CPUs in water and measure their "density" using the Archimedes Principle

I know it sounds like a stretch, but I think it is kinda possible in theory to calculate the transistor density of a cpu using Archimedes principle if we know the volume of a transistor (and other materials like packaging, etc). All we need is a very large container filled with water and a lot of cpus to calculate the displacement. Then subtract and divide. Voila!

Geddagod · Mar 19, 2024

Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

S'renne · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

Who's Xino

S'renne · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

Oh wait their accounts have all been suspended when was this lmao

SiliconFly · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

Well, generally speaking, there are 3 possibilities for ARL...

(1) There is a very high probability that ARL might have a clock regression of up to 10% to 15%. Or maybe not. Hard to say at this point. If a clock regression is there AND if LNC's IPC gains are in the order of 20% to 25% only (which sounds reasonable), then we may end up with ARL having only single digit level IPC gains. Imho, thats not exactly a bad assessment.

(2) Next is, similar to MILD's claims, if LNC ends up having massive IPC gains in the order of 30% or 40%, and there isn't much clock regression, then ARL is gonna be awesome. The likelihood of this happening isn't that high if you ask me. But quite a possibility.

(3) Then there is a third but remote possibility that ARL might have a slight performance regression over RPL, cos RPL screams at a mind-numbing clock of 6.2 GHz. And if LNC's IPC gains aren't large enough, we may end up with something very similar to MTL, a slight performance regression. But the probability of something like this happening is pretty low. But still a possibility.

Ghostsonplanets · Mar 19, 2024

Geddagod said:
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)

That would be really unfortunate for a brand new generation. Specially coming after the impressive Sunny and Golden Cove generations, which both had ~20% IPC increase over the past uArch gen.

But, quite frankly, I'm more interested into how Lunar Lake will shape up than ARL. The return of low power x86 in the vein of Core M is much needed to fight against Apple M and X Elite. Intel ST performance is already very competitive so that a single digit uplift would still maintain them in the fight. But figuring high performance at low power and with good efficiency is key and hence why LNL is such an interesting prospect to me.

Geddagod · Mar 19, 2024

S'renne said:
Oh wait their accounts have all been suspended when was this lmao

No it hasn't. It was today.

SiliconFly said:
Then there is a third but remote possibility that ARL might have a slight performance regression over RPL, cos RPL screams at a mind-numbing clock of 6.2 GHz.

Xino claims ARL might get up to 5.6ghz, but I doubt he is talking about the 14900ks, most people when comparing generations don't include the KS parts.

Ghostsonplanets said:
That would be really unfortunate for a brand new generation. Specially coming after the impressive Sunny and Golden Cove generations, which both had ~20% IPC increase over the past uArch gen.

Might be closer to 10% than 20%, unfortunately. I agree though, after all the LNC hype...

Ghostsonplanets said:
But figuring high performance at low power and with good efficiency is key and hence why LNL is such an interesting prospect to me.

Intel's low power optimization is just so cooked it's wild. Crossing my fingers for LNL (though I prob won't end up getting it anyway).

eek2121 · Mar 19, 2024

coercitiv said:
Never really bothered to to get more in-depth with the subject, but my basic understanding is ML based tasks in personal computing will be split in two categories:

"Low" compute tasks where efficiency is important, such as video call background blur, noise reduction, recognition, translation, dictation, grammar & auto correct etc. The NPU should handle them, so it needs to be scaled to their scope and made as efficient as possible.

Heavy compute tasks using generative models (language, multimedia, science & engineering) where performance is important. These will leverage the GPU mostly, because this way the compute area can be used for both AI and graphics, which is a good compromise for a consumer chip.

Meh, both can be used for point #2. I am actually more curious if this becomes a move back to CMT. Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

igor_kavinski · Mar 19, 2024

eek2121 said:
Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

Wouldn't that break compatibility with existing software? If they somehow divert those instructions from CPU decoder to NPU, there would be a latency hit involved.

SiliconFly · Mar 19, 2024

igor_kavinski said:
Wouldn't that break compatibility with existing software? If they somehow divert those instructions from CPU decoder to NPU, there would be a latency hit involved.

I think it's possible. Not very sure though. If the FP instructions are removed, the CPU will throw an exception when it doesn't recognize the instruction, the OS has to catch it and divert it to the NPU. There's gonna be a lot of latency involved.

And if I'm right, too many programs use FP these days I think. Possibly even browsers and apps like MS Office and lots of games too I think. Again, not sure though. If thats the case, then we're stuck with FP forever!

naukkis · Mar 19, 2024

eek2121 said:
Meh, both can be used for point #2. I am actually more curious if this becomes a move back to CMT. Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

NPU is exact opposite of FPU. Floating point instructions are varying point so number expression range can be huge, like from 2^-64 to 2^64 and calculations can be done between opposite extremes. NPU instead is relying extremely short integers, like 4 and 8 bits - only 16 or 256 values. If we think normal integer(fixed point math) cpu as middle point NPU is other way and FPU the other, they are absolutely not alternatives.

adroc_thurston · Mar 19, 2024

naukkis said:
NPU is exact opposite of FPU

?
They do bf/fp8|16 math just fine.

naukkis said:
NPU instead is relying extremely short integers, like 4 and 8 bits - only 16 or 256 values.

Most support FP16/BF16 just fine.

igor_kavinski · Mar 19, 2024

adroc_thurston said:
Most support FP16/BF16 just fine.

Can they match or exceed an AVX-512 CPU?

adroc_thurston · Mar 19, 2024

igor_kavinski said:
Can they match or exceed an AVX-512 CPU?

Yes they're dedicated matrix crunchers.
GEMM is the only thing they do.

H433x0n · Mar 19, 2024

An fmax of 5.6ghz would be fine.

At this point they should be receiving ES2 for a launch in October. If the practical IPC is ~15% I guess that makes sense once taking into account the 2-4% penalty from tile overhead.

Unfortunately in typical leaker fashion Xino wasn’t very specific. Was this test at JEDEC-4800? Was it with most recent stepping of SoC tile? Are the IPC figures from mobile or desktop ARL? Which version of RPL is he talking about? The 13900K or a 14900KS 1T performance?

Expectations are pretty low but if IPC bump is <15% then they deserve to get clobbered.

FlameTail · Mar 19, 2024

adroc_thurston said:
Yes they're dedicated matrix crunchers.
GEMM is the only thing they do.

GEMM?

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Diamond Member

Diamond Member

Golden Member

Golden Member

Lifer

Golden Member

Golden Member

Elite Member

Golden Member

Golden Member

Golden Member

Member

Member

Golden Member

Senior member

Golden Member

Diamond Member

Lifer

Golden Member

Golden Member

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member