Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 268 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
854
804
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,031
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,525
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,433
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,319
Last edited:

coercitiv

Diamond Member
Jan 24, 2014
7,361
17,450
136
Yeah XMX has to come back, because MS's TOPs requirements are only getting larger as time goes on. It's a lot more area efficient (AKA cheaper to produce and for the end consumer later) to use XMX than it is to slap on an even _bigger_ NPU, even if the NPU would be more power efficient.
Never really bothered to to get more in-depth with the subject, but my basic understanding is ML based tasks in personal computing will be split in two categories:
  • "Low" compute tasks where efficiency is important, such as video call background blur, noise reduction, recognition, translation, dictation, grammar & auto correct etc. The NPU should handle them, so it needs to be scaled to their scope and made as efficient as possible.
  • Heavy compute tasks using generative models (language, multimedia, science & engineering) where performance is important. These will leverage the GPU mostly, because this way the compute area can be used for both AI and graphics, which is a good compromise for a consumer chip.
 

moinmoin

Diamond Member
Jun 1, 2017
5,247
8,462
136
If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money). Having two separate designs makes a lot of sense considering LNC is new.
It would be especially silly considering the whole reason of existence for E-cores is area efficiency. But with AVX-512 existing but disabled the P-cores are essentially artificially bloated without reason. And I have a hard time imagining combining P and E-cores with all the hard- and software changes that necessitates is cheaper than just optimizing P-cores.
 

Geddagod

Golden Member
Dec 28, 2021
1,531
1,627
106
I remember reading some articles that had mixed views about the AVX-512 die area during the Linus Torvalds AVX controversy. Many claimed that AVX-512 instructions take up significant die space (as much as 25% per core) due to it's complex logic. While a few others claimed that AVX-512 support doesn't take up significant space in the total die area.

If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money). Having two separate designs makes a lot of sense considering LNC is new.
Where did you hear AVX-512 adds 25% to the core area?
Also, even if it is true (I doubt it is lmao), that's just the core. It's not adding 25% to the whole die, the impact there is gonna be way, waaaay smaller
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
Where did you hear AVX-512 adds 25% to the core area?
Also, even if it is true (I doubt it is lmao)
You should. Like I said, AVX-512 % die area projections differs from one die to die. Some are from trust worthy sources while others are plain speak, possibly just rumors or guesses (may or may not be accurate). You be your own judge, cos Intel doesn't publish exact figures.

Link, link & link.

...It's not adding 25% to the whole die, the impact there is gonna be way, waaaay smaller
Your accuracy amazes me! Assuming you can read correctly, I clearly mention their claims upto 25% per core. I never said 25% of the whole die area. Then also, I'm sure it's definitely not waaaay smaller. Definitely not a single digit % number.
 
Jul 27, 2020
28,173
19,191
146
If we assume each P core in Intel CPUs contains two AVX-512 units and same core is used for server and consumer with the latter having one unit disabled, that's a lot of die area being wasted in the name of segmentation.
 

Geddagod

Golden Member
Dec 28, 2021
1,531
1,627
106
You should. Like I said, AVX-512 % die area projections differs from one die to die. Some are from trust worthy sources while others are plain speak, possibly just rumors or guesses (may or may not be accurate). You be your own judge, cos Intel doesn't publish exact figures.
You can literally just look at skylake client and then skylake server (which has AVX-512).
Assuming you can read correctly, I clearly mention their claims upto 25% per core. I never said 25% of the whole die area.
Shouldn't have said this then
If the former is true, then having AVX-512 support in the die and then fused off is a complete waste of expensive silicon and adds up to cost significantly (25% is a ton of money)
You are talking about the die in one sentence, and then in parenthesis just mention 25% is a ton of money? It would be generous of me to assume you are talking about the core area tbh, though if it was some other people who typed that I would have just made that assumption lol
Then also, I'm sure it's definitely not waaaay smaller. Definitely not a single digit % number.
Using very optimistic calculations, at best it looks like on AVX-512 is ~15% of a skylake server core. A skylake core looks to be ~ 2/3 of a skylake "block". There's a lot of stuff on the CPU that's not just skylake "blocks" but let's just ignore that. It very conceivably can be a single digit % number, even if the number is 25% as you think it is.
 

Geddagod

Golden Member
Dec 28, 2021
1,531
1,627
106
Intel might have a marginal lead in client with their upcoming products.
Lol
But, AMD's upcoming cores appear to be better suited for data center than Intel's upcoming cores. Diamond rapids may not match Zen5 series in overall server performance and/or efficiency.
DMR not matching Zen 5 would be kinda pathetic since DMR would be launching pretty much near Zen 6
don't think the transistors are in it to enable it, right ? While googling on the subject, I found this from tomshardware.com
That doesn't mean the transistors won't/can't be there....
AMD drops performance in exchange for area/cost for the Zen4c cores. This leaves Intel in a situation where they need a faster Atom or a more efficient Cove core, but they have neither.
Why?
Ya. Agree. Nothing special. But still gonna be light years ahead of Zen5 I presume.
"I presume"
LionCove+ will be something like RaptorCove compared to GoldenCove or RedwoodCove. Nothing more.
Uhh I think Bionc said something about changes to the L0 and L1, but I could be misremembering
 

dullard

Elite Member
May 21, 2001
26,024
4,650
126
You are talking about the die in one sentence, and then in parenthesis just mention 25% is a ton of money?
So many of the arguments here are simple misunderstandings like that. People here tend to change subject midsentence and not tell anyone about the change of subject. Or, my pet peeve, use a pronoun that doesn't refer to anything remotely close to the sentence the pronoun is in. The way I read his post was that paragraph was talking about dies, and thus a 25% cost would naturally be assumed to be referring to the entire die cost.
 

Geddagod

Golden Member
Dec 28, 2021
1,531
1,627
106
So many of the arguments here are simple misunderstandings like that. People here tend to change subject midsentence and not tell anyone about the change of subject. Or, my pet peeve, use a pronoun that doesn't refer to anything remotely close to the sentence the pronoun is in. The way I read his post was that paragraph was talking about dies, and thus a 25% cost would naturally be assumed to be referring to the entire die cost.
Yup. Regardless, idk why he was so mad, other than the "I doubt it is lmao", nothing in my reply was thaaaat annoying. And that was referring to the core, not the total die.
But whatever, I* don't use this site much anymore due to how many times it loads super slowly, or just doesn't load at all.
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
Once done benchmarking, put the CPUs in water and measure their "density" using the Archimedes Principle :p
I know it sounds like a stretch, but I think it is kinda possible in theory to calculate the transistor density of a cpu using Archimedes principle if we know the volume of a transistor (and other materials like packaging, etc). All we need is a very large container filled with water and a lot of cpus to calculate the displacement. Then subtract and divide. Voila! ;)
 
  • Haha
Reactions: igor_kavinski

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)
Well, generally speaking, there are 3 possibilities for ARL...

(1) There is a very high probability that ARL might have a clock regression of up to 10% to 15%. Or maybe not. Hard to say at this point. If a clock regression is there AND if LNC's IPC gains are in the order of 20% to 25% only (which sounds reasonable), then we may end up with ARL having only single digit level IPC gains. Imho, thats not exactly a bad assessment.

(2) Next is, similar to MILD's claims, if LNC ends up having massive IPC gains in the order of 30% or 40%, and there isn't much clock regression, then ARL is gonna be awesome. The likelihood of this happening isn't that high if you ask me. But quite a possibility.

(3) Then there is a third but remote possibility that ARL might have a slight performance regression over RPL, cos RPL screams at a mind-numbing clock of 6.2 GHz. And if LNC's IPC gains aren't large enough, we may end up with something very similar to MTL, a slight performance regression. But the probability of something like this happening is pretty low. But still a possibility.
 
  • Like
Reactions: Tlh97 and hemedans

Ghostsonplanets

Senior member
Mar 1, 2024
774
1,228
96
Lmao Xino said ARL will be the same perf as RPL or maybe a singe digits level improvement (prob referring to ST)
That would be really unfortunate for a brand new generation. Specially coming after the impressive Sunny and Golden Cove generations, which both had ~20% IPC increase over the past uArch gen.

But, quite frankly, I'm more interested into how Lunar Lake will shape up than ARL. The return of low power x86 in the vein of Core M is much needed to fight against Apple M and X Elite. Intel ST performance is already very competitive so that a single digit uplift would still maintain them in the fight. But figuring high performance at low power and with good efficiency is key and hence why LNL is such an interesting prospect to me.
 
  • Like
Reactions: Tlh97 and Thibsie

Geddagod

Golden Member
Dec 28, 2021
1,531
1,627
106
Oh wait their accounts have all been suspended when was this lmao
No it hasn't. It was today.
Then there is a third but remote possibility that ARL might have a slight performance regression over RPL, cos RPL screams at a mind-numbing clock of 6.2 GHz.
Xino claims ARL might get up to 5.6ghz, but I doubt he is talking about the 14900ks, most people when comparing generations don't include the KS parts.
That would be really unfortunate for a brand new generation. Specially coming after the impressive Sunny and Golden Cove generations, which both had ~20% IPC increase over the past uArch gen.
Might be closer to 10% than 20%, unfortunately. I agree though, after all the LNC hype...
But figuring high performance at low power and with good efficiency is key and hence why LNL is such an interesting prospect to me.
Intel's low power optimization is just so cooked it's wild. Crossing my fingers for LNL (though I prob won't end up getting it anyway).
 

eek2121

Diamond Member
Aug 2, 2005
3,415
5,056
136
Never really bothered to to get more in-depth with the subject, but my basic understanding is ML based tasks in personal computing will be split in two categories:
  • "Low" compute tasks where efficiency is important, such as video call background blur, noise reduction, recognition, translation, dictation, grammar & auto correct etc. The NPU should handle them, so it needs to be scaled to their scope and made as efficient as possible.
  • Heavy compute tasks using generative models (language, multimedia, science & engineering) where performance is important. These will leverage the GPU mostly, because this way the compute area can be used for both AI and graphics, which is a good compromise for a consumer chip.

Meh, both can be used for point #2. I am actually more curious if this becomes a move back to CMT. Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃
 
Jul 27, 2020
28,173
19,191
146
Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃
Wouldn't that break compatibility with existing software? If they somehow divert those instructions from CPU decoder to NPU, there would be a latency hit involved.
 

SiliconFly

Golden Member
Mar 10, 2023
1,924
1,284
106
Wouldn't that break compatibility with existing software? If they somehow divert those instructions from CPU decoder to NPU, there would be a latency hit involved.
I think it's possible. Not very sure though. If the FP instructions are removed, the CPU will throw an exception when it doesn't recognize the instruction, the OS has to catch it and divert it to the NPU. There's gonna be a lot of latency involved.

And if I'm right, too many programs use FP these days I think. Possibly even browsers and apps like MS Office and lots of games too I think. Again, not sure though. If thats the case, then we're stuck with FP forever!
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
Meh, both can be used for point #2. I am actually more curious if this becomes a move back to CMT. Put enough instructions and speed on the NPU and suddenly the CPU doesn’t need to have an FPU anymore. 🙃

NPU is exact opposite of FPU. Floating point instructions are varying point so number expression range can be huge, like from 2^-64 to 2^64 and calculations can be done between opposite extremes. NPU instead is relying extremely short integers, like 4 and 8 bits - only 16 or 256 values. If we think normal integer(fixed point math) cpu as middle point NPU is other way and FPU the other, they are absolutely not alternatives.
 

H433x0n

Golden Member
Mar 15, 2023
1,224
1,606
106
An fmax of 5.6ghz would be fine.

At this point they should be receiving ES2 for a launch in October. If the practical IPC is ~15% I guess that makes sense once taking into account the 2-4% penalty from tile overhead.

Unfortunately in typical leaker fashion Xino wasn’t very specific. Was this test at JEDEC-4800? Was it with most recent stepping of SoC tile? Are the IPC figures from mobile or desktop ARL? Which version of RPL is he talking about? The 13900K or a 14900KS 1T performance?

Expectations are pretty low but if IPC bump is <15% then they deserve to get clobbered.
 
  • Like
Reactions: Tlh97