Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

	Intel Alder Lake - N	Intel Wildcat Lake	Intel Lunar Lake	Mediatek D9500
Launch Date	Q1-2023	Q2-2026 ?	Q3-2024	Q3-2025
Model	Intel N300	?	Core Ultra 7 268V	Dimensity 9500 5G
Dies	2	2	2	1
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	TSMC N3P

CPU	8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	C1 1+3+4
Threads	8	6	8	8
Max Clock	3.8 GHz	?	5 GHz
L3 Cache	6 MB	?	12 MB
TDP	7 W	Fanless ?	17 W	Fanless

Memory	64-bit LPDDR5-4800	64-bit LPDDR5-6800 ?	128-bit LPDDR5X-8533	64-bit LPDDR5X-10667
Size	16 GB	?	32 GB	24 GB ?
Bandwidth		~ 55 GB/s	136 GB/s	85.6 GB/s

GPU	UHD Graphics		Arc 140V	G1 Ultra
EU / Xe	32 EU	2 Xe	8 Xe	12
Max Clock	1.25 GHz		2 GHz

NPU	NA	18 TOPS	48 TOPS	100 TOPS ?

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

SiliconFly · Oct 20, 2024

Hulk · Oct 20, 2024

Arrow Lake makes its appearance at MicroCenter.
The K's COME with coolers now, indicating you don't need a really expensive cooler to run these things so Intel can afford to put one in the box. That is a good sign.

https://www.microcenter.com/search/search_results.aspx?Ntk=all&sortby=match&N=4294966995+4294820689+4294802427&myStore=false

Kocicak · Oct 20, 2024

Hulk said:
Arrow Lake makes its appearance at MicroCenter.The K's COME with coolers now

Probably just a mistake in description.

poke01 · Oct 20, 2024

Might be a typo, Aussie site PLE lists 265k, 285k as no HSF boxes.

Intel Core Ultra 7 265K Arrow Lake 20 Core 20 Thread Up To 5.5GHz LGA1851 - No HSF Retail Box

The Intel Core Ultra 7 processor integrates advanced technologies to enhance performance and security. With Smart Cache technology, it efficiently manages data for faster processing speeds. Features like Intel Quick Sync Video facilitate seamless multimedia experiences, while support for...

www.ple.com.au

Intel Core Ultra 9 285K Arrow Lake 24 Core 24 Thread Up To 5.7GHz LGA1851 - No HSF Retail Box

The Intel Core Ultra 9 processor offers performance with a clock speed of 2.3 GHz and the capability to boost up to 5.1 GHz. This processor is built on Smart Cache technology, featuring 24 MB of cache memory for efficiency in data processing. With 16 cores and 22 threads, it manages...

www.ple.com.au

Josh128 · Oct 20, 2024

poke01 said:
Might be a typo, Aussie site PLE lists 265k, 285k as no HSF boxes.

Intel Core Ultra 7 265K Arrow Lake 20 Core 20 Thread Up To 5.5GHz LGA1851 - No HSF Retail Box

The Intel Core Ultra 7 processor integrates advanced technologies to enhance performance and security. With Smart Cache technology, it efficiently manages data for faster processing speeds. Features like Intel Quick Sync Video facilitate seamless multimedia experiences, while support for...

www.ple.com.au

Intel Core Ultra 9 285K Arrow Lake 24 Core 24 Thread Up To 5.7GHz LGA1851 - No HSF Retail Box

The Intel Core Ultra 9 processor offers performance with a clock speed of 2.3 GHz and the capability to boost up to 5.1 GHz. This processor is built on Smart Cache technology, featuring 24 MB of cache memory for efficiency in data processing. With 16 cores and 22 threads, it manages...

www.ple.com.au

Hmm. Same weird thin box form factor as Zen 5. Pinching pennies on both sides, lol.

BTW, Redwood Cove and Lion Cove are not "disasters" as have been stated here. You guys are missing the forest for the trees. The problem is MCM. M C M!! There are tradeoffs that even the pretty, tightly packed "tiles" cannot nullify. Thats why Intel waited so long to go that route. AMD also has tightly packed chiplets with RDNA 3, and despite the node advantage over RDNA 2, was quite disappointing and power hungry compared to Nvidias still monolithic design. Nvidia sticking with monolithic was also a conscious decision. There is no magic bullet to overcome these limitations, yet. Both Intel and AMD have very smart and talented engineers, but they all have to work within the confines of the current understanding of physics and material sciences.

Nothingness · Oct 20, 2024

SiliconFly said:
But it doesn't mean much cos for a given workload, the arm isa has to execute more no. of instructions to get the same work done when compared to cisc isa like LNC & Zen 5. I'm only talking abt the no. of instructions for the sake of IPC (not inst width, complexity, etc). This too makes literal IPC comparison meaningless.

This is incorrect.

Running SPEC CPU2017 on Chinese CPUs, and More

SPEC CPU2017 is an industry standard benchmark suite.

chipsandcheese.com

Talking about SPECint 2017

If I take the geometric mean of the difference in instructions executed, x86-64 and aarch64 are surprisingly close, with x86-64 executing ~1.17% more instructions.

IEC · Oct 20, 2024

Looks like the hotspot on the Arrow Lake heatspreader has moved judging by the changes MSI and ASUS made to their AIOs. Der8auer previews some of the changes:
YouTube video

Those motherboard prices though. Oof.

SiliconFly · Oct 20, 2024

511 · Oct 20, 2024

Anyways i had a discussion with my friend at Nvidia and he said Grace Hopper CPU is better than Intel Xeon 6/ AMD GNR 🤣

Nothingness · Oct 20, 2024

SiliconFly said:
Not even close. SPECint is a very narrow representation of the much wider instruction set. I've worked with x86 instructions and I've gone thru a bit of ARM instructions too. RISC by definition uses more simpler instructions to perform a task whereas CISC uses fewer but more complex instructions to do the same. Thats the core tenet of RISC vs CISC. Pretty much well established. Something you don't agreed with.

For example, to display a window using a Windows system call on a PC, you tend to have fewer but more complex assembly instructions to setup the parameters for the call. Doing the same on WoA, you'll need a lot more instructions to do the same. Just fyi..

movzx eax, byte ptr [rcx+rax+69h]

lea eax, [rdi+rsi]

Confronted to the reality of a dozen of different benchmarks, you provide a two lines example. That's interesting.

jpiniero · Oct 20, 2024

https://videocardz.com/newz/intel-core-ultra-200h-arrow-lake-h-lineup-has-been-leaked-16-core-ultra-9-285h-features-45w-tdp

Here's the entire 200 Series laptop lineup apparently if you haven't seen it.

SiliconFly · Oct 20, 2024

511 · Oct 20, 2024

jpiniero said:
https://videocardz.com/newz/intel-core-ultra-200h-arrow-lake-h-lineup-has-been-leaked-16-core-ultra-9-285h-features-45w-tdp

Here's the entire 200 Series laptop lineup apparently if you haven't seen it.

i5 with 5 Ghz damm i wish they had 16 Chadmont only sku

naukkis · Oct 20, 2024

SiliconFly said:
Not even close. SPECint is a very narrow representation of the much wider instruction set. I've worked with x86 instructions and I've gone thru a bit of ARM instructions too. RISC by definition uses more simpler instructions to perform a task whereas CISC uses fewer but more complex instructions to do the same. Thats the core tenet of RISC vs CISC. Pretty much well established. Something you don't agreed with.

For example, to display a window using a Windows system call on a PC, you tend to have fewer but more complex assembly instructions to setup the parameters for the call. Doing the same on WoA, you'll need a lot more instructions to do the same. Just fyi..

movzx eax, byte ptr [rcx+rax+69h]

lea eax, [rdi+rsi]

ARM ain't very risc - and other way x86 complex instructions are avoidable because they are slow. That 3-operand lea in your example is one of those instructions which are replaced with simpler instructions when performance matters. And because ARM has more general purpose registers less instructions is needed to spill and restore registers. So yeah, in real life code arm and x86 have very similar code density - and arm actually being more dense won't be a surprise.

EDIT: Read it wrong - you used 2-operand lea. Aarch64 support those addressing modes and even more complex address modes like auto increment and register pair loads, only that 3-operand version ain't supported.

511 · Oct 20, 2024

naukkis said:
ARM ain't very risc - and other way x86 complex instructions are avoidable because they are slow. That 3-operand lea in your example is one of those instructions which are replaced with simpler instructions when performance matters. And because ARM has more general purpose registers less instructions is needed to spill and restore registers. So yeah, in real life code arm and x86 have very similar code density - and arm actually being more dense won't be a surprise.

About this if you count instructions ARM will require equal or more instructions vs x86 to do the same task if we are not counting specialized instructions

SiliconFly · Oct 20, 2024

OneEng2 · Oct 20, 2024

alcoholbob said:
That's something they should aim for, but it will be a struggle for them, as historically Intel has had poor perf/watt in productivity benchmarks against AMD even with a node advantage, and it doesn't look like it's changed moving to TSMC 3nm

I have observed this as well. Sure, Intel peaked at the Netburst days with their no-holes-barred attack on performance through power vs efficiency approach, but it does seem like this has been a recurring issue for them. They do seem to have cycles where they fall back to more effecient cores (like Core and Core 2 and now Lion Cove/Skymont). I am encouraged (unlike many here) by Intel's return to efficiency in this cycle.

cannedlake240 said:
Redwood cove was a disaster but Raptor, Golden coves were ok? So a tock is ok but tick is a disaster all of a sudden? Intel themselves said that Raptor cove wasn't even supposed to exist lol, Golden is the tock and Redwood is the tick it's all result of the foundry debacle

I think that this whole Tick/Tock thing is past. The "Tock" generally involved a die shrink. I think that any company that thinks they are going to be able to rely on a big process improvement every other CPU release is going to be headed to some pain and misery as they are confronted with a very new reality in the financial implications of such a strategy as well as the grossly diminishing returns each shrink will give them.

Hulk said:
All of these P cores are pretty comparable performance-wise to their competition so basically all P cores for the past 5 years have been a "disaster."

"Disaster" is a serious disrupting to functioning. AFAIK all of these chips actually function.

Agree. I think that we need to be watching the financial success of the product lineup. Being "competitive" from a performance standpoint, even if you are not leading by 25-40% (which I think is a thing of the past due to my previous argument about process improvements having diminishing returns) is enough. Other factors are going to be more important IMHO.

Josh128 said:
Hmm. Same weird thin box form factor as Zen 5. Pinching pennies on both sides, lol.

BTW, Redwood Cove and Lion Cove are not "disasters" as have been stated here. You guys are missing the forest for the trees. The problem is MCM. M C M!! There are tradeoffs that even the pretty, tightly packed "tiles" cannot nullify. Thats why Intel waited so long to go that route. AMD also has tightly packed chiplets with RDNA 3, and despite the node advantage over RDNA 2, was quite disappointing and power hungry compared to Nvidias still monolithic design. Nvidia sticking with monolithic was also a conscious decision. There is no magic bullet to overcome these limitations, yet. Both Intel and AMD have very smart and talented engineers, but they all have to work within the confines of the current understanding of physics and material sciences.

Intel was late to the MCM game and it arguably hurt them financially. I even wonder about Intel's tile implementation that requires another layer of silicon in addition to the actual tiles (but that is another subject for another thread

). Nvidia seems to be relying on their leadership position in the AI market that basically lets them charge whatever they want. I don't see this as a good long term position as eventually, cost will need to be lowered and yields will need to be higher to compete.

SiliconFly said:
Omg! I assumed you would get the full picture by looking at those instructions and what they mean. Those 2 are one of the most common instructions and something a cisc can do in one instruction but a risc can't. Also, when benchmarking general app performance like office, browsers, etc, these type of instructions are used more than int/fp.

It is my understanding that for a very long time now, CISC processors early decode stages put CISC into RISC like (equal length instruction and data) format for pipelining and superscalar execution. I suspect that this is a gross oversimplification of what actually happens though.

SiliconFly · Oct 20, 2024

OneEng2 · Oct 20, 2024

SiliconFly said:
Yep. I remember reading about this too. Once decoded into µops, the rest of the blocks function typically like a RISC cpu (both Intel & AMD I think). But how similar, I don't know. Maybe some forum member with better insight can shed some light.

I must admit, your two assembly language examples took me back

. The only thing I use asm for these days (and I assure you whan I say "I use", I mean my engineering team) is some of the initial setup instructions for the embedded micro. Even that is going by the wayside as many microcontroller suppliers now offer a "setup utility" that generates the setup asm file for you. Kids! Pretty soon, none of them will have the vaguest idea how a CPU works

.

I feel pretty old now as I find myself saying "back in my day" and explaining to young engineers how to handle a problem when the "setup utility" doesn't get it done. Yep, you have to read the engineering documents on how the different setup registers ACTUALLY WORK and then do your own bit and byte calculations (God forbid!).

Anyway, if someone on the forum has a more detailed understanding of the CISC to RISC (ish) decode and pipelining logic, I would be interested in hearing more about it as well.

Hulk · Oct 20, 2024

OneEng2 said:
I must admit, your two assembly language examples took me back . The only thing I use asm for these days (and I assure you whan I say "I use", I mean my engineering team) is some of the initial setup instructions for the embedded micro. Even that is going by the wayside as many microcontroller suppliers now offer a "setup utility" that generates the setup asm file for you. Kids! Pretty soon, none of them will have the vaguest idea how a CPU works .

I feel pretty old now as I find myself saying "back in my day" and explaining to young engineers how to handle a problem when the "setup utility" doesn't get it done. Yep, you have to read the engineering documents on how the different setup registers ACTUALLY WORK and then do your own bit and byte calculations (God forbid!).

Anyway, if someone on the forum has a more detailed understanding of the CISC to RISC (ish) decode and pipelining logic, I would be interested in hearing more about it as well.

Anybody know who came up with micro-ops and when? Saved x86 I would think?

SiliconFly · Oct 20, 2024

Doug S · Oct 20, 2024

Magio said:
Apple's P core hasn't had an 10=% gen-on-gen IPC increase since 2019, before they even launched the M series. The M4 P core is the biggest increase they've had since and that's 8%, it's ludicrous to suggest a 25% increase in M5 at this stage

The fact Apple has been getting consistent IPC increases despite gaining 10% frequency with each step from A12Z->M1->M2->M3->M4 seems to be pretty underrated. Where would Intel and AMD frequencies be if they'd been gaining 10% with each new design over the past five years?

Who knows whether they will continue to bump frequency or if their desire for power efficiency will halt those increases soon, and if they do whether that will mean more effort towards IPC increases (though since that's harder to improve the higher it gets that may not be reflected in double digit numbers, at some point even managing 5% at constant frequency will be a huge victory)

jdubs03 · Oct 20, 2024

Doug S said:
The fact Apple has been getting consistent IPC increases despite gaining 10% frequency with each step from A12Z->M1->M2->M3->M4 seems to be pretty underrated. Where would Intel and AMD frequencies be if they'd been gaining 10% with each new design over the past five years?

Who knows whether they will continue to bump frequency or if their desire for power efficiency will halt those increases soon, and if they do whether that will mean more effort towards IPC increases (though since that's harder to improve the higher it gets that may not be reflected in double digit numbers, at some point even managing 5% at constant frequency will be a huge victory)

I’d assume that they’ll continue proceeding in the manner of their recent history. If I had to hazard, I guess they will lift both IPC and frequency up the same. They usually bump each gen by 15% overall on ST, so they have to get to that somehow. I suspect there will be some decent IPC gains ahead. If the ARM X925 can gain 15% in IPC, then there still seems to be headroom for further gains.

It’s really just an ARMs race (heh), vanilla with great gains vs. custom staying just ahead. X930 vs M5/A19(P) should be interesting.

But for x86, it would take something revolutionary to reach the performance efficiency of the aforementioned competitors. For Intel it’ll be that unified core that is their first attempt to try and bride the gap.

DavidC1 · Oct 20, 2024

SiliconFly said:
Pentium(?)

Also not sure when exactly they transitioned to *full* µops. Core(?)

Atom also turns away from that aspect and executes x86 instructions directly without turning them into uops.

Doug S · Oct 20, 2024

jdubs03 said:
If the ARM X925 can gain 15% in IPC, then there still seems to be headroom for further gains.

But that's the thing, they were able to gain 15% because they're starting from further behind. There's always headroom when you're further behind. Apple increased performance by 70% between A8 and A9, precisely because they were well behind on frequency and had a lot more low hanging fruit to grab for IPC. They're on the top of the IPC heap, so it will be harder for them to increase IPC by x% than it is for someone else who is 20% behind them in IPC.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Elite Member

Golden Member

Diamond Member

Diamond Member

Lifer

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Senior member

Golden Member

Senior member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member