Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 597 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
850
801
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:

poke01

Diamond Member
Mar 8, 2022
4,231
5,568
106
Might be a typo, Aussie site PLE lists 265k, 285k as no HSF boxes.


 

Josh128

Golden Member
Oct 14, 2022
1,325
1,996
106
Might be a typo, Aussie site PLE lists 265k, 285k as no HSF boxes.


Hmm. Same weird thin box form factor as Zen 5. Pinching pennies on both sides, lol.

BTW, Redwood Cove and Lion Cove are not "disasters" as have been stated here. You guys are missing the forest for the trees. The problem is MCM. M C M!! There are tradeoffs that even the pretty, tightly packed "tiles" cannot nullify. Thats why Intel waited so long to go that route. AMD also has tightly packed chiplets with RDNA 3, and despite the node advantage over RDNA 2, was quite disappointing and power hungry compared to Nvidias still monolithic design. Nvidia sticking with monolithic was also a conscious decision. There is no magic bullet to overcome these limitations, yet. Both Intel and AMD have very smart and talented engineers, but they all have to work within the confines of the current understanding of physics and material sciences.
 

Nothingness

Diamond Member
Jul 3, 2013
3,301
2,373
136
But it doesn't mean much cos for a given workload, the arm isa has to execute more no. of instructions to get the same work done when compared to cisc isa like LNC & Zen 5. I'm only talking abt the no. of instructions for the sake of IPC (not inst width, complexity, etc). This too makes literal IPC comparison meaningless.
This is incorrect.

Talking about SPECint 2017
If I take the geometric mean of the difference in instructions executed, x86-64 and aarch64 are surprisingly close, with x86-64 executing ~1.17% more instructions.
 

511

Diamond Member
Jul 12, 2024
4,550
4,168
106
Anyways i had a discussion with my friend at Nvidia and he said Grace Hopper CPU is better than Intel Xeon 6/ AMD GNR 🤣
 
Last edited:
  • Haha
Reactions: Elfear

Nothingness

Diamond Member
Jul 3, 2013
3,301
2,373
136
Not even close. SPECint is a very narrow representation of the much wider instruction set. I've worked with x86 instructions and I've gone thru a bit of ARM instructions too. RISC by definition uses more simpler instructions to perform a task whereas CISC uses fewer but more complex instructions to do the same. Thats the core tenet of RISC vs CISC. Pretty much well established. Something you don't agreed with.

For example, to display a window using a Windows system call on a PC, you tend to have fewer but more complex assembly instructions to setup the parameters for the call. Doing the same on WoA, you'll need a lot more instructions to do the same. Just fyi..
  • movzx eax, byte ptr [rcx+rax+69h]
  • lea eax, [rdi+rsi]
Confronted to the reality of a dozen of different benchmarks, you provide a two lines example. That's interesting.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
Not even close. SPECint is a very narrow representation of the much wider instruction set. I've worked with x86 instructions and I've gone thru a bit of ARM instructions too. RISC by definition uses more simpler instructions to perform a task whereas CISC uses fewer but more complex instructions to do the same. Thats the core tenet of RISC vs CISC. Pretty much well established. Something you don't agreed with.

For example, to display a window using a Windows system call on a PC, you tend to have fewer but more complex assembly instructions to setup the parameters for the call. Doing the same on WoA, you'll need a lot more instructions to do the same. Just fyi..
  • movzx eax, byte ptr [rcx+rax+69h]
  • lea eax, [rdi+rsi]

ARM ain't very risc - and other way x86 complex instructions are avoidable because they are slow. That 3-operand lea in your example is one of those instructions which are replaced with simpler instructions when performance matters. And because ARM has more general purpose registers less instructions is needed to spill and restore registers. So yeah, in real life code arm and x86 have very similar code density - and arm actually being more dense won't be a surprise.

EDIT: Read it wrong - you used 2-operand lea. Aarch64 support those addressing modes and even more complex address modes like auto increment and register pair loads, only that 3-operand version ain't supported.
 
Last edited:

511

Diamond Member
Jul 12, 2024
4,550
4,168
106
ARM ain't very risc - and other way x86 complex instructions are avoidable because they are slow. That 3-operand lea in your example is one of those instructions which are replaced with simpler instructions when performance matters. And because ARM has more general purpose registers less instructions is needed to spill and restore registers. So yeah, in real life code arm and x86 have very similar code density - and arm actually being more dense won't be a surprise.
About this if you count instructions ARM will require equal or more instructions vs x86 to do the same task if we are not counting specialized instructions
 

OneEng2

Senior member
Sep 19, 2022
840
1,107
106
That's something they should aim for, but it will be a struggle for them, as historically Intel has had poor perf/watt in productivity benchmarks against AMD even with a node advantage, and it doesn't look like it's changed moving to TSMC 3nm
I have observed this as well. Sure, Intel peaked at the Netburst days with their no-holes-barred attack on performance through power vs efficiency approach, but it does seem like this has been a recurring issue for them. They do seem to have cycles where they fall back to more effecient cores (like Core and Core 2 and now Lion Cove/Skymont). I am encouraged (unlike many here) by Intel's return to efficiency in this cycle.
Redwood cove was a disaster but Raptor, Golden coves were ok? So a tock is ok but tick is a disaster all of a sudden? Intel themselves said that Raptor cove wasn't even supposed to exist lol, Golden is the tock and Redwood is the tick it's all result of the foundry debacle
I think that this whole Tick/Tock thing is past. The "Tock" generally involved a die shrink. I think that any company that thinks they are going to be able to rely on a big process improvement every other CPU release is going to be headed to some pain and misery as they are confronted with a very new reality in the financial implications of such a strategy as well as the grossly diminishing returns each shrink will give them.
All of these P cores are pretty comparable performance-wise to their competition so basically all P cores for the past 5 years have been a "disaster."

"Disaster" is a serious disrupting to functioning. AFAIK all of these chips actually function.
Agree. I think that we need to be watching the financial success of the product lineup. Being "competitive" from a performance standpoint, even if you are not leading by 25-40% (which I think is a thing of the past due to my previous argument about process improvements having diminishing returns) is enough. Other factors are going to be more important IMHO.
Hmm. Same weird thin box form factor as Zen 5. Pinching pennies on both sides, lol.

BTW, Redwood Cove and Lion Cove are not "disasters" as have been stated here. You guys are missing the forest for the trees. The problem is MCM. M C M!! There are tradeoffs that even the pretty, tightly packed "tiles" cannot nullify. Thats why Intel waited so long to go that route. AMD also has tightly packed chiplets with RDNA 3, and despite the node advantage over RDNA 2, was quite disappointing and power hungry compared to Nvidias still monolithic design. Nvidia sticking with monolithic was also a conscious decision. There is no magic bullet to overcome these limitations, yet. Both Intel and AMD have very smart and talented engineers, but they all have to work within the confines of the current understanding of physics and material sciences.
Intel was late to the MCM game and it arguably hurt them financially. I even wonder about Intel's tile implementation that requires another layer of silicon in addition to the actual tiles (but that is another subject for another thread :) ). Nvidia seems to be relying on their leadership position in the AI market that basically lets them charge whatever they want. I don't see this as a good long term position as eventually, cost will need to be lowered and yields will need to be higher to compete.
Omg! I assumed you would get the full picture by looking at those instructions and what they mean. Those 2 are one of the most common instructions and something a cisc can do in one instruction but a risc can't. Also, when benchmarking general app performance like office, browsers, etc, these type of instructions are used more than int/fp.
It is my understanding that for a very long time now, CISC processors early decode stages put CISC into RISC like (equal length instruction and data) format for pipelining and superscalar execution. I suspect that this is a gross oversimplification of what actually happens though.
 

OneEng2

Senior member
Sep 19, 2022
840
1,107
106
Yep. I remember reading about this too. Once decoded into µops, the rest of the blocks function typically like a RISC cpu (both Intel & AMD I think). But how similar, I don't know. Maybe some forum member with better insight can shed some light.
I must admit, your two assembly language examples took me back :). The only thing I use asm for these days (and I assure you whan I say "I use", I mean my engineering team) is some of the initial setup instructions for the embedded micro. Even that is going by the wayside as many microcontroller suppliers now offer a "setup utility" that generates the setup asm file for you. Kids! Pretty soon, none of them will have the vaguest idea how a CPU works :).

I feel pretty old now as I find myself saying "back in my day" and explaining to young engineers how to handle a problem when the "setup utility" doesn't get it done. Yep, you have to read the engineering documents on how the different setup registers ACTUALLY WORK and then do your own bit and byte calculations (God forbid!).

Anyway, if someone on the forum has a more detailed understanding of the CISC to RISC (ish) decode and pipelining logic, I would be interested in hearing more about it as well.
 

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
I must admit, your two assembly language examples took me back :). The only thing I use asm for these days (and I assure you whan I say "I use", I mean my engineering team) is some of the initial setup instructions for the embedded micro. Even that is going by the wayside as many microcontroller suppliers now offer a "setup utility" that generates the setup asm file for you. Kids! Pretty soon, none of them will have the vaguest idea how a CPU works :).

I feel pretty old now as I find myself saying "back in my day" and explaining to young engineers how to handle a problem when the "setup utility" doesn't get it done. Yep, you have to read the engineering documents on how the different setup registers ACTUALLY WORK and then do your own bit and byte calculations (God forbid!).

Anyway, if someone on the forum has a more detailed understanding of the CISC to RISC (ish) decode and pipelining logic, I would be interested in hearing more about it as well.
Anybody know who came up with micro-ops and when? Saved x86 I would think?
 

Doug S

Diamond Member
Feb 8, 2020
3,581
6,320
136
Apple's P core hasn't had an 10=% gen-on-gen IPC increase since 2019, before they even launched the M series. The M4 P core is the biggest increase they've had since and that's 8%, it's ludicrous to suggest a 25% increase in M5 at this stage

The fact Apple has been getting consistent IPC increases despite gaining 10% frequency with each step from A12Z->M1->M2->M3->M4 seems to be pretty underrated. Where would Intel and AMD frequencies be if they'd been gaining 10% with each new design over the past five years?

Who knows whether they will continue to bump frequency or if their desire for power efficiency will halt those increases soon, and if they do whether that will mean more effort towards IPC increases (though since that's harder to improve the higher it gets that may not be reflected in double digit numbers, at some point even managing 5% at constant frequency will be a huge victory)
 

jdubs03

Golden Member
Oct 1, 2013
1,282
902
136
The fact Apple has been getting consistent IPC increases despite gaining 10% frequency with each step from A12Z->M1->M2->M3->M4 seems to be pretty underrated. Where would Intel and AMD frequencies be if they'd been gaining 10% with each new design over the past five years?

Who knows whether they will continue to bump frequency or if their desire for power efficiency will halt those increases soon, and if they do whether that will mean more effort towards IPC increases (though since that's harder to improve the higher it gets that may not be reflected in double digit numbers, at some point even managing 5% at constant frequency will be a huge victory)
I’d assume that they’ll continue proceeding in the manner of their recent history. If I had to hazard, I guess they will lift both IPC and frequency up the same. They usually bump each gen by 15% overall on ST, so they have to get to that somehow. I suspect there will be some decent IPC gains ahead. If the ARM X925 can gain 15% in IPC, then there still seems to be headroom for further gains.

It’s really just an ARMs race (heh), vanilla with great gains vs. custom staying just ahead. X930 vs M5/A19(P) should be interesting.

But for x86, it would take something revolutionary to reach the performance efficiency of the aforementioned competitors. For Intel it’ll be that unified core that is their first attempt to try and bride the gap.
 

Doug S

Diamond Member
Feb 8, 2020
3,581
6,320
136
If the ARM X925 can gain 15% in IPC, then there still seems to be headroom for further gains.

But that's the thing, they were able to gain 15% because they're starting from further behind. There's always headroom when you're further behind. Apple increased performance by 70% between A8 and A9, precisely because they were well behind on frequency and had a lot more low hanging fruit to grab for IPC. They're on the top of the IPC heap, so it will be harder for them to increase IPC by x% than it is for someone else who is 20% behind them in IPC.