Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

	Intel Raptor Lake U	Intel Wildcat Lake 15W?	Intel Lunar Lake	Intel Panther Lake 4+0+4
Launch Date	Q1-2024	Q2-2026	Q3-2024	Q1-2026
Model	Intel 150U	Intel Core 7	Core Ultra 7 268V	Core Ultra 7 365
Dies	2	2	2	3
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	Intel 18-A + Intel 3 + TSMC N6

CPU	2 P-core + 8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores
Threads	12	6	8	8
Max Clock	5.4 GHz	?	5 GHz	4.8 GHz
L3 Cache	12 MB		12 MB	12 MB
TDP	15 - 55 W	15 W ?	17 - 37 W	25 - 55 W

Memory	128-bit LPDDR5-5200	64-bit LPDDR5	128-bit LPDDR5x-8533	128-bit LPDDR5x-7467
Size	96 GB		32 GB	128 GB
Bandwidth			136 GB/s

GPU	Intel Graphics	Intel Graphics	Arc 140V	Intel Graphics
RT	No	No	YES	YES
EU / Xe	96 EU	2 Xe	8 Xe	4 Xe
Max Clock	1.3 GHz	?	2 GHz	2.5 GHz

NPU	GNA 3.0	18 TOPS	48 TOPS	49 TOPS

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

511 · Nov 30, 2024

Anyways anyone having area value for 4C Skymont+L2 ?

msj10 · Nov 30, 2024

511 said:
Anyways anyone having area value for 4C Skymont+L2 ?

6.8 mm^2

DavidC1 · Nov 30, 2024

511 said:
Anyways anyone having area value for 4C Skymont+L2 ?

Can't really figure out the 4C+L2 size from that figure. The compute tile has other stuff such as the interconnect necessary to connect to other tiles. It has EMIB too, so that takes a bit. If it needs a router, it'll take little bit more too.

55mm2 for 24 cores on 18A is ok considering.

mzocyteae · Nov 30, 2024

511 said:
https://twitter.com/x/status/1862509554736800177
Darkmont area efficiency the 24 core compute tile is just 55mm2 with the L2 on 18A😮

Sierra Forest is ~578m^2/38 cluster with IMC+L3+mesh.
55mm2/24 core (6 cluster?) is unsurprising, while the smaller overall size definitely helps interconnect latency.
It would be mostly interesting to see how much L3 it gets.

cannedlake240 · Nov 30, 2024

511 said:
https://twitter.com/x/status/1862509554736800177
Darkmont area efficiency the 24 core compute tile is just 55mm2 with the L2 on 18A😮

The tile apparently has no L3 cache so 55mm2 isn't unsurprising

mzocyteae · Nov 30, 2024

DavidC1 said:
It used to be doubling the stages got you 80%+ gains. How is it worth now? Unified Core should aim for at maximum low-5GHz.

As for FP, it should stay 256-bit for client. 512-bit is a waste, and was a mistake to do it with AVX-512. It should have been AVX3-256.

It is Intel's (political) stubbornness to not implement avx512 in client cores. AMD already proved that avx512 can be implemented with 256 datapath and gets reasonable throughput gains.
If you ever touched simd codes, you won't spam "AVX3-256" bullshit. Most (all?) avx512 instructions have xmm/ymm counterparts.

IMO neither intel nor arm did it correct.
SIMD should be designed with both fixed-size vector width and predicate registers.

511 · Nov 30, 2024

DavidC1 said:
Can't really figure out the 4C+L2 size from that figure. The compute tile has other stuff such as the interconnect necessary to connect to other tiles. It has EMIB too, so that takes a bit. If it needs a router, it'll take little bit more too.

I meant for Arrow Lake 4C+L2 on N3B
This confirms the die has empty area to facilitate EMIB+Mesh+TSV

https://twitter.com/x/status/1862790501524455522

511 · Nov 30, 2024

cannedlake240 said:
The tile apparently has no L3 cache so 55mm2 isn't unsurprising

That's what i said lol

dttprofessor · Nov 30, 2024

55mm2?
so small?

511 · Nov 30, 2024

dttprofessor said:
55mm2?
so small?

Only 24 Darkmont cores L2 TSVs and EMIB in that small area

DavidC1 · Nov 30, 2024

If cluster size is similar to Skymont, and doesn't have more L2 cache, then it means 18A is similar to N3B in size.

cannedlake240 · Nov 30, 2024

Continued Momentum for Intel 18A

Progress on lead product designs and process readiness is enabling us to bridge from Intel 20A earlier than we’d planned.

www.intel.com

Are these CLF wafer pics?

SteinFG · Nov 30, 2024

cannedlake240 said:
Continued Momentum for Intel 18A

Progress on lead product designs and process readiness is enabling us to bridge from Intel 20A earlier than we’d planned.

www.intel.com

Are these CLF wafer pics?

yep, seems like it. 6 clusters per chip

dttprofessor · Dec 1, 2024

511 said:
Only 24 Darkmont cores L2 TSVs and EMIB in that small area

emib should be on the base tile?

DavidC1 · Dec 1, 2024

mzocyteae said:
It is Intel's (political) stubbornness to not implement avx512 in client cores. AMD already proved that avx512 can be implemented with 256 datapath and gets reasonable throughput gains.
If you ever touched simd codes, you won't spam "AVX3-256" bullshit. Most (all?) avx512 instructions have xmm/ymm counterparts.

And what's the BS? That 512-bit FPUs are overkill for CPUs in this day and age?

AVX3-256 means ALL AVX-512 instructions should be kept, without needing 512-bit registers and FPU.

AMD cores are also very far away from being the most efficient design so it very much applies. Go look at many of Intel's older presentations. 512-bit was solely to stave off Nvidia's advance in HPC, note the first AVX512 product was supposed to be Xeon Phi. And they paid for it with decreased clocks and fragmented ISAs. Something they could have avoided if they focused on GPUs way earlier and stayed on a far sane 256-bit FPU.

ARM vendors and Skymont does a far better approach of adding more FPUs. It straight up benefits everything without putting the burden on users and programmers.

adroc_thurston · Dec 1, 2024

DavidC1 said:
ARM vendors and Skymont does a far better approach of adding more FPUs

Lmao no, 2 FMAs is the maximum anything real SIMD code would saturate.

DavidC1 said:
without putting the burden on users and programmers.

It literally puts the burden directly on the SIMD slave. You need to juggle more math to saturate moar FMA units.
Man, you really never ever heard SIMD people ranting?

igor_kavinski · Dec 1, 2024

adroc_thurston said:
Man, you really never ever heard SIMD people ranting?

Share a few dozen links of rants please

mzocyteae · Dec 1, 2024

DavidC1 said:
AVX3-256 means ALL AVX-512 instructions should be kept, without needing 512-bit registers and FPU.

Lmao, do you understand what is ISA?
And you can't even understand this:

avx512 instructions have xmm/ymm counterparts.

DavidC1 said:
ARM vendors and Skymont does a far better approach of adding more FPUs. It straight up benefits everything without putting the burden on users and programmers.

avx512 requires much less effort to programming than avx256.
The only (albeit big) problem is ISA fragmentation caused by Intel's silly decision to not implement avx512 in client cores.

511 · Dec 1, 2024

mzocyteae said:
The only (albeit big) problem is ISA fragmentation caused by Intel's silly decision to not implement avx512 in client cores.

Also depreciation of instructions on their whim

OneEng2 · Dec 1, 2024

DavidC1 said:
If cluster size is similar to Skymont, and doesn't have more L2 cache, then it means 18A is similar to N3B in size.

Which is what has been the general consensus among speculations with the information available to date.

It looks to me like AMD intends to complete with these chips from N3P in desktop and laptop while in server, N2 will be used.

... which again brings me back to the financial implications of Intel using a more expensive process and more expensive equipment. Of course some of this cost is mitigated by Intel not having to pay for TSMCs profit.

I wonder if CWF has AVX512 and SMT? Hard to see how it can compete in DC and HPC without them.

511 · Dec 1, 2024

OneEng2 said:
Which is what has been the general consensus among speculations with the information available to date.

Yes we all have been Saying N3 Density and Slightly better performance than N3P but bit less than N2 and danniel nenni confirms it

Intel 18A "too good" but design lags

Got this from X but too interesting to let it go by: "The design service and design enablement is still fairly weak at Intel right now, but the technology is just way too good." He noted Samsung's MOL had issues, but no comments on TSMC.

semiwiki.com

OneEng2 said:
It looks to me like AMD intends to complete with these chips from N3P in desktop and laptop while in server, N2 will be used.

... which again brings me back to the financial implications of Intel using a more expensive process and more expensive equipment. Of course some of this cost is mitigated by Intel not having to pay for TSMCs profit.

From where do you get the expensive equipment both use almost the same equipment there is no dual sourcing in many of the critical things in semi manufacturing. intel sells one such equipment masks which Intels subsidiary produces so they can charge TSMC more for it lol.
How do you rate one process more expensive than other without proof and you simply said the most important point doesn't matter the foundry you know that if foundry has 30% margin on 18A and product another 30% that is roughly 70% margin on a chip vs AMDs which would not be that much
Also Zen 6 is not arriving before 2H26 at the earliest so it has like 1 year of reign

OneEng2 said:
I wonder if CWF has AVX512 and SMT? Hard to see how it can compete in DC and HPC without them.

First thing it is not a HPC Chip it would loose to 5C in AVX-512 but that's about it does it states anywhere SMT is necessary it is entirely dependent on the people who buy it how much they feature Security SMT is not necessary. AVX-512 might be which might be mitigated by AVX10/256 somewhat

OneEng2 · Dec 1, 2024

511 said:
From where do you get the expensive equipment both use almost the same equipment there is no dual sourcing in many of the critical things in semi manufacturing. intel sells one such equipment masks which Intels subsidiary produces so they can charge TSMC more for it lol.
How do you rate one process more expensive than other without proof and you simply said the most important point doesn't matter the foundry you know that if foundry has 30% margin on 18A and product another 30% that is roughly 70% margin on a chip vs AMDs which would not be that much
Also Zen 6 is not arriving before 2H26 at the earliest so it has like 1 year of reign

First thing it is not a HPC Chip it would loose to 5C in AVX-512 but that's about it does it states anywhere SMT is necessary it is entirely dependent on the people who buy it how much they feature Security SMT is not necessary make or break AVX-512 might be

Your last sentence is a bit confusing, can you clarify?

Intel originally purchased 5000 series ASML machines (high NA) which for 18A. Now, Intel intends to wring out the high NA machines in 2025, but not use them for production.

GAA and BSPD both cause more passes of process steps over FinFET and FSPD do. This makes the process more expensive. Intel also produces fewer chips on their high end equipment than does TSMC which makes the amortization costs higher for Intel. Furthermore, without High NA, Intel will have to rely on double patterning which will also raise costs. Throughout 2025, Intel will be producing only CWF chips on 18A as I understand it. This is a pretty low volume chip compared to desktop and laptop markets.

AMD using N3P for the high volume desktop and laptop segments significantly reduces their cost over Intel's use of N3B today (N3B also has more passes as I understand it than N3E, N3P and N3X) and (my guess) Intel's own 18A process.

It seems like Intel is willing to throw money at their chips to keep them competitive while AMD manages to maintain competitive products at much lower production costs on less expensive process nodes.

cannedlake240 · Dec 1, 2024

OneEng2 said:
Throughout 2025, Intel will be producing only CWF chips on 18A as I understand it

18A production fab won't be online until 2H 25, where it'll make PTL, CLF and external foundry chips. And CLF isn't very low volume, each one has 660mm2 of 18A silicon.

511 · Dec 1, 2024

OneEng2 said:
Your last sentence is a bit confusing, can you clarify?

I meant that SMT is not a break or make feature as some people will turn it off some may not also for AVX-512 CLW-F support AVX10/256 so it gets everything AVX-512 has just with 256 bit data path

OneEng2 said:
Intel originally purchased 5000 series ASML machines (high NA) which for 18A. Now, Intel intends to wring out the high NA machines in 2025, but not use them for production.

That is a gamble will it pay or not only time will tell

OneEng2 said:
GAA and BSPD both cause more passes of process steps over FinFET and FSPD do. This makes the process more expensive. Intel also produces fewer chips on their high end equipment than does TSMC which makes the amortization costs higher for Intel. Furthermore, without High NA, Intel will have to rely on double patterning which will also raise costs. Throughout 2025, Intel will be producing only CWF chips on 18A as I understand it. This is a pretty low volume chip compared to desktop and laptop markets.

Both TSMC/Intel will use multi patterning at N2/18A but BSPDN allows Intel to relax pitches a but increases complexity as for how expensive it is only Intel/TSMC know the cost

OneEng2 said:
AMD using N3P for the high volume desktop and laptop segments significantly reduces their cost over Intel's use of N3B today (N3B also has more passes as I understand it than N3E, N3P and N3X) and (my guess) Intel's own 18A process.

N3P products(Zen6) won't be launching before 2H26 as for cost of 18A N3B/N3P you are assuming this on the basis of Intel 7 cost structure also N3B and N3P price difference won't be significant both are N3 family.

we will know more details by IEDM 24 in December so hold your horses before making conclusion 🙂

OneEng2 said:
It seems like Intel is willing to throw money at their chips to keep them competitive while AMD manages to maintain competitive products at much lower production costs on less expensive process nodes.

This is true rn but how true it will be next year oncw it ramps the process

cannedlake240 · Dec 1, 2024

511 said:
This is true rn but how true it will be next year oncw it ramps the process

NVL uses N2 on premium skus apparently, probably just to end up losing against Zen 6 Vcache... Unless it's that 144mb L3 tile

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Diamond Member

Member

Platinum Member

Member

Senior member

Member

Diamond Member

Diamond Member

Member

Diamond Member

Platinum Member

Senior member

Senior member

Member

Platinum Member

Diamond Member

Lifer

Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Senior member

Diamond Member

Senior member