Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

	Intel Raptor Lake U	Intel Wildcat Lake 15W?	Intel Lunar Lake	Intel Panther Lake 4+0+4
Launch Date	Q1-2024	Q2-2026	Q3-2024	Q1-2026
Model	Intel 150U	Intel Core 7	Core Ultra 7 268V	Core Ultra 7 365
Dies	2	2	2	3
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	Intel 18-A + Intel 3 + TSMC N6

CPU	2 P-core + 8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores
Threads	12	6	8	8
Max Clock	5.4 GHz	?	5 GHz	4.8 GHz
L3 Cache	12 MB		12 MB	12 MB
TDP	15 - 55 W	15 W ?	17 - 37 W	25 - 55 W

Memory	128-bit LPDDR5-5200	64-bit LPDDR5	128-bit LPDDR5x-8533	128-bit LPDDR5x-7467
Size	96 GB		32 GB	128 GB
Bandwidth			136 GB/s

GPU	Intel Graphics	Intel Graphics	Arc 140V	Intel Graphics
RT	No	No	YES	YES
EU / Xe	96 EU	2 Xe	8 Xe	4 Xe
Max Clock	1.3 GHz	?	2 GHz	2.5 GHz

NPU	GNA 3.0	18 TOPS	48 TOPS	49 TOPS

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

adroc_thurston · Jun 13, 2025

511 said:
If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.

Well, it's dead because none of it was competitive.

511 · Jun 13, 2025

And it is their fault as well cause they Mess up product definition for GPUs.

adroc_thurston · Jun 13, 2025

511 said:
And it is their fault as well cause they Mess up product definition for GPUs.

Well no their IP is just not up to snuff.

511 · Jun 13, 2025

adroc_thurston said:
Well no their IP is just not up to snuff.

That's not true tbf if we are talking about Xe3 and beyond but Xe1 was dud and Xe2 improved upon it by very much also by IP I mean the architecture not the physical implementation which sucks big time

ondma · Jun 14, 2025

LightningZ71 said:
My comment was directed specifically at the Gen on Gen performance difference between vanilla non-x3d Zen5 vs. vanilla non-x3d Zen6. There are only three cases where I would expect an X3D parts to be slower in ST performance than it's predecessor or the non-x3d sibling:
1- notable peak clock speed deficit, largely gone with Zen5.
2-Thermal throttling due to heavy MT loads running concurrently or poor cooling leading to heat soak. The vanilla part should generate slightly less thermal load and should maintain slightly higher clocks.
3- a weird corner case that exposes the minor latency hit that the 3d cache causes.

My argument for Zen6 is that, if the rumors are true, the 12 core CCX will have 48MB L3 cache at a comparable latency to the 8 core 32MB L3 CCX in Zen5. The 50% larger L3 would theoretically be available for a pure ST scenario, helping any apps that are dependent on it. It should also be less affected by cache pollution as the cache is larger and has more room to tolerate it with. Add in the expected 10% pic improvement from the rumor slide and it should be able to best Arrow Lake too.

I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

Kepler_L2 · Jun 14, 2025

ondma said:
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

You can look at Zen2 vs Zen3, both had 32MB L3 but on Zen2 only 16MB were available for each core due to split CCX design.

adroc_thurston · Jun 14, 2025

ondma said:
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

You want more cache in general for 1T or gaming, and more cache per core for anything nT.
Venice-D goes to 4M L3@core despite a generational membw bump for a reason.

Joe NYC · Jun 14, 2025

511 said:
@Kepler_L2 it was leaked that bLLC would be 144MB L3 but is it using same tech as AMD or is it Intel foundry tech.

TSMC, most likely. Brute force - just a huge die full of SRAM.

511 · Jun 14, 2025

Joe NYC said:
TSMC, most likely. Brute force - just a huge die full of SRAM.

Yup around 20-25mm2 extra required for such a large amount of SRAM.

adroc_thurston · Jun 14, 2025

511 said:
Yup around 20-25mm2 extra required for such a large amount of SRAM.

More, modern SRAM macros need a ton of assist circuitry.

dangerman1337 · Jun 14, 2025

If that 144MB L3 compute die exists wonder what the core count would be? If Intel is releasing BTL-S to Consumer (this seems to be the case: https://videocardz.com/newz/intel-preparing-budget-core-5-120f-6-core-cpu-featuring-only-p-cores) I wonder if they make NVL & RZL extra huge cache versions just be P Core only. Would feel weird to do 8+16 Core tile.

511 · Jun 14, 2025

adroc_thurston said:
More, modern SRAM macros need a ton of assist circuitry.

as long as it offers the best gaming performance it is gonna sell 30-35mm2 extra die will be worth it

dangerman1337 · Jun 14, 2025

511 said:
as long as it offers the best gaming performance it is gonna sell 30-35mm2 extra die will be worth it

Hopefully they do it P Core Only, imagine 12 or more Griffin Cove Cores with RZL-S with all that extra cache and crazy fast DDR5 with low latency? Insane gaming performance.

511 · Jun 14, 2025

dangerman1337 said:
Hopefully they do it P Core Only, imagine 12 or more Griffin Cove Cores with RZL-S with all that extra cache and crazy fast DDR5 with low latency? Insane gaming performance.

This is a 8+16 144MB tile tbh for Nova Lake Griffin Cove / Razer Lake is 2027

DavidC1 · Jun 14, 2025

511 said:
Why is it not that's1MB increase for 1 Cycle Skymont is 19 Cycles 4MB L2.

Latency is also affected by design choices, so you can't compare 1:1 with Skymont, which is lower power, and is also a shared cache for 4x cores.

1 cycle increase for mere 33% capacity increase is nothing good. Even if latency stayed the same, I wouldn't call it impressive, and actually even against Skymont it's just 1 cycle reduction. You'd think a "performance" focused core in 2027 would be better than E core in 2025.

The last Intel core with impressive cache structure was Sandy Bridge. It could overclock to 4.5GHz, the cache was at same clock as the core, and at 8MB capacity had 25 cycle latency, despite being an L3 cache. I wonder how it would fare with 18A?

511 said:
If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.

That's cause they weren't selling. Lot of vendors were on board with mobile ARC GPUs until it found the perf/W was bad and the drivers were atrocious. The last famous Intel DC GPU was Ponte Vecchio, which had enormously complicated packaging that made Lunarlake's MoP complaint like it added a penny to BoM and was maybe 20% faster in cornercase scenarios.

The last JPR dGPU marketshare showed Intel isn't even blip in the radar now. They are 0% according to them. Probably sold few thousands to low tens of thousands. The best case is 0.49%, since numbers are rounded down.

511 · Jun 14, 2025

DavidC1 said:
Latency is also affected by design choices, so you can't compare 1:1 with Skymont, which is lower power, and is also a shared cache for 4x cores.

1 cycle increase for mere 33% capacity increase is nothing good. Even if latency stayed the same, I wouldn't call it impressive, and actually even against Skymont it's just 1 cycle reduction. You'd think a "performance" focused core in 2027 would be better than E core in 2025.

It's good tbh also it's shares between 2 cores as well also bout the P core vs E core in terms of IPC I would think that P and E core have similar IPC by H2 26 when Nova Lake launches.

DavidC1 said:
The last Intel core with impressive cache structure was Sandy Bridge. It could overclock to 4.5GHz, the cache was at same clock as the core, and at 8MB capacity had 25 cycle latency, despite being an L3 cache. I wonder how it would fare with 18A?

8 MB at 25 Cycle is pretty Good I wonder what's the Cycles will be for NVL L3 anything under 50 would be Good imo.

DavidC1 said:
That's cause they weren't selling. Lot of vendors were on board with mobile ARC GPUs until it found the perf/W was bad and the drivers were atrocious. The last famous Intel DC GPU was Ponte Vecchio, which had enormously complicated packaging that made Lunarlake's MoP complaint like it added a penny to BoM and was maybe 20% faster in cornercase scenarios.

Not to mention ARC has been delayed so much.

DavidC1 said:
The last JPR dGPU marketshare showed Intel isn't even blip in the radar now. They are 0% according to them. Probably sold few thousands to low tens of thousands. The best case is 0.49%, since numbers are rounded down.

Well maybe they already shipped in Q4 25 when they were 1% and after that low shipments.

DavidC1 · Jun 14, 2025

511 said:
It's good tbh also it's shares between 2 cores as well also bout the P core vs E core in terms of IPC I would think that P and E core have similar IPC by H2 26 when Nova Lake launches.

In Sandy Bridge, it went from 41 cycles to 25 cycles, nearly a 40% reduction, while clocking much higher in the new Turbo mode consistently as well.

They aren't losing money on ARC because of high BoM, that is nonsense. They are losing money on ARC because basically there's no volume. They could have $50 BoM and it would still lose them money.

AcrosTinus · Jun 14, 2025

511 said:
Yeah but not anymore going forward the private alley is going away 2P people have to share 😂.

If only they weren't a bunch of idiot in Intel DC GPU space not cancelling everything.

Why is it not that's1MB increase for 1 Cycle Skymont is 19 Cycles 4MB L2.

Their Heydey died with 10nm delays lol.

I have a feeling that this is the secret on how they were able to increase the P-core count. Instead of having a stop per P-Core and E-Core cluster, 2P-Cores share a stop and maybe even the E-Core cluster is now 8 cores big. This sounds more realistic to me than two compute dies with two separated ring-buses? with each having 12stops.

Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.

511 · Jun 14, 2025

DavidC1 said:
In Sandy Bridge, it went from 41 cycles to 25 cycles, nearly a 40% reduction, while clocking much higher in the new Turbo mode consistently as well.

Didn't know that it is insane improvement lol.

DavidC1 said:
They aren't losing money on ARC because of high BoM, that is nonsense. They are losing money on ARC because basically there's no volume. They could have $50 BoM and it would still lose them money.

Yes but I think the volume they are using now is due to the prepayment they did for Arc.

AcrosTinus said:
I have a feeling that this is the secret on how they were able to increase the P-core count. Instead of having a stop per P-Core and E-Core cluster, 2P-Cores share a stop and maybe even the E-Core cluster is now 8 cores big. This sounds more realistic to me than two compute dies with two separated ring-buses? with each having 12stops.

Yes also I doubt 8E core cluster 12 -> 8 is a good amount of reduction for cores in Ring.

AcrosTinus said:
Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.

Each die has separate ring and they are connecting using some shared fabric.

DavidC1 · Jun 14, 2025

AcrosTinus said:
Could also be a way to reduce the stops per ring to 8, essentially having 16 stops if two dies are really employed in nova.

And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

511 said:
Didn't know that it is insane improvement lol.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?

Io Magnesso · Jun 14, 2025

I want more IPC performance for the X86...

Io Magnesso · Jun 14, 2025

DavidC1 said:
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?

There are rumors that the NEX division will be given up, but the network/wifi I don't think it's possible to let go
I think that the dismantling of the NEX division will be merely a change of personnel within the Intel company.

AcrosTinus · Jun 14, 2025

DavidC1 said:
And AMD doesn't have this problem. An engineering-issue, or should I say lack of it? Oh right, cause they lack engineers. Back to crossbar, or rethought of mesh, do something new. The 2011 Sandy Bridge design is showing it's age very much.

Yes, that is due to the Ring, which was well thought out and novel design. They regressed every gen since then. Their fabric has also been mediocre at best since. Which are details that are lost when you have brain drain.

They want to give up their Networking/WiFi division now? What is on their minds?

That is true, Intel introduced the mesh in HEDT and benchmarks show that if clocked high enough the penalty compared to the ring are minimal but the scaling it vastly superior. Had they invested some time in a mainstream variant, the mesh could have been vastly more performant but who knows....

AMD being on a mesh is news to me, this explains the sub 20ns core to core latency within a CCD.

Doug S · Jun 14, 2025

ondma said:
I am not a chip designer, so this is a legitimate question, not a criticism. Which is more important, the absolute amount of cache, or the cache per core? I ask this because even though the proposed Zen 6 CCD has 50% more cache, it also has 50% more cores, so the cache per core is 4MB in both configurations.

It is the same cache per core only if you use all cores.

In the world most of us occupy our CPUs are typically loading only a few cores at a time so you get more cache per core in those circumstances. But even if you're the outlier who is often running all cores at 100% you aren't any worse off than before and now you have 50% more cores for your outlier tasks.

Thibsie · Jun 14, 2025

Doug S said:
It is the same cache per core only if you use all cores.

In the world most of us occupy our CPUs are typically loading only a few cores at a time so you get more cache per core in those circumstances. But even if you're the outlier who is often running all cores at 100% you aren't any worse off than before and now you have 50% more cores for your outlier tasks.

Yeah, but might thread 'eat' the second core cache ? I mean, both core will compte for cache then, no ?
Also, more read/write ports could slow cache access (speed/latency) or augment complexity ?
This might be completely false, I dunno much about cache workings.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Senior member

Senior member

Senior member

Diamond Member

Golden Member