Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	12 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop & Mobile H&HX	Mobile U Only	Mobile H
Process Node	Intel 4	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	8P + 16E	4P + 4E	4P + 8E
LLC	24 MB	36 MB ?	12 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

Nothingness · Oct 3, 2024

Hulk said:
This is the reason the PC and Apple can and will continue to coexist. They both serve different markets within the giant computer "tent." Some people, like me, want more control over the PC both from a software and hardware point of view. I want to build exactly what i want, I want to be able to run software I might have from 30 years ago (Quickbooks for example).

These old software have to be run with emulators, because Windows dropped support for older API, no? So how do you run your 30 years old SW? How do you run 16-bit software? If you have to use emulation then any machine, no matter the CPU or the OS, might run it equally well.

I doubt I can run any of my old games with Glide without using using some unsupported library or even full emulation. No existing PC would accept my Glide card (yes I still have an old Voodoo2 card). And that's even without talking about some painful protections that might not work anymore due to how level they were.

At some point legacy is becoming a hindrance to progress. And you have no guarantee full compatibility is maintained by newer versions of Windows and by new PC (or even CPUs).

That being said I agree Apple and PC ecosystems are vastly different and it's much easier for Apple to move forward.

Doug S · Oct 3, 2024

The Hardcard said:
2. Gerard Williams III designs are top notch, but his teams have freedoms x86 designers don’t have. They aren’t accountable to customers running software written in the 1980s and 1990s. Apple’s design team is the freeest, they can toss features the company pushed as the new direction developers had to adopt less than a decade earlier.

That's probably why Intel started talking about "X86-S", which would be a 64 bit only version of x86 that cuts out legacy cruft like segmentation, rings 1 & 2, 16 bit addressing, and some obsolete I/O related features. That doesn't address issues with variable length instructions with all the prefixes, but it would mark the first time Intel produced an x86 CPU that wasn't capable of running as an 8086.

511 · Oct 3, 2024

I think they need to team up with AMD on x86S it's the only way forward for the x86 brigade Apple's ability to cut thing whenever they want is PITA for many what for now may not work tomorrow with x86 we have been used to like even it is 20 years old. Apple has the least complexity in their ISA/design their design are quite good as well also on Intel P core they support AVX-512 in HW it's just they fuse it off on factory

Raqia · Oct 3, 2024

Doug S said:
That's probably why Intel started talking about "X86-S", which would be a 64 bit only version of x86 that cuts out legacy cruft like segmentation, rings 1 & 2, 16 bit addressing, and some obsolete I/O related features. That doesn't address issues with variable length instructions with all the prefixes, but it would mark the first time Intel produced an x86 CPU that wasn't capable of running as an 8086.

The cost in terms of PPA for those features is likely de-minimus on today's nodes. However, the cost of designing-in and validating those decades of features into a new u-arch is enormous. Their biggest advantage in paring things down to X86-S will be reducing those issues in getting a new CPU to market.

DavidC1 · Oct 3, 2024

Doug S said:
That's not true, at least not based on the Lunar Lake numbers FlameTail listed a few posts earlier - are those not correct?

FlameTail got the I-cache number wrong. In Redwood Cove you have 48KB L1D and 64KB L1i. Lion Cove keeps the same 64KB L1i, but splits L1D into two levels - 48KB and 192KB. It is an improvement but not "L1" as Intel likes you to believe.

Apple can do 3 cycle latency for the 192KB L1i and 128KB L1D - Lion Cove is 4 cycle for 48KB "L0" and 9 cycle for 192KB "L1". It is more correct to say Lion Cove has 48KB L1D and 192KB L1.5, as C&C points out not only the latency is substantially higher and exists to be between L1 and L2, but the bandwidth is lot lower than L1 too.

The 48KB "L0" they call it doesn't decrease in cycles too much either. It goes from 5 to 4. Calling it L0 is just marketing. It would be more accurate to say the uop cache is L0, at least it's doing something much greater than L1.

The Hardcard said:
Bloat would be about the logic area, which is much closer than flat area calculations that include megabytes of cache others aren’t using.

And Lion Cove is bloated, because without the L2 cache it's 3.4mm2 for Lion Cove and 2.5mm2 for M3.

FlameTail · Oct 3, 2024

DavidC1 said:
FlameTail got the I-cache number wrong. In Redwood Cove you have 48KB L1D and 64KB L1i. Lion Cove keeps the same 64KB L1i, but splits L1D into two levels - 48KB and 192KB. It is an improvement but not "L1" as Intel likes you to believe.

Yeah, I got the L1d and L1i numbers mixed up for Lion Cove.

	M3-P	Lion Cove
L0d		48 KB
L1d	128 KB	192 KB
L1i	192 KB	64 KB
L2	16 MB (shared)	2.5 MB (private)
L3		12 MB (shared)

Intel is spending more on data caches than the instruction cache. Any idea why?

Apple and other ARM vendors spend more on instruction caches than data caches.

DavidC1 said:
Apple can do 3 cycle latency for the 192KB L1i and 128KB L1D - Lion Cove is 4 cycle for 48KB "L0" and 9 cycle for 192KB "L1". It is more correct to say Lion Cove has 48KB L1D and 192KB L1.5, as C&C points out not only the latency is substantially higher and exists to be between L1 and L2, but the bandwidth is lot lower than L1 too.

Indeed. Although Intel matches Apple in terms of the quantity of cache (number of bytes), the quality of the cache itself is inferior (latency, bandwidth, etc..).

DavidC1 said:
And Lion Cove is bloated, because without the L2 cache it's 3.4mm2 for Lion Cove and 2.5mm2 for M4.

2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².

DavidC1 · Oct 3, 2024

FlameTail said:
Intel is spending more on data caches than the instruction cache. Any idea why?

No, but Monts also do this too. It's 64KB L1i and 32KB L1D.

FlameTail said:
2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².

Yup, corrected.

9949asd · Oct 3, 2024

SiliconFly said:
Is ARL-S desktop gonna run hotter than RPL-S? Raptor itself runs real hot while gaming. What kinda of Arrow Lake temperatures are we talking about?

265k run p95 sfft at stock all p5.2 e4.6, 237w 75c with 360aio.

I don’t think ARL temperature will be a problem.

MS_AT · Oct 3, 2024

Both Monts and newest ARM designs are not using uops caches iirc, which could explain why data caches are bigger than instruction caches.

Another angle could be instruction density but I am unaware of any data that could support the claim that x64 would be more efficient in code footprint than ARM.

The Hardcard · Oct 3, 2024

FlameTail said:
Yeah, I got the L1d and L1i numbers mixed up for Lion Cove.

M3-P Lion Cove
L0d 48 KB
L1d 128 KB 192 KB
L1i 192 KB 64 KB
L2 16 MB (shared) 2.5 MB (private)
L3 12 MB
(shared)

Intel is spending more on data caches than the instruction cache. Any idea why?

Apple and other ARM vendors spend more on instruction caches than data caches.

Indeed. Although Intel matches Apple in terms of the quantity of cache (number of bytes), the quality of the cache itself is inferior (latency, bandwidth, etc..).

2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².

Both AMD and Intel have large separate micro-op caches that store decoded instructions. x86 is full of complex, variable-length instructions that their front ends burn a lot of energy to break down each time. It is part of the reason for the old claim of x86 ISA liabilities that some people still spout today. Why RISC PowerPC, ARM RISC-V will take compute to places that x86 can’t follow and thus x86 will die.

Micro-op caches are a clever and effective way to mitigate the x86 instruction set and for 2024, both AMD and Intel have huge micro-op caches. They massively reduce the latency and power requirements for x86 ISA. I am actually don’t know how useful the L1 instruction cache is. Both companies keep them, I guess there is some reason, but it’s not clear to me if growing them would help more than continuing to improve the micro-op caches.

Nothingness · Oct 3, 2024

MS_AT said:
Another angle could be instruction density but I am unaware of any data that could support the claim that x64 would be more efficient in code footprint than ARM.

I once measured dynamic code size (I mean the total number of instruction bytes executed) on a SPEC 2006 403.gcc run. x86-64 used about 10% fewer bytes than AArch64 (despite a significant use of rep stos/mov instructions which are denser than the Arm equivalent sequences). That was years ago, but I don't think things changed significantly.

Of course this should be redone with proper Icache simulation and various sizes. This could be done by a motivated programmer by using some QEMU plugin, I'm too lazy

The Hardcard · Oct 3, 2024

DavidC1 said:
And Lion Cove is bloated, because without the L2 cache it's 3.4mm2 for Lion Cove and 2.5mm2 for M3.

I did not realize you could fit that much cache in that little of extra space. Well yeah, 3.4 mm2 for core logic on N3B? There is some bloat there.

Schmide · Oct 3, 2024

WTH is a L0 ?

gdansk · Oct 3, 2024

Schmide said:
WTH is a L0 ?

If you think that's confusing soon we'll have level negative one cache.

DavidC1 · Oct 3, 2024

The Hardcard said:
I did not realize you could fit that much cache in that little of extra space. Well yeah, 3.4 mm2 for core logic on N3B? There is some bloat there.

Caches are very space and energy efficient.* There is logic that if adding 1mm2 of logic area doesn't provide performance improvements equal to adding 1mm2 of additional cache, then it shouldn't be done.

*Space efficiency is a reason why SRAM has reached scaling limits first. By the way, DRAM, reached limits long, long ago, because it's far smaller than SRAM. Ultimately the limiter is process. And the limits are far far larger than the Silicon Atom. The average size is 10-20nm, not much different than 22nm generation.

Schmide said:
WTH is a L0 ?

Newest Intel marketing term for L1.

DavidC1 · Oct 3, 2024

The Hardcard said:
Both AMD and Intel have large separate micro-op caches that store decoded instructions. x86 is full of complex, variable-length instructions that their front ends burn a lot of energy to break down each time. It is part of the reason for the old claim of x86 ISA liabilities that some people still spout today. Why RISC PowerPC, ARM RISC-V will take compute to places that x86 can’t follow and thus x86 will die.

ISA argument again eh? It'll never stop. There's no such thing as Apple vs Apple vs comparison in the real world, you only think that way if you are stuck with the PC enthusiast benchmarking mindset where you have to equalize everything.

Look at Lunarlake again. Many comments bring up the ISA too. Guess why LNL happened? Because Apple basically slapped them silly, and Intel was forced to confront reality, out of their artificially created x86 licensing bubble. People create products, not the other way around.

@adroc_thurston And I keep telling you Intel thinking BOM is the greatest thing is what brought them into this perilous situation in the first place. Unless you believe them and are short sighted as the whole of Intel is?

ikjadoon · Oct 3, 2024

I think they need to team up with AMD on x86S

FWIW, AMD has been studying Intel's X86-S proposal, according to this TechPowerUp interview of AMD's David McAfee in July 2023:

TPU: Any thoughts on Intel's X86-S proposal that focuses on 64-bit and gets rid of a lot of legacy capability, so cores can be simplified and use less silicon area at the same time?

David: We have absolutely been looking at that. We've been evaluating similar proposals for a long, long time. It is both incredibly beneficial to make that break, also very, very complicated. I think it is a non-trivial exercise to strip out legacy compatibility in a core architecture as well as time that in a way so that it matches up perfectly with an OS transition that also eliminates a lot of these legacy compatibilities. I would say "very interesting," something that really, we would have to look at as an industry and make that move in concert. We find Intel's proposal pretty intriguing as we look at that.

9949asd · Oct 3, 2024

Hitman928 said:
Other outlets I’ve seen haven’t shown this large discrepancy between sensors and power draw. For instance, when in as close to an identical laptop as possible, STX and MTL systems pull nearly identical wall power when both are set to 28 W limits.

View attachment 108622
An even more direct measurement of the cables on desktop systems show that the sensors are spot on once factoring in the VRM efficiency. Certainly no less accurate than Intel’s.

View attachment 108623

I’d need to see much more proof of this AMD sensors are bad theory as we have clear evidence to the contrary. Using different laptops, even with the display off, has too many variables to try and use it to test SoC power/performance at the system level. It’s a silly test. System level power/performance, sure, but it’s going to be individual to the laptop tested.

Edit: Another example in the opposite direction of why drawing SoC conclusions on system power draw is silly,

View attachment 108624

You know the geekwan test is using board power(excluding screen) limited to 30w right?

258v on board devices use 8w
370 on board devices use 15w
185h on board devices use 8w

So, why there is a 7w difference on the board? The all have ram and m2.

Also the battery life test will not lie

The sensors type or location on intel and amd chip for sure will be different, even lunar lake and MTL will be different. The software reading is not guarantee correct. that’s why they use system power or board power.

itsmydamnation · Oct 3, 2024

FlameTail said:
Yeah, I got the L1d and L1i numbers mixed up for Lion Cove.

M3-P Lion Cove
L0d 48 KB
L1d 128 KB 192 KB
L1i 192 KB 64 KB
L2 16 MB (shared) 2.5 MB (private)
L3 12 MB
(shared)

Intel is spending more on data caches than the instruction cache. Any idea why?

Apple and other ARM vendors spend more on instruction caches than data caches.

Indeed. Although Intel matches Apple in terms of the quantity of cache (number of bytes), the quality of the cache itself is inferior (latency, bandwidth, etc..).

2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².

this table is meaningless without latencies ,
But if i was to guess on why intel went this structure its again because of TLB lookups / 4k page size. it makes scaling the the first cache level really hard on x86.

FlameTail · Oct 3, 2024

ikjadoon said:
FWIW, AMD has been studying Intel's X86-S proposal, according to this TechPowerUp interview of AMD's David McAfee in July 2023:

The mention about syncing the x86 -> x86s transition to an OS transition is interesting.

I wish Microsoft would make the next version of Windows (Windows 12?) ditch support for the decades of compatibility, and make it a clean new OS built from the ground-up to support modern CPU architectures (X86S, ARMv9, RISC-V).

itsmydamnation · Oct 3, 2024

FlameTail said:
The mention about syncing the x86 -> x86s transition to an OS transition is interesting.

I wish Microsoft would make the next version of Windows (Windows 12?) ditch support for the decades of compatibility, and make it a clean new OS built from the ground-up to support modern CPU architectures (X86S, ARMv9, RISC-V).

No they will just f it up.

Current windows server is so god damn jank with its settings ( really terrible) vs control pannel ( terrible but different and less ) and icons that no longer exist but you still need/ want like hdwwiz. Last thing i need is them to try something that architecturally complex......

Mopetar · Oct 3, 2024

They could easily just make an emulator for running legacy stuff without much of a problem. Apple did it twice and they moved CPU architectures, which is a far bigger jump. Modern chips are more than capable enough to support any extra overhead that a sandbox like that would require.

A fresh start and an opportunity to reverse decades of bad decisions that modern versions of Windows are beholden to would be good. Also if they chain all of their UI/UX people up in the basement so they stop moving stuff around for no good reason I'd appreciate it greatly.

poke01 · Oct 3, 2024

There won’t be any major changes to x86 till Nova Lake. So for people that want exciting changes it won’t happen till late 2026.

we are in a slug till in then. Arrow Lake is fine, saves only by Skymont and panther lake won’t have APX. Still the CPU industry is better than ever. Looking at the consumer GPU side of things, it’s no where near as bleak.

AMDK11 · Oct 3, 2024

Chips and Cheese Interviews Ronak Singhal

Chips and Cheese Interviews Ronak Singhal

For today’s article we have another video interview for you folks, this time with Ronak Singhal from Intel where we talk about Granite Rapids AP.

chipsandcheese.com

From this interview, it appears that RedwoodCove's 8-Wide decoder for GraniteRapids is a typo. GraniteRapids and MeteorLake are based on exactly the same RedwoodCove (6-Wide decoder), and the difference mainly comes down to 2x512bit (AVX512) and AMX.

Hulk · Oct 3, 2024

Nothingness said:
These old software have to be run with emulators, because Windows dropped support for older API, no? So how do you run your 30 years old SW? How do you run 16-bit software? If you have to use emulation then any machine, no matter the CPU or the OS, might run it equally well.

I doubt I can run any of my old games with Glide without using using some unsupported library or even full emulation. No existing PC would accept my Glide card (yes I still have an old Voodoo2 card). And that's even without talking about some painful protections that might not work anymore due to how level they were.

At some point legacy is becoming a hindrance to progress. And you have no guarantee full compatibility is maintained by newer versions of Windows and by new PC (or even CPUs).

That being said I agree Apple and PC ecosystems are vastly different and it's much easier for Apple to move forward.

One older application that I still use that comes to mind is QuickBooks 99. I am running it on a few PCS and I don't remember exactly how I get it working but it's some type of compatibility mode. I haven't had to reinstall in a long time. The application is so old. I had to copy the eight or nine floppy disks! I have a few others like that but I'd have to look in my computer to remember them. QuickBooks 99 same application. Has business records going back to 1998? If I remember correctly. Save me a ton of money. No upgrades, no nothing. I just keep using it.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Diamond Member

Diamond Member

Platinum Member

Member

Golden Member

Diamond Member

Golden Member

Member

Senior member

Senior member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Golden Member

Senior member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Chips and Cheese Interviews Ronak Singhal​

Diamond Member

Chips and Cheese Interviews Ronak Singhal