Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 547 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
741
700
106
PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E012 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4TSMC N3BTSMC N3BIntel 18A
DateQ4 2023Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P8P + 16E4P + 4E4P + 8E
LLC24 MB36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15

LNL-MX.png

Intel Core Ultra 100 - Meteor Lake

INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



Clockspeed.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,023
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,511
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,358
136
This is the reason the PC and Apple can and will continue to coexist. They both serve different markets within the giant computer "tent." Some people, like me, want more control over the PC both from a software and hardware point of view. I want to build exactly what i want, I want to be able to run software I might have from 30 years ago (Quickbooks for example).
These old software have to be run with emulators, because Windows dropped support for older API, no? So how do you run your 30 years old SW? How do you run 16-bit software? If you have to use emulation then any machine, no matter the CPU or the OS, might run it equally well.

I doubt I can run any of my old games with Glide without using using some unsupported library or even full emulation. No existing PC would accept my Glide card (yes I still have an old Voodoo2 card). And that's even without talking about some painful protections that might not work anymore due to how level they were.

At some point legacy is becoming a hindrance to progress. And you have no guarantee full compatibility is maintained by newer versions of Windows and by new PC (or even CPUs).

That being said I agree Apple and PC ecosystems are vastly different and it's much easier for Apple to move forward.
 

Doug S

Diamond Member
Feb 8, 2020
3,139
5,384
136
2. Gerard Williams III designs are top notch, but his teams have freedoms x86 designers don’t have. They aren’t accountable to customers running software written in the 1980s and 1990s. Apple’s design team is the freeest, they can toss features the company pushed as the new direction developers had to adopt less than a decade earlier.

That's probably why Intel started talking about "X86-S", which would be a 64 bit only version of x86 that cuts out legacy cruft like segmentation, rings 1 & 2, 16 bit addressing, and some obsolete I/O related features. That doesn't address issues with variable length instructions with all the prefixes, but it would mark the first time Intel produced an x86 CPU that wasn't capable of running as an 8086.
 

511

Platinum Member
Jul 12, 2024
2,060
1,802
106
I think they need to team up with AMD on x86S it's the only way forward for the x86 brigade Apple's ability to cut thing whenever they want is PITA for many what for now may not work tomorrow with x86 we have been used to like even it is 20 years old. Apple has the least complexity in their ISA/design their design are quite good as well also on Intel P core they support AVX-512 in HW it's just they fuse it off on factory
 

Raqia

Member
Nov 19, 2008
92
58
91
That's probably why Intel started talking about "X86-S", which would be a 64 bit only version of x86 that cuts out legacy cruft like segmentation, rings 1 & 2, 16 bit addressing, and some obsolete I/O related features. That doesn't address issues with variable length instructions with all the prefixes, but it would mark the first time Intel produced an x86 CPU that wasn't capable of running as an 8086.
The cost in terms of PPA for those features is likely de-minimus on today's nodes. However, the cost of designing-in and validating those decades of features into a new u-arch is enormous. Their biggest advantage in paring things down to X86-S will be reducing those issues in getting a new CPU to market.
 

DavidC1

Golden Member
Dec 29, 2023
1,452
2,361
96
That's not true, at least not based on the Lunar Lake numbers FlameTail listed a few posts earlier - are those not correct?
FlameTail got the I-cache number wrong. In Redwood Cove you have 48KB L1D and 64KB L1i. Lion Cove keeps the same 64KB L1i, but splits L1D into two levels - 48KB and 192KB. It is an improvement but not "L1" as Intel likes you to believe.

Apple can do 3 cycle latency for the 192KB L1i and 128KB L1D - Lion Cove is 4 cycle for 48KB "L0" and 9 cycle for 192KB "L1". It is more correct to say Lion Cove has 48KB L1D and 192KB L1.5, as C&C points out not only the latency is substantially higher and exists to be between L1 and L2, but the bandwidth is lot lower than L1 too.

The 48KB "L0" they call it doesn't decrease in cycles too much either. It goes from 5 to 4. Calling it L0 is just marketing. It would be more accurate to say the uop cache is L0, at least it's doing something much greater than L1.
Bloat would be about the logic area, which is much closer than flat area calculations that include megabytes of cache others aren’t using.
And Lion Cove is bloated, because without the L2 cache it's 3.4mm2 for Lion Cove and 2.5mm2 for M3.
 
Last edited:

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,757
106
FlameTail got the I-cache number wrong. In Redwood Cove you have 48KB L1D and 64KB L1i. Lion Cove keeps the same 64KB L1i, but splits L1D into two levels - 48KB and 192KB. It is an improvement but not "L1" as Intel likes you to believe.
Yeah, I got the L1d and L1i numbers mixed up for Lion Cove.
M3-PLion Cove
L0d48 KB
L1d128 KB 192 KB
L1i192 KB64 KB
L216 MB (shared)2.5 MB (private)
L312 MB
(shared)

Intel is spending more on data caches than the instruction cache. Any idea why?

Apple and other ARM vendors spend more on instruction caches than data caches.
Apple can do 3 cycle latency for the 192KB L1i and 128KB L1D - Lion Cove is 4 cycle for 48KB "L0" and 9 cycle for 192KB "L1". It is more correct to say Lion Cove has 48KB L1D and 192KB L1.5, as C&C points out not only the latency is substantially higher and exists to be between L1 and L2, but the bandwidth is lot lower than L1 too.
Indeed. Although Intel matches Apple in terms of the quantity of cache (number of bytes), the quality of the cache itself is inferior (latency, bandwidth, etc..).
And Lion Cove is bloated, because without the L2 cache it's 3.4mm2 for Lion Cove and 2.5mm2 for M4.
2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².
 

MS_AT

Senior member
Jul 15, 2024
627
1,280
96
Both Monts and newest ARM designs are not using uops caches iirc, which could explain why data caches are bigger than instruction caches.

Another angle could be instruction density but I am unaware of any data that could support the claim that x64 would be more efficient in code footprint than ARM.
 

The Hardcard

Senior member
Oct 19, 2021
314
397
106
Yeah, I got the L1d and L1i numbers mixed up for Lion Cove.
M3-PLion Cove
L0d48 KB
L1d128 KB192 KB
L1i192 KB64 KB
L216 MB (shared)2.5 MB (private)
L312 MB
(shared)

Intel is spending more on data caches than the instruction cache. Any idea why?

Apple and other ARM vendors spend more on instruction caches than data caches.

Indeed. Although Intel matches Apple in terms of the quantity of cache (number of bytes), the quality of the cache itself is inferior (latency, bandwidth, etc..).

2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².
Both AMD and Intel have large separate micro-op caches that store decoded instructions. x86 is full of complex, variable-length instructions that their front ends burn a lot of energy to break down each time. It is part of the reason for the old claim of x86 ISA liabilities that some people still spout today. Why RISC PowerPC, ARM RISC-V will take compute to places that x86 can’t follow and thus x86 will die.

Micro-op caches are a clever and effective way to mitigate the x86 instruction set and for 2024, both AMD and Intel have huge micro-op caches. They massively reduce the latency and power requirements for x86 ISA. I am actually don’t know how useful the L1 instruction cache is. Both companies keep them, I guess there is some reason, but it’s not clear to me if growing them would help more than continuing to improve the micro-op caches.
 

Nothingness

Diamond Member
Jul 3, 2013
3,292
2,358
136
Another angle could be instruction density but I am unaware of any data that could support the claim that x64 would be more efficient in code footprint than ARM.
I once measured dynamic code size (I mean the total number of instruction bytes executed) on a SPEC 2006 403.gcc run. x86-64 used about 10% fewer bytes than AArch64 (despite a significant use of rep stos/mov instructions which are denser than the Arm equivalent sequences). That was years ago, but I don't think things changed significantly.

Of course this should be redone with proper Icache simulation and various sizes. This could be done by a motivated programmer by using some QEMU plugin, I'm too lazy :)
 

DavidC1

Golden Member
Dec 29, 2023
1,452
2,361
96
I did not realize you could fit that much cache in that little of extra space. Well yeah, 3.4 mm2 for core logic on N3B? There is some bloat there.
Caches are very space and energy efficient.* There is logic that if adding 1mm2 of logic area doesn't provide performance improvements equal to adding 1mm2 of additional cache, then it shouldn't be done.

*Space efficiency is a reason why SRAM has reached scaling limits first. By the way, DRAM, reached limits long, long ago, because it's far smaller than SRAM. Ultimately the limiter is process. And the limits are far far larger than the Silicon Atom. The average size is 10-20nm, not much different than 22nm generation.
WTH is a L0 ?
Newest Intel marketing term for L1.
 

DavidC1

Golden Member
Dec 29, 2023
1,452
2,361
96
Both AMD and Intel have large separate micro-op caches that store decoded instructions. x86 is full of complex, variable-length instructions that their front ends burn a lot of energy to break down each time. It is part of the reason for the old claim of x86 ISA liabilities that some people still spout today. Why RISC PowerPC, ARM RISC-V will take compute to places that x86 can’t follow and thus x86 will die.
ISA argument again eh? It'll never stop. There's no such thing as Apple vs Apple vs comparison in the real world, you only think that way if you are stuck with the PC enthusiast benchmarking mindset where you have to equalize everything.

Look at Lunarlake again. Many comments bring up the ISA too. Guess why LNL happened? Because Apple basically slapped them silly, and Intel was forced to confront reality, out of their artificially created x86 licensing bubble. People create products, not the other way around.

@adroc_thurston And I keep telling you Intel thinking BOM is the greatest thing is what brought them into this perilous situation in the first place. Unless you believe them and are short sighted as the whole of Intel is?
 

ikjadoon

Senior member
Sep 4, 2006
241
519
146
I think they need to team up with AMD on x86S

FWIW, AMD has been studying Intel's X86-S proposal, according to this TechPowerUp interview of AMD's David McAfee in July 2023:

TPU: Any thoughts on Intel's X86-S proposal that focuses on 64-bit and gets rid of a lot of legacy capability, so cores can be simplified and use less silicon area at the same time?

David
: We have absolutely been looking at that. We've been evaluating similar proposals for a long, long time. It is both incredibly beneficial to make that break, also very, very complicated. I think it is a non-trivial exercise to strip out legacy compatibility in a core architecture as well as time that in a way so that it matches up perfectly with an OS transition that also eliminates a lot of these legacy compatibilities. I would say "very interesting," something that really, we would have to look at as an industry and make that move in concert. We find Intel's proposal pretty intriguing as we look at that.
 

9949asd

Member
Jul 12, 2024
139
96
61
Other outlets I’ve seen haven’t shown this large discrepancy between sensors and power draw. For instance, when in as close to an identical laptop as possible, STX and MTL systems pull nearly identical wall power when both are set to 28 W limits.

View attachment 108622
An even more direct measurement of the cables on desktop systems show that the sensors are spot on once factoring in the VRM efficiency. Certainly no less accurate than Intel’s.

View attachment 108623

I’d need to see much more proof of this AMD sensors are bad theory as we have clear evidence to the contrary. Using different laptops, even with the display off, has too many variables to try and use it to test SoC power/performance at the system level. It’s a silly test. System level power/performance, sure, but it’s going to be individual to the laptop tested.

Edit: Another example in the opposite direction of why drawing SoC conclusions on system power draw is silly,

View attachment 108624
You know the geekwan test is using board power(excluding screen) limited to 30w right?

258v on board devices use 8w
370 on board devices use 15w
185h on board devices use 8w

So, why there is a 7w difference on the board? The all have ram and m2.

Also the battery life test will not lieIMG_3139.jpeg

The sensors type or location on intel and amd chip for sure will be different, even lunar lake and MTL will be different. The software reading is not guarantee correct. that’s why they use system power or board power.
 
Last edited:
  • Like
Reactions: Tlh97 and reb0rn

itsmydamnation

Diamond Member
Feb 6, 2011
3,028
3,800
136
Yeah, I got the L1d and L1i numbers mixed up for Lion Cove.
M3-PLion Cove
L0d48 KB
L1d128 KB192 KB
L1i192 KB64 KB
L216 MB (shared)2.5 MB (private)
L312 MB
(shared)

Intel is spending more on data caches than the instruction cache. Any idea why?

Apple and other ARM vendors spend more on instruction caches than data caches.

Indeed. Although Intel matches Apple in terms of the quantity of cache (number of bytes), the quality of the cache itself is inferior (latency, bandwidth, etc..).

2.5 mm² is for M3 P-core (N3B).
M4 P-core (N3E) is nearly 3 mm².
this table is meaningless without latencies ,
But if i was to guess on why intel went this structure its again because of TLB lookups / 4k page size. it makes scaling the the first cache level really hard on x86.
 

FlameTail

Diamond Member
Dec 15, 2021
4,384
2,757
106
FWIW, AMD has been studying Intel's X86-S proposal, according to this TechPowerUp interview of AMD's David McAfee in July 2023:
The mention about syncing the x86 -> x86s transition to an OS transition is interesting.

I wish Microsoft would make the next version of Windows (Windows 12?) ditch support for the decades of compatibility, and make it a clean new OS built from the ground-up to support modern CPU architectures (X86S, ARMv9, RISC-V).
 
  • Like
Reactions: ikjadoon

itsmydamnation

Diamond Member
Feb 6, 2011
3,028
3,800
136
The mention about syncing the x86 -> x86s transition to an OS transition is interesting.

I wish Microsoft would make the next version of Windows (Windows 12?) ditch support for the decades of compatibility, and make it a clean new OS built from the ground-up to support modern CPU architectures (X86S, ARMv9, RISC-V).

No they will just f it up.

Current windows server is so god damn jank with its settings ( really terrible) vs control pannel ( terrible but different and less ) and icons that no longer exist but you still need/ want like hdwwiz. Last thing i need is them to try something that architecturally complex......
 

Mopetar

Diamond Member
Jan 31, 2011
8,381
7,474
136
They could easily just make an emulator for running legacy stuff without much of a problem. Apple did it twice and they moved CPU architectures, which is a far bigger jump. Modern chips are more than capable enough to support any extra overhead that a sandbox like that would require.

A fresh start and an opportunity to reverse decades of bad decisions that modern versions of Windows are beholden to would be good. Also if they chain all of their UI/UX people up in the basement so they stop moving stuff around for no good reason I'd appreciate it greatly.
 

poke01

Diamond Member
Mar 8, 2022
3,448
4,729
106
There won’t be any major changes to x86 till Nova Lake. So for people that want exciting changes it won’t happen till late 2026.

we are in a slug till in then. Arrow Lake is fine, saves only by Skymont and panther lake won’t have APX. Still the CPU industry is better than ever. Looking at the consumer GPU side of things, it’s no where near as bleak.
 

AMDK11

Senior member
Jul 15, 2019
447
364
136

Chips and Cheese Interviews Ronak Singhal​


From this interview, it appears that RedwoodCove's 8-Wide decoder for GraniteRapids is a typo. GraniteRapids and MeteorLake are based on exactly the same RedwoodCove (6-Wide decoder), and the difference mainly comes down to 2x512bit (AVX512) and AMX.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
5,104
3,634
136
These old software have to be run with emulators, because Windows dropped support for older API, no? So how do you run your 30 years old SW? How do you run 16-bit software? If you have to use emulation then any machine, no matter the CPU or the OS, might run it equally well.

I doubt I can run any of my old games with Glide without using using some unsupported library or even full emulation. No existing PC would accept my Glide card (yes I still have an old Voodoo2 card). And that's even without talking about some painful protections that might not work anymore due to how level they were.

At some point legacy is becoming a hindrance to progress. And you have no guarantee full compatibility is maintained by newer versions of Windows and by new PC (or even CPUs).

That being said I agree Apple and PC ecosystems are vastly different and it's much easier for Apple to move forward.
One older application that I still use that comes to mind is QuickBooks 99. I am running it on a few PCS and I don't remember exactly how I get it working but it's some type of compatibility mode. I haven't had to reinstall in a long time. The application is so old. I had to copy the eight or nine floppy disks! I have a few others like that but I'd have to look in my computer to remember them. QuickBooks 99 same application. Has business records going back to 1998? If I remember correctly. Save me a ton of money. No upgrades, no nothing. I just keep using it.