Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 363 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
696
602
106
PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E08 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ?12 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15

LNL-MX.png

Intel Core Ultra 100 - Meteor Lake

INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



Clockspeed.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,006
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,490
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,525
2,519
136
Speaking of Cinebench, apparently HUB got access to a Zen 5 gaming PC before the keynote and the Cinebench score was roughly the same as Zen 4. They ended up pulling the demo and said it was pre production silicon so maybe it’s not representative. It was talked about in their podcast released today if you’re curious.
Thanks but I based my calcs on the following from AMD. I have much more faith in this AMD released chart and I think AMD must as well to have released it. Even Intel didn't show how specific apps do with LNL IPC-wise.

1718228445061.png
 
  • Like
Reactions: lightmanek

H433x0n

Golden Member
Mar 15, 2023
1,209
1,572
96
Efficiency is more than just process though. Intel's big cores seem powerful performance wise, but inefficient. I doubt a modestly more efficient process node will be enough to overcome AMD's efficiency in core design.
Lion Cove is a departure from previous cores to address this. They backed off on widening the ROB by their typical 50%, they split the scheduler and introduced a new level of cache to increase capacity by 50% while only adding a single cycle of latency. I think this may be the first P core intel has designed that has had efficiency as a top priority.
 

AMDK11

Senior member
Jul 15, 2019
426
338
136
Thanks but I based my calcs on the following from AMD. I have much more faith in this AMD released chart and I think AMD must as well to have released it. Even Intel didn't show how specific apps do with LNL IPC-wise.

View attachment 101057
The fact that Intel does not provide details on the IPC growth chart to which test a given bar refers to is nothing new. Intel has been doing this since SunnyCove.

Although tests later confirm this, the average is similar to what Intel reports.
 

Hulk

Diamond Member
Oct 9, 1999
4,525
2,519
136
The fact that Intel does not provide details on the IPC growth chart to which test a given bar refers to is nothing new. Intel has been doing this since SunnyCove.

Although tests later confirm this, the average is similar to what Intel reports.
I didn't write they usually do this. I was simply stating the comparison to illustrate my contention that AMD must be quite confident of their IPC increase by releasing such specific data.
 
  • Like
Reactions: lightmanek

Hulk

Diamond Member
Oct 9, 1999
4,525
2,519
136
Lion Cove is a departure from previous cores to address this. They backed off on widening the ROB by their typical 50%, they split the scheduler and introduced a new level of cache to increase capacity by 50% while only adding a single cycle of latency. I think this may be the first P core intel has designed that has had efficiency as a top priority.
If I remember correctly the L0 (used to be called L1) actually has 1 cycle reduced latency, from 5 cycles to 4. New L1 (actually 1.5) has 9 cycle latency and is there according to Intel to avoid going to L2 for many more situations than with Raptor Cove.
 

poke01

Platinum Member
Mar 8, 2022
2,208
2,805
106
Thanks but I based my calcs on the following from AMD. I have much more faith in this AMD released chart and I think AMD must as well to have released it. Even Intel didn't show how specific apps do with LNL IPC-wise.

View attachment 101057
AMD Zen 5 CPUs/SoCs launch next month, so it’s set in stone. Intel performance metrics may increase or decrease so they wait till close to release.
 
  • Like
Reactions: Hulk

poke01

Platinum Member
Mar 8, 2022
2,208
2,805
106
image-4.jpg


"Intel also switched from using proprietary design tools to industry-standard tools optimized for its use. Intel’s old architectures were designed with “Fubs” (functional blocks) of tens of thousands of cells consisting of manually drawn circuits, but it has now moved to using big, synthesized partitions of hundreds of thousands to millions of cells. The removal of the artificial boundaries improves design time, increases utilization, and reduces area.

This also allowed for the addition of more configuration knobs into the design to spin off customized SoC-specific designs faster, with the lead architect saying this allows for more customization between the cores used for Lunar Lake and Arrow Lake. This design methodology also makes 99% of the design transferable to other process nodes, a key advance that prevents the stumbles we’ve seen in the past where intel’s new architectures were delayed by massive process node delays (as with 10nm, for instance)."

It appears that Jim Keller contributed to the LionCove project.

EDIT:
"Intel says it widened the prediction block by 8X over the previous architecture while maintaining accuracy. Intel also tripled the request bandwidth from the instruction cache to the L2 and doubled the instruction fetch bandwidth from 64 to 128 bytes per second. Additionally, decode bandwidth was bumped up from 6 to 8 instructions per cycle while the micro-op cache was increased along with its read bandwidth. The micro-op queue was also increased from 144 entries to 192 entries."

EDIT:
It appears that LionCove is a completely redesigned core and designed from the ground up with a new approach.
This is Intel basically catching up to ARM and co. Nice to see it.
 

FlameTail

Diamond Member
Dec 15, 2021
3,952
2,376
106
Furthermore, it looks like Intel is going to incorporate LPDDR5X on-package memory again with Panther Lake-U low-power SKUs for thin and light platforms. Intel has already confirmed that its Panther Lake lineup will scale up what Lunar Lake had to offer and will offer more flexible DRAM configurations so you won't be limited to just 16 GB or 32 GB LPDDR5X capacities. A more recent Panther Lake leak also confirmed the "H" SKUs with 12 Xe cores based on the Celestial graphics IP.
Panther Lake will continue to use on-package memory.
 
  • Like
Reactions: Joe NYC

Henry swagger

Senior member
Feb 9, 2022
504
306
106

DavidC1

Senior member
Dec 29, 2023
941
1,475
96
I guess it is 2+8.
If Intel achieves manufacturing ARL-S, large compute tile, in 20A, this is clearly superb.
From Sierra Forest review, we can see the Intel 3 process is good.

While I believe lot of Meteorlake's problems are due to being delayed, it may also be due to Intel 4 being a pipecleaner version of Intel 3.

In this case, we should not expect anything fancy with 20A. The good one will be 18A, and may be the reason why parts using 20A is limited.
 

Wolverine2349

Senior member
Oct 9, 2022
428
132
86
From Sierra Forest review, we can see the Intel 3 process is good.

While I believe lot of Meteorlake's problems are due to being delayed, it may also be due to Intel 4 being a pipecleaner version of Intel 3.

In this case, we should not expect anything fancy with 20A. The good one will be 18A, and may be the reason why parts using 20A is limited.

Why don't they just make everything using Intel 18A Process 3 for Arrow Lake if it is the real good one. Id they are doing SIera FOrrest with it why Not Arrow Lake?
 

Hulk

Diamond Member
Oct 9, 1999
4,525
2,519
136
From Sierra Forest review, we can see the Intel 3 process is good.

While I believe lot of Meteorlake's problems are due to being delayed, it may also be due to Intel 4 being a pipecleaner version of Intel 3.

In this case, we should not expect anything fancy with 20A. The good one will be 18A, and may be the reason why parts using 20A is limited.
What Intel node do you think will be comparable to N3B? I realize this is kind of a wild guess since actual data is virtually non existent.
 

DavidC1

Senior member
Dec 29, 2023
941
1,475
96
Why don't they just make everything using Intel 18A Process 3 for Arrow Lake if it is the real good one. Id they are doing SIera FOrrest with it why Not Arrow Lake?
Because like I said, Intel 4 and 20A is a "pipecleaner" process. You need real world data and experience before you can move onto the next one, because undoubtedly the next process is more complex in every way, thus the experience from the previous generation is pretty much a requirement.

This is cutting edge work, where there is little to no data about what is needed. So the engineers themselves are learning as they go along. Obviously you can't skip this process(pun not intended).
 

DavidC1

Senior member
Dec 29, 2023
941
1,475
96
What Intel node do you think will be comparable to N3B? I realize this is kind of a wild guess since actual data is virtually non existent.
Intel is typically known for making processes that are a generation or more ahead in transistor performance, but half a generation behind in density.

Intel 3 is likely to beat even N2 on performance but be N4/N5 level for density.
 
  • Like
Reactions: Henry swagger

Wolverine2349

Senior member
Oct 9, 2022
428
132
86
Because like I said, Intel 4 and 20A is a "pipecleaner" process. You need real world data and experience before you can move onto the next one, because undoubtedly the next process is more complex in every way, thus the experience from the previous generation is pretty much a requirement.

This is cutting edge work, where there is little to no data about what is needed. So the engineers themselves are learning as they go along. Obviously you can't skip this process(pun not intended).

How will Intel 20A compare to TSMC process they are going to put the Core Ultra 275 and 285 or a hypothetical rumored 12 Lion Cove P core Core Ultra 295 on?

Which is the better process node if yields were not a thing?
 

lightisgood

Senior member
May 27, 2022
214
97
71
Intel is typically known for making processes that are a generation or more ahead in transistor performance, but half a generation behind in density.

Intel 3 is likely to beat even N2 on performance but be N4/N5 level for density.

We should pay attention to the decline of N3E's density in comparison to original N3...
 

DavidC1

Senior member
Dec 29, 2023
941
1,475
96
How will Intel 20A compare to TSMC process they are going to put the Core Ultra 275 and 285 or a hypothetical rumored 12 Lion Cove P core Core Ultra 295 on?

Which is the better process node if yields were not a thing?
20A might also have limited libraries just like Intel 4 is and thus not suited for all chips.

Look to Intel 10nm as an example. Cannonlake existed, but with horrible performance figures, and they couldn't activate the iGPU. They said the problem was not defect density as much as parametric yields. You could have all functionality working, but doesn't matter if it doesn't clock high enough for example. But they still needed to get it out. Once it did we got Icelake very quickly, a much improved chip, even against 14nm.
 

ShimmerBlade

Junior Member
Jun 13, 2024
2
5
41
What Intel node do you think will be comparable to N3B? I realize this is kind of a wild guess since actual data is virtually non existent.
An Intel engineer weighed in on a similar question under High Yield's YouTube video - "Why Next Generation Chips Separate Data and Power". Quote:
20A is a 3nm competitor, and 18A is an improved version, so it can be called a 2nm or 3nm+ competitor. However, I don't control how they sell this stuff and that's all it really is.
 

Hulk

Diamond Member
Oct 9, 1999
4,525
2,519
136
So I'm examining/transcribing the presentation about Skymont form the lead architecture and came across this interesting paragraph on L1 to L1 cache transfers within a Skymont cluster. I think it's very interesting and wanted to share it here for further discussion/clarification. I have reached a new level of geekdom in that recently I transcribe these architecture presentations in my down time and it's actually enjoyable. Somehow it clears my head of the day-to-day stuff that clutters it. Anyway...

"The other thing was, this is kind of fun, some of you out there, tech press, you benchmark our cores, you run micros, and you notice things, and some of you noticed something. When we have data where multiple cores in the same module want to use it at the same time, and specifically, when one core has modified data, and it's still in the first level cache, and another core wants to access it, we do a funny thing.

We don't say, here's the data, we can execute it, and this is Gracemont and Crestmont. What we do is we pretend like it misses the L2, we send it to the fabric, the fabric comes back and asks us for the data, we provide the data to the fabric, the fabric gives this back to us. So, suddenly people were surprised, hey, the data is near and the latency is high. In fact, the latency a little bit longer than a normal cache hit and that's because of this sort of roundtrip behavior.

So it was nice of you guys to notice that, but the good news here is we went ahead and fixed it. In Skymont we have what we call L1 to L1 transfers. What this means is that when one core asks the L2 for the data, we see that it is resident in another core, and we don't go to the fabric anymore, the L2 goes and says, hey please give me the data, it grabs the data, provides it to the core locally, the fabric isn't involved anymore. This is more reliable performance for the cases where people have really tight pipelines and they're sharing data within a module in local time and space. So that's a more reliable latency for cooperative workloads."