Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 307 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
679
559
106
PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

ModelCode-NameDateTDPNodeTilesMain TileCPULP E-CoreLLCGPUXe-cores
Core Ultra 100UMeteor LakeQ4 202315 - 57 WIntel 4 + N5 + N64tCPU2P + 8E212 MBIntel Graphics4
?Lunar LakeQ4 202417 - 30 WN3B + N62CPU + GPU & IMC4P + 4E08 MBArc8
?Panther LakeQ1 2026 ??Intel 18A + N3E3CPU + MC4P + 8E4?Arc12



Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

Meteor LakeArrow Lake (20A)Arrow Lake (N3B)Arrow Lake Refresh (N3B)Lunar LakePanther Lake
PlatformMobile H/U OnlyDesktop OnlyDesktop & Mobile H&HXDesktop OnlyMobile U OnlyMobile H
Process NodeIntel 4Intel 20ATSMC N3BTSMC N3BTSMC N3BIntel 18A
DateQ4 2023Q1 2025 ?Desktop-Q4-2024
H&HX-Q1-2025
Q4 2025 ?Q4 2024Q1 2026 ?
Full Die6P + 8P6P + 8E ?8P + 16E8P + 32E4P + 4E4P + 8E
LLC24 MB24 MB ?36 MB ??8 MB?
tCPU66.48
tGPU44.45
SoC96.77
IOE44.45
Total252.15



Intel Core Ultra 100 - Meteor Lake

INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

Clockspeed.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 23,969
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,441
Last edited:

Thunder 57

Platinum Member
Aug 19, 2007
2,723
3,912
136
Hyperthreading/SMT was once thought by every engineer to be something that was required to have but now it's not far away from being relegated to niches that VLIW architectures occupy.

Just because an alternative hasn't been thought of doesn't mean it will remain that way forever.

Of course not but I see no reason to get rid of uop caches, and you have not given any reason as to why they would/should. FWIW I disagree with ditching HT too. It's throwing away mostly free performance for... security? Maybe?
 
  • Like
Reactions: Tlh97

cytg111

Lifer
Mar 17, 2008
23,335
12,947
136
What is the proportion of the reasons you are watching the channel? Because she has visually appealing features or that she offers information that pertains to your bias?

When it comes to gigantic scale mass manufacturing such as modern semiconductors, being out there in the real world with scale is king. They weren't able to get 10nm out in any capacity until they decided to do a little with Cannonlake and very quickly they were able to ramp 10nm to yields. Intel may take the lead but it'll take extraordinary measures to do so and perhaps TSMC messing up. They may take the lead but that's still hope more than anything.
In terms of logical fallacies that is not much of a choice you're giving me here :):):).

Basically I think Intel is returning to the front, take geo-politics, Taiwan, AI, Chips act etc. The western front cant afford second place and cant let it be up to a coin toss(Taiwan), so I think there is going to be thrown money at the problem until Intel is nr.1 again.

With this conviction I am trying to deduce where and when to throw my money in. That is my motivation.
But hey, If she was to hit me up for a date, would I say no? Probably not.
 

dullard

Elite Member
May 21, 2001
25,126
3,516
126
What does he mean by "it does not reach more than 7"?
Performing a translation of a crude translation from a vague source isn't my forte. I assume it is referring to the number of Xe cores. Intel labels Meteor Lake CPUs with more Xe cores as having Intel Arc Graphics, but labels Meteor Lake CPUs with few Xe cores as having Intel Graphics.

For example, the 165H has 8 Xe cores and is labeled Intel Arc Graphics.
https://ark.intel.com/content/www/u...-processor-165h-24m-cache-up-to-5-00-ghz.html
1715194753249.png


But the 165U has 4 Xe cores and is labeled Intel Graphics (without the "Arc" in the name).
https://ark.intel.com/content/www/u...-processor-165u-12m-cache-up-to-4-90-ghz.html
1715194719070.png
 

DrMrLordX

Lifer
Apr 27, 2000
21,709
10,984
136
Anyway I think I am taking that bet and be buying up some INTC the coming time.

Good luck! Nobody can really tell if Intel's nodes are actually competitive with TSMC's since Intel seems to be struggling with either volume or design (or both?). Arrow Lake doesn't look like it's going to be a great CPU despite Intel having a newer/mostly better TSMC node available for it than some competition which is coming up on TSMC N4P. So while it is rather embarrassing for Intel that they're going to release a new flagship CPU late this year on a TSMC node, it's even more embarrassing for them that said flagship CPU will most likely be slower than a competitor's chip on the older N4P.
 

SiliconFly

Golden Member
Mar 10, 2023
1,063
549
96
To add to the confusion, there are some rumors of Arrow Lake-U being on Intel 3. Rumor here: https://wccftech.com/intel-arrow-la...ernative-lunar-lake-intel-3-perf-watt-uplift/ And our own rumors: https://forums.anandtech.com/thread...ure-lakes-rapids-thread.2509080/post-40656166

But I haven't seen anything from Intel mention that.
Well, I think intel 4 & Intel 3 share the same HP libraries (possibly with a few tweaks). Then, ARL-U being just a MTL-U refresh bodes really well with the leak.

If true, we're going to see RWC yet again. yuck.
 

DavidC1

Senior member
Dec 29, 2023
203
284
96
Of course not but I see no reason to get rid of uop caches, and you have not given any reason as to why they would/should. FWIW I disagree with ditching HT too. It's throwing away mostly free performance for... security? Maybe?
For SMT it complicates validation time. In terms of transistors it's pretty much free. Only 5-10% of the core, not the total die.

Increased validation complexity brings potential pitfalls for every product development. If lack of it can expedite that, then it's an overall win. Recently revealed security issues exacerbate the issue. The history of two x86 vendors are rife with countless execution issues too.

The other CPU vendors do not use SMT. The ARM vendors with far superior execution and performance do not use SMT. Clearly there's more than one way.
 
  • Like
Reactions: Tlh97 and Vattila

Hulk

Diamond Member
Oct 9, 1999
4,269
2,089
136
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Or is the OoO "smarts" of the CPU going to be so good that it is able to keep the core relatively saturated?

It just seems to me that the wider the core gets the more suited it is for SMT?

I have a limited understanding of microprocessor architecture, which is why I'm asking.

Also, do the structures in the SMT enabled CPU hinder ST performance in any way even if only an ST thread is being executed?
 

Wolverine2349

Member
Oct 9, 2022
183
72
61
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Or is the OoO "smarts" of the CPU going to be so good that it is able to keep the core relatively saturated?

It just seems to me that the wider the core gets the more suited it is for SMT?

I have a limited understanding of microprocessor architecture, which is why I'm asking.

Also, do the structures in the SMT enabled CPU hinder ST performance in any way even if only an ST thread is being executed?


Interesting points. I have heard hyper threading is not always the best and more cores on same node is way way way better than HT and HT/SMT can be detrimental.

I mean Core 2 family of CPUs did not have it and they were one of the best innovations ever. Some have stated only reason Core 2 did not have it was because it was based on P6 micro architecture and it did not have it. Well Pentium 4 did not either until they added it. So I seriously doubt Intel could not have added HT to Core 2 if they wanted to.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,516
126
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Or is the OoO "smarts" of the CPU going to be so good that it is able to keep the core relatively saturated?

It just seems to me that the wider the core gets the more suited it is for SMT?

I have a limited understanding of microprocessor architecture, which is why I'm asking.

Also, do the structures in the SMT enabled CPU hinder ST performance in any way even if only an ST thread is being executed?
I think you are looking at the trees and missing the forest. Yes, without SMT there will be unused parts of the core. That WAS a big problem in the past when CPUs had slow clock speeds and few cores.

That is no longer as big of a problem as it once was. Use of SMT both (1) limits potential clock speeds and (2) requires added security checks that add way more work to perform to ensure no information can accidentally be accessed. Thus, SMT can slow down a CPU more than it helps (many industrial power use software has always recommended turning SMT off to increase performance). The limited potential clock speeds should be self-explanatory (when you use more of the CPU more of the time, it produces more heat when energizing and draining transistors, limiting the clock speeds that it could have been at). How many times have you seen security fixes reduce performance by 1%, 2%, 10%, etc? That all adds up and almost never is included in the comparisons of SMT on/off benchmarks (to do benchmarking comparisons right, they should also turn on/off the necessary security checks that are only needed because of SMT). The security performance drops often outweigh any possible performance gains from utilizing any unused part of a core.

Plus, we now have plenty of other cores on which to offload extra threads. It is better to have a real core for these threads than to take the time to try to slip a thread in and out of unused parts of a core.
 
Last edited:

dullard

Elite Member
May 21, 2001
25,126
3,516
126
In terms of transistors it's pretty much free. Only 5-10% of the core, not the total die.
Lets go with your numbers. 5% to 10% of 24 cores is an extra 1.2 to 2.4 cores you could have had instead. I'll take ~2 more real cores over cramming extra threads that the 24 cores can't handle into unused gaps.
 
Jul 27, 2020
16,825
10,781
106
I personally think Intel should've done a special part with a "sea of LP E-cores". Smaller, less performant E-cores but 24 to 64 of them. That could've been nice to have for people who love threads (and who buy Threadrippers) but Intel's part would've been targeted more at the professional crowd who didn't want to pay for HEDT. They could've priced these parts up to $1200 based on the core configuration but working in B760/H770/Z790 mobos and I'm positive they would've sold well. Sadly, Intel was obsessed with dethroning AMD rather than actually innovating.
 
Mar 8, 2024
37
110
66
I personally think Intel should've done a special part with a "sea of LP E-cores". Smaller, less performant E-cores but 24 to 64 of them. That could've been nice to have for people who love threads (and who buy Threadrippers) but Intel's part would've been targeted more at the professional crowd who didn't want to pay for HEDT. They could've priced these parts up to $1200 based on the core configuration but working in B760/H770/Z790 mobos and I'm positive they would've sold well. Sadly, Intel was obsessed with dethroning AMD rather than actually innovating.

They sell 24-core atom boards, but those are peaky at best, yeah.
 

DavidC1

Senior member
Dec 29, 2023
203
284
96
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.
Of course there are more potential opportunities with a wider core, but it assumes the engineers are also doing zero to enhance the utilization of the wider core.

Core 2 is far wider than Netburst, but the opportunities for HT are actually less because it utilizes it much better.
I mean Core 2 family of CPUs did not have it and they were one of the best innovations ever. Some have stated only reason Core 2 did not have it was because it was based on P6 micro architecture and it did not have it. Well Pentium 4 did not either until they added it. So I seriously doubt Intel could not have added HT to Core 2 if they wanted to.
Intel did not add HT on Core 2 because the Israel Design Center at the time wasn't experienced with it while the American team worked on Netburst so they were. So that gives you a rough idea on the impact the extra validation needed to make it work properly.

Pentium 4 always had HT, but disabled. The Xeons were enabled before the client parts.
Lets go with your numbers. 5% to 10% of 24 cores is an extra 1.2 to 2.4 cores you could have had instead. I'll take ~2 more real cores over cramming extra threads that the 24 cores can't handle into unused gaps.
You might want to read into HT for AMD and Intel and see how insignificant the addition of physical structures are to enable it.

The replicated structures are an insignificant amount and rest are just shared or partitioned such as the load/store buffers. I looked over the data again and it said the area impact is under 5%. 2-4% is probably realistic.

You can make it perform far higher with less losses but requires much more work such as with IBM, but neither AMD/Intel does this.

During the days of much fewer cores, the extra 5-10% resulted in 30% improvement in performance at roughly 30% increase in power. Many other structures do not provide that level of efficiency, not in power, and certainly not in die size.

In CPUs with only 2 cores such as the early Core ix chips, HT added a substantial performance gain even in games, since games were typically 3 or 4 threaded.
 
Last edited:

Hulk

Diamond Member
Oct 9, 1999
4,269
2,089
136
Of course there are more potential opportunities with a wider core, but it assumes the engineers are also doing zero to enhance the utilization of the wider core.

Core 2 is far wider than Netburst, but the opportunities for HT are actually less because it utilizes it much better.

Intel did not add HT on Core 2 because the Israel Design Center at the time wasn't experienced with it while the American team worked on Netburst so they were. So that gives you a rough idea on the impact the extra validation needed to make it work properly.

Pentium 4 always had HT, but disabled. The Xeons were enabled before the client parts.

You might want to read into HT for AMD and Intel and see how insignificant the addition of physical structures are to enable it.

The replicated structures are an insignificant amount and rest are just shared or partitioned such as the load/store buffers. I looked over the data again and it said the area impact is under 5%. 2-4% is probably realistic.

You can make it perform far higher with less losses but requires much more work such as with IBM, but neither AMD/Intel does this.

During the days of much fewer cores, the extra 5-10% resulted in 30% improvement in performance at roughly 30% increase in power. Many other structures do not provide that level of efficiency, not in power, and certainly not in die size.

In CPUs with only 2 cores such as the early Core ix chips, HT added a substantial performance gain even in games, since games were typically 3 or 4 threaded.
Thanks for the detailed and informative reply.
I remember reading that by the time Prescott was introduced the P4 pipeline had grown to 31 stages and pipeline stalls were costly and HT helped offset the loss of productivity during those stalls.

My first processor that could execute more than one thread was a Northwood P4 3.06. I remember being amazed at how much more responsive the system was just due to that additional logical core. Of course as the core count grew the relative gains diminished.

I'm excited to see what ARL brings.
 
  • Like
Reactions: Tlh97 and Vattila

Wolverine2349

Member
Oct 9, 2022
183
72
61
Pentium 4 always had HT, but disabled. The Xeons were enabled before the client parts.

Even the original Williamite Pentium 4 1.3GHz 400MHz FSB original had it just disabled? Not just the first 3.06GHz 533MHz FSB Northwood that came in late 2002?
 

AMDK11

Senior member
Jul 15, 2019
291
197
116
Interesting points. I have heard hyper threading is not always the best and more cores on same node is way way way better than HT and HT/SMT can be detrimental.

I mean Core 2 family of CPUs did not have it and they were one of the best innovations ever. Some have stated only reason Core 2 did not have it was because it was based on P6 micro architecture and it did not have it. Well Pentium 4 did not either until they added it. So I seriously doubt Intel could not have added HT to Core 2 if they wanted to.
Why would they add HT to Core 2 when it already provided a large (compared to Core 1) IPC increase, even a gigantic one compared to P4(Netbrust). HT has been added to the expanded and redesigned Nehalem. This was the plan and bargaining chip of the new LGA1366 platform.