Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Comparison of upcoming Intel's U-series CPU: Core Ultra 100U, Lunar Lake and Panther Lake

Model	Code-Name	Date	TDP	Node	Tiles	Main Tile	CPU	LP E-Core	LLC	GPU	Xe-cores
Core Ultra 100U	Meteor Lake	Q4 2023	15 - 57 W	Intel 4 + N5 + N6	4	tCPU	2P + 8E	2	12 MB	Intel Graphics	4
?	Lunar Lake	Q4 2024	17 - 30 W	N3B + N6	2	CPU + GPU & IMC	4P + 4E	0	8 MB	Arc	8
?	Panther Lake	Q1 2026 ?	?	Intel 18A + N3E	3	CPU + MC	4P + 8E	4	?	Arc	12

Comparison of die size of Each Tile of Meteor Lake, Arrow Lake, Lunar Lake and Panther Lake

	Meteor Lake	Arrow Lake (20A)	Arrow Lake (N3B)	Arrow Lake Refresh (N3B)	Lunar Lake	Panther Lake
Platform	Mobile H/U Only	Desktop Only	Desktop & Mobile H&HX	Desktop Only	Mobile U Only	Mobile H
Process Node	Intel 4	Intel 20A	TSMC N3B	TSMC N3B	TSMC N3B	Intel 18A
Date	Q4 2023	Q1 2025 ?	Desktop-Q4-2024 H&HX-Q1-2025	Q4 2025 ?	Q4 2024	Q1 2026 ?
Full Die	6P + 8P	6P + 8E ?	8P + 16E	8P + 32E	4P + 4E	4P + 8E
LLC	24 MB	24 MB ?	36 MB ?	?	8 MB	?
tCPU	66.48
tGPU	44.45
SoC	96.77
IOE	44.45
Total	252.15

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

coercitiv · May 8, 2024

Hulk said:
Since ASML makes all of the machinery to mass produce these chip patterns if a couple of companies each have the latest and greatest how is anyone in the lead? Or is the machinery only half the battle and the rest is knowing how to use it?

Some assembly required. Recipe book not included.

Thunder 57 · May 8, 2024

DavidC1 said:
Hyperthreading/SMT was once thought by every engineer to be something that was required to have but now it's not far away from being relegated to niches that VLIW architectures occupy.

Just because an alternative hasn't been thought of doesn't mean it will remain that way forever.

Of course not but I see no reason to get rid of uop caches, and you have not given any reason as to why they would/should. FWIW I disagree with ditching HT too. It's throwing away mostly free performance for... security? Maybe?

cytg111 · May 8, 2024

DavidC1 said:
What is the proportion of the reasons you are watching the channel? Because she has visually appealing features or that she offers information that pertains to your bias?

When it comes to gigantic scale mass manufacturing such as modern semiconductors, being out there in the real world with scale is king. They weren't able to get 10nm out in any capacity until they decided to do a little with Cannonlake and very quickly they were able to ramp 10nm to yields. Intel may take the lead but it'll take extraordinary measures to do so and perhaps TSMC messing up. They may take the lead but that's still hope more than anything.

In terms of logical fallacies that is not much of a choice you're giving me here

.

Basically I think Intel is returning to the front, take geo-politics, Taiwan, AI, Chips act etc. The western front cant afford second place and cant let it be up to a coin toss(Taiwan), so I think there is going to be thrown money at the problem until Intel is nr.1 again.

With this conviction I am trying to deduce where and when to throw my money in. That is my motivation.
But hey, If she was to hit me up for a date, would I say no? Probably not.

dullard · May 8, 2024

igor_kavinski said:
Source?

I believe the source is Golden Pig here, start at the 4th line from the bottom: https://cdn.videocardz.com/1/2024/03/GOLDEN-PIG-RUMORS.png

igor_kavinski · May 8, 2024

dullard said:
I believe the source is Golden Pig here, start at the 4th line from the bottom: https://cdn.videocardz.com/1/2024/03/GOLDEN-PIG-RUMORS.png

What does he mean by "it does not reach more than 7"?

dullard · May 8, 2024

igor_kavinski said:
What does he mean by "it does not reach more than 7"?

Performing a translation of a crude translation from a vague source isn't my forte. I assume it is referring to the number of Xe cores. Intel labels Meteor Lake CPUs with more Xe cores as having Intel Arc Graphics, but labels Meteor Lake CPUs with few Xe cores as having Intel Graphics.

For example, the 165H has 8 Xe cores and is labeled Intel Arc Graphics.
https://ark.intel.com/content/www/u...-processor-165h-24m-cache-up-to-5-00-ghz.html

But the 165U has 4 Xe cores and is labeled Intel Graphics (without the "Arc" in the name).
https://ark.intel.com/content/www/u...-processor-165u-12m-cache-up-to-4-90-ghz.html

DrMrLordX · May 8, 2024

cytg111 said:
Anyway I think I am taking that bet and be buying up some INTC the coming time.

Good luck! Nobody can really tell if Intel's nodes are actually competitive with TSMC's since Intel seems to be struggling with either volume or design (or both?). Arrow Lake doesn't look like it's going to be a great CPU despite Intel having a newer/mostly better TSMC node available for it than some competition which is coming up on TSMC N4P. So while it is rather embarrassing for Intel that they're going to release a new flagship CPU late this year on a TSMC node, it's even more embarrassing for them that said flagship CPU will most likely be slower than a competitor's chip on the older N4P.

dullard · May 8, 2024

To add to the confusion, there are some rumors of Arrow Lake-U being on Intel 3. Rumor here: https://wccftech.com/intel-arrow-la...ernative-lunar-lake-intel-3-perf-watt-uplift/ And our own rumors: https://forums.anandtech.com/thread...ure-lakes-rapids-thread.2509080/post-40656166

But I haven't seen anything from Intel mention that.

SiliconFly · May 8, 2024

dullard said:
To add to the confusion, there are some rumors of Arrow Lake-U being on Intel 3. Rumor here: https://wccftech.com/intel-arrow-la...ernative-lunar-lake-intel-3-perf-watt-uplift/ And our own rumors: https://forums.anandtech.com/thread...ure-lakes-rapids-thread.2509080/post-40656166

But I haven't seen anything from Intel mention that.

Well, I think intel 4 & Intel 3 share the same HP libraries (possibly with a few tweaks). Then, ARL-U being just a MTL-U refresh bodes really well with the leak.

If true, we're going to see RWC yet again. yuck.

DavidC1 · May 9, 2024

Thunder 57 said:
Of course not but I see no reason to get rid of uop caches, and you have not given any reason as to why they would/should. FWIW I disagree with ditching HT too. It's throwing away mostly free performance for... security? Maybe?

For SMT it complicates validation time. In terms of transistors it's pretty much free. Only 5-10% of the core, not the total die.

Increased validation complexity brings potential pitfalls for every product development. If lack of it can expedite that, then it's an overall win. Recently revealed security issues exacerbate the issue. The history of two x86 vendors are rife with countless execution issues too.

The other CPU vendors do not use SMT. The ARM vendors with far superior execution and performance do not use SMT. Clearly there's more than one way.

Hulk · May 9, 2024

I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Or is the OoO "smarts" of the CPU going to be so good that it is able to keep the core relatively saturated?

It just seems to me that the wider the core gets the more suited it is for SMT?

I have a limited understanding of microprocessor architecture, which is why I'm asking.

Also, do the structures in the SMT enabled CPU hinder ST performance in any way even if only an ST thread is being executed?

Wolverine2349 · May 9, 2024

Hulk said:
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Or is the OoO "smarts" of the CPU going to be so good that it is able to keep the core relatively saturated?

It just seems to me that the wider the core gets the more suited it is for SMT?

I have a limited understanding of microprocessor architecture, which is why I'm asking.

Also, do the structures in the SMT enabled CPU hinder ST performance in any way even if only an ST thread is being executed?

Interesting points. I have heard hyper threading is not always the best and more cores on same node is way way way better than HT and HT/SMT can be detrimental.

I mean Core 2 family of CPUs did not have it and they were one of the best innovations ever. Some have stated only reason Core 2 did not have it was because it was based on P6 micro architecture and it did not have it. Well Pentium 4 did not either until they added it. So I seriously doubt Intel could not have added HT to Core 2 if they wanted to.

dullard · May 9, 2024

Hulk said:
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Or is the OoO "smarts" of the CPU going to be so good that it is able to keep the core relatively saturated?

It just seems to me that the wider the core gets the more suited it is for SMT?

I have a limited understanding of microprocessor architecture, which is why I'm asking.

Also, do the structures in the SMT enabled CPU hinder ST performance in any way even if only an ST thread is being executed?

I think you are looking at the trees and missing the forest. Yes, without SMT there will be unused parts of the core. That WAS a big problem in the past when CPUs had slow clock speeds and few cores.

That is no longer as big of a problem as it once was. Use of SMT both (1) limits potential clock speeds and (2) requires added security checks that add way more work to perform to ensure no information can accidentally be accessed. Thus, SMT can slow down a CPU more than it helps (many industrial power use software has always recommended turning SMT off to increase performance). The limited potential clock speeds should be self-explanatory (when you use more of the CPU more of the time, it produces more heat when energizing and draining transistors, limiting the clock speeds that it could have been at). How many times have you seen security fixes reduce performance by 1%, 2%, 10%, etc? That all adds up and almost never is included in the comparisons of SMT on/off benchmarks (to do benchmarking comparisons right, they should also turn on/off the necessary security checks that are only needed because of SMT). The security performance drops often outweigh any possible performance gains from utilizing any unused part of a core.

Plus, we now have plenty of other cores on which to offload extra threads. It is better to have a real core for these threads than to take the time to try to slip a thread in and out of unused parts of a core.

dullard · May 9, 2024

DavidC1 said:
In terms of transistors it's pretty much free. Only 5-10% of the core, not the total die.

Lets go with your numbers. 5% to 10% of 24 cores is an extra 1.2 to 2.4 cores you could have had instead. I'll take ~2 more real cores over cramming extra threads that the 24 cores can't handle into unused gaps.

igor_kavinski · May 9, 2024

dullard said:
That all adds up and almost never is included in the comparisons of SMT on/off benchmarks (to do benchmarking comparisons right, they should also turn on/off the necessary security checks that are only needed because of SMT).

Excellent point!

igor_kavinski · May 9, 2024

I personally think Intel should've done a special part with a "sea of LP E-cores". Smaller, less performant E-cores but 24 to 64 of them. That could've been nice to have for people who love threads (and who buy Threadrippers) but Intel's part would've been targeted more at the professional crowd who didn't want to pay for HEDT. They could've priced these parts up to $1200 based on the core configuration but working in B760/H770/Z790 mobos and I'm positive they would've sold well. Sadly, Intel was obsessed with dethroning AMD rather than actually innovating.

reggie_fils_aime · May 9, 2024

igor_kavinski said:
I personally think Intel should've done a special part with a "sea of LP E-cores". Smaller, less performant E-cores but 24 to 64 of them. That could've been nice to have for people who love threads (and who buy Threadrippers) but Intel's part would've been targeted more at the professional crowd who didn't want to pay for HEDT. They could've priced these parts up to $1200 based on the core configuration but working in B760/H770/Z790 mobos and I'm positive they would've sold well. Sadly, Intel was obsessed with dethroning AMD rather than actually innovating.

They sell 24-core atom boards, but those are peaky at best, yeah.

DavidC1 · May 9, 2024

Hulk said:
I could use a little clarification regarding SMT implementation.

I understand that it appears that Intel will be removing SMT from future P cores in order to focus the P cores on ST performance. In an effort to increase IPC the core is getting wider and wider. Wouldn't that also imply that more of the core will go unused due to the fact that there is limited parallelism in ST code, which I believe was the original intent of SMT? To use "unused" parts of the core I mean.

Of course there are more potential opportunities with a wider core, but it assumes the engineers are also doing zero to enhance the utilization of the wider core.

Core 2 is far wider than Netburst, but the opportunities for HT are actually less because it utilizes it much better.

Wolverine2349 said:
I mean Core 2 family of CPUs did not have it and they were one of the best innovations ever. Some have stated only reason Core 2 did not have it was because it was based on P6 micro architecture and it did not have it. Well Pentium 4 did not either until they added it. So I seriously doubt Intel could not have added HT to Core 2 if they wanted to.

Intel did not add HT on Core 2 because the Israel Design Center at the time wasn't experienced with it while the American team worked on Netburst so they were. So that gives you a rough idea on the impact the extra validation needed to make it work properly.

Pentium 4 always had HT, but disabled. The Xeons were enabled before the client parts.

dullard said:
Lets go with your numbers. 5% to 10% of 24 cores is an extra 1.2 to 2.4 cores you could have had instead. I'll take ~2 more real cores over cramming extra threads that the 24 cores can't handle into unused gaps.

You might want to read into HT for AMD and Intel and see how insignificant the addition of physical structures are to enable it.

The Dark Knight: Intel's Core i7

www.anandtech.com

The replicated structures are an insignificant amount and rest are just shared or partitioned such as the load/store buffers. I looked over the data again and it said the area impact is under 5%. 2-4% is probably realistic.

You can make it perform far higher with less losses but requires much more work such as with IBM, but neither AMD/Intel does this.

During the days of much fewer cores, the extra 5-10% resulted in 30% improvement in performance at roughly 30% increase in power. Many other structures do not provide that level of efficiency, not in power, and certainly not in die size.

In CPUs with only 2 cores such as the early Core ix chips, HT added a substantial performance gain even in games, since games were typically 3 or 4 threaded.

Hulk · May 9, 2024

DavidC1 said:
Of course there are more potential opportunities with a wider core, but it assumes the engineers are also doing zero to enhance the utilization of the wider core.

Core 2 is far wider than Netburst, but the opportunities for HT are actually less because it utilizes it much better.

Intel did not add HT on Core 2 because the Israel Design Center at the time wasn't experienced with it while the American team worked on Netburst so they were. So that gives you a rough idea on the impact the extra validation needed to make it work properly.

Pentium 4 always had HT, but disabled. The Xeons were enabled before the client parts.

You might want to read into HT for AMD and Intel and see how insignificant the addition of physical structures are to enable it.

The Dark Knight: Intel's Core i7

www.anandtech.com

The replicated structures are an insignificant amount and rest are just shared or partitioned such as the load/store buffers. I looked over the data again and it said the area impact is under 5%. 2-4% is probably realistic.

You can make it perform far higher with less losses but requires much more work such as with IBM, but neither AMD/Intel does this.

During the days of much fewer cores, the extra 5-10% resulted in 30% improvement in performance at roughly 30% increase in power. Many other structures do not provide that level of efficiency, not in power, and certainly not in die size.

In CPUs with only 2 cores such as the early Core ix chips, HT added a substantial performance gain even in games, since games were typically 3 or 4 threaded.

Thanks for the detailed and informative reply.
I remember reading that by the time Prescott was introduced the P4 pipeline had grown to 31 stages and pipeline stalls were costly and HT helped offset the loss of productivity during those stalls.

My first processor that could execute more than one thread was a Northwood P4 3.06. I remember being amazed at how much more responsive the system was just due to that additional logical core. Of course as the core count grew the relative gains diminished.

I'm excited to see what ARL brings.

Wolverine2349 · May 9, 2024

Pentium 4 always had HT, but disabled. The Xeons were enabled before the client parts.

Even the original Williamite Pentium 4 1.3GHz 400MHz FSB original had it just disabled? Not just the first 3.06GHz 533MHz FSB Northwood that came in late 2002?

Thunder 57 · May 9, 2024

Wolverine2349 said:
Even the original Williamite Pentium 4 1.3GHz 400MHz FSB original had it just disabled? Not just the first 3.06GHz 533MHz FSB Northwood that came in late 2002?

Northwood always had it. Williamette, I'm not sure.

AMDK11 · May 9, 2024

Wolverine2349 said:
Interesting points. I have heard hyper threading is not always the best and more cores on same node is way way way better than HT and HT/SMT can be detrimental.

I mean Core 2 family of CPUs did not have it and they were one of the best innovations ever. Some have stated only reason Core 2 did not have it was because it was based on P6 micro architecture and it did not have it. Well Pentium 4 did not either until they added it. So I seriously doubt Intel could not have added HT to Core 2 if they wanted to.

Why would they add HT to Core 2 when it already provided a large (compared to Core 1) IPC increase, even a gigantic one compared to P4(Netbrust). HT has been added to the expanded and redesigned Nehalem. This was the plan and bargaining chip of the new LGA1366 platform.

naukkis · May 10, 2024

Wolverine2349 said:
Even the original Williamite Pentium 4 1.3GHz 400MHz FSB original had it just disabled? Not just the first 3.06GHz 533MHz FSB Northwood that came in late 2002?

Yes, Willamette server cpu's had it enabled.

igor_kavinski · May 10, 2024

naukkis said:
Yes, Willamette server cpu's had it enabled.

What was the most powerful P4 server CPU? Did those get to have anything like 32GB RDRAM?

Gideon · May 10, 2024

igor_kavinski said:
What was the most powerful P4 server CPU? Did those get to have anything like 32GB RDRAM?

wasn't P4 32bit only. wouldn't that limit RAM to 4GB?

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Diamond Member

Platinum Member

Lifer

Elite Member

Lifer

Elite Member

Lifer

Elite Member

Golden Member

Senior member

Diamond Member

Member

Elite Member

Elite Member

Lifer

Lifer

Member

Senior member

Diamond Member

Member

Platinum Member

Senior member

Senior member

Lifer

Golden Member