Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Page 813 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
762
718
106
PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png

Intel Core Ultra 100 - Meteor Lake

INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)



Clockspeed.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,024
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,515
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,546
2,539
96
Is there anyone who can estimate approximately how many transistors are in L2, e.g. Skylake 256KB, Sunny/CypressCove 512KB, GldenCove 1.25MB-2MB, LionCove 3MB?
That's easy.

6/8 transistors per bit
8 bits per byte
1024 x 1024

~50/67 million transistors per MB.
Seems like all modern x86 processors have a pretty wide decode. Skymont has a very elaborate front end and can decode 9 instructions per cycle.
Skymont is smaller than both Zen 5, Lion Cove and M4, so despite looking super wide, it has area of an E core even from an ARM camp perspective so some sacrifices are being made.
There is no perfect comparison because memory subsystems are different, but a comparison that selectively excludes core-private resources from one core's area while factoring them in for another one is anything but fair.
The reason you exclude the latter levels two-fold:
1. Caches are easy to add. Once you have the basic block, you just copy paste. Logic transistors take great planning and thought to add.
2. L1 caches are included because it's actually very close to the core and fundamental to it's operation. You have separate Instruction and Data for example. L2 is generalized.

And yep, Lion Cove's cache levels are marketing. "L0" would have made sense if they greatly expanded the uop cache, or added a secondary level, larger uop cache and called that "L0". But in this case L0 = L1. I wouldn't even add 192KB L1.5 in the calculation. I hope the lies go away with the marketing team shift but it's "AI"... so not pinning my hopes on it.

Even if you exclude it, M4 has a *huge* advantage over Lion Cove. It's perf/W is unparalleled. It does it at a much lower clock too, meaning headroom, unlike Intel which is killing it's chips to reach peak frequency.

Do you guys realize how obnoxious Intel would be if the situation was flipped and M4 was their chip and 285K was from Apple?
 
Last edited:

DavidC1

Golden Member
Dec 29, 2023
1,546
2,539
96
Normalized Zen5C is 1.6X more area than Skymont 😂(1.9/1.14) skymont wins just by a little bit
Gracemont was good but Skymont is a better core in all areas, ability to scale, perf/watt, absolute performance.

It also compared poorly against Zen 4C. It was a 1mm2 core on Intel 4 versus 1.4mm2 for Zen 4C on N5. The latter was ~40% faster per clock, with FP having a greater advantage.

On Skymont vs Zen 5C it goes like this:
-30% Int gap, 60% FP gap --> 10% Int gap, 18% FP gap
-1mm2 on Intel 4 vs 1.4mm2 on N5 --> 1.1mm2 on N3B vs 1.7mm2 on N3E

This is why you need 144 core Sierra Forest to beat 64 core Zen 4C by only 10-20%. Some were close enough that SRF came close to 128 core Zen 4C, like on Cloud and VM. This suggests the lack of FP performance was a significant hindrance for versus comparisons, which will improve greatly on Skymont. Also Gracemont likely had corner case instructions which were weaker compared to a Big core, plus the 4 core per cluster module and the memory level parallelism isn't as robust.

The problem for Clearwater Forest is the nearly 1 year delay. Without it, it would have went against Zen 5C, which would have been very favorable.

Addendum: Someone said Skymont is twice the transistors of Gracemont. That would be true if you assume N5 was 1.84x over N7, but Angstronomics said it's really 1.5x. Based on that it's a 70% larger core. And since they straight up doubled the FP block which is usually responsible for 20-25% of the core size, it means without it, it would have been only 30-40% larger core for a 30%+ Integer gain which is beating the inverse square law.
 
Last edited:
  • Like
Reactions: Io Magnesso

511

Platinum Member
Jul 12, 2024
2,497
2,301
106
That's easy.

6/8 transistors per bit
8 bits per byte
1024 x 1024

~50/67 million transistors per MB.
What about macros and tags it's not that easy lol
Skymont is smaller than both Zen 5, Lion Cove and M4, so despite looking super wide, it has area of an E core even from an ARM camp perspective so some sacrifices are being made.

The reason you exclude the latter levels two-fold:
1. Caches are easy to add. Once you have the basic block, you just copy paste. Logic transistors take great planning and thought to add.
2. L1 caches are included because it's actually very close to the core and fundamental to it's operation. You have separate Instruction and Data for example. L2 is generalized.

And yep, Lion Cove's cache levels are marketing. "L0" would have made sense if they greatly expanded the uop cache, or added a secondary level, larger uop cache and called that "L0". But in this case L0 = L1. I wouldn't even add 192KB L1.5 in the calculation. I hope the lies go away with the marketing team shift but it's "AI"... so not pinning my hopes on it.

Even if you exclude it, M4 has a *huge* advantage over Lion Cove. It's perf/W is unparalleled. It does it at a much lower clock too, meaning headroom, unlike Intel which is killing it's chips to reach peak frequency.

Do you guys realize how obnoxious Intel would be if the situation was flipped and M4 was their chip and 285K was from Apple?
If that was the case Apple would never have left Intel.
 
  • Like
Reactions: Io Magnesso

DavidC1

Golden Member
Dec 29, 2023
1,546
2,539
96
What about macros and tags it's not that easy lol
Pentium III - 9.5 million transistors
Pentium III Coppermine - 28 million 256KB on-die L2 = +18.5
Pentium III Tualatin - 44 million transistors 512KB on-die L2 = +16

Pentium 4 Willamette - 42 million 256KB L2
Pentium 4 Northwood - 55 million 512KB L2 = +13

They don't really matter that much.

They are also lot less dense than SRAM, which you can plainly see from the die shots, since you can see gaps whereas SRAM is filled and beautiful colors.

If you want to learn about CPUs, read older articles and look at history.
 
Last edited:

511

Platinum Member
Jul 12, 2024
2,497
2,301
106
Gracemont was good but Skymont is a better core in all areas, ability to scale, perf/watt, absolute performance.

It also compared poorly against Zen 4C. It was a 1mm2 core on Intel 4 versus 1.4mm2 for Zen 4C on N5. The latter was ~40% faster per clock, with FP having a greater advantage.

On Skymont vs Zen 5C it goes like this:
-30% Int gap, 60% FP gap --> 10% Int gap, 18% FP gap
-1mm2 on Intel 4 vs 1.4mm2 on N5 --> 1.1mm2 on N3B vs 1.7mm2 on N3E

This is why you need 144 core Sierra Forest to beat 64 core Zen 4C by only 10-20%. Some were close enough that SRF came close to 128 core Zen 4C, like on Cloud and VM. This suggests the lack of FP performance was a significant hindrance for versus comparisons, which will improve greatly on Skymont. Also Gracemont likely had corner case instructions which were weaker compared to a Big core, plus the 4 core per cluster module and the memory level parallelism isn't as robust.

The problem for Clearwater Forest is the nearly 1 year delay. Without it, it would have went against Zen 5C, which would have been very favorable.
Addendum: Someone said Skymont is twice the transistors of Gracemont. That would be true if you assume N5 was 1.84x over N7, but Angstronomics said it's really 1.5x. Based on that it's a 70% larger core. And since they straight up doubled the FP block which is usually responsible for 20-25% of the core size, it means without it, it would have been only 30-40% larger core for a 30%+ Integer gain which is beating the inverse square law.
Half a year tbh but yeah it's a bit late in cycle unless DMR is delayed by half a year it's game over for Intel in DC.

Gracemont density is roughly 60 Mxtor/mm2(H408G60) from the reverse engineering by Techinsights. For SRF they have done the teardown if only someone had the report and let us know the pitches. 🙂

Also I believe if we go by Classic Intel methodology they would have used the most performant library which is the largest one I would doubt Skymont would be 2X larger if anything it would be less considering 3 Fin N3E lib is(H221G54) would be roughly ~124 Millionxtor/mm2 add in the fact that Intel must have relaxed it even more it would fall less than 2X.
I don't remember macros and tags being so big that it would suddenly go from 50 million transistors to 150 million+.

They are also lot less dense than SRAM, which you can plainly see from the die shots, since you can see gaps whereas SRAM is filled and beautiful color.
It would go from 50 -> 60-70 Million though yeah they are less dense. There is additional circuitry as well for SRAM as well like the sense amplifiers
 

DavidC1

Golden Member
Dec 29, 2023
1,546
2,539
96
It would go from 50 -> 60-70 Million though yeah they are less dense. There is additional circuitry as well for SRAM as well like the sense amplifiers
Only noticeable change is the 8T transistors(which aren't used in the L3 caches and I doubt they are on the 1MB+ sizes either).

Look at history, if you really want to learn. See my above post.
Gracemont density is roughly 60 Mxtor/mm2(H408G60) from the reverse engineering by Techinsights. For SRF they have done the teardown if only someone had the report and let us know the pitches. 🙂
Transistor count metric is useless data. How large the die is, how much power it consumes, and how it performs is only thing that matters.
 
  • Like
Reactions: Io Magnesso
Jul 27, 2020
25,247
17,542
146
Apple shows how that can work.
There is more to it I'm sure. One guy cannot enforce anything. It's a culture thing there and someone needs to spill the beans on how they get so much productivity out of their employees.

In my job, I wouldn't have gone anywhere if I hadn't been left to my devices. I was put in charge of a product, saw where things could be improved and took the initiative without asking for anyone's permission. And they didn't get mad at me and saw that I had done something useful to improve the processes. Had I been micromanaged by a manager who wanted a complete timesheet filled out daily on how I had spent my time, I wouldn't have lasted even a month in this job. Employees need to be given the freedom to perform, especially ones who like solving problems. Is Apple doing that? Maybe. Maybe their managers are very forgiving? Maybe they analyze the skills and capabilities of each of their subordinates and don't expect more than what they can manage? Maybe when they see that the team is falling behind, instead of threatening them with grave consequences, do they listen intently to their employees and figure out how to support them with whatever they need to get the work done faster and better?
 

OneEng2

Senior member
Sep 19, 2022
638
869
106
Wow. Lots of great information ;).

M4 does lots of things right; however, it does so by utilizing a big monolithic die.

Additionally, M4 (and ARM in general) is not as well equipped to serve DC as Zen 5/6 are.... at least that is how it appears at this time.

M4 is very well designed for its target purpose though. Everyone praises the power efficiency and single thread performance; however, I suspect that the very things that make it a super star in a phone or tablet will kill it in DC.

AMD has specifically stated they are designing for "DC First". This is where the margins, growth, and future is. Sadly, I think desktop performance is going to lean more to HPC in the future and everything else will be laptop (and most of that just a bunch of business users and home users browsing the internet). I think that LNL is a very good design; however, I also fear that it isn't cost effective for Intel.

Skymont, like M4 is good at what it does, but it isn't good for everything. People rightly criticize Intel's new P Core; however, without it, ARL and LNL would have been quite pathetic IMO.
 

511

Platinum Member
Jul 12, 2024
2,497
2,301
106
Wow. Lots of great information ;).

M4 does lots of things right; however, it does so by utilizing a big monolithic die.

Additionally, M4 (and ARM in general) is not as well equipped to serve DC as Zen 5/6 are.... at least that is how it appears at this time.

M4 is very well designed for its target purpose though. Everyone praises the power efficiency and single thread performance; however, I suspect that the very things that make it a super star in a phone or tablet will kill it in DC.

AMD has specifically stated they are designing for "DC First". This is where the margins, growth, and future is. Sadly, I think desktop performance is going to lean more to HPC in the future and everything else will be laptop (and most of that just a bunch of business users and home users browsing the internet). I think that LNL is a very good design; however, I also fear that it isn't cost effective for Intel.

Skymont, like M4 is good at what it does, but it isn't good for everything. People rightly criticize Intel's new P Core; however, without it, ARL and LNL would have been quite pathetic IMO.
Intel needs a new P core arch but they don't have replacement so they are extending their P cores until Unified.

LNL issue is the memory part also if it didn't had die waste unit it would have been better off for their margins.
 
  • Like
Reactions: Io Magnesso

511

Platinum Member
Jul 12, 2024
2,497
2,301
106
It's not a "die" waste unit because it's not dying and refuses to die. Panther Lake will still have it. It's right now pretty much an "alive" waste unit.
Here the "die" means a Square/rectangular PCs of silicone not "die":"dead" also In PTL the NPU is substantially smaller they shrank the count from 6->3 Tile while maintaining slightly better TOPS.
 
  • Like
Reactions: Io Magnesso

Doug S

Diamond Member
Feb 8, 2020
3,214
5,522
136
Not that I disagree but what about patent laws?

Those are mostly irrelevant when it comes to those big companies competing, as they all have enough patents that they generally enter into cross licensing agreements with each other. Even if they don't have a formal cross licensing agreement if one tried to sue another the one they sued could come up some patents they were violating, and the outcome would be a cross licensing agreement.

They worry about patent trolls a lot more than they worry about each other.
 

Doug S

Diamond Member
Feb 8, 2020
3,214
5,522
136
Additionally, M4 (and ARM in general) is not as well equipped to serve DC as Zen 5/6 are.... at least that is how it appears at this time.

See that's the baseless BS my post was talking about. No evidence is offered, just "how it appears at this time". They haven't shipped a server chip, so obviously M4 (and somehow extending to ARM as a whole) is "not as well equipped". Total BS!

What, specifically, do you think Apple is lacking that makes them less well equipped to handle DC roles when compared to Zen 5? If Apple wanted to they could design a chip with a bunch of P cores leave off the GPU and other stuff, and if necessary use their fusion IO to tie multiple such chips together. That would be VERY competitive with ARM's DC stuff.

Apple isn't interested in that market though, since consumers and small businesses don't buy big iron servers - though it sounds like they're building something for their own datacenters.
 
  • Like
Reactions: CouncilorIrissa

OneEng2

Senior member
Sep 19, 2022
638
869
106
See that's the baseless BS my post was talking about. No evidence is offered, just "how it appears at this time". They haven't shipped a server chip, so obviously M4 (and somehow extending to ARM as a whole) is "not as well equipped". Total BS!

What, specifically, do you think Apple is lacking that makes them less well equipped to handle DC roles when compared to Zen 5? If Apple wanted to they could design a chip with a bunch of P cores leave off the GPU and other stuff, and if necessary use their fusion IO to tie multiple such chips together. That would be VERY competitive with ARM's DC stuff.

Apple isn't interested in that market though, since consumers and small businesses don't buy big iron servers - though it sounds like they're building something for their own datacenters.

It kinda looks like (which is why I specifically used that wording) ARM does not do well in the current state of servers.

You can, of course, say that an M4 based server would do much better, but it doesn't exist. What does exist doesn't seem to be competitive .... at this time.

Do you have other data that would indicate something different?
 

poke01

Diamond Member
Mar 8, 2022
3,533
4,859
106

It kinda looks like (which is why I specifically used that wording) ARM does not do well in the current state of servers.

You can, of course, say that an M4 based server would do much better, but it doesn't exist. What does exist doesn't seem to be competitive .... at this time.

Do you have other data that would indicate something different?
These Intel lakes are all consumer products, so the topic of ARM CPUs when compared to Intel's equivalence like Lunar lake or LNC for client is for laptops or mini-pcs or desktops.

Why does there need be a M4 based server to validate M4's performance? CPUs can be tailor made for client use and thats fine.
 

OneEng2

Senior member
Sep 19, 2022
638
869
106
These Intel lakes are all consumer products, so the topic of ARM CPUs when compared to Intel's equivalence like Lunar lake or LNC for client is for laptops or mini-pcs or desktops.

Why does there need be a M4 based server to validate M4's performance? CPUs can be tailor made for client use and thats fine.
I certainly never implied that M4 wasn't good for what it was designed for.

I guess I am defending x86 from a business standpoint as the fastest growing segment is DC and it also has the highest margins .... so if you make processors for a living, it is a good strategy to design for DC and trickle down to client.

Client, especially desktop, is a much lower margin market which lends it to be more utilitarian and more like a commodity. In other words, it's a lot more difficult to make money doing it.

This is why chiplets/tiles are becoming so important. They allow more cores, more types of cores, etc, with lower sized die. I will agree that this comes at a performance cost; however, as AMD has shown, you can get good performance AND use chiplets.

M4, I believe, was tailor made for mobile correct?
 
Jul 27, 2020
25,247
17,542
146
Let's see. M4 does not run Linux or Windows. What exactly are cloud providers going to use it for then? MacOS Server (if a latest version exists) is almost not used by businesses.

Also, Apple can't push their Apple store through server M4 deployments. Businesses use custom software on servers, mostly opensource. No double dipping for Apple there.
 
  • Like
Reactions: Io Magnesso

poke01

Diamond Member
Mar 8, 2022
3,533
4,859
106
I certainly never implied that M4 wasn't good for what it was designed for.

I guess I am defending x86 from a business standpoint as the fastest growing segment is DC and it also has the highest margins .... so if you make processors for a living, it is a good strategy to design for DC and trickle down to client.

Client, especially desktop, is a much lower margin market which lends it to be more utilitarian and more like a commodity. In other words, it's a lot more difficult to make money doing it.

This is why chiplets/tiles are becoming so important. They allow more cores, more types of cores, etc, with lower sized die. I will agree that this comes at a performance cost; however, as AMD has shown, you can get good performance AND use chiplets.

M4, I believe, was tailor made for mobile correct?
AMD doesn’t use only chiplets for laptop parts, Zen 5 mobile is monolithic and also chiplet. M4 is made for mobile but its architecture is scalable and then you get laptop only parts like M4 Pro/Max which are not for tablets/passively cooled laptops etc
Let's see. M4 does not run Linux or Windows. What exactly are cloud providers going to use it for then? MacOS Server (if a latest version exists) is almost not used by businesses.
Nope, it’s not designed for cloud/DC just as how lunar lake or Strix point isn’t designed for cloud.

You don’t see these x86 SoCs being used in cloud do you.
 

poke01

Diamond Member
Mar 8, 2022
3,533
4,859
106
No, but you most certainly see Zen 5 and Redwood Cove/Lion Cove in DC everywhere.

There is a difference between the same die being used and the same cores being used.
Exactly and the reason why M4 P cores aren’t in servers is because Apple doesn’t make server based CPUs to sell to others because that’s not their business.

In the end what matters to most is compatibility and efficiency and companies will choose based on that. DC is a big market and that is the primary focus for Intel and AMD. Like you said designs based for their primary market will trickle down to client too.
 
  • Like
Reactions: Io Magnesso