Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 911 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
910
828
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,034
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,527
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,435
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,321
Last edited:

poke01

Diamond Member
Mar 8, 2022
4,555
5,848
106
AMD/Intel puts a lot of xtor for SIMD Apple uses that budget for Integer Performance
don’t Apple also have good FP especially from M4 onwards?
Not as great as AMDs but good enough.
 

poke01

Diamond Member
Mar 8, 2022
4,555
5,848
106
SIMD != FP
Yeah look at the dieshot of Meteor lake cpu from Semianalysis, AVX-512 occupies a lot of space in redwood cove.

Maybe there should be some SKUs where ST performance is given preference over SMID especially in products like Panther Lake and Lunar.
 

coercitiv

Diamond Member
Jan 24, 2014
7,443
17,731
136
Why would they design Panther Lake to hit 5 GHz?
To lower performance. /joke

Backside power delivery changed the design rules and maybe even Intel's priorities for a mobile product: they may have favored density versus best frequency. Listen to this 3 min segment on 18A transition from the perspective of a chip architect. I'll try to summarize via bullet points:
  • power delivery is no longer embedded, that gives us higher density and better vdroop (lower resistance)
  • in traditional designs, power routes were used to shield signal paths, which enabled higher frequencies
  • power routes are gone, so now we have they rely on spacing and make sure "we're not routing things next to each other"
  • "it's not terribly complicated, it's just a change"

My interpretation of the above is they get a choice between reduced density gains to keep SNR or reduced clocks to capitalize more on density gains. The price to pay is probably double, there's a penalty for clocks due to noise and another one due to thermals as density increases.

In the end this gets compounded with 18A yields/quality. Based solely on their lineup and the drop in fmax across SKUs, I think yields aren't helping.
 

511

Diamond Member
Jul 12, 2024
5,009
4,522
106
Yeah look at the dieshot of Meteor lake cpu from Semianalysis, AVX-512 occupies a lot of space in redwood cove.

Maybe there should be some SKUs where ST performance is given preference over SMID especially in products like Panther Lake and Lunar.
Even Zen 5 look at the ports occupying significant area
 

Meteor Late

Senior member
Dec 15, 2023
347
382
96
Oh they are binned.

Ok, let's clarify, they are binned wayy less aggresively to hit those frequencies. Meaning, you can get 4.6GHz in an enormous number of samples, unlike Intel's 5.1GHz chip, which will be in very limited quantity and the most common chips we will see will be the sub 5GHz ones.
 
  • Like
Reactions: OneEng2

OneEng2

Senior member
Sep 19, 2022
946
1,158
106
Yeah look at the dieshot of Meteor lake cpu from Semianalysis, AVX-512 occupies a lot of space in redwood cove.

Maybe there should be some SKUs where ST performance is given preference over SMID especially in products like Panther Lake and Lunar.
It's my understanding that Intel intends to eventually add AVX10 to the "mont" processors; however, I believe they will be relegated to a 128bit wide implementation (verses 512) to save space/power.

This is why I scoff when people claim that "mont" will simply become a P core (complete with SMT) and retain its current size advantage and thermal properties.
 

511

Diamond Member
Jul 12, 2024
5,009
4,522
106
It's my understanding that Intel intends to eventually add AVX10 to the "mont" processors; however, I believe they will be relegated to a 128bit wide implementation (verses 512) to save space/power.

This is why I scoff when people claim that "mont" will simply become a P core (complete with SMT) and retain its current size advantage and thermal properties.
It can become the P core while being smaller than P core that's entirely possible it would a bit larger than current atom implementation but still smaller the the Cove.
 
  • Like
Reactions: OneEng2

OneEng2

Senior member
Sep 19, 2022
946
1,158
106
It can become the P core while being smaller than P core that's entirely possible it would a bit larger than current atom implementation but still smaller the the Cove.
It is demonstrably true that Intel's P cores are less PPA than AMD's P Cores. This is a decent comparison as well since both cores accomplish mostly the same thing (although Zen still supports AVX512).

So one could say that the Intel P core architecture must have some downfalls compared to Zen 5.

There really isn't an AMD version of "mont" as "mont" does way fewer things than a Zen 5 core does. This makes it much more difficult to know, with any certainty, what "mont" would look like if it DID support everything.
 

511

Diamond Member
Jul 12, 2024
5,009
4,522
106
It is demonstrably true that Intel's P cores are less PPA than AMD's P Cores. This is a decent comparison as well since both cores accomplish mostly the same thing (although Zen still supports AVX512).
So does Intel's P core it's just fused off it's physically present
There really isn't an AMD version of "mont" as "mont" does way fewer things than a Zen 5 core does. This makes it much more difficult to know, with any certainty, what "mont" would look like if it DID support everything.
AMD didn't have the resource to create two cores so they created one and scaled it according to needs Intel dis but now they are running out of resources to have two separate cores .
 

OneEng2

Senior member
Sep 19, 2022
946
1,158
106
So does Intel's P core it's just fused off it's physically present
That would explain some of the bloat. Thanks.
AMD didn't have the resource to create two cores so they created one and scaled it according to needs Intel dis but now they are running out of resources to have two separate cores .
I am still questioning the strategy of making 2 totally different architecture cores and gluing them together in a single processor for doing essentially the same kinds of tasks.

It's different when you make a core designed for specific tasks (like graphics or AI). But two cores that are different doing the same job?

It will be VERY interesting to see how the 288c Clearwater Forest holds up against the Venice D 192c. On paper, a single Zen 5 core is worth about 1.4 "mont" cores. With this metric, the 288c part wins.

Can't wait to see a detailed writeup on these two.
 

poke01

Diamond Member
Mar 8, 2022
4,555
5,848
106
There is a fundamental issue with Intels CPU strategy in laptop and server.

Unlike desktop, you cannot power your way thru server and mobile where perf/w is paramount.
 

Josh128

Golden Member
Oct 14, 2022
1,500
2,249
106
It will be VERY interesting to see how the 288c Clearwater Forest holds up against the Venice D 192c. On paper, a single Zen 5 core is worth about 1.4 "mont" cores. With this metric, the 288c part wins.

Can't wait to see a detailed writeup on these two.
Im more interested to see how the 256C Venice D compares to the 288c CWF, in both performance and power efficiency. 288 is exactly 12.5% more than 256, so core for core comparisons will be pretty straightforward between the two. I suspect it will be a bludgeoning in favor of Zen 6D.
 

alcoholbob

Diamond Member
May 24, 2005
6,390
469
126
It's an interesting concept I suppose.

I guess I believe that in this day and age, there aren't any mysterious CPU architectures that magically work better than everything that came before it.

I believe that Apple, Intel, and AMD all have equivalent engineering teams and tools. The difference is what you target your architecture to do and what things you decide to prioritize and what things you decide to give up.

I believe you can't say I want it all and actually get it all. If you say I want a core that is very power efficient, you can't also say I want a core that clocks higher than the competition.

You can't say I want the core to be very small AND I want 4 way SMT, AVX512, etc, etc.

I do agree that Lion Cove appears to have lower PPA than Zen 5, although I think ARL in general gets a pretty bad rap on the basis of its poor showing in latency sensitive applications (which is mostly a ring bus issue IMO vs a core problem).

I think it is a pretty tall order to take ANY derivative of Skymont and make it compete with Zen 5 across the board. I think you can make it do some things better, but at the expense of doing other things worse.

I just don't see getting something for nothing in engineering.

I assume “ringbus issue” is not synonymous with ringbus frequency? Because you can OC the ringbus to match Raptor Lake ring frequency and it’s still miles slower. In fact even when frequency normalized and ring frequency normalized, hell even if you overclock both of these faster than a comparable RPL processor, it’s still slower than a similarly tuned Raptor Lake CPU by a good 20% in gaming if you look at Skatterbencher or DannyZReviews numbers. To me it’s got to be some kind of cache latency or die-to-die communucation problem.
 
  • Like
Reactions: Joe NYC

511

Diamond Member
Jul 12, 2024
5,009
4,522
106
Im more interested to see how the 256C Venice D compares to the 288c CWF, in both performance and power efficiency. 288 is exactly 12.5% more than 256, so core for core comparisons will be pretty straightforward between the two. I suspect it will be a bludgeoning in favor of Zen 6D.
Pretty sure Venice Dense will win much more newer arch + HT + AVX-512 VD Competitor is like DMR.
 
Last edited:

DavidC1

Platinum Member
Dec 29, 2023
2,003
3,148
96
Pretty sure Venice Dense will win much more newer arch + HT + AVX-512 VD Competitor is like DMR.
Clearwater Forest was a formidable part in it's original timeline. As it now stands, it's at least 6 months delayed. I speculate the delays are also going to result in slightly lower specs than original, meaning it'll be both late and worse. Just like Knights Landing Xeon Phi which got 9 month delay plus 10% higher power and 10% lower performance. Now, CWF has at best, 6 months TTM advantage over Venice, which is bad. Then conveniently sales will tank and they will say "noone wanted E core server" when it's because of execution failures.
I guess I believe that in this day and age, there aren't any mysterious CPU architectures that magically work better than everything that came before it.

I just don't see getting something for nothing in engineering.
Isn't that ignoring the reality that there are always people that are better than you at something, which gets compounded when you multiply that as with a team? There are *always* someone, something, some group that's better, faster, stronger than others. They just are. And that gets compounded when you are talking about a group of people, a team. You have seen masterchef? How some just fumble? Aren't they given same materials and tools as the rest? How do some consistently do good and some fumble? Aren't they all under the same "engineering limits"?
I assume “ringbus issue” is not synonymous with ringbus frequency? Because you can OC the ringbus to match Raptor Lake ring frequency and it’s still miles slower. In fact even when frequency normalized and ring frequency normalized, hell even if you overclock both of these faster than a comparable RPL processor, it’s still slower than a similarly tuned Raptor Lake CPU by a good 20% in gaming if you look at Skatterbencher or DannyZReviews numbers. To me it’s got to be some kind of cache latency or die-to-die communucation problem.
Things like ringbus are a high level block. There are many details that aren't being told, those resulting from countless decisions that will never be seen by the public. The chaos of multiple CEO changes, process issues, employee turnovers, and braindrain is all having an issue. So even if you have the ring at same frequency, it's just not same as before. Maybe the ring is ok, but the caches are not. Maybe the routers are underperforming. The "chaos" we have seen are internal problems being so problematic that it's oozing out externally. So we get to see glimpses of how bad it is.

There are also much simpler factors at play. If you are an engineer at Intel, the company that the public sees as a worse counterpart in a fading market(whether that is true or not, but perception), and you hear of endless layoffs, how motivated, or how hard will you be working? Roughly 2/3rds of employees said they will leak NDAs if they are laid off/fired by a company and if they see that action as being unjust. Now carry that though and actions at a company level with 100K people.

I think it can be simultaneously true that 18A may be ok yet it has underperformed in a product. Because it's not enough that 18A is ok, it's just a baseline to create something. And because even in the most optimistic scenario, 18A is still not good enough to spend porting by 3rd party company, we only have Intel products to base it on, so regardless of whether it's entirely the fault of 18A, a combo of 18A + design, or entirely the design, the end result is the same. If it'll disappoint it will disappoint, and if it's not it won't.
 
Last edited:

DavidC1

Platinum Member
Dec 29, 2023
2,003
3,148
96
I'll ask again, what is the 18A Fmax when power isn't limited? How does that compare to Intel's other nodes?
Dullard, @Meteor Late already responded to you in post #22,721. Let me jog your memory.

He said:
That's completely irrelevant, because those power levels are only reached in multithreaded apps, none of these are going to consume 45W or more in 1T scenarios, which is where those frequencies happen. that is to say, Panther Lake doesn't have lower fmax just because it has a lower PL2, that's not how it works.
Lunarlake has 37W PL2 at max, but it uses ~10W to get 5.1GHz Turbo. That's not even close to the low end TDP(PL1) limit of 15W. Not even desktop processors need even 40W to reach peak Turbo, whether you are talking 5.7GHz Arrowlake or 6GHz Raptorlake.

So we can rule out power as fmax limits. The culprit is elsewhere.
SIMD != FP
Well, it's being used for the same purpose.
Yeah look at the dieshot of Meteor lake cpu from Semianalysis, AVX-512 occupies a lot of space in redwood cove.

Maybe there should be some SKUs where ST performance is given preference over SMID especially in products like Panther Lake and Lunar.
This is what I mean AVX-512 should have been AVX3-256. Even in servers the 512-bit gets you only ~15% in average. It's the ISA that's bringing you most of the gains over AVX2. You hurt everything else just to satisfy the HPC crowd. In CPUs you are bound by everything else - you could have more L/S units, more memory bandwidth, more cache bandwidth, which speeds up every other application. I've seen some tests for a Xeon where it said it got ~10% of theoretical performance based on Flops. Heck, if you use Intel Linpack and test for Flops it doesn't even get 100% of the theoretical Flops on a test that is designed to test Flops!

Now I can speculate the way they justify extra space for 512-bit over 256-bit is they look at the core size and say, if it delivers 15% perf for 15% area, it makes sense. But AVX-512 only offers that in select set of applications. What if that 15% area was spent on beefing up general purpose performance instead?

What is better overall?
A) 256-bit AVX with 1.15x in everything
B) 512-bit AVX, but 1x in everything

Remember option A means that 1.15x is a boost in AVX as well. So if an application brought 30% by going to 512-bit over 256-bit, the 1.15x gain means you are comparing 1.3/1.15, or 13% disadvantage in AVX-512 favoring application, while being 15% faster in everything else.
 
Last edited:

Joe NYC

Diamond Member
Jun 26, 2021
3,888
5,410
136
This is what I mean AVX-512 should have been AVX3-256. Even in servers the 512-bit gets you only ~15% in average. It's the ISA that's bringing you most of the gains over AVX2. You hurt everything else just to satisfy the HPC crowd. In CPUs you are bound by everything else - you could have more L/S units, more memory bandwidth, more cache bandwidth, which speeds up every other application. I've seen some tests for a Xeon where it said it got ~10% of theoretical performance based on Flops. Heck, if you use Intel Linpack and test for Flops it doesn't even get 100% of the theoretical Flops on a test that is designed to test Flops!

Now I can speculate the way they justify extra space for 512-bit over 256-bit is they look at the core size and say, if it delivers 15% perf for 15% area, it makes sense. But AVX-512 only offers that in select set of applications. What if that 15% area was spent on beefing up general purpose performance instead?

What is better overall?
A) 256-bit AVX with 1.15x in everything
B) 512-bit AVX, but 1x in everything

Remember option A means that 1.15x is a boost in AVX as well. So if an application brought 30% by going to 512-bit over 256-bit, the 1.15x gain means you are comparing 1.3/1.15, or 13% disadvantage in AVX-512 favoring application, while being 15% faster in everything else.

That's water under the bridge, at this point.

What is interesting is what comes next. For example, on AMD side, after spending the die area on full AVX-512 in Zen 5 and had to squeeze it in on N4 node, what comes next, with 2 node jump and a jump in transistor budget?

Vast majority of the transistor budget was spent on extra cores. And same story on Intel side with NVL.

Maybe the path of throwing more transistors to extract more ST performance (mentioned above in the tread) is very constrained...
 
  • Like
Reactions: Tlh97

DavidC1

Platinum Member
Dec 29, 2023
2,003
3,148
96
That's water under the bridge, at this point.

What is interesting is what comes next. For example, on AMD side, after spending the die area on full AVX-512 in Zen 5 and had to squeeze it in on N4 node, what comes next, with 2 node jump and a jump in transistor budget?
N2 isn't really a 2 node jump, more like 1 at best. N2 gives you almost no density improvement. And this is using 2010+ standards, not in the golden age. Compared to the golden age of scaling we're talking 0.5x.

The video that @coercitiv linked about 18A, continues the trend of less gains requiring more effort. We went from Golden age of scaling to dung-level.
Vast majority of the transistor budget was spent on extra cores. And same story on Intel side with NVL.

Maybe the path of throwing more transistors to extract more ST performance (mentioned above in the tread) is very constrained...
It's not that constrained. If you expand everything, the decoders, the ports, the units, you'll get that performance. Even Keller said that you can have easily 1K RoB CPUs. Intel is much closer there, but their monts aren't, and AMD isn't there either. Google said the decode limit being at 2-3 is wrong because there are many code that can do, 6, 8, or even more. Many also said decode isn't for average, but for peaks. Also the average ILP has increased compared to the older days where everything was slower and smaller such as branch prediction and caches. So you can't take the knowledge from Pentium Pro days and apply it to today, because even the original Raspberry Pi computers are better than that. No one expands decode while leaving everything else. It's a general purpose CPU so you have to beef up everything else anyway. So extra decoders come with more and faster BTBs, better branch algorithms, faster and wider caches, more compute units, more buffers, more L/S units.

Actually some expressed doubt there will be greater gains after Athlon 64! The trick is not doing the Intel P core way, which causes bloat. You need innovation otherwise which doesn't fall from the sky but new ideas from engineers. This is what @OneEng2 overlooks. The E core had new innovation to get that "catchup". Predecode bits in L1, multi-level predecode cache, Clustered decode, nanocode, they are all new ideas. Sandy Bridge was good too, because it brought physical register files and uop caches. Further efficient gains will need additional new ideas.

Let's imagine a hypothetical future where CPUs are like everything else, meaning no more Moore's Law. The last process node ever to come has been 2 years ago. What do you do? No gains from now on? Absolutely not. It will continue to get refined. Even internal combustion engines get better, and there's no such thing as Moore's Law there. And some believe it can improve much further with things like rotary valve engines.
 
Last edited: