Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 577 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
846
799
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:

511

Diamond Member
Jul 12, 2024
4,520
4,137
106
Intel is saying Skymont is +32% IPC over Gracemont so that is where I based my figure. 32% over Gracemont does not equal Raptor Cove. That would be more like 48%.
32% is Integer IPC and for FP it is 72% as you can see in FP they lacked
Intel-Raptor-Cove-13th-Gen-Raptor-Lake-AMD-Zen-4-Ryzen-7000-CPU-Core-IPC-Performance-_-DDR5-60...png
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
I have gone over the Intel provided CB R24 data in some detail and here is what I have found. Please let me know where I've lost the trail.
It's fine if you got it wrong. You are asking questions, and it challenges us too. It's all good.
Based on my testing of my 14900K in CB R24 MT, Raptor Cove does about 21.2 points/GHz (no HT), 27.8 points/GHz (with HT) and Gracemont about 12.8 in MT.
This is from Chips and cheese analysis of Cinebench 2024.
When hitting the execution units, Cinebench 2024 uses scalar and 128-bit packed floating point operations. Wider vector execution units are not useful. Scalar integer performance plays an important role in keeping the FP execution units fed.
So both scalar performance and 128-bit floating point performance is important. Scalar performance improves 32% on Skymont, and on top of that they are adding double the amount of FP units, meaning you get results that are better.

What is the difference between E and P in R23? Techpowerup shows ~50% between E and P for Alderlake. So yes Skymont can improve to a point where it's on par(or even better) than Raptor Cove.
These results are R23 on your system:
CB ST shows Raptor to have 38% better IPC than Gracemont. Of course Raptor loses it's HT capability here.
So R24 must increase FP load for the difference to be greater, where Skymont entirely closes the gap.

Why do I also have a feeling the gap between the E and P are greater than it should be for your system? A guy with 14700K on Youtube is getting 75 for E and 128 for P. 70%, but that's at default clocks, and P is quite a bit higher.
 
Last edited:

511

Diamond Member
Jul 12, 2024
4,520
4,137
106
GNR is underperforming in Tomshardware review too. Behind Emerald Rapids in most of their tests. Strange they don't mention this.
Well they won't but I think i know why it's doing it but it is based on my guess that it is the frequency affecting the performance Zen 5 may have higher low core load frequency vs Granite Rappids and the workloads are not embarrassingly parallel so GNR doesn't show goodness and maybe bit of bugs with software like NAMD
Screenshot_20240927-221942.png
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
Well they won't but I think i know why it's doing it but it is based on my guess that it is the frequency affecting the performance Zen 5 may have higher low core load frequency vs Granite Rappids and the workloads are not embarrassingly parallel so GNR doesn't show goodness and maybe bit of bugs with software like NAMD
Clock doesn't explain lot of the losses.

They screwed up or the platform is rushed out. Based on Sierra Forest's scaling problems it may not be fixed until the Clearwater Forest generation.
 

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
@Hulk Your Gracemont cores are underperforming. Are you sure you tested CB2024 ST?


3.4GHz Turbo clock N100 is getting 60 points. That's 17.6 points per GHz.
The E's perform differently when working with the P's. The P's can be fully isolated by turning of the E's. Then the E's can be "determined" by making the final score match the actual score. When you do that you will end up with about 12.8. Run them with 1 P at 800MHz and you will get around the number you quoted.

But if you use that number along with the number found for the P's with the E's off you will get an outrageously high total score. As the E's join in with the P's in greater number their IPC decreases in both versions of Cinebench. I've noticed this for years. I'm thinking it might have something to do with the shared L3 cache or something? That's what I mean when I write benching the E's is hard because they are slipperly and refuse to be nailed down.
 

511

Diamond Member
Jul 12, 2024
4,520
4,137
106
Clock doesn't explain lot of the losses.

They screwed up or the platform is rushed out. Based on Sierra Forest's scaling problems it may not be fixed until the Clearwater Forest generation.
I think it will be fixed by Q1 25
 

Thunder 57

Diamond Member
Aug 19, 2007
4,026
6,741
136
You’re predicting Zen 5 to be 15-20% more efficient than ARL? Guess we’ll find out on the 24th.

What if ARL is more efficient? Does that mean we can finally move on from everybody pretending their desktop PC is a server rack with strict TCO requirements?

I don't think most people care about power cost which in many areas is negligable. Rather, I think many, like myself, are more considered about extra heat being pumped into the room.
 

DrMrLordX

Lifer
Apr 27, 2000
22,901
12,967
136

CPU A gets N performance at 200W. CPU B gets N * 1.01 performance at 300W.

Lower the power target to 125W for both CPUs. Which do you expect to be faster @125W? A or B? The only way the answer is B is if the frequency scaling past 125W is insanely bad.

I don't think most people care about power cost which in many areas is negligable. Rather, I think many, like myself, are more considered about extra heat being pumped into the room.
It's not just the extra heat being pumped into the room. It's removing it from the CPU in the first place. Most CPUs at that power level require custom water. Even an AiO would struggle to keep up.
 

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
But if you use that number along with the number found for the P's with the E's off you will get an outrageously high total score. As the E's join in with the P's in greater number their IPC decreases in both versions of Cinebench. I've noticed this for years. I'm thinking it might have something to do with the shared L3 cache or something? That's what I mean when I write benching the E's is hard because they are slipperly and refuse to be nailed down.
E-cores have shared L2 which makes it bandwidth starved when many E-cores need bandwidth. Skymont does double that L2 bandwidth vs Gracemont. And you are right that 4-core cluster have only same amount of L3 bandwidth that single P-core so L3-dependent workloads may become L3 bandwidth starved.
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
But if you use that number along with the number found for the P's with the E's off you will get an outrageously high total score. As the E's join in with the P's in greater number their IPC decreases in both versions of Cinebench. I've noticed this for years. I'm thinking it might have something to do with the shared L3 cache or something? That's what I mean when I write benching the E's is hard because they are slipperly and refuse to be nailed down.
Yea there's probably an impact due to that. They said 32/72% number in ST becomes 32%/55% in MT.
E-cores have shared L2 which makes it bandwidth starved when many E-cores need bandwidth. Skymont does double that L2 bandwidth vs Gracemont. And you are right that 4-core cluster have only same amount of L3 bandwidth that single P-core so L3-dependent workloads may become L3 bandwidth starved.
There's also being able to do L1-L1 transfers, which will improve performance in MT scenarios and scaling.

We don't know for sure how it'll perform in wide range of workloads until we see Arrowlake in hands of people.

Numbers for Lunarlake and Arrowlake shows that lacking a high performance L3 cache is enough to starve Skymont to a point where it barely outperforms the predecessor. That also means Skymont is high performing enough that such things become a serious bottleneck. A 4.6GHz Raptor Cove class cores would indeed be starved.
 
Last edited:
  • Like
Reactions: igor_kavinski

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
E-cores have shared L2 which makes it bandwidth starved when many E-cores need bandwidth. Skymont does double that L2 bandwidth vs Gracemont. And you are right that 4-core cluster have only same amount of L3 bandwidth that single P-core so L3-dependent workloads may become L3 bandwidth starved.
@Hulk The above theory would be easy to test. Change the ring frequency and see if performance in MT workload changes.

I would also change memory speeds, since it would show if it's memory bandwidth starved. I don't know how the CB2024 behaves. I know earlier versions basically didn't care about memory speeds after a certain point. It behaved very much like SpecInt_1T. Since it had a nice mix of Integer and FP, it was an easy test for what people like to erroneously call "IPC".

4 core cluster means the ring also has to transfer data from the memory controller and that's being shared.

In a different topic, I would have said forget it to P cores and have all Skymont cores for Wildcat Lake. And if it had some of the optimizations that Lunarlake had such as the SLC then it would have been a great low-cost laptop chip and completely kill WoA.
 

AMDK11

Senior member
Jul 15, 2019
473
407
136
Since the P cores are similar architectures we'll start with the +9% for Lion Cove over Raptor Cove, using the non HT Raptor 21.2 points/GHz and increase it to 23.1.

LionCove is not even roughly the same microarchitecture as RaptorCove despite a modest average 9% IPC increase. LionCove loses a lot in the overall construction of ArrowLake.

The execution engine in LionCove has been thoroughly rebuilt and now closely resembles Zen5 and Skymont.

I hope to see a detailed analysis of LionCove with ArrowLake and an IPC test with HTT disabled in Raptor.
 
Last edited:
  • Like
Reactions: igor_kavinski

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
I don't know, Arrow Lake is not that exciting to me, but to be fair I moved from desktop to laptop camp a long time ago.
LNL is exciting although not particularly powerful.
To me Panther Lake-P looks the most exciting next thing, although not that keen on 3 different core clusters, but whatever.
That 12 Xe3 IGP looks particularly appealing, although BW will be a problem unless they put some SLC inside.

@igor_kavinski: LPDDR6 should start at 10.667Gbps and that's only 25% more than what LNL has. And there is still the question, If PTL will use It. I vote for getting rid of NPU and putting SLC instead, it would help BW and save on power.

edit: I take It back, LPDDR6 10667 has 28.5GBps effective BW, while 8500 only has 17Gbps, so 67.5% increase.
That would be enough, but still at least a 16MB SLC wouldn't hurt.;)
 
Last edited: