Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 659 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
846
799
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:

naukkis

Golden Member
Jun 5, 2002
1,020
853
136

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?
Nope it doesn't happen now Only 200Mhz drop before it was severe drop
Screenshot_20241104-235750.png
 

Josh128

Golden Member
Oct 14, 2022
1,319
1,986
106
Yep this is accurate. My 7700X does it in 182 seconds.
The chart is not accurate for Zen 5. I got 139 seconds on the 9900X and checked with some 9950X owners on the Zen 5 thread, who also get similar to me or faster. Something is off with that chart, with Zen 5 at least, as its over 10% slower than what we are posting. Im running a mild CO/+50MHz boost override but thats all. Dolphin uses at most a couple of cores/threads. I remember when they released the multithreading capable version of it and it significantly improved performance, and this benchmark is basically just having the emulator run some predetermined renders, so it should be the same, so there shouldnt be much difference between any of the Zen 5 SKUs.

There is an 265K Arrow Lake owner at WCCFTECH currently assembling his build, he said he would run the bench when he gets it done. Im curious to see if their ARL results are off as well.
 

MS_AT

Senior member
Jul 15, 2024
869
1,763
96
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?
The problem wasn't strictly that the clocks dropped, but the throttling was based on license scheme. In other words, using any AVX512 "heavy" instruction forced lower clocks for many cycles even if it was only one instruction per 1k instructions. Plus transitions to AVX512 mode was time consuming. The consequence of that was that unless AVX512 instruction were making up most of the program it was better to avoid them altogether. But these problems were mostly addressed with IceLake. And were never present in Zen4/5 to begin with.
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?
Clocks only dropped on 14nm Intel parts, because AVX-512 was too much for the 28 core 14nm CPU. Had they waited a little longer for future processes, it would have been fine. Indeed that is the case with Icelake.

Intel went from 128-bit vectors in Nehalem at 45nm to 256-bits in Sandy Bridge with 32nm, added FMA in 22nm Haswell, and 512-bits in 14nm Skylake.
 

Jan Olšan

Senior member
Jan 12, 2017
574
1,131
136
Sure :)

9950X PBO DDR5 6400 with AVX512 enabled in BIOS (2703 / 28719): https://browser.geekbench.com/v5/cpu/23027280
View attachment 110944

9950X PBO DDR5 6400 with AVX512 disabled in BIOS (2458 / 28569): https://browser.geekbench.com/v5/cpu/23027302
View attachment 110947

So, AVX512 are only implemented in AES-XTS ST, but not in MT
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.
 

poke01

Diamond Member
Mar 8, 2022
4,202
5,551
106
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.
Will Strix Halo change that or any CPU that’s got gobs of memory bandwidth?
 
  • Like
Reactions: Tlh97 and Joe NYC

naukkis

Golden Member
Jun 5, 2002
1,020
853
136
GB6's "MT" is so far from "all cores" it shouldn't cause bottlenecks to this degree in relation to ST.

This is GB5 and it just launches multiple copies of jobs to run parallel. It doesn't scale 100% even without AVX512 optimizations so less scaling with better single-thread optimizations is pretty much inevitable. And that's how unrealistic that kind of benchmarks are - with GB6 kind of execution code optimization should actually make MT scaling better not worse.
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
It's going to improve final clocks are 5.4 Ghz for ST and coupled with 14% IPC over RWC we should have around 20% more ST Perf
 
  • Like
Reactions: Tlh97 and sgs_x86

sgs_x86

Junior Member
Dec 20, 2020
17
26
91
It's going to improve final clocks are 5.4 Ghz for ST and coupled with 14% IPC over RWC we should have around 20% more ST Perf
It will be less than 14% ipc improvement over RWC. The LNC core in Lunar Lake has the advantage of on-die memory controller. The LNC core in Arrow Lake does not have on-die imc. So the ipc gain over RWC should be between 9% and 14%.
 

511

Diamond Member
Jul 12, 2024
4,523
4,144
106
It will be less than 14% ipc improvement over RWC. The LNC core in Lunar Lake has the advantage of on-die memory controller. The LNC core in Arrow Lake does not have on-die imc. So the ipc gain over RWC should be between 9% and 14%.
Yes but RWC regressed vs RPC so it will definitely be higher than 9% lol
 
  • Like
Reactions: Tlh97 and sgs_x86

OneEng2

Senior member
Sep 19, 2022
840
1,105
106
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.
I am wondering now (based on the MT memory performance of Arrow Lake in CB2024) if we are seeing more applications that become memory bandwidth limited vs CPU limited as CPU's, compilers, and applications become more and more MT friendly. You have a CPU pumping through more data than the memory subsystem can provide so adding more CPU's or better multiple instruction & data just doesn't matter.