Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 659 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
942
858
106
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15W?Intel Lunar LakeIntel Panther Lake 4+0+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz?5 GHz4.8 GHz
L3 Cache12 MB12 MB12 MB
TDP15 - 55 W15 W ?17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB32 GB128 GB
Bandwidth136 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz?2 GHz2.5 GHz
NPUGNA 3.018 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,044
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,531
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,440
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,327
Last edited:

naukkis

Golden Member
Jun 5, 2002
1,030
854
136

511

Diamond Member
Jul 12, 2024
5,490
4,896
106
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?
Nope it doesn't happen now Only 200Mhz drop before it was severe drop
Screenshot_20241104-235750.png
 

Josh128

Banned
Oct 14, 2022
1,542
2,295
106
Yep this is accurate. My 7700X does it in 182 seconds.
The chart is not accurate for Zen 5. I got 139 seconds on the 9900X and checked with some 9950X owners on the Zen 5 thread, who also get similar to me or faster. Something is off with that chart, with Zen 5 at least, as its over 10% slower than what we are posting. Im running a mild CO/+50MHz boost override but thats all. Dolphin uses at most a couple of cores/threads. I remember when they released the multithreading capable version of it and it significantly improved performance, and this benchmark is basically just having the emulator run some predetermined renders, so it should be the same, so there shouldnt be much difference between any of the Zen 5 SKUs.

There is an 265K Arrow Lake owner at WCCFTECH currently assembling his build, he said he would run the bench when he gets it done. Im curious to see if their ARL results are off as well.
 

MS_AT

Senior member
Jul 15, 2024
943
1,868
96
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?
The problem wasn't strictly that the clocks dropped, but the throttling was based on license scheme. In other words, using any AVX512 "heavy" instruction forced lower clocks for many cycles even if it was only one instruction per 1k instructions. Plus transitions to AVX512 mode was time consuming. The consequence of that was that unless AVX512 instruction were making up most of the program it was better to avoid them altogether. But these problems were mostly addressed with IceLake. And were never present in Zen4/5 to begin with.
 

DavidC1

Platinum Member
Dec 29, 2023
2,187
3,340
106
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?
Clocks only dropped on 14nm Intel parts, because AVX-512 was too much for the 28 core 14nm CPU. Had they waited a little longer for future processes, it would have been fine. Indeed that is the case with Icelake.

Intel went from 128-bit vectors in Nehalem at 45nm to 256-bits in Sandy Bridge with 32nm, added FMA in 22nm Haswell, and 512-bits in 14nm Skylake.
 

Jan Olšan

Senior member
Jan 12, 2017
626
1,262
136
Sure :)

9950X PBO DDR5 6400 with AVX512 enabled in BIOS (2703 / 28719): https://browser.geekbench.com/v5/cpu/23027280
View attachment 110944

9950X PBO DDR5 6400 with AVX512 disabled in BIOS (2458 / 28569): https://browser.geekbench.com/v5/cpu/23027302
View attachment 110947

So, AVX512 are only implemented in AES-XTS ST, but not in MT
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.
 

poke01

Diamond Member
Mar 8, 2022
4,884
6,224
106
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.
Will Strix Halo change that or any CPU that’s got gobs of memory bandwidth?
 
  • Like
Reactions: Tlh97 and Joe NYC

naukkis

Golden Member
Jun 5, 2002
1,030
854
136
GB6's "MT" is so far from "all cores" it shouldn't cause bottlenecks to this degree in relation to ST.

This is GB5 and it just launches multiple copies of jobs to run parallel. It doesn't scale 100% even without AVX512 optimizations so less scaling with better single-thread optimizations is pretty much inevitable. And that's how unrealistic that kind of benchmarks are - with GB6 kind of execution code optimization should actually make MT scaling better not worse.
 

511

Diamond Member
Jul 12, 2024
5,490
4,896
106
It's going to improve final clocks are 5.4 Ghz for ST and coupled with 14% IPC over RWC we should have around 20% more ST Perf
 
  • Like
Reactions: Tlh97 and sgs_x86

sgs_x86

Junior Member
Dec 20, 2020
17
26
91
It's going to improve final clocks are 5.4 Ghz for ST and coupled with 14% IPC over RWC we should have around 20% more ST Perf
It will be less than 14% ipc improvement over RWC. The LNC core in Lunar Lake has the advantage of on-die memory controller. The LNC core in Arrow Lake does not have on-die imc. So the ipc gain over RWC should be between 9% and 14%.
 

511

Diamond Member
Jul 12, 2024
5,490
4,896
106
It will be less than 14% ipc improvement over RWC. The LNC core in Lunar Lake has the advantage of on-die memory controller. The LNC core in Arrow Lake does not have on-die imc. So the ipc gain over RWC should be between 9% and 14%.
Yes but RWC regressed vs RPC so it will definitely be higher than 9% lol
 
  • Like
Reactions: Tlh97 and sgs_x86

OneEng2

Golden Member
Sep 19, 2022
1,011
1,212
106
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.
I am wondering now (based on the MT memory performance of Arrow Lake in CB2024) if we are seeing more applications that become memory bandwidth limited vs CPU limited as CPU's, compilers, and applications become more and more MT friendly. You have a CPU pumping through more data than the memory subsystem can provide so adding more CPU's or better multiple instruction & data just doesn't matter.