Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

	Intel Alder Lake - N	Intel Wildcat Lake	Intel Lunar Lake	Mediatek D9500
Launch Date	Q1-2023	Q2-2026 ?	Q3-2024	Q3-2025
Model	Intel N300	?	Core Ultra 7 268V	Dimensity 9500 5G
Dies	2	2	2	1
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	TSMC N3P

CPU	8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	C1 1+3+4
Threads	8	6	8	8
Max Clock	3.8 GHz	?	5 GHz
L3 Cache	6 MB	?	12 MB
TDP	7 W	Fanless ?	17 W	Fanless

Memory	64-bit LPDDR5-4800	64-bit LPDDR5-6800 ?	128-bit LPDDR5X-8533	64-bit LPDDR5X-10667
Size	16 GB	?	32 GB	24 GB ?
Bandwidth		~ 55 GB/s	136 GB/s	85.6 GB/s

GPU	UHD Graphics		Arc 140V	G1 Ultra
EU / Xe	32 EU	2 Xe	8 Xe	12
Max Clock	1.25 GHz		2 GHz

NPU	NA	18 TOPS	48 TOPS	100 TOPS ?

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

511 · Nov 4, 2024

MarkPost said:
Sure

9950X PBO DDR5 6400 with AVX512 enabled in BIOS (2703 / 28719): https://browser.geekbench.com/v5/cpu/23027280
View attachment 110944

9950X PBO DDR5 6400 with AVX512 disabled in BIOS (2458 / 28569): https://browser.geekbench.com/v5/cpu/23027302
View attachment 110947

So, AVX512 are only implemented in AES-XTS ST, but not in MT

LOL Quite the difference

igor_kavinski · Nov 4, 2024

MarkPost said:
So, AVX512 are only implemented in AES-XTS ST, but not in MT

Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?

igor_kavinski · Nov 4, 2024

csbin said:
Z890 CUDIMM

https://twitter.com/x/status/1853371337173274624

NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO!!!!

Joe NYC · Nov 4, 2024

csbin said:
Z890 CUDIMM

https://twitter.com/x/status/1853371337173274624

Then, 10,000 has to be it.

Price hikes and speeds increases of DRAM kits will continue until morale improves.

naukkis · Nov 4, 2024

MarkPost said:
Sure

9950X PBO DDR5 6400 with AVX512 enabled in BIOS (2703 / 28719): https://browser.geekbench.com/v5/cpu/23027280
View attachment 110944

9950X PBO DDR5 6400 with AVX512 disabled in BIOS (2458 / 28569): https://browser.geekbench.com/v5/cpu/23027302
View attachment 110947

So, AVX512 are only implemented in AES-XTS ST, but not in MT

I don't think so. AVX512 just doesn't help as much when execution is bottlenecked elsewhere.

ondma · Nov 4, 2024

lightmanek said:
According to my wife, sometimes smaller is better ...

Good thing for you, huh?

511 · Nov 4, 2024

igor_kavinski said:
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?

Nope it doesn't happen now Only 200Mhz drop before it was severe drop

Benchmarks - Seite 2 - Hardwareluxx

Intels Comeback: Zweimal Xeon 6972P mit 96 Kernen und 12x DDR5-6400 im Test.

www.hardwareluxx.de

moinmoin · Nov 4, 2024

naukkis said:
I don't think so. AVX512 just doesn't help as much when execution is bottlenecked elsewhere.

GB6's "MT" is so far from "all cores" it shouldn't cause bottlenecks to this degree in relation to ST.

Josh128 · Nov 4, 2024

poke01 said:
Yep this is accurate. My 7700X does it in 182 seconds.

The chart is not accurate for Zen 5. I got 139 seconds on the 9900X and checked with some 9950X owners on the Zen 5 thread, who also get similar to me or faster. Something is off with that chart, with Zen 5 at least, as its over 10% slower than what we are posting. Im running a mild CO/+50MHz boost override but thats all. Dolphin uses at most a couple of cores/threads. I remember when they released the multithreading capable version of it and it significantly improved performance, and this benchmark is basically just having the emulator run some predetermined renders, so it should be the same, so there shouldnt be much difference between any of the Zen 5 SKUs.

There is an 265K Arrow Lake owner at WCCFTECH currently assembling his build, he said he would run the bench when he gets it done. Im curious to see if their ARL results are off as well.

Hitman928 · Nov 4, 2024

moinmoin said:
GB6's "MT" is so far from "all cores" it shouldn't cause bottlenecks to this degree in relation to ST.

They are discussing GB5.

MS_AT · Nov 4, 2024

igor_kavinski said:
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?

The problem wasn't strictly that the clocks dropped, but the throttling was based on license scheme. In other words, using any AVX512 "heavy" instruction forced lower clocks for many cycles even if it was only one instruction per 1k instructions. Plus transitions to AVX512 mode was time consuming. The consequence of that was that unless AVX512 instruction were making up most of the program it was better to avoid them altogether. But these problems were mostly addressed with IceLake. And were never present in Zen4/5 to begin with.

igor_kavinski · Nov 4, 2024

MS_AT said:
And were never present in Zen4/5 to begin with.

GB5 predates Zen 4 so it doesn't KNOW that

Geekbench 5 Release Notes

DavidC1 · Nov 4, 2024

igor_kavinski said:
Probably thanks to Intel having thermal issues with their implementation where all-core AVX-512 may have been causing clocks to drop so low that it wasn't worth keeping it on in MT?

Clocks only dropped on 14nm Intel parts, because AVX-512 was too much for the 28 core 14nm CPU. Had they waited a little longer for future processes, it would have been fine. Indeed that is the case with Icelake.

Intel went from 128-bit vectors in Nehalem at 45nm to 256-bits in Sandy Bridge with 32nm, added FMA in 22nm Haswell, and 512-bits in 14nm Skylake.

Jan Olšan · Nov 5, 2024

MarkPost said:
Sure

9950X PBO DDR5 6400 with AVX512 enabled in BIOS (2703 / 28719): https://browser.geekbench.com/v5/cpu/23027280
View attachment 110944

9950X PBO DDR5 6400 with AVX512 disabled in BIOS (2458 / 28569): https://browser.geekbench.com/v5/cpu/23027302
View attachment 110947

So, AVX512 are only implemented in AES-XTS ST, but not in MT

What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.

poke01 · Nov 5, 2024

Jan Olšan said:
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.

Will Strix Halo change that or any CPU that’s got gobs of memory bandwidth?

naukkis · Nov 5, 2024

moinmoin said:
GB6's "MT" is so far from "all cores" it shouldn't cause bottlenecks to this degree in relation to ST.

This is GB5 and it just launches multiple copies of jobs to run parallel. It doesn't scale 100% even without AVX512 optimizations so less scaling with better single-thread optimizations is pretty much inevitable. And that's how unrealistic that kind of benchmarks are - with GB6 kind of execution code optimization should actually make MT scaling better not worse.

poke01 · Nov 5, 2024

kuo

Inside Intelâs Lunar Lake: A Promise That Became a Problem

Intel has recently announced that after Lunar Lake (LNL), it will discontinue integrating DRAM into CPU packaging. While this news hasâ¦

medium.com

adroc_thurston · Nov 5, 2024

poke01 said:
kuo

told ya'.
MoP is $$$ and everyone wants to be cheapo.

511 · Nov 5, 2024

adroc_thurston said:
told ya'.
MoP is $$$ and everyone wants to be cheapo.

Not everyone can squeeze consumer like Apple and people defend it

511 · Nov 5, 2024

https://twitter.com/x/status/1853627612716794170

ARL-H

SiliconFly · Nov 5, 2024

511 · Nov 5, 2024

It's going to improve final clocks are 5.4 Ghz for ST and coupled with 14% IPC over RWC we should have around 20% more ST Perf

sgs_x86 · Nov 5, 2024

511 said:
It's going to improve final clocks are 5.4 Ghz for ST and coupled with 14% IPC over RWC we should have around 20% more ST Perf

It will be less than 14% ipc improvement over RWC. The LNC core in Lunar Lake has the advantage of on-die memory controller. The LNC core in Arrow Lake does not have on-die imc. So the ipc gain over RWC should be between 9% and 14%.

511 · Nov 5, 2024

sgs_x86 said:
It will be less than 14% ipc improvement over RWC. The LNC core in Lunar Lake has the advantage of on-die memory controller. The LNC core in Arrow Lake does not have on-die imc. So the ipc gain over RWC should be between 9% and 14%.

Yes but RWC regressed vs RPC so it will definitely be higher than 9% lol

OneEng2 · Nov 5, 2024

Jan Olšan said:
What likely happens is that ultimately the score (processing speed) caps out when memory bandwidth is exhausted. In single-thread, you do not reach that limit, so you see the AVX-512 boost in the result.
In MT test though, you bump into that ceiling and thus AVX-512 doesn't help - the processor finished the particular code faster, but then it was waiting for the memory, so the end result is the same as with AVX2. The faster execution didn't matter, the score is pretty much only based the on memory bandwidth.

It's the same reason why you can get AVX-512 boost in 1T Cruncher with MT yCruncher showing no or much lower gains.

I am wondering now (based on the MT memory performance of Arrow Lake in CB2024) if we are seeing more applications that become memory bandwidth limited vs CPU limited as CPU's, compilers, and applications become more and more MT friendly. You have a CPU pumping through more data than the memory subsystem can provide so adding more CPU's or better multiple instruction & data just doesn't matter.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Diamond Member

Lifer

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Senior member

Lifer

Golden Member

Senior member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Junior Member

Diamond Member

Senior member