Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 539 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
846
799
106
Wildcat Lake (WCL) Preliminary Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing ADL-N. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q2/Computex 2026. In case people don't remember AlderLake-N, I have created a table below to compare the detail specs of ADL-N and WCL. Just for fun, I am throwing LNL and upcoming Mediatek D9500 SoC.

Intel Alder Lake - NIntel Wildcat LakeIntel Lunar LakeMediatek D9500
Launch DateQ1-2023Q2-2026 ?Q3-2024Q3-2025
ModelIntel N300?Core Ultra 7 268VDimensity 9500 5G
Dies2221
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6TSMC N3P
CPU8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-coresC1 1+3+4
Threads8688
Max Clock3.8 GHz?5 GHz
L3 Cache6 MB?12 MB
TDP7 WFanless ?17 WFanless
Memory64-bit LPDDR5-480064-bit LPDDR5-6800 ?128-bit LPDDR5X-853364-bit LPDDR5X-10667
Size16 GB?32 GB24 GB ?
Bandwidth~ 55 GB/s136 GB/s85.6 GB/s
GPUUHD GraphicsArc 140VG1 Ultra
EU / Xe32 EU2 Xe8 Xe12
Max Clock1.25 GHz2 GHz
NPUNA18 TOPS48 TOPS100 TOPS ?






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,028
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,522
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,430
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,318
Last edited:
Jul 27, 2020
28,023
19,126
146
Just keep in mind all of that Royal Core IP up to disbandment is still there for usage in future projects.
Except the key personnel from that team have left and the new team may not understand that IP well enough to be able to integrate any unique features into a new design.
 

jdubs03

Golden Member
Oct 1, 2013
1,280
902
136
Except the key personnel from that team have left and the new team may not understand that IP well enough to be able to integrate any unique features into a new design.
I would imagine there is plenty of documentation. But it could definitely take longer to implement with this team having to getting up to speed with those features.
 

cannedlake240

Senior member
Jul 4, 2024
247
138
76
Something interesting just popped-up on X/Twitter:


@InstLatX64 mentions the following:

#CougarCove (Panther Lake)
#PantherCove (?)
#PantherCoveX (Diamond Rapids)
#RazerLake

But what is the second entry Panther Cove? It appears Panther Cove is still in development and is a part of Nova Lake (instead of Coyote Cove). Even if Coyote Cove is real, it may just be a low-end/minor refresh of Cougar Cove on the next iteration of 18A (like ARL-U).

The post says PNC has big IPC improvements, inclusion of APX/AVX10, etc. And the timeline fits perfectly with Nova Lake.
Rumor mill has it that Intel doesn't want the public to know that they didn't to pair Panther lake with Panther Cove, so for Nova they'll market it as Coyote. And 'big IPC increase' is probably relative to the minor refresh/tick that is Cougar cove
 
  • Like
Reactions: SiliconFly

cannedlake240

Senior member
Jul 4, 2024
247
138
76
Once you've lost your mind and think a 24 wide design isn't the stupidest thing ever, you better figure out a way to keep it doing something halfway useful when the magic compiler you're depending on fails to materialize (like they always do)
It wasn't a vliw design lol. It was the complete opposite of that, and was supposed to be the pinnacle of OoOe design based on x86S
 

Hulk

Diamond Member
Oct 9, 1999
5,138
3,727
136
Some light Monday night reading for my fellow CPU enthusiasts. I'm no expert and most everybody around here is more knowledgable/smarter than I am but here goes...

Here is a quick investigation of Geekbench 6 using my 14900K.
First, my 14900K running 5.5/4.4 without HT scores 3050/18000.
Isolating an E core for the ST test shows it scoring 1607. Just a little better than half the performance of Raptor Cove.

The throughput, or IPC specific to Geekbench 6 for ST as follows:
Gracemont gets 65% the IPC of Raptor Cove or you could look at it as Raptor Cove having 53% better IP than Gracemont.

Raptor Cove at 2.8GHz scores about the same in GB 6 as Gracemont at 4.4 (as expected based on above data).

GB 6 MT does not scale linearly with core count. It's more like a cubic relationship as follows. Graphs for P cores alone and E cores alone (no HT).

1727744022587.png

1727744097048.png

Some generalizations:
Core scaling is pretty good, generally over 80% for 2 or 3 cores. 100% meaning linear scaling of performance for adding cores.

By the time you get to 6-8 cores you are only getting about half of the performance you would expect with linear scaling.

The scaling drops faster with stronger cores. The E's drop to 58% of full linear scaling at 8 cores while the P's get (drop) to 58% performance at 6 cores. What this means is you go from 7 to 8 cores and you would hope for, with linear/perfect scaling, 8/7 increase in performance but you get 58% of that.

Data supporting above conclusions.
1727744481633.png

1727744525085.png

Strangly, after 13 or so cores GB 6 hits an asymtote and scaling starts to get better. I could investigate better by subbing in P cores at 2.8GHz to extend the graph but got tired of the computer restarts.

Hyperthreading does very little once you get past 14 cores or so simply because they are weak and you are only getting 45% of expected performance.

My opinions
I like GB 6 more now that I know how it is behaving. It's a pretty good simulation of a lot of software today in that I don't generally "feel" much performance difference when going above 6-8 cores unless I'm doing some serious multitasking and/or rendering/encoding on well-threaded applications.

I think it's a pretty good indicator of performance for thin and light laptops where you are looking for snappy performance but not so interested in rendering and or encoding for hours on end.
 

Henry swagger

Senior member
Feb 9, 2022
511
313
106
Something interesting just popped-up on X/Twitter:


@InstLatX64 mentions the following:

#CougarCove (Panther Lake)
#PantherCove (?)
#PantherCoveX (Diamond Rapids)
#RazerLake

But what is the second entry Panther Cove? It appears Panther Cove is still in development and is a part of Nova Lake (instead of Coyote Cove). Even if Coyote Cove is real, it may just be a low-end/minor refresh of Cougar Cove on the next iteration of 18A (like ARL-U).

The post says PNC has big IPC improvements, inclusion of APX/AVX10, etc. And the timeline fits perfectly with Nova Lake.
Panther cove will be the new ground up p core architecture with stephen 's influence
 
  • Like
Reactions: SiliconFly

lopri

Elite Member
Jul 27, 2002
13,314
690
126
Some light Monday night reading for my fellow CPU enthusiasts. I'm no expert and most everybody around here is more knowledgable/smarter than I am but here goes...

Here is a quick investigation of Geekbench 6 using my 14900K.
First, my 14900K running 5.5/4.4 without HT scores 3050/18000.
Isolating an E core for the ST test shows it scoring 1607. Just a little better than half the performance of Raptor Cove.

The throughput, or IPC specific to Geekbench 6 for ST as follows:
Gracemont gets 65% the IPC of Raptor Cove or you could look at it as Raptor Cove having 53% better IP than Gracemont.

Raptor Cove at 2.8GHz scores about the same in GB 6 as Gracemont at 4.4 (as expected based on above data).

GB 6 MT does not scale linearly with core count. It's more like a cubic relationship as follows. Graphs for P cores alone and E cores alone (no HT).

View attachment 108485

View attachment 108486

Some generalizations:
Core scaling is pretty good, generally over 80% for 2 or 3 cores. 100% meaning linear scaling of performance for adding cores.

By the time you get to 6-8 cores you are only getting about half of the performance you would expect with linear scaling.

The scaling drops faster with stronger cores. The E's drop to 58% of full linear scaling at 8 cores while the P's get (drop) to 58% performance at 6 cores. What this means is you go from 7 to 8 cores and you would hope for, with linear/perfect scaling, 8/7 increase in performance but you get 58% of that.

Data supporting above conclusions.
View attachment 108487

View attachment 108488

Strangly, after 13 or so cores GB 6 hits an asymtote and scaling starts to get better. I could investigate better by subbing in P cores at 2.8GHz to extend the graph but got tired of the computer restarts.

Hyperthreading does very little once you get past 14 cores or so simply because they are weak and you are only getting 45% of expected performance.

My opinions
I like GB 6 more now that I know how it is behaving. It's a pretty good simulation of a lot of software today in that I don't generally "feel" much performance difference when going above 6-8 cores unless I'm doing some serious multitasking and/or rendering/encoding on well-threaded applications.

I think it's a pretty good indicator of performance for thin and light laptops where you are looking for snappy performance but not so interested in rendering and or encoding for hours on end.
That is a fantastic work you've done there. My understanding is that Geekbench 6 multicore test try to solve a single problem using multiple cores (so to speak) instead of doing "embarrassingly parallel" workload. I think I like this way of testing, too. Not that it is a be all end all benchmark but there are other benchmarks that can showcase different multicore workloads. (e.g. Cinebench) It is good to have multiple benchmarks that can be used for different purposes.
 

DrMrLordX

Lifer
Apr 27, 2000
22,902
12,971
136
This is obvious: users should target the applications they want to run. No other benchmark will give them the exact answer they need. And this applies to all benchmarks beyond Geekbench. Cinebench is utterly non-representative in the way it scales, except for users who do rendering. Running games at very low resolutions is also completely stupid. Does this mean they are useless?

Cinebench in particular was part of a suite of fp benchmarks that used to be a good indicator of how a CPU might perform in other fp tasks. The fact that it was embarassingly-parallel ensured that it would use all of a CPU's available resources, at least to the extent that the particular codebase could manage. Sadly, it has long since failed in this capacity since it rarely taxes all of a CPU's fp resources by not utilizing the most advanced ISA extensions possible on every CPU. Other rendering benchmarks still fulfill this task relatively well.

Of course, one of the reasons why people became obsessed with fp performance dates back to the Pentium II vs k6 days where the k6 utterly failed as a gaming CPU thanks to its non-pipelined fp unit that performed rather poorly except in games that supported 3DNow! (on k6 generations that actually had 3DNow!). And even with that ISA extension, performance for the k62 wasn't that great compared to Pentium competitors. It was taken as gospel that CPUs needed fp performance because, hey, look at how bad the k6/k62 were, can't let that happen anymore. The idea that fp performance is universal between applications and universally useful for home users that want to do at least some gaming is perhaps dated.

We've now reached the point that, yes, the most useful benchmarks are on an application-by-application basis. And that brings us back to Geekbench which tries to bundle a bunch of common applications/algorithms, doesn't really let us see the underlying codebase, and then produces an aggregate score (and a lot of subscores) and asks us to take these scores seriously.

That being said, what CPU benchmark do you know launches multiple completely unrelated benchmarks to mimic what we users do (running a game with browser running, etc.)?

Some old Anandtech benchmarks attempted to replicate these scenarios. Light/medium/heavy multitasking. They did it for storage reviews too. Not many reviewers have taken up the mantle here.

My point was you blaming Geekbench for not representing how *you* use your computer applies to all benchmarks.

I blame Geekbench for being superfluous. It may not represent how anyone uses their computer. If you really want to run a benchmark that captures what the statistically-average PC does all day, it's probably going to be an electron benchmark aimed at corpo laptops. And none of us would use that crap on our home PCs.

OMG! Royal Core V2 had SMT4! So Jim Keller saw potential in going with SMT4!

Wait are you saying that Jim Keller was Richie Rich all along?
 

Doug S

Diamond Member
Feb 8, 2020
3,575
6,312
136
It wasn't a vliw design lol. It was the complete opposite of that, and was supposed to be the pinnacle of OoOe design based on x86S

It would still require a magic compiler because there's not nearly enough parallelism in typical code to support anything like 24 wide operation on a single thread. You'd have to handle 4 or 5 branches per cycle, which pretty much implies high accuracy static prediction at compilation time.
 

DavidC1

Golden Member
Dec 29, 2023
1,833
2,960
96
The scaling drops faster with stronger cores. The E's drop to 58% of full linear scaling at 8 cores while the P's get (drop) to 58% performance at 6 cores. What this means is you go from 7 to 8 cores and you would hope for, with linear/perfect scaling, 8/7 increase in performance but you get 58% of that.
Good work. Interesting to see.

The scaling better on E cores may have to do with the fact that the P cores are faster thus are more memory subsystem bound. Assuming 50% per clock advantage, it seems to line up with that in the graph too.

2 P cores = 3 E core scaling
8 P cores = 12 E core scaling
 

Magio

Member
May 13, 2024
170
201
76

I don't know if this was posted but if accurate, Skymont is legitimately the most exciting x86 core since at least Zen 1. Scales to lower power than any x86 core before it while delivering higher sub-1W performance than any x86 core ever has by a long shot.

Will also be interesting to see what it delivers in its normal form (not LP-E). Intel has a gem on their hands if they can continue to improve on this core design.