Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Page 782 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Tigerick

Senior member
Apr 1, 2022
941
857
106
Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

Intel Raptor Lake UIntel Wildcat Lake 15W?Intel Lunar LakeIntel Panther Lake 4+4+4
Launch DateQ1-2024Q2-2026Q3-2024Q1-2026
ModelIntel 150UIntel Core 7Core Ultra 7 268VCore Ultra 7 365
Dies2223
NodeIntel 7 + ?Intel 18-A + TSMC N6TSMC N3B + N6Intel 18-A + Intel 3 + TSMC N6
CPU2 P-core + 8 E-cores2 P-core + 4 LP E-cores4 P-core + 4 LP E-cores4 P-core + 4 LP E-cores
Threads12688
Max Clock5.4 GHz?5 GHz4.8 GHz
L3 Cache12 MB12 MB12 MB
TDP15 - 55 W15 W ?17 - 37 W25 - 55 W
Memory128-bit LPDDR5-520064-bit LPDDR5128-bit LPDDR5x-8533128-bit LPDDR5x-7467
Size96 GB32 GB128 GB
Bandwidth136 GB/s
GPUIntel GraphicsIntel GraphicsArc 140VIntel Graphics
RTNoNoYESYES
EU / Xe96 EU2 Xe8 Xe4 Xe
Max Clock1.3 GHz?2 GHz2.5 GHz
NPUGNA 3.018 TOPS48 TOPS49 TOPS






PPT1.jpg
PPT2.jpg
PPT3.jpg



As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.



LNL-MX.png
 

Attachments

  • PantherLake.png
    PantherLake.png
    283.5 KB · Views: 24,042
  • LNL.png
    LNL.png
    881.8 KB · Views: 25,531
  • INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    INTEL-CORE-100-ULTRA-METEOR-LAKE-OFFCIAL-SLIDE-2.jpg
    181.4 KB · Views: 72,439
  • Clockspeed.png
    Clockspeed.png
    611.8 KB · Views: 72,325
Last edited:

511

Diamond Member
Jul 12, 2024
5,375
4,782
106
Well if you have a leading edge fab business Iike Intel and have outsourced 30% of your wafers you will bleed money AMD gave up on their Fab biz and that was a good decision for them.
Not to mention if your prior nodes are very expensive and they are the majority of volume they just have problems lined up.
 
Last edited:

MS_AT

Senior member
Jul 15, 2024
929
1,848
96
I have been thinking that code compilation might be one of those tasks as you have pointed out. I have some non-trivial C++ projects as well, but honestly, even those compile pretty quickly .... even when I force a "Build All".
By build all, do you mean you do a clean rebuild or just build everything? As most of sane build systems are leaning heavily on the incremental builds ensuring that you don't rebuild anything that has not changed so in best case scenario only the cpp file that was modified;) It has then to be then linked once again but depending on the size of the project and linker you use might give different meaning to "pretty quickly";)

As I said it's very project specific so it's hard to compare unless we put here much more details or refer to some sort of open source project that can be used as a benchmark.

In general the compilation scales pretty well with number of cores until it doesn't;) and I don't want to take the thread off-topic by discussing everything that can slow you down, what can be done to optimize the build time, why some of these tricks cannot be universally applied etc.
 

OneEng2

Senior member
Sep 19, 2022
978
1,187
106
The reason developers should get the great machines is they run in debug mode. Optimization takes a back seat. If it runs ok on an above average machine, it should run fine on a lesser machine in release build. Running a profiler is not super taxing but if you're doing it over and over those minutes or even few seconds add up.

I probably wouldn't notice a lesser machine for every day compilation. Even incremental builds on large projects wouldn't be horrible. I'd rather spend the extra few bucks for a good processor so when I evaluate a large project or do updates, it takes less time.

The simple truth is. You can dumb down a faster machine (limit threads and such) but you can't smart up a slower one. A developer machine should at least meet the best metrics a program is being designed for and probably exceed it for good measure.
While there is some wisdom to making developers use an average machine (to avoid the "it works find on my machine" syndrome ;) ), the loss of productivity will always drive managers to purchase the best machines (laughably today that means a laptop) for developers to compile on.
By build all, do you mean you do a clean rebuild or just build everything? As most of sane build systems are leaning heavily on the incremental builds ensuring that you don't rebuild anything that has not changed so in best case scenario only the cpp file that was modified;) It has then to be then linked once again but depending on the size of the project and linker you use might give different meaning to "pretty quickly";)

As I said it's very project specific so it's hard to compare unless we put here much more details or refer to some sort of open source project that can be used as a benchmark.

In general the compilation scales pretty well with number of cores until it doesn't;) and I don't want to take the thread off-topic by discussing everything that can slow you down, what can be done to optimize the build time, why some of these tricks cannot be universally applied etc.
Yea, I meant everything regardless of if it has been touched or not. Generally I only do this on the build machine for a final release step, and this is done using the command line interface and a build script vs the IDE.

I don't think that it is off topic though. Even this use case runs into scalability issues past a certain number of cores. My thought is that the IO becomes the bottleneck in and out of the disk system. Again, a workstation would likely be a better option than a high core count desktop.

The question I am wondering about is are there ENOUGH use cases where a 52 core desktop would be worth the silicon to the OEM and the price tag to the user where the use case would not drive the user to a workstation instead?
 

511

Diamond Member
Jul 12, 2024
5,375
4,782
106
Except for prosumer/Development use no one should need more than 8 HT Cores or a 6+8 Config in Intel's case.
 

reb0rn

Senior member
Dec 31, 2009
320
120
116
Yea, I am a real person.

What do you need more than 16 cores for? I am asking a real question.
Any task, more program at same time, etc that can use 16 cores will use 64 and finish the process way faster, if a code can scale to 16 it can scale to 64 threads or more
 

reb0rn

Senior member
Dec 31, 2009
320
120
116
Everything other than games I have run that use 16 threads can scale to more is it compiling, some calculations, crypto, encryption, encoding.... there is maybe some that need more optimization but so far in my limited use i seen none
 

Schmide

Diamond Member
Mar 7, 2002
5,788
1,092
126
While there is some wisdom to making developers use an average machine (to avoid the "it works find on my machine" syndrome ;) ), the loss of productivity will always drive managers to purchase the best machines (laughably today that means a laptop) for developers to compile on.

Another thing that justifies above average machines in the developers hands.

Emulation, or moreover platform replication. There are projects were you will have to run your own server, database, or other asset often in an unoptimized state. You will design it, test it, break it, reload it over and over all on one machine.

Though now that I think of it. Developers just need two machines. (Recurse till all the machines are mine)
 
  • Haha
Reactions: OneEng2

Hitman928

Diamond Member
Apr 15, 2012
6,753
12,492
136
Everything other than games I have run that use 16 threads can scale to more is it compiling, some calculations, crypto, encryption, encoding.... there is maybe some that need more optimization but so far in my limited use i seen none

Many compilations won’t scale that high. Video encoding won’t. I won’t say all because I haven’t tested them all, but many encryption algorithms won’t scale like that either.
 

OneEng2

Senior member
Sep 19, 2022
978
1,187
106
Many compilations won’t scale that high. Video encoding won’t. I won’t say all because I haven’t tested them all, but many encryption algorithms won’t scale like that either.
Yea, I don't know the actual number of real world applications that do scale to higher than 16c/32t, but my gut feeling is that most of them that DO are likely good candidates for a workstation vs high core desktop.
 

reb0rn

Senior member
Dec 31, 2009
320
120
116
Why would I pay workstation price if I can get same or almost same for 40%

@Hitman928 Maybe some do not scale but my user case is not limited to one app, my 9 PC are heavy loaded and if the price is right I would rather have 5 PC with 64 core at decent price
more so most multithreaded app just need minor tweak to scale other could be limited by ram or nvme that not the same
 

OneEng2

Senior member
Sep 19, 2022
978
1,187
106
Why would I pay workstation price if I can get same or almost same for 40%
I am speculating that you can't get "almost the same" in most apps without having the extra bandwidth that the workstation multiple memory channels gives you.

Additionally, I am speculating that for the kinds of applications that DO scale, many will be the kinds of work where the people doing it will be very happy to pay for a real workstation for the added productivity.

We will see next year. If Intel launches a 52 core part in H1 2026, we will see if it sells.... and what price it sells at ...... and how well practical applications scale.
 

reb0rn

Senior member
Dec 31, 2009
320
120
116
It will sell as intel need to make it work so price will be very competitive, its other unknown is how well new process will be, I am mostly interested perf/watt in multithreaded use + avx10

We know we lost any hope that any new intel node just do not compute and have a lots issue, they still have dozen tech that are lead edge and they sure need some luck and to get their fab in order
 

coercitiv

Diamond Member
Jan 24, 2014
7,465
17,829
136
Meanwhile Intel shaved $100 from the price of Ultra 7:

This is what happens when you have extra cores but not the consistent ST performance uplift that users were expecting.
 

Thibsie

Golden Member
Apr 25, 2017
1,175
1,383
136
It will sell as intel need to make it work so price will be very competitive, its other unknown is how well new process will be, I am mostly interested perf/watt in multithreaded use + avx10

Intel is bleeding cash, we'll see if competitive pricing is enough.
 

coercitiv

Diamond Member
Jan 24, 2014
7,465
17,829
136
At this moment, do we have a reliable / fresh source of information on whether the MC is on the compute tiles or the SoC tile? I remember a while ago there was talk about it moving to the compute tile, but in the light of this dual compute tile SKU I find it hard to believe it. Quad channel RAM and distributed memory controller sounds like a very complex and expensive solution for a niche consumer product. Makes very little sense to me, unless this is meant for HEDT / workstation and not consumer.
 

eek2121

Diamond Member
Aug 2, 2005
3,472
5,147
136
Didn't work for AMD. ST performance was more important to more people.
It absolutely did work for AMD. Ryzen trailed in single core performance for Zen, Zen+, and Zen 2, while leading in core counts. Zen only became a single core beast with Zen 3 and X3D.
The same will apply to this 52c NVL-S, when compared to an optimized design it will sacrifice gaming perf for productivity perf. Mem controller stays on SoC tile for a start. The tiles are identical 8+16, in order for a dual tile to perform well in gaming it would need to be exclusive P tile and exclusive E tile. This would also increase MT perf since resulting core count would be something like 12P+40E due to distribution on somewhat identically sized tiles. The obvious problem with asymmetrical tiles would be design cost (financial, manpower, time to market). Ironically AMD is in a better position to execute such a setup with a 12+24 chip but I really doubt they'll do it until Intel has something on the shelves that challenges their 3D cache setup.
AMD’s memory controller is also on the SoC (IO die). Intel is working on stacked cache. Just because you add more cores doesn’t mean single core performance has to suffer. AMD could drop a 64-96 core Threadripper part that hits 5.7ghz if they wanted. They don’t due to targeting the pro/workstation market. We do get 5.5ghz parts, however.