Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Tigerick · Aug 22, 2022

Wildcat Lake (WCL) Specs

Intel Wildcat Lake (WCL) is upcoming mobile SoC replacing Raptor Lake-U. WCL consists of 2 tiles: compute tile and PCD tile. It is true single die consists of CPU, GPU and NPU that is fabbed by 18-A process. Last time I checked, PCD tile is fabbed by TSMC N6 process. They are connected through UCIe, not D2D; a first from Intel. Expecting launching in Q1 2026.

	Intel Raptor Lake U	Intel Wildcat Lake 15W?	Intel Lunar Lake	Intel Panther Lake 4+0+4
Launch Date	Q1-2024	Q2-2026	Q3-2024	Q1-2026
Model	Intel 150U	Intel Core 7	Core Ultra 7 268V	Core Ultra 7 365
Dies	2	2	2	3
Node	Intel 7 + ?	Intel 18-A + TSMC N6	TSMC N3B + N6	Intel 18-A + Intel 3 + TSMC N6

CPU	2 P-core + 8 E-cores	2 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores	4 P-core + 4 LP E-cores
Threads	12	6	8	8
Max Clock	5.4 GHz	?	5 GHz	4.8 GHz
L3 Cache	12 MB		12 MB	12 MB
TDP	15 - 55 W	15 W ?	17 - 37 W	25 - 55 W

Memory	128-bit LPDDR5-5200	64-bit LPDDR5	128-bit LPDDR5x-8533	128-bit LPDDR5x-7467
Size	96 GB		32 GB	128 GB
Bandwidth			136 GB/s

GPU	Intel Graphics	Intel Graphics	Arc 140V	Intel Graphics
RT	No	No	YES	YES
EU / Xe	96 EU	2 Xe	8 Xe	4 Xe
Max Clock	1.3 GHz	?	2 GHz	2.5 GHz

NPU	GNA 3.0	18 TOPS	48 TOPS	49 TOPS

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

511 · Nov 11, 2024

Thanks swan for initiating such a mid release and pat for doubling down on it

LightningZ71 · Nov 11, 2024

DavidC1 said:
Also keep in mind the 288 core Clearwater Forest is 12 dies of 24 cores each, with 24 cores being composed of 6x quad core clusters.

I missed that detail. So, in a certain way, it's similar to a hypothetical 12CCD Epyc processor where each CCD is a 6 core CCX, but the cores are 'mont quads. Assuming that they get their I/O setup right, it should be broadly competitive.

Saylick · Nov 11, 2024

LightningZ71 said:
I missed that detail. So, in a certain way, it's similar to a hypothetical 12CCD Epyc processor where each CCD is a 6 core CCX, but the cores are 'mont quads. Assuming that they get their I/O setup right, it should be broadly competitive.

Not quite as similar to EPYC, I think. CCD-to-CCD communication has to go through the IOD in Zen, while Intel uses a mesh interconnect, so each cluster talks to each other via the on-die network, where latency is based on the number of hops between nodes in the mesh. It was this way for Sapphire Rapids, Emerald Rapids, and Granite Rapids. I don’t see why it would change for what comes next, even if there are more compute tiles.

For P-core server products, there’s one network node per core. I’ll have to double check, but it would not surprise me if for E-core server products, there’s one network node per E-core cluster.

Hulk · Nov 11, 2024

igor_kavinski said:
You could use the replacement normally since Intel says all kinks have been worked out. Could be a test to see how truthful Intel is being or if they are totally clueless about their own product.

What's that quote Einstein had about the definition of insanity?

OneEng2 · Nov 11, 2024

511 said:
How are you calculating per core performance is AVX-512 included in them than yes elso no for pure integer workloads which many are both will be similar integer performance per thread

I agree for VM part but there are customers who disables SMT so it will give them more physical cores to work with
On a sidenote not a single site benchmarks the accelerator in Silicons they are niche but have decent use cases

Zen 5c can operate on 1.4 threads at a time per core. Skymont can operate on 1 thread at a time. If the single core IPC of Zen 5c was exactly equal to Skymont, and they were clocked at the same speed, Zen 5c would still perform 1.4 times faster than Skymont. This would be the worst case in a MT application.

Additionally, if there are any AVX512 executions in the workload, Zen 5c gets another big boost.

That is where I got the 40-60% "guesstimate" or SWAG

.

cannedlake240 said:
Yeah, idk about that... Unless 18A is complete garbage(which is a possibility) 40-60% higher perf seems too optimistic outside niche HPC/AI workloads and Clearwater has more cores. Epyc Zen 4C had 50% higher perf per core... Against 8ch Sierra that had Crestmont and 100mb L3. CLF has Darkmont equipped with more L3 and likely faster 12ch DDR5. Crestmont was slower than Zen 4 in int perf let alone FP, but Skymont already narrowed that gap

Clearwater does have more cores, but each core can only operate on 1 thread while Zen 5c can operate on 1.4 threads at a time in a MT workload. Add in any AVX512 or FP tasks in the workload and it isn't hard to see each Zen 5c core performing 1.5 times as much as each Skymont core.

Someone show me where my math is off here. Seems like lots of people think I am off base (and I might be).

LightningZ71 said:
My biggest concern for Intel's very high core count 'mont server processors has nothing to do with the cores themselves, save if they can have a competitive AVX-512 implementation and remain compact and efficient enough, but is far more about Intel's mesh fabric connecting them all. 288 cores in clusters of 4 is still 72 reservation stations. How will the mesh affect performance for them?

That is a valid concern as well. Feeding 288 cores is a marvel all on its own. I think we are definitely looking at more bandwidth (and socket power) for future DC processors.

poke01 · Nov 11, 2024

Hulk said:
What's that quote Einstein had about the definition of insanity?

No need to be an Intel beta tester. Be freeee

igor_kavinski · Nov 12, 2024

poke01 said:
No need to be an Intel beta tester.

That's actually a great argument in trying to get a free 285K.

Dear Intel,

With my two prior RMA requests X and Y and now a third one, I think I have demonstrated quite consistently that I'm the sort of user who is perfectly suited to testing your CPUs and stressing them in normal workloads without any sort of overclocking involved. I think it would be prudent to let me have the 285K so the respective product teams can learn how actual users work in real life using Intel processors, instead of running Cinebench on a constant loop for X amount of hours and declaring a processor fit for public consumption.

Yours truly,
The "three times successful smasher of Intel CPUs" Hulk

jdubs03 · Nov 12, 2024

igor_kavinski said:
That's actually a great argument in trying to get a free 285K.

Dear Intel,

With my two prior RMA requests X and Y and now a third one, I think I have demonstrated quite consistently that I'm the sort of user who is perfectly suited to testing your CPUs and stressing them in normal workloads without any sort of overclocking involved. I think it would be prudent to let me have the 285K so the respective product teams can learn how actual users work in real life using Intel processors, instead of running Cinebench on a constant loop for X amount of hours and declaring a processor fit for public consumption.

Yours truly,
The "three times successful smasher of Intel CPUs" Hulk

Heh, might work except for maybe that last sentence there.
Put the pressure on Mr. Banner!

511 · Nov 12, 2024

Regards
Gamma Scientist
Dr. Bruce Banner
This is better

Hulk · Nov 12, 2024

igor_kavinski said:
That's actually a great argument in trying to get a free 285K.

Dear Intel,

With my two prior RMA requests X and Y and now a third one, I think I have demonstrated quite consistently that I'm the sort of user who is perfectly suited to testing your CPUs and stressing them in normal workloads without any sort of overclocking involved. I think it would be prudent to let me have the 285K so the respective product teams can learn how actual users work in real life using Intel processors, instead of running Cinebench on a constant loop for X amount of hours and declaring a processor fit for public consumption.

Yours truly,
The "three times successful smasher of Intel CPUs" Hulk

I also cap frequency at 5.5GHz and power at 200W and I still seem to burn them out.

Funny thing is, around 20 years ago I used to do a lot of beta testing and actually wrote a few "how to" books on some of the software I was testing. Even got to go to NAB in Vegas twice and speak as an "expert" on video editing and get paid for the trip. Then YouTube came and wiped out that market.

511 · Nov 12, 2024

Raichus review for ARL is up

https://twitter.com/x/status/1856188360043434305

igor_kavinski · Nov 12, 2024

Hulk said:
I also cap frequency at 5.5GHz and power at 200W and I still seem to burn them out.

Hey, maybe the CPU tries to compensate for the lack of power by overperforming in ST with higher than normal boosts since it isn't allowed to flex its muscles MT-wise? Definitely seems like some sort of boosting algo designed to win benchmarks in the short term.

DavidC1 · Nov 12, 2024

Saylick said:
Not quite as similar to EPYC, I think. CCD-to-CCD communication has to go through the IOD in Zen, while Intel uses a mesh interconnect, so each cluster talks to each other via the on-die network, where latency is based on the number of hops between nodes in the mesh. It was this way for Sapphire Rapids, Emerald Rapids, and Granite Rapids. I don’t see why it would change for what comes next, even if there are more compute tiles.

While it uses the mesh, Clearwater Forest potentially has one big advantage over current server chips, and it's that it can use Foveros Direct to communicate, whereas now it's using EMIB.

Foveros Direct is the most advanced version:

So potentially the connections between the clusters can be much faster and with less power penalties.

adroc_thurston · Nov 12, 2024

DavidC1 said:
While it uses the mesh, Clearwater Forest potentially has one big advantage over current server chips, and it's that it can use Foveros Direct to communicate, whereas now it's using EMIB.

Nope. The bases and the I/O caps are still chained over EMIB.
Hybrid bonding just allows them to put cores in top of cache, same as PVC/MI300/Granite Ridge-X/you-name-it.
Plus you're still dealing with a rather xboxhueg mesh in any case, which is slow.

Saylick · Nov 12, 2024

511 said:
Raichus review for ARL is up

https://twitter.com/x/status/1856188360043434305
View attachment 111484
View attachment 111485

Skymont is a beast. Lion Cove is a total letdown. I am really curious as to how much larger Skymont would be if they scaled it up to hit the same clocks as Lion Cove and gave it similar instruction set, e.g. AVX-512.

DavidC1 · Nov 12, 2024

Saylick said:
Skymont is a beast. Lion Cove is a total letdown. I am really curious as to how much larger Skymont would be if they scaled it up to hit the same clocks as Lion Cove and gave it similar instruction set, e.g. AVX-512.

Better way is to make it wide as possible and keep it at 5GHz or below. Like 50% faster per clock.

Lion Cove and Skymont should also be able to put out few % more if the SoC itself did not suck.

@adroc_thurston That's a bit of a disappointment. I guess it's V-cache with cache as a base tile then.

cannedlake240 · Nov 12, 2024

511 said:
Raichus review for ARL is up

https://twitter.com/x/status/1856188360043434305
View attachment 111484
View attachment 111485

How is zen 5 so fast in FP?

DavidC1 · Nov 12, 2024

Each Clearwater Forest 24 core tile seems to be about 90mm2. That means the quad core cluster is little under 15mm2.

Total of 12 compute tiles is ~1400mm2. The base is Intel 3. They can put a LOT of cache underneath if they want to. I read it's around only 1/2 GB though? If they make it take up the space underneath, they can get 1GB of SRAM under there.

cannedlake240 · Nov 12, 2024

DavidC1 said:
Each Clearwater Forest 24 core tile seems to be about 90mm2. That means the quad core cluster is little under 15mm2.

Total of 12 compute tiles is ~1400mm2. The base is Intel 3. They can put a LOT of cache underneath if they want to. I read it's around only 1/2 GB though? If they make it take up the space underneath, they can get 1GB of SRAM under there.

Apparently the 4x3 CLF Intels been showing isn't how the actual package looks like. Bionic on Twitter said a while ago that Clearwater has more than 2X L3 over SRF, so only twice the 216mb of the 288C Sierra. Base tiles house emib phys, memory controllers and the mesh fabric.

One could also take the "doubling of L3", as 2x over 144C SRF which would just be baffling lol... Imagine a 288C cpu with so much SRAM real estate only having a little over 200mb of L3. If that's the case they should at least double the cluster L2 as well, 8mb cluster L2 has been talked about since Tremont days

Saylick · Nov 12, 2024

DavidC1 said:
Better way is to make it wide as possible and keep it at 5GHz or below. Like 50% faster per clock.

Would be interesting to see how well the clustered decode approach scales. Why not just add another cluster of 3-wide decode at this point and then widen everything else downstream.

511 · Nov 12, 2024

Hulk said:
I also cap frequency at 5.5GHz and power at 200W and I still seem to burn them out.

Funny thing is, around 20 years ago I used to do a lot of beta testing and actually wrote a few "how to" books on some of the software I was testing. Even got to go to NAB in Vegas twice and speak as an "expert" on video editing and get paid for the trip. Then YouTube came and wiped out that market.

If you had multiple defective cpu either you are getting wrong CPUs constantly or there somewhere something in MB causing issues my advice would be cap IA voltage to around 1.45V or maybe 1.5V depending on the desired frequency it will prevent degradation altogether on new cpu

cannedlake240 said:
How is zen 5 so fast in FP?

They significantly improved the FP performance full fat AVX-512 with more units to feed

511 · Nov 12, 2024

DavidC1 said:
Each Clearwater Forest 24 core tile seems to be about 90mm2. That means the quad core cluster is little under 15mm2.

Total of 12 compute tiles is ~1400mm2. The base is Intel 3. They can put a LOT of cache underneath if they want to. I read it's around only 1/2 GB though? If they make it take up the space underneath, they can get 1GB of SRAM under there.

Dou you have a die shot available?

DavidC1 · Nov 12, 2024

Saylick said:
Would be interesting to see how well the clustered decode approach scales. Why not just add another cluster of 3-wide decode at this point and then widen everything else downstream.

It is exactly what it says on the optimization manual for Gracemont.

This overall approach to x86 instruction decoding provides a clear path forward to very wide designs without needing to cache post-decoded instructions.

Can't be more optimistic than that.
-It saves on complexity, meaning less time
-It saves on transistors, meaning less power and area
-It can scale easily, while going above 8-wide traditionally is questionable
-Each clusters are only 3-wide, so easier to fill
-Works on both branches and loops
-Further opportunities for improvement, not just on the decode section but coupled with changes elsewhere.

There was an X post about Keller having worked on Intel's next architecture with 12-wide decode. This is likely Arctic Wolf.

And I doubt they're widening it by 33% to get 5-10% gains. That's not what they've been doing. Branch predictor on Skymont is 27% over Gracemont. FP is 20-30% more area for 30% extra performance.

Anandtech article about Atom said the design goals within that team were 1% power for 2% performance. Or the compactness of the core. You need to be very balanced to be like that. Can't go spending too much on one area and skimping on another. Would not be surprised if they bring some more new ideas to deliver on it.

DavidC1 · Nov 12, 2024

511 said:
Dou you have a die shot available?

You don't need a die shot. You only need a shot of the package. You find the size of the LGA7529 package. The actual shot shows having 3x narrow dies. The narrowness is because the dies are right next to each other like on Meteorlake. Each narrow die is actually 4 of them. You find the size of the narrow geometry, and divide by 4.

It's possible 4 of the dies are connected by Foveros, and connection to another of the quad die groups are done using EMIB. Then again, SPR is using EMIB only and it's pretty close between the dies.

Same with how I got Turin Dense Zen 5c's size. You get the package size, and measure the die. There's a clear separation between core and L2 for AMD, so it's even easier to find core size only.

DrMrLordX · Nov 12, 2024

Hulk said:
Is it likely that until put more resources into the development of skymont because it does double duty, client and server? Whereas lion cove is pretty much just client.

Technically, Skymont won't see use in any server products. Darkmont will, though.

It is unfortunate that Intel doesn't have a Lion Cove-based server product and instead chose to use Redwood Cove in Granite Rapids.

SiliconFly said:
Oh yeah! X3D didn’t. Turin didn’t. Same as GNR didn’t. And Lunar Lake didn’t.

Granite Rapids (assuming you're talking about that, and not Granite Ridge) isn't even using the same core as the 285k. It's Redwood Cove.

Lunar Lake doesn't share the same compute chiplet or even have the same package layout and is a niche product for 15W and below. It's good for what it is, but . . . not exactly the same thing!

Meanwhile, Turin, Granite Rapids, and Granite Rapids-X use the same CCDs. It's all the same product.

SiliconFly said:
When Zen 5 desktop parts were launched it was a disaster

Why, because some reviewers didn't like the game performance on the 9950X? Please. It's the most lucrative disaster AMD ever had. And a few weeks later, X3D parts hit the streets and all was forgiven. Meanwhile, take a look at client market share for Q3 2024 and see what's really happening.

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes + WCL Discussion Threads

Senior member

Attachments

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Lifer

Golden Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Senior member

Platinum Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Platinum Member

Lifer