Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Tigerick · Aug 22, 2022

As Hot Chips 34 starting this week, Intel will unveil technical information of upcoming Meteor Lake (MTL) and Arrow Lake (ARL), new generation platform after Raptor Lake. Both MTL and ARL represent new direction which Intel will move to multiple chiplets and combine as one SoC platform.

MTL also represents new compute tile that based on Intel 4 process which is based on EUV lithography, a first from Intel. Intel expects to ship MTL mobile SoC in 2023.

ARL will come after MTL so Intel should be shipping it in 2024, that is what Intel roadmap is telling us. ARL compute tile will be manufactured by Intel 20A process, a first from Intel to use GAA transistors called RibbonFET.

Intel Core Ultra 100 - Meteor Lake

As mentioned by Tomshardware, TSMC will manufacture the I/O, SoC, and GPU tiles. That means Intel will manufacture only the CPU and Foveros tiles. (Notably, Intel calls the I/O tile an 'I/O Expander,' hence the IOE moniker.)

Darkmont · 2025-08-25T15:10:17-0400

Josh128 said:
What do you mean? The tweet clearly says Skymont is 3.18, RPC is 3.71. I remember Intel hyping up "Chadmont" saying it was going to have Raptor Cove +2% IPC. This tweet is saying Darkmont has Skymont+17% IPC and now Darkmont is equal to RPC, lol.

View attachment 129235

Sierra Forrest is Crestmont

511 · 2025-08-25T15:12:11-0400

Josh128 said:
What do you mean? The tweet clearly says Skymont is 3.18, RPC is 3.71. I remember Intel hyping up "Chadmont" saying it was going to have Raptor Cove +2% IPC. This tweet is saying Darkmont has Skymont+17% IPC and now Darkmont is equal to RPC, lol.

View attachment 129235

Also in the tweet SRF is Sierra Forest with 3.18 is Crestmont

Josh128 · 2025-08-25T15:27:49-0400

511 said:
View attachment 129236
Also in the tweet SRF is Sierra Forest with 3.18 is Crestmont

Ah, thought it was Skymont. So, Darkmont is < Skymont. Odd.

511 · 2025-08-25T15:33:26-0400

Josh128 said:
Ah, thought it was Skymont. So, Darkmont is < Skymont. Odd.

Nope Darkmont >= Skymont don't forget Intel cut the latency by 2 Cycle on Darkmont+L2 vs Skymont+L2.

coercitiv · 2025-08-25T15:34:42-0400

Covfefe said:
I'd take this tweet with a grain of salt. The 17% number on the Intel slide is SpecIntRate, i.e. one specInt instance for every hardware thread. All the numbers in the tweet (except Darkmont) are specInt.

At the same time, it does provide us with a baseline for perf/clock uplift, and that baseline does not look particularly optimistic given the double digit difference we know exists between Skymont and Crestmont. I agree though that going for the RPC comparison is... not a great idea.

Josh128 said:
Ah, thought it was Skymont. So, Darkmont is < Skymont. Odd.

We would need to know relative SpecIntRate for Skymont/Crestmont in a similar server architecture, otherwise math goes sideways fast.

dullard · 2025-08-25T15:58:09-0400

511 said:
View attachment 129200

Here is the best photo that I see of it. https://www.servethehome.com/wp-content/uploads/2024/09/Intel-Xeon-6-Clearwater-Forest-Close.jpg

Covfefe · 2025-08-25T16:10:31-0400

coercitiv said:
At the same time, it does provide us with a baseline for perf/clock uplift, and that baseline does not look particularly optimistic given the double digit difference we know exists between Skymont and Crestmont. I agree though that going for the RPC comparison is... not a great idea.

I don't think it's useful even as a baseline estimate.Here's some SpecIntRate results for the 5 CPUs from the tweet.

	SpecInt “ipc”	SpecRate score	copies	score per copy	clockrate (nominal)	SpecRate “ipc”	specintrate link
Emerald Rapids(RPC)	14.5/3.9 = 3.71	1240	256	4.84	1.9	2.54	link
Granite Rapids(RWC)	13.6/3.9Ghz = 3.48	1220	256	4.76	2	2.38	link
Sierra Forest(SRF)	9.54/3.0ghz = 3.18	1410	288	4.89	2.2	2.22	link
AMD EPYC 9654	14.3/3.7 = 3.86	1790	384	4.66	2.4	1.94	link
AMD EPYC 9755	18/4.1 = 4.39	2720	512	5.31	2.7	1.96	link

The SpecRate ipc I calculated here is obviously next to useless because the real world clockspeed could be miles apart from the nominal clockspeed. Despite that, I think the SpecRate score and score per copy is enough to completely discredit the tweet's comparison. SpecRateInt and SpecInt should not be directly compared.

desrever · 2025-08-25T16:13:09-0400

Covfefe said:
I don't think it's useful even as a baseline estimate.Here's some SpecIntRate results for the 5 CPUs from the tweet.

SpecInt “ipc”
SpecRate score
copies
score per copy
clockrate (nominal)
SpecRate “ipc”
specintrate link
Emerald Rapids(RPC)
14.5/3.9 = 3.71
1240
256
4.84
1.9
2.54

CPU2017 Floating Point Rate Result: ASUSTeK Computer Inc. ASUS RS720-E11-RS12U (1.90 GHz, Intel Xeon Platinum 8592+)

CFP2017 result for ASUS RS720-E11-RS12U (1.90 GHz, Intel Xeon Platinum 8592+); SPECrate2017_fp_base: 1240; SPECrate2017_fp_peak: 1290

www.spec.org

Granite Rapids(RWC)
13.6/3.9Ghz = 3.48
1220
256
4.76
2
2.38

CPU2017 Integer Rate Result: IEIT Systems Co., Ltd. meta brain NF3290G8 (Intel Xeon 6980P)

CINT2017 result for meta brain NF3290G8 (Intel Xeon 6980P); SPECrate2017_int_base: 1220; SPECrate2017_int_peak: 1260

www.spec.org

Sierra Forest(SRF)
9.54/3.0ghz = 3.18
1410
288
4.89
2.2
2.22

CPU2017 Integer Rate Result: IEIT Systems Co., Ltd. meta brain NF5280G8 (Intel Xeon 6780E)

CINT2017 result for meta brain NF5280G8 (Intel Xeon 6780E); SPECrate2017_int_base: 1410; SPECrate2017_int_peak: 1460

www.spec.org

AMD EPYC 9654
14.3/3.7 = 3.86
1790
384
4.66
2.4
1.94

CPU2017 Integer Rate Result: ASUSTeK Computer Inc. ASUS RS700A-E12(K14PP-D24) Server System 2.40 GHz, AMD EPYC 9654

CINT2017 result for ASUS RS700A-E12(K14PP-D24) Server System 2.40 GHz, AMD EPYC 9654; SPECrate2017_int_base: 1790; SPECrate2017_int_peak: 1930

www.spec.org

AMD EPYC 9755
18/4.1 = 4.39
2720
512
5.31
2.7
1.96

CPU2017 Integer Rate Result: GIGA-BYTE TECHNOLOGY CO., LTD. R183-Z93-LAJ1-000 (2.70 GHz, AMD EPYC 9755) (test sponsored by Giga Computing Technology Co., Ltd.)

CINT2017 result for R183-Z93-LAJ1-000 (2.70 GHz, AMD EPYC 9755); SPECrate2017_int_base: 2720; SPECrate2017_int_peak: 2790

www.spec.org

You are comparing SMT scores and dividing on it???

Covfefe · 2025-08-25T16:20:46-0400

desrever said:
You are comparing SMT scores and dividing on it???

Yes, I know it's a bad comparison. That's kind of my whole point.

coercitiv · 2025-08-25T16:43:34-0400

Covfefe said:
Yes, I know it's a bad comparison. That's kind of my whole point.

Let's look at a very crude "efficiency" metric, SpecRate "IPC" / SpecInt "IPC":

	SpecInt "IPC"	SpecRate "IPC"	"Efficiency"
Emerald Rapids(RPC)	3.71	2.54	0.68
Granite Rapids(RWC)	3.48	2.38	0.68
Sierra Forest(SRF)	3.18	2.22	0.70
AMD EPYC 9654	3.86	1.94	0.50
AMD EPYC 9755	4.39	1.96	0.45

Seems to me like arches from the same family that use similar interconnect tend to have a degree of semblance when it comes to scaling from SpecRate to SpecInt. It's not enough data to draw a conclusion, but not a chaotic comparison either.

gdansk · 2025-08-25T17:05:22-0400

18A CWF Darkmont is about 25% more cores in given area than N3E Turin D Z5C when excluding LLC.
Good enough. Seems it all comes down to clocks.

Doug S · 2025-08-25T17:09:00-0400

Am I reading those slides right or did Intel marketing write them stupidly?

Are there actually 12 separate CPU chiplets, mounted four at a time on three base tiles?? If so, holy chiplets Batman!

adroc_thurston · 2025-08-25T17:10:51-0400

Doug S said:
Are there actually 12 separate CPU chiplets, mounted four at a time on three base tiles??

Yeah.
And two I/O caps.

Doug S · 2025-08-25T17:13:39-0400

Wow my understanding of packaging economics must be out of date because I can't see how that can possibly be cost effective? Ditto dicing wafers into tiny little E core sized chiplets. I haven't followed what Intel is doing that closely (obviously) but this seems crazy to me.

I guess it is good if you have crappy 18A yields though...

Hitman928 · 2025-08-25T17:21:29-0400

Doug S said:
Am I reading those slides right or did Intel marketing write them stupidly?

Are there actually 12 separate CPU chiplets, mounted four at a time on three base tiles?? If so, holy chiplets Batman!

YOLO

coercitiv · 2025-08-25T17:29:14-0400

Doug S said:
Are there actually 12 separate CPU chiplets, mounted four at a time on three base tiles?? If so, holy chiplets Batman!

You should see the other guy...

coercitiv · 2025-08-25T17:39:31-0400

A closeup of CF:

Source. (fixed)

Here's a look at the twelve 18A chiplets for Clearwater Forest, each housing 24 cores in 55 mm^2, the dies are stacked and connected with a mesh topology on 3 base dies on Intel 3

AcrosTinus · 2025-08-25T18:05:20-0400

was too negative about bugs...

Josh128 · 2025-08-25T18:07:14-0400

gdansk said:
18A CWF Darkmont is about 25% more cores in given area than N3E Turin D Z5C when excluding LLC.
Good enough. Seems it all comes down to clocks.

How does that jive with 18A claimed density vs N3E? I know its apples to oranges, but Im guessing Darkmont has significantly more transistors than Crestmont, so maybe ballparkish to Zen 5C?

lightisgood · 2025-08-25T18:35:45-0400

coercitiv said:
A closeup of CF:
View attachment 129245

Source.

There are 12 islands for mesh-stop?
If this estimate is right, it's so cool...
I remember Sierra Forest had 36 islands.

gdansk · 2025-08-25T18:44:52-0400

Josh128 said:
How does that jive with 18A claimed density vs N3E? I know its apples to oranges, but Im guessing Darkmont has significantly more transistors than Crestmont, so maybe ballparkish to Zen 5C?

It's beyond apples and oranges. I can't say anything useful about the processes from measuring die shots.

Saylick · 2025-08-25T18:54:40-0400

coercitiv said:
A closeup of CF:
View attachment 129245

Source.

Source link is broken.

Also, I wonder what the core-to-core latency plot will look like, specifically when one core needs to talk to another core on a separate die but on the same base tile and on a separate die but also on a separate base die.

Then again, the use case for a product like this doesn't really necessitate low core-to-core latency so... eh.

adroc_thurston · 2025-08-25T19:19:53-0400

Saylick said:
Also, I wonder what the core-to-core latency plot will look like, specifically when one core needs to talk to another core on a separate die but on the same base tile and on a separate die but also on a separate base die.

Characteristically the same as GNR, where local L3 is slow but manageable and far L3 is pretty bad.

DavidC1 · 2025-08-25T19:36:30-0400

Covfefe said:
The SpecRate ipc I calculated here is obviously next to useless because the real world clockspeed could be miles apart from the nominal clockspeed. Despite that, I think the SpecRate score and score per copy is enough to completely discredit the tweet's comparison. SpecRateInt and SpecInt should not be directly compared.

The 17% is useful on the basis that we could take the SpecIntRate score from Sierra Forest, multiply by 1.9x for typical clock scaling with double the cores and do 1.17x on it. Which would be 3140 for Clearwater Forest.

I think they are doing this for two reasons:
-It hides how the final part performs
-Multi-thread performance is the most important metric for Clearwater Forest

The other issue in using Rate is that "IPC" for MT in comparison to SMT enabled cores is affected significantly by SMT. So if AMD benefits more from SMT, then the SpecRate "IPC" would look better for AMD versus if you used 1T SpecInt by a few %.

Actually, Darkmont is only few single digit % faster than Skymont, which will be easily eclipsed either way by SoC implementation. So the real important part for Clearwater is how the part performs against predecessor and competition(including Intel) in actual server workloads, and in SpecIntRate.

Sierra Forest needed Intel comparing a 144 core SRF against 128-thread AMD part. If it gets over 3000+ for 2P, then it can be directly compared against top 192 core AMD part. That wasn't good for Intel as they have to sell it as a perf/W and perf/$ and you need a high end part for more revenue.

Joe NYC · 2025-08-25T20:42:43-0400

adroc_thurston said:
Yeah.
And two I/O caps.

It will be interesting to see these 288 cores all accessing the shared 500 MB LLC through the mesh interconnect, up to 2 hops over EMIB.

OTOH, AMD has fast L3, 128 MB per chiplet (1 GB per CPU), hiding all the L3 access traffic from the fabric...

Discussion Intel Meteor, Arrow, Lunar & Panther Lakes Discussion Threads

Senior member

Attachments

Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Elite Member

Member

Senior member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Golden Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member