Discussion Intel current and future Lakes & Rapids thread

Page 924 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DavidC1

Golden Member
Dec 29, 2023
1,617
2,671
96
So 1P Turin is 20% faster 2P it's 40% and GNR didn't scale as well i was right about Turin 20-30% faster but the 2P scaling issue is something i didn't predicted
Yea, and it also includes the tests where GNR is doing horrible, as in last place behind Sierra Forest, and even Sapphire Rapids.

They obviously rushed it. Perhaps they decided fixing it would take too long and getting it finished after it's released was better.

Guesses for Sierra Forest-AP aka Xeon 6 6900E series:
My guess for SRF-AP is that it'll land somewhere around low-900 points, or right behind the current state of GNR in Phoronix Geomean. It has twice the cores, but not twice the TDP, so probably 10% lower all core clock.

Clearwater Forest:
I would like to see more details about the strength and weaknesses of Skymont, but the 288 core Clearwater Forest is going to beat top Granite Rapids in a lot of workloads.

I still think Phoronix benches have too much of workloads that do not scale with many cores. Assuming 10% gain for 144 to 176 cores and the lowest 30% gain for Clearwater Forest will have it rivaling 288 core SRF in Phoronix Geomean. Since FP gains a serious amount, and it'll probably add additional instructions that help it a lot, plus the large cache base tile, even the 176 core version will easily beat 288 core SRF, which is a problem as it should not except in certain scenarios.

176 core CWF-SP will lose to 288 core SRF-AP, but in floating point workloads it'll be pretty close.

By 2026, some are going to get Xeon 7 6900E over Granite Rapids, considering Xeon 7 6700/6900P is going to be over a year away.
 

511

Platinum Member
Jul 12, 2024
2,877
2,890
106
I am pretty sure that 6000 memory for AMD Thats the spec.
Yes 6400 is selective servers run at certified memory speed anyway Phoronix Average will skew the result customer has to know which workload to buy what for cause there are use cases for both one thing GNR has dedicated Silicon for Accelerators like IAA,QAT,DLB etc which no reviewer has tested 🤣
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,094
16,014
136
Yes 6400 is selective servers run at certified memory speed
The quote from the review "The AMD EPYC 9965 features 192 cores of Zen 5C "Turin Dense" with SMT to provide for 384 threads per socket, a 2.25GHz base clock frequency, 3.7GHz boost frequency, 128 lanes of PCIe Gen 5, 12 channel DDR5-6000 memory (or DDR5-6400 in some environments), "
 
  • Like
Reactions: Tlh97 and Krteq

MS_AT

Senior member
Jul 15, 2024
738
1,489
96
Yea, and it also includes the tests where GNR is doing horrible, as in last place behind Sierra Forest, and even Sapphire Rapids.

They obviously rushed it. Perhaps they decided fixing it would take too long and getting it finished after it's released was better.

Guesses for Sierra Forest-AP aka Xeon 6 6900E series:
My guess for SRF-AP is that it'll land somewhere around low-900 points, or right behind the current state of GNR in Phoronix Geomean. It has twice the cores, but not twice the TDP, so probably 10% lower all core clock.

Clearwater Forest:
I would like to see more details about the strength and weaknesses of Skymont, but the 288 core Clearwater Forest is going to beat top Granite Rapids in a lot of workloads.

I still think Phoronix benches have too much of workloads that do not scale with many cores. Assuming 10% gain for 144 to 176 cores and the lowest 30% gain for Clearwater Forest will have it rivaling 288 core SRF in Phoronix Geomean. Since FP gains a serious amount, and it'll probably add additional instructions that help it a lot, plus the large cache base tile, even the 176 core version will easily beat 288 core SRF, which is a problem as it should not except in certain scenarios.

176 core CWF-SP will lose to 288 core SRF-AP, but in floating point workloads it'll be pretty close.

By 2026, some are going to get Xeon 7 6900E over Granite Rapids, considering Xeon 7 6700/6900P is going to be over a year away.
Disclaimer: I wrongly assumed Clearwater Forest is Skymont based, my bad. It's apparently Darkmont based and I know nothing of this one, so feel free to ignore anything that is written below.


Do you mean floating point or SIMD? Now Skymont is posting great improvements over Gracemont in SIMD because Gracemont was really bad at it, 2 Execution ports at 128b width. Skymont is 4x128b with each unit capable of FMA iirc. Zen5 is 4x512b with 2 units capable of FMA and 2 of addition. In mixed code Zen5 core can peak at 2048b per cycle, Skymont core only at 512b. In add or fma dominated code Skymont will still show 512b while Zen5 will do 1024b. So assuming the cores won't be bandwidth starved, you need at least 2x as many Skymont cores, at worst 4x as many. Skymont also doesn't support AVX512 which is important from feature point of view not just the register width.

In reality it will come down to bandwidth available and code quality.

Since we are discussing server chips, and SIMD workloads, I am assuming that for HPC and AI workloads the AVX512 code paths are already available.

I would expect Clearwater Forest to score a lot of wins against Zen5c, but not in SIMD. Reality will of course verify the expectations when both chips will be on the market;)
 
Last edited:

511

Platinum Member
Jul 12, 2024
2,877
2,890
106
Do you mean floating point or SIMD? Now Skymont is posting great improvements over Gracemont in SIMD because Gracemont was really bad at it, 2 Execution ports at 128b width. Skymont is 4x128b with each unit capable of FMA iirc. Zen5 is 4x512b with 2 units capable of FMA and 2 of addition. In mixed code Zen5 core can peak at 2048b per cycle, Skymont core only at 512b. In add or fma dominated code Skymont will still show 512b while Zen5 will do 1024b. So assuming the cores won't be bandwidth starved, you need at least 2x as many Skymont cores, at worst 4x as many. Skymont also doesn't support AVX512 which is important from feature point of view not just the register width.

In reality it will come down to bandwidth available and code quality.

Since we are discussing server chips, and SIMD workloads, I am assuming that for HPC and AI workloads the AVX512 code paths are already available.

I would expect Clearwater Forest to score a lot of wins against Zen5c, but not in SIMD. Reality will of course verify the expectations when both chips will be on the market;)
Zen 5C will be the comparison tbf against Darkmont i don't know exactly the spec for 5C
 

MS_AT

Senior member
Jul 15, 2024
738
1,489
96
Zen 5C will be the comparison tbf against Darkmont i don't know exactly the spec for 5C
I meant the Turin Dense, released yesterday. But it seems I confused the 288 core Sierra Forest with Clearwater Forest using Darkmont. My bad. Since we know nothing about Darkmont I guess, my post can be thrown to the bin;)
 
Jul 27, 2020
26,010
17,949
146
Since we know nothing about Darkmont
But we do. It will be DARK.

Darker than a moonless night.

Darker than the blackest cat you've ever seen.

It will be so dark, it will throw us into a dangerous state of depression with its performance and we will be forced to question our very existence.

Man will not have known darkness of such nature before.
 

511

Platinum Member
Jul 12, 2024
2,877
2,890
106
I meant the Turin Dense, released yesterday. But it seems I confused the 288 core Sierra Forest with Clearwater Forest using Darkmont. My bad. Since we know nothing about Darkmont I guess, my post can be thrown to the bin;)
Turin dense also has 4X512 bit ? I thought it was limited to 2X512Bit still I don't think skymont to
Darkmont is a big change anyway looks like the SpecInt performance for TD is around 3000 while 2 X144 E-Cores scores around 1410
Screenshot_20241011-152722.png
 

AcrosTinus

Senior member
Jun 23, 2024
206
209
76
Cause AMDs P core Arch in Zen 5 is superior than P core arch in GNR one reason
Yeah, I forgot that it is not Lion Cove, they basically upgraded to Redwood Cove 8Wide to gain parity with Zen4 one year later. Now they are essentially lagging by 1 year.

The only chance for them to catch up is a delay in Zen6, this will be the only GAP in the Matrix for Intel to introduce a arch that is competitive and even on par with consumer or ahead.
 

MS_AT

Senior member
Jul 15, 2024
738
1,489
96
Turin dense also has 4X512 bit ? I thought it was limited to 2X512Bit still I don't think skymont to
Darkmont is a big change anyway looks like the SpecInt performance for TD is around 3000 while 2 X144 E-Cores scores around 1410
View attachment 109199
Mobile only Zen5 [Strix Point and friends, except for Strix Halo] are 256b. Turin Dense is 512b according to all information available at the moment that I know of. So AMD has 4 Zen5 "designs" so to speak. But it's not so unimaginable, seeing that Turin Dense is on 3nm node vs 4nm for others.
 

511

Platinum Member
Jul 12, 2024
2,877
2,890
106
Yeah, I forgot that it is not Lion Cove, they basically upgraded to Redwood Cove 8Wide to gain parity with Zen4 one year later. Now they are essentially lagging by 1 year.

The only chance for them to catch up is a delay in Zen6, this will be the only GAP in the Matrix for Intel to introduce a arch that is competitive and even on par with consumer or ahead.
DMR has Panther Cove on 18A we will have 3 P cove arch in 3 years with IPC improvement though LNC -> 14 vs RWC, Cougar Cove-> 8-10% IPC vs LNC and Coyote/Panther Cove with another 20% on top if the rumors regarding PNC/CYC being overhauled is true
 

511

Platinum Member
Jul 12, 2024
2,877
2,890
106
Yeah, I forgot that it is not Lion Cove, they basically upgraded to Redwood Cove 8Wide to gain parity with Zen4 one year later. Now they are essentially lagging by 1 year.
Yeah Better than lagging 3+ years lol i am still pissed they show accelerators in sillicon but doesn't market it good enough if we throw them in the mix something will run insanely fast
The only chance for them to catch up is a delay in Zen6, this will be the only GAP in the Matrix for Intel to introduce a arch that is competitive and even on par with consumer or ahead.
Yes Intel needs a superior arch to Zen6 if they want leadership back
 

yuri69

Senior member
Jul 16, 2013
663
1,194
136
The only chance for them to catch up is a delay in Zen6, this will be the only GAP in the Matrix for Intel to introduce a arch that is competitive and even on par with consumer or ahead.
Well, with GNR/Atoms Intel is now notably *closer* to the current gen EPYC than it have been since 2019 Rome(?). It's not bad. Given Zen 6 is a 2026 evolution of Zen 5, DMR has a good chance.
 
  • Like
Reactions: AcrosTinus

DavidC1

Golden Member
Dec 29, 2023
1,617
2,671
96
Do you mean floating point or SIMD? Now Skymont is posting great improvements over Gracemont in SIMD because Gracemont was really bad at it, 2 Execution ports at 128b width. Skymont is 4x128b with each unit capable of FMA iirc. Zen5 is 4x512b with 2 units capable of FMA and 2 of addition. In mixed code Zen5 core can peak at 2048b per cycle, Skymont core only at 512b. In add or fma dominated code Skymont will still show 512b while Zen5 will do 1024b. So assuming the cores won't be bandwidth starved, you need at least 2x as many Skymont cores, at worst 4x as many. Skymont also doesn't support AVX512 which is important from feature point of view not just the register width.
My analysis is based on the current benchmark result of Sierra Forest, not the theoretical maximum.

Doubling FP units in Skymont only resulted in 30% gain, as the 32% Integer gain is microarchitecture gain which will also benefit FP by a roughly equal amount. It only takes 30% on top of that to reach 72% FP gain. The gains will certainly be greater in certain HPC tests, but SpecFP is little bit more varied than that.

100% gains due to doubling FP performance means the code is entirely FP unit bound, which outside of synthetic Whetstone benchmarks designed to test only the Floating Point unit will never happen.
 

AMDK11

Senior member
Jul 15, 2019
456
373
136
Yeah, I forgot that it is not Lion Cove, they basically upgraded to Redwood Cove 8Wide to gain parity with Zen4 one year later. Now they are essentially lagging by 1 year.

The only chance for them to catch up is a delay in Zen6, this will be the only GAP in the Matrix for Intel to introduce a arch that is competitive and even on par with consumer or ahead.
RedwoodCove in GraniteRapids is still 6-wide. On the Intel slide, the statement that RedwoodCove is 8-Wide was a typo. In the interview, it was confirmed that RedwoodCove in Meteor and GraniteRapids are basically the same thing.

The difference comes down to 2x512bit (AVX512) and AMX.
 
Jul 27, 2020
26,010
17,949
146
RedwoodCove in GraniteRapids is still 6-wide. On the Intel slide, the statement that RedwoodCove is 8-Wide was a typo. In the interview, it was confirmed that RedwoodCove in Meteor and GraniteRapids are basically the same thing.
The Cove that keeps on biting them in the butt. Amazing.