Discussion Intel current and future Lakes & Rapids thread

Page 891 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

H433x0n

Senior member
Mar 15, 2023
649
666
96
Higher 1T means nothing on Server grade products, SPR falls on its face once they hit its Max TDP. Thats why stock Xeon W9 are on par with Zen3 Thread Rippers on general MT workloads, I suspect that due to Higher L3$ and better Topology EMR will be 15% higher at same core count and TDP when compared to SPR. That is progress I suppose, but AMD with Genoa/Genoa-X/Bergamo are just too ahead on Core/Performance/TCO.
I'm not going to really argue any of your points since I pretty much agree. Looking at the Phoronix benchmarks of SPR performance, if you werre to add an additional 17% it gets pretty close to Epyc 9554 but that's still a far cry from the 96-core 9654.
 

Abwx

Lifer
Apr 2, 2011
10,583
3,053
136
I'm not going to really argue any of your points since I pretty much agree. Looking at the Phoronix benchmarks of SPR performance, if you werre to add an additional 17% it gets pretty close to Epyc 9554 but that's still a far cry from the 96-core 9654.

But to get those 17% you ll have to increase wattage by 40-45% or so, and even at stock perf/watt is at a disadvantage in respect of the 9554 due to Epyc s node advantage...
 

Geddagod

Golden Member
Dec 28, 2021
1,033
874
96
Higher 1T means nothing on Server grade products, SPR falls on its face once they hit its Max TDP. Thats why stock Xeon W9 are on par with Zen3 Thread Rippers on general MT workloads, I suspect that due to Higher L3$ and better Topology EMR will be 15% higher at same core count and TDP when compared to SPR. That is progress I suppose, but AMD with Genoa/Genoa-X/Bergamo are just too ahead on Core/Performance/TCO.
While literally less than a week ago I was arguing with someone on Reddit that ST perf isn't all that important in server products, I wouldn't say it's worth nothing. Higher 1T was the quoted reason that Jensen said they went and paired SPR with their H100 systems- and even if that was just a "fake" reason so that they would have an excuse not to use AMD, there's plenty of benches in websites such as Phoronix that care mostly about ST performance- and that whole website is dedicated to testing workloads that are representative of common server use cases.
AMD even recognize this market as, even if niche, somewhat of an official market- which is why they sell server skus that are frequency optimized (AMD with their -F skus).
Intel's self quoted perf improvement with EMR was 17% improved perf/watt btw
 

H433x0n

Senior member
Mar 15, 2023
649
666
96
But to get those 17% you ll have to increase wattage by 40-45% or so, and even at stock perf/watt is at a disadvantage in respect of the 9554 due to Epyc s node advantage...
No, it's 17% more general compute performance at iso power (350W). I'm not claiming it's amazing but it's unambiguously more performant than SPR.
 

Hitman928

Diamond Member
Apr 15, 2012
4,795
6,921
136
While literally less than a week ago I was arguing with someone on Reddit that ST perf isn't all that important in server products, I wouldn't say it's worth nothing. Higher 1T was the quoted reason that Jensen said they went and paired SPR with their H100 systems- and even if that was just a "fake" reason so that they would have an excuse not to use AMD, there's plenty of benches in websites such as Phoronix that care mostly about ST performance- and that whole website is dedicated to testing workloads that are representative of common server use cases.
AMD even recognize this market as, even if niche, somewhat of an official market- which is why they sell server skus that are frequency optimized (AMD with their -F skus).
Intel's self quoted perf improvement with EMR was 17% improved perf/watt btw

There's a difference between 1T performance and per core performance. No one cares about 1T performance on server CPUs, but many customers do care about how each core performs once the CPU is loaded.
 

H433x0n

Senior member
Mar 15, 2023
649
666
96
While literally less than a week ago I was arguing with someone on Reddit that ST perf isn't all that important in server products, I wouldn't say it's worth nothing. Higher 1T was the quoted reason that Jensen said they went and paired SPR with their H100 systems- and even if that was just a "fake" reason so that they would have an excuse not to use AMD, there's plenty of benches in websites such as Phoronix that care mostly about ST performance- and that whole website is dedicated to testing workloads that are representative of common server use cases.
AMD even recognize this market as, even if niche, somewhat of an official market- which is why they sell server skus that are frequency optimized (AMD with their -F skus).
Intel's self quoted perf improvement with EMR was 17% improved perf/watt btw
Everybody always has an issue when I link the Phoronix test results. Going by their results, SPR is within 25% of Genoa in their geometric mean of all results and outperforms Milan. I genuinely don't know how to interpret their findings though.
 
  • Like
Reactions: Henry swagger

Geddagod

Golden Member
Dec 28, 2021
1,033
874
96
Everybody always has an issue when I link the Phoronix test results. Going by their results, SPR is within 25% of Genoa in their geometric mean of all results and outperforms Milan. I genuinely don't know how to interpret their findings though.
The geometric mean isn't all that useful. They mix in a bunch of ST, MT, and then AI based workloads (where Intel obviously dominates bcuz of accelerators) and call it a day. Using the individual breakdown of "sections" averages are much more useful.
 
  • Like
Reactions: igor_kavinski

H433x0n

Senior member
Mar 15, 2023
649
666
96
The geometric mean isn't all that useful. They mix in a bunch of ST, MT, and then AI based workloads (where Intel obviously dominates bcuz of accelerators) and call it a day. Using the individual breakdown of "sections" averages are much more useful.
In that case then they're beating Milan in every metric? All of these results are exact opposite of what I've heard online where Milan beats SPR and Genoa nearly doubles SPR performance.

It may seem that I’m trolling but I’m not. If those are legitimate / representative benchmarks then the tech enthusiast internet is a crazy place.
 
Last edited:

dullard

Elite Member
May 21, 2001
24,717
3,016
126
The geometric mean isn't all that useful. They mix in a bunch of ST, MT, and then AI based workloads (where Intel obviously dominates bcuz of accelerators) and call it a day. Using the individual breakdown of "sections" averages are much more useful.
The mathematically correct way to compare multiple benchmarks is a geometric mean, not just a simple average. That said, you are correct that a geometric mean of a mixture of irrelevant tests is not useful. If you care about MT, then do a geometric mean of just the MT tests. If you care about AI, then do a geometric mean of just the AI tests.
 
  • Like
Reactions: Geddagod

H433x0n

Senior member
Mar 15, 2023
649
666
96
No, only on AVX-512 And specific accelerators, otherwise Milan will either match or beat SPR.
… AVX512 and accelerators are part of SPR though? If AVX512 and AMX provide a big enough uplift allowing it to take the win in Phoronix’s test suite for HPC, Rendering, Code Compilation, etc that’s an accurate representation of what a buyer would receive if they ordered SPR.
 

nicalandia

Diamond Member
Jan 10, 2019
3,327
5,279
136
… AVX512 and accelerators are part of SPR though? If AVX512 and AMX provide a big enough uplift allowing it to take the win in Phoronix’s test suite for HPC, Rendering, Code Compilation, etc that’s an accurate representation of what a buyer would receive if they ordered SPR.
But AVX512/AMX is not making any difference in many of those test(HPC, Rendering and Code Compilation) because the developers have yet to implement it, heck even AVX-512 has hardly been implemented.

1695431779573.png

1695431803073.png

1695431834524.png

1695431882963.png

Those are at ISO core count so Apples to Apples, perhaps With EMR they will be able to beat Milan?
 

H433x0n

Senior member
Mar 15, 2023
649
666
96
But AVX512/AMX is not making any difference in many of those test(HPC, Rendering and Code Compilation) because the developers have yet to implement it, heck even AVX-512 has hardly been implemented.

View attachment 86127

View attachment 86128

View attachment 86129

View attachment 86130

Those are at ISO core count so Apples to Apples, perhaps With EMR they will be able to beat Milan?
According to Phoronix they’re beating Milan. When you’re comparing the entire processor and not just cherry picking a slice of 8 cores rented out.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,103
14,141
136
According to Phoronix they’re beating Milan. When you’re comparing the entire processor and not just cherry picking a slice of 8 cores rented out.
But what happens when it is pitted against its competition, which is not Milan ?
 

H433x0n

Senior member
Mar 15, 2023
649
666
96
But what happens when it is pitted against its competition, which is not Milan ?
It loses.

I’m not exactly an SPR cheerleader, I’m not advocating anybody to buy it.

Don't. You will get a few people upset here. We all know how weak SPR looks against Genoa.
I honestly don’t care. I find a lot of the data center stuff boring and it’s not exactly competitive at the moment between Intel and AMD.
 

nicalandia

Diamond Member
Jan 10, 2019
3,327
5,279
136
Last edited:

H433x0n

Senior member
Mar 15, 2023
649
666
96
Well Data Center is where they are "More" Competitive due to AVX/AMX because it gets worst at the desktop where their niche accelerators don't help https://www.pugetsystems.com/labs/articles/intel-xeon-w-3400-content-creation-review/

at stock it can't even beat Milan and it's barely ahead of Rome based ThreadRippers.
To me, that’s essentially a Data Center comparison. Not exactly unexpected results either. To reiterate - I’m not white knighting for SPR.

I was just making a comment that the online discourse about this is wildly overstating the reality of the situation.
 

JoeRambo

Golden Member
Jun 13, 2013
1,805
2,084
136
According to Phoronix they’re beating Milan. When you’re comparing the entire processor and not just cherry picking a slice of 8 cores rented out.

And that's the problem for Intel currently, such a slice on AMD is cherry picked, but pretty much impossible to make it perform on Intel.

1) Without on-socket NUMA, such tenant of 8C size does not have enough on core resources and any memory traffic is basically a L3 cache miss and looong trip to memory.
2) With quadrant mode, such tenant of 8C is awkwardly sized for 14C slice and has even more miniscule L3, mem bandwidth cut to 2 channel. Not a right place to be and memory subsystem performance is still hilariously bad for 2023.

Compare to AMD where such tenant is has full chiplet just for themselves, backed by their excellent L3 slice of 32MB and 96MB with X3D. And all L3 misses share same IOD with 12 channels.

EMR is fixing most of it by providing larger slice, way more L3 for "half" mode. So fortunes of Intel from performance side core vs core will improve quite some.
They won't be able to touch AMD's perf peaks obviously, but there are so called "licenzing bastards", where software vendor shenanigans are forcing orgs to buy certain sized servers anyway -> like 32C. Intel will be much more competitive in those if the price is right.
 

lightisgood

Member
May 27, 2022
147
58
61

Attachments

  • Screenshot from 2023-09-25 15-32-46.png
    Screenshot from 2023-09-25 15-32-46.png
    486.9 KB · Views: 29
  • Like
Reactions: Geddagod

AMDK11

Member
Jul 15, 2019
129
86
101
Doesn't this slide say up to 288 cores on the 2S?
128279_2.jpg
 

nicalandia

Diamond Member
Jan 10, 2019
3,327
5,279
136
Doesn't this slide say up to 288 cores on the 2S?
128279_2.jpg
Yes. 2S = 2P so two socket. 2S SF 288c/288T vs 2S Bergamo 256c/512T.
Few here got too carried away br OMG 288 Core Sierra Forest. Well its like the previous leaks. Only 144 per socket and... get this: 36 Quad Clusters each with 3 MiB of L3. Same internal ring per cluster. Good luck with that.

Screenshot_20230926-171133_Chrome.jpg
 
Last edited: