Discussion Intel current and future Lakes & Rapids thread

Page 691 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

nicalandia

Platinum Member
Jan 10, 2019
2,185
3,253
106
8S systems are extremely rare these days. Only a few select buyers want them. If that's the only place HBM2e Sapphire Rapids shows up, then it's not going to be terribly relevant compared to standard 1/2S Sapphire Rapids.
SPR-SP 2S with HBM2e will also be available(and more of them of course)
 

Joe NYC

Senior member
Jun 26, 2021
821
875
96
8S systems are extremely rare these days. Only a few select buyers want them. If that's the only place HBM2e Sapphire Rapids shows up, then it's not going to be terribly relevant compared to standard 1/2S Sapphire Rapids.
I think the latest revision of Aurora supercomputer spec also has HBM SPR CPUs.
 
Last edited:

nicalandia

Platinum Member
Jan 10, 2019
2,185
3,253
106
I've been thinking real hard about the number posted so far For Sapphire Rapids on Cinebench R23, I've have message Yuuki_AnS about it. since most of the MT posted Scores have about the same score weather they are 2s 48 Cores of 2S 60 Cores per Socket.

I believe he is running his Developer MotherBoard in Sub NUMA Cluster SNC2 and If Sapphire Rapids is anything like Xeon knights landing(in many ways they are, HBM, Highest Mesh so far, Memory mode) Knights landing takes a huge hit on performance on a Quad Cluster Domain effectively cutting the performance to a 1/4th of what the CPU is capable on Cinebench R23.

Xeon Phi Knights landing on SNC-4 gets only 2500 MT points in CBR23 and with a All2All it's gets a 400% boost to over 10,000 points on MT.

1660685503562.png

4 Tiles, SNC-4 on Sapphire Rapids CPU package. I would not be too surprised if those 60,000 points the 2S systems are getting on CBR23 is actually on SNC-2/4 and the CPU can hit close to 70,000 per CPU alone and over 120,000 points in 2S Systems
 
Last edited:
  • Like
Reactions: moinmoin

lightisgood

Junior Member
May 27, 2022
11
6
41
HBM is Dram, Slow clocked Dram at that!


caches do one of two things,

reduce latency

amplify bandwidth.


HBM acting as a cache will provide no latency benefit , infact misses to memory will probably be worse then if its acting as a scratch pad because extra latency on HBM cache miss.

We know the low latency DRAM cache which has the sophisticated cache hierarchy.

here.

In theory, Intel can add eDRAM cache... maybe manufacutured by sub-10nm process... as "Memory tile" on SPRs/EMRs or later.

I will wait and see how things go for a while.
 

DrMrLordX

Lifer
Apr 27, 2000
19,972
8,955
136
SPR-SP 2S with HBM2e will also be available(and more of them of course)
Okay. It seemed from the previous post that would not be the case.

I think the latest revision of Aurora supercomputer spec also has HBM SPR CPUs.
Aurora uses custom nodes configured specifically for that supercomputer though, doesn't it? The general market won't see products based on that.
 
  • Like
Reactions: Joe NYC

nicalandia

Platinum Member
Jan 10, 2019
2,185
3,253
106
Just got back from YuuKi_AnS, We checked all the settings(most are locked for default config) Subnuma clustering is set to default(all2all).

This run was performed a few minutes ago.
1660761261967.png

That CPU can't seem to perform as it should. I blame that on the Beta BIOS, because even Release quality CPUs are not performing.

That 70,000 points for that 2S system is just sad
 

Timmah!

Golden Member
Jul 24, 2010
1,052
276
136
Just got back from YuuKi_AnS, We checked all the settings(most are locked for default config) Subnuma clustering is set to default(all2all).

This run was performed a few minutes ago.
View attachment 66056

That CPU can't seem to perform as it should. I blame that on the Beta BIOS, because even Release quality CPUs are not performing.

That 70,000 points for that 2S system is just sad
Wait, 2x 56 core SPR score 70000 together? When single 13900k does supposedly 40000?
"Sad" is understatement here.
 

LightningZ71

Golden Member
Mar 10, 2017
1,396
1,542
136
(My random musings...)

I have to think that there is something seriously wrong in that setup. That's under-performing a dual Ice Lake 8380 setup (80 cores total, 160 threads) which achieves 74630 (https://www.tomshardware.com/news/amds-epyc-milan-breaks-cinebench-record-heres-a-10nm-ice-lake-xeon-comparison). Either Intel has drastically dropped the ball in some unprecedented way, or there's something hamstringing this setup behind the scenes in ways that we can't see.

Looking at some details, the Ice Lake processors should be operating at least at 2.3Ghz (their base Ghz config), so they have a 15% frequency advantage over the reported 2.0Ghz in task manager during the instant shown. Assuming that CB23 in this instance is frequency bound in this case (no stalls for memory access), a 15% improvement for SPR should give roughly 80,000 points total. That's an improvement for 2S throughput over Ice Lake, but at the expense of 40% more cores for about 8% more throughput. That still doesn't sound right, and, if the understood specs of the processor are correct, and it has a base frequency of 2.0Ghz, it isn't really going to be an improvement if SPR can't consistently boost above it's base frequency.

Looking at the bench, it seems to me that the issue here is likely that SPR is running at it's max power draw or is thermal limiting to it's base frequency (at least according to what I believe the UEFI is seeing). Looking at all of the UnCore that's on each of the SPR tiles, and the cost of the interconnect between the tiles, it is not unreasonable for the UnCore power draw to be much higher than Ice Lake. From past benchmarking, the 10nm ESF process advantage over 10nm+ was more apparent at higher frequencies, and should pay dividends on boosting performance, which we see in the desktop space. At the lower frequencies that servers operate at, this shouldn't be as pronounced, at least given my understanding. That being the case, the near idle power draw on SPR should be higher than Ice Lake for any given performance tier, and fully loaded performance like we see here should really exacerbate the above speculated issue. This should be more of an issue for the workstation space than the generic server space as I don't usually see servers with every core pegged at 100% usage constantly, which leads me to believe that, while we do see SPR underperforming here a bit, for actual server workloads, we may see SPR significantly outstrip Ice Lake in many areas given it's improved I/O, its likely better boosting behavior, the greater number of physical cores and threads available, and the larger amount of processor cache.

The disappointing part in the score is that Golden Cove is supposed to be a big update over Sunny Cove, so I would have expected a greater improvement in per-clock-cycle improvement than what we're seeing here...

Again, this is all just speculation on my part...
 

DrMrLordX

Lifer
Apr 27, 2000
19,972
8,955
136
The disappointing part in the score is that Golden Cove is supposed to be a big update over Sunny Cove, so I would have expected a greater improvement in per-clock-cycle improvement than what we're seeing here...
If the leakers could reduce thread count and use something like process lasso to restrict the application to one tile on one socket, it might prove to be enlightening.
 
  • Like
Reactions: igor_kavinski

nicalandia

Platinum Member
Jan 10, 2019
2,185
3,253
106
If the leakers could reduce thread count and use something like process lasso to restrict the application to one tile on one socket, it might prove to be enlightening.
They can't. not on Developer MB and Beta BIOS. Those features are hard locked on those. YuuKi_AnS have reported that to me multiple times after nagging him about SMT Off or just rinning One CPU per MB.

He is currently working on 4S SPR-SP.. I am waiting on Screenshots to update this group
 

DrMrLordX

Lifer
Apr 27, 2000
19,972
8,955
136
They can't. not on Developer MB and Beta BIOS. Those features are hard locked on those. YuuKi_AnS have reported that to me multiple times after nagging him about SMT Off or just rinning One CPU per MB.

He is currently working on 4S SPR-SP.. I am waiting on Screenshots to update this group
No I mean

Do it in software. Cinebench, if I recall, allows you to customize the number of threads it spawns. Then you just set affinity.

edit: so set it to spawn 14 threads, and then make sure they're all resident on one tile.
 
Last edited:

diediealldie

Member
May 9, 2020
63
55
61
(My random musings...)

I have to think that there is something seriously wrong in that setup. That's under-performing a dual Ice Lake 8380 setup (80 cores total, 160 threads) which achieves 74630 (https://www.tomshardware.com/news/amds-epyc-milan-breaks-cinebench-record-heres-a-10nm-ice-lake-xeon-comparison). Either Intel has drastically dropped the ball in some unprecedented way, or there's something hamstringing this setup behind the scenes in ways that we can't see.
Yeah. If SPR's really that underwhelming, they could have switched to all capacities to Ice Lake Xeons(10ESF is expensive) except 4S and 8S variants to replace old Skylakes.I think SPR will show much better performance, albeit it always missed window of oppurtunity(where SPR can be a king of performance before Genoa comes out) a bit.

Maybe thread scaling (performance scaling vs worker thread counts) can show us better picture? not sure
 

IntelUser2000

Elite Member
Oct 14, 2003
8,382
3,333
136
Intel's Ponte Vecchio has DP FP performance rating of 52TFlops. If they release they'll take leadership from both Hopper and MI200. It needs to as it's very complex to manufacture.
 

TESKATLIPOKA

Senior member
May 1, 2020
739
841
106
Hi, guys.
If anyone has i9-12900, then please test It in CBR23 from 1 thread up to 16 threads and upload the results here or in Zen4 thread. Many thanks.

P.S. Actually It can be up to 24 threads, so we can see how much SMT(HT) provides.
 

Timmah!

Golden Member
Jul 24, 2010
1,052
276
136
This 13900KS Does 42,000 points. Very HEDT
Meh, no CPU with just 8 "big" cores is HEDT, in my books, no amount of little cores on top of these, helping it to do well in CB, is going to change that. The score on par with 2 years old TRs is irrelevant. If CB supported GPU rendering, you could hypothetically run it on iGPU part and maybe do equally well (under assumption the iGPU is not too crappy) - you would not call it HEDT cause of that.
 

ASK THE COMMUNITY