• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Discussion Intel current and future Lakes & Rapids thread

Page 470 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
17,657
6,651
136
My point was simply that the claims of MTL in Q4'23-Q1'24 are, at this point, complete speculation, given Intel's last statement on the matter (and after the 7nm delay) is that they'd ship a year earlier than that.

When will MTL actually ship? Darned if I know. Probably Q1'23 at best. But I feel like if someone is going to assert a full year delay from the official timeline, they should at least justify that claim.
I think some of us are still feeling sore over Intel's infamous 10nm launch in 2017. Nobody wants Meteor Lake to be another Cannonlake, yet . . .

Then I got hammered for being too pro-Intel in an Intel thread. ¯\_(ツ)_/¯
Fair enough.

I know you still can't buy Ice Lake-SP. Honestly though, 90% of everything on Dell, HP, and Lenovo's sites right now is shipping Aug-Oct. It's frustrating.
Well problem is . . . like, if you tried buying IceLake-SP rackmounts two months ago, you still got an August delivery. There's really never been an earlier delivery date for IceLake-SP rackmounts from Dell, or anyone else that I can tell, unless you were a hyperscalar that got early shipments. Everything's delayed for various reasons, but 1-2 months delay on a Cascade Lake-SP system in March was April, not August. Does that make sense?


If you're not talking about the latter, then yes, Meteor Lake lead die PRQ will probably take place during H1'23, making Bob Swan's statement to investors 100% truthful.
Why am I reminded of BK talking about Cannonlake launching in 2017?
 
  • Like
Reactions: Tlh97

Gideon

Golden Member
Nov 27, 2007
1,429
2,875
136

So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800Wo_O

EDIT:

It obviously also depends on how much RAM the system has.

The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:

I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.

Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,429
2,875
136
I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.
Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.

BTW, I'm certainly not saying Sapphire Rapid will be useless against Milan, far from it. Stuff requiring PCIe 5.0, DDR5, HBM alone will win quite a few workloads hands down (even against V-cache when they scale to tens of GB range or above). It will also be the only game in town for workloads that use AVX-512 and AMX instructions extensively.

But in general purpose compute and virtual servers ... it will do quite badly if this info is accurate.

I mean, just look how well Ice-Lake fairs in the updated Milan Review Anandtech did:


The most interesting comparisons today were pitting the 24- and 16-core Milan parts against Intel’s newest 28-core Xeon 6330 based on the new Ice Lake SP microarchitecture. The AMD parts are also in the same price range to Intel’s chip, at $2010 and $1565 versus $1894. The 16-core chip actually mostly matches the performance of the 28-competitor in many workloads while still showcasing a large per-thread performance advantage, while the 24-core part, being 6% more expensive, more notably showcases both large total +26% throughput and large +47% per-thread performance leadership. Database workloads are admittedly still AMD’s weakness here, but in every other scenario, it’s clear which is the better value proposition.
 
Last edited:

uzzi38

Golden Member
Oct 16, 2019
1,753
3,491
106

So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800Wo_O

EDIT:

It obviously also depends on how much RAM the system has.

The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:

I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.

Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....
For that sustained throughput the clocks would have to be absurdly high. Like 3.5GHz.
 
  • Like
Reactions: Tlh97 and Gideon

Mopetar

Diamond Member
Jan 31, 2011
6,116
2,937
136
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.
Really that's about the best case scenario for Intel though. They have a few other areas where they stack up well or even win out aside from AVX, but in almost anything that scales well with cores the Epic system is going to leverage that massive core advantage that it has.

There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.
 

Gideon

Golden Member
Nov 27, 2007
1,429
2,875
136
There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.
I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.

And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)

What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.
 

NTMBK

Diamond Member
Nov 14, 2011
9,359
2,813
136
I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.

And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)

What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.
If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.

As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
 

yuri69

Member
Jul 16, 2013
136
161
116
As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
This. AMD open source initiatives lag horribly behind the actual product launches. Good luck going against Intel.
 

Abwx

Diamond Member
Apr 2, 2011
9,230
1,111
126
If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.

As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
Methink that you didnt understand that he s talking of implementing AVX512, eventually using two 256b FP units.
 

Exist50

Senior member
Aug 18, 2016
319
341
136
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.
 

jpiniero

Diamond Member
Oct 1, 2010
9,923
2,270
136
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.
Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.
 
  • Like
Reactions: NTMBK

Exist50

Senior member
Aug 18, 2016
319
341
136
Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.
For how many is that the case though? For all Intel includes it in marketing, very, very few real world workloads are decided by AVX-512.

More to the point, this is about long term strategy. Business that's lost to AMD can still be won back. But an ecosystem shift to ARM may well be irreversible.
 
  • Like
Reactions: spursindonesia

IntelUser2000

Elite Member
Oct 14, 2003
7,586
2,437
136
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.
That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.

5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz

This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.

At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.
 
  • Like
Reactions: lightmanek

Exist50

Senior member
Aug 18, 2016
319
341
136
That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.

5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz

This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.

At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.
Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.
 

IntelUser2000

Elite Member
Oct 14, 2003
7,586
2,437
136
Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.
From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,896
2,628
136
From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.
Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.
 

IntelUser2000

Elite Member
Oct 14, 2003
7,586
2,437
136
\Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.
Linpack is meant for general purpose supercomputers.

You forgot something in your calculation. Probably FMA. Let me show you:

52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops

Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors
 

jpiniero

Diamond Member
Oct 1, 2010
9,923
2,270
136
To catch up: Alder Lake-S lead product is 8+8 coming this year. Coming next year is 6+0?
There's a rumor that the 6+0 got cancelled in favor of more Rocket Lake. We'll have to see if that actually happens but you shouldn't count on it.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,896
2,628
136
Linpack is meant for general purpose supercomputers.

You forgot something in your calculation. Probably FMA. Let me show you:

52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops

Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors
Oh yeah, you're right.
 
  • Like
Reactions: IntelUser2000

Gideon

Golden Member
Nov 27, 2007
1,429
2,875
136
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.
Agreed that would be best. The problem is Intel and AMD would then want to release processors with support at approximately the same time, thus even if such collaberation took place, the result would not bear fruit before Zen 5 or latter. I still remember when AMD made Bulldozer support FMA4 as they though that would be what Intel will adopt. Intel went with FMA3 instead to have parity with AMD, in the end AMD also deprecated FMA4 pretty quickly supporting FMA3 since Piledriver.

Still considering AMD went with 128 bit -> 256 bit with the transition from 12nm -> 7nm it would be prime time for another FP unit shakeup with Zen 4 during the shrink to 5nm. I would prefer if they added more 256 bit units though rather than creating a giant 512 bit one, but we shall see.
 
  • Like
Reactions: scineram and Tlh97

lobz

Golden Member
Feb 10, 2017
1,767
2,284
136
Well, Linpack is not GeekBench. Which means, SPR could very well end up being the real deal, I just wish Intel gets rid of at least SOME of the artificial segmentation.

@Zucker2k please close your eyes and pretend this comment doesn't exist. I don't want your entire world view to collapse.
 

ASK THE COMMUNITY

TRENDING THREADS