Discussion Intel current and future Lakes & Rapids thread

Page 470 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
21,629
10,841
136
My point was simply that the claims of MTL in Q4'23-Q1'24 are, at this point, complete speculation, given Intel's last statement on the matter (and after the 7nm delay) is that they'd ship a year earlier than that.

When will MTL actually ship? Darned if I know. Probably Q1'23 at best. But I feel like if someone is going to assert a full year delay from the official timeline, they should at least justify that claim.

I think some of us are still feeling sore over Intel's infamous 10nm launch in 2017. Nobody wants Meteor Lake to be another Cannonlake, yet . . .

Then I got hammered for being too pro-Intel in an Intel thread. ¯\_(ツ)_/¯

Fair enough.

I know you still can't buy Ice Lake-SP. Honestly though, 90% of everything on Dell, HP, and Lenovo's sites right now is shipping Aug-Oct. It's frustrating.

Well problem is . . . like, if you tried buying IceLake-SP rackmounts two months ago, you still got an August delivery. There's really never been an earlier delivery date for IceLake-SP rackmounts from Dell, or anyone else that I can tell, unless you were a hyperscalar that got early shipments. Everything's delayed for various reasons, but 1-2 months delay on a Cascade Lake-SP system in March was April, not August. Does that make sense?


If you're not talking about the latter, then yes, Meteor Lake lead die PRQ will probably take place during H1'23, making Bob Swan's statement to investors 100% truthful.

Why am I reminded of BK talking about Cannonlake launching in 2017?
 
  • Like
Reactions: Tlh97

Gideon

Golden Member
Nov 27, 2007
1,637
3,672
136

So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800Wo_O

EDIT:

It obviously also depends on how much RAM the system has.

The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:

I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.

Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....
 
Last edited:

Gideon

Golden Member
Nov 27, 2007
1,637
3,672
136
I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.
Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.

BTW, I'm certainly not saying Sapphire Rapid will be useless against Milan, far from it. Stuff requiring PCIe 5.0, DDR5, HBM alone will win quite a few workloads hands down (even against V-cache when they scale to tens of GB range or above). It will also be the only game in town for workloads that use AVX-512 and AMX instructions extensively.

But in general purpose compute and virtual servers ... it will do quite badly if this info is accurate.

I mean, just look how well Ice-Lake fairs in the updated Milan Review Anandtech did:


The most interesting comparisons today were pitting the 24- and 16-core Milan parts against Intel’s newest 28-core Xeon 6330 based on the new Ice Lake SP microarchitecture. The AMD parts are also in the same price range to Intel’s chip, at $2010 and $1565 versus $1894. The 16-core chip actually mostly matches the performance of the 28-competitor in many workloads while still showcasing a large per-thread performance advantage, while the 24-core part, being 6% more expensive, more notably showcases both large total +26% throughput and large +47% per-thread performance leadership. Database workloads are admittedly still AMD’s weakness here, but in every other scenario, it’s clear which is the better value proposition.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,629
5,935
146

So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800Wo_O

EDIT:

It obviously also depends on how much RAM the system has.

The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:

I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.

Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....
For that sustained throughput the clocks would have to be absurdly high. Like 3.5GHz.
 
  • Like
Reactions: Tlh97 and Gideon

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.

Really that's about the best case scenario for Intel though. They have a few other areas where they stack up well or even win out aside from AVX, but in almost anything that scales well with cores the Epic system is going to leverage that massive core advantage that it has.

There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.
 

Gideon

Golden Member
Nov 27, 2007
1,637
3,672
136
There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.

I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.

And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)

What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.
 

NTMBK

Lifer
Nov 14, 2011
10,237
5,019
136
I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.

And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)

What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.

If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.

As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
 

yuri69

Senior member
Jul 16, 2013
388
619
136
As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
This. AMD open source initiatives lag horribly behind the actual product launches. Good luck going against Intel.
 

Abwx

Lifer
Apr 2, 2011
10,947
3,457
136
If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.

As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.

Methink that you didnt understand that he s talking of implementing AVX512, eventually using two 256b FP units.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.
 

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.

Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.
 
  • Like
Reactions: NTMBK

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.

For how many is that the case though? For all Intel includes it in marketing, very, very few real world workloads are decided by AVX-512.

More to the point, this is about long term strategy. Business that's lost to AMD can still be won back. But an ecosystem shift to ARM may well be irreversible.
 
  • Like
Reactions: spursindonesia

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.

5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz

This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.

At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.
 
  • Like
Reactions: lightmanek

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.

5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz

This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.

At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.

Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.

From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.
Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
\Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.

Linpack is meant for general purpose supercomputers.

You forgot something in your calculation. Probably FMA. Let me show you:

52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops

Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors
 

jpiniero

Lifer
Oct 1, 2010
14,591
5,214
136
To catch up: Alder Lake-S lead product is 8+8 coming this year. Coming next year is 6+0?

There's a rumor that the 6+0 got cancelled in favor of more Rocket Lake. We'll have to see if that actually happens but you shouldn't count on it.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,795
3,626
136
Linpack is meant for general purpose supercomputers.

You forgot something in your calculation. Probably FMA. Let me show you:

52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops

Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors
Oh yeah, you're right.
 
  • Like
Reactions: IntelUser2000