Discussion Intel current and future Lakes & Rapids thread

lobz · Jun 30, 2021

repoman27 said:
I think you're misinterpreting my tone.

That may have happened, absolutely. If yes, I'm sorry.

DrMrLordX · Jun 30, 2021

Exist50 said:
My point was simply that the claims of MTL in Q4'23-Q1'24 are, at this point, complete speculation, given Intel's last statement on the matter (and after the 7nm delay) is that they'd ship a year earlier than that.

When will MTL actually ship? Darned if I know. Probably Q1'23 at best. But I feel like if someone is going to assert a full year delay from the official timeline, they should at least justify that claim.

I think some of us are still feeling sore over Intel's infamous 10nm launch in 2017. Nobody wants Meteor Lake to be another Cannonlake, yet . . .

repoman27 said:
Then I got hammered for being too pro-Intel in an Intel thread. ¯\_(ツ)_/¯

Fair enough.

I know you still can't buy Ice Lake-SP. Honestly though, 90% of everything on Dell, HP, and Lenovo's sites right now is shipping Aug-Oct. It's frustrating.

Well problem is . . . like, if you tried buying IceLake-SP rackmounts two months ago, you still got an August delivery. There's really never been an earlier delivery date for IceLake-SP rackmounts from Dell, or anyone else that I can tell, unless you were a hyperscalar that got early shipments. Everything's delayed for various reasons, but 1-2 months delay on a Cascade Lake-SP system in March was April, not August. Does that make sense?

If you're not talking about the latter, then yes, Meteor Lake lead die PRQ will probably take place during H1'23, making Bob Swan's statement to investors 100% truthful.

Why am I reminded of BK talking about Cannonlake launching in 2017?

Gideon · Jun 30, 2021

https://twitter.com/x/status/1410158472235208709

So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800W😵

EDIT:

It obviously also depends on how much RAM the system has.

The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:

I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.

Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....

DrMrLordX · Jun 30, 2021

Gideon said:
So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800W😵

That doesn't seem right somehow.

Gideon · Jun 30, 2021

Gideon said:
I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.
Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.

BTW, I'm certainly not saying Sapphire Rapid will be useless against Milan, far from it. Stuff requiring PCIe 5.0, DDR5, HBM alone will win quite a few workloads hands down (even against V-cache when they scale to tens of GB range or above). It will also be the only game in town for workloads that use AVX-512 and AMX instructions extensively.

But in general purpose compute and virtual servers ... it will do quite badly if this info is accurate.

I mean, just look how well Ice-Lake fairs in the updated Milan Review Anandtech did:

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

The most interesting comparisons today were pitting the 24- and 16-core Milan parts against Intel’s newest 28-core Xeon 6330 based on the new Ice Lake SP microarchitecture. The AMD parts are also in the same price range to Intel’s chip, at $2010 and $1565 versus $1894. The 16-core chip actually mostly matches the performance of the 28-competitor in many workloads while still showcasing a large per-thread performance advantage, while the 24-core part, being 6% more expensive, more notably showcases both large total +26% throughput and large +47% per-thread performance leadership. Database workloads are admittedly still AMD’s weakness here, but in every other scenario, it’s clear which is the better value proposition.

uzzi38 · Jun 30, 2021

Gideon said:
https://twitter.com/x/status/1410158472235208709

So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800W😵

EDIT:

It obviously also depends on how much RAM the system has.

The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:

I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.

Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!

If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....

For that sustained throughput the clocks would have to be absurdly high. Like 3.5GHz.

Mopetar · Jun 30, 2021

Gideon said:
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.

Really that's about the best case scenario for Intel though. They have a few other areas where they stack up well or even win out aside from AVX, but in almost anything that scales well with cores the Epic system is going to leverage that massive core advantage that it has.

There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.

Gideon · Jun 30, 2021

Mopetar said:
There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.

I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.

And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)

What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.

NTMBK · Jun 30, 2021

Gideon said:
I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.

And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)

What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.

If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.

As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.

yuri69 · Jun 30, 2021

NTMBK said:
As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.

This. AMD open source initiatives lag horribly behind the actual product launches. Good luck going against Intel.

Abwx · Jun 30, 2021

NTMBK said:
If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.

As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.

Methink that you didnt understand that he s talking of implementing AVX512, eventually using two 256b FP units.

NTMBK · Jun 30, 2021

Abwx said:
Methink that you didnt understand that he s talking of implementing AVX512, eventually using two 256b FP units.

I was referring to his idea of having an "x86 SVE".

Exist50 · Jun 30, 2021

In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.

jpiniero · Jun 30, 2021

Exist50 said:
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.

Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.

Exist50 · Jun 30, 2021

jpiniero said:
Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.

For how many is that the case though? For all Intel includes it in marketing, very, very few real world workloads are decided by AVX-512.

More to the point, this is about long term strategy. Business that's lost to AMD can still be won back. But an ecosystem shift to ARM may well be irreversible.

IntelUser2000 · Jun 30, 2021

Gideon said:
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.

That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.

5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz

This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.

At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.

Bouowmx · Jun 30, 2021

To catch up: Alder Lake-S lead product is 8+8 coming this year. Coming next year is 6+0?

Exist50 · Jun 30, 2021

IntelUser2000 said:
That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.

5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz

This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.

At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.

Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.

IntelUser2000 · Jun 30, 2021

Exist50 said:
Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.

From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.

tamz_msc · Jul 1, 2021

IntelUser2000 said:
From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.

If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.

Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.

IntelUser2000 · Jul 1, 2021

tamz_msc said:
\Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.

Linpack is meant for general purpose supercomputers.

You forgot something in your calculation. Probably FMA. Let me show you:

52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops

Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors

jpiniero · Jul 1, 2021

Bouowmx said:
To catch up: Alder Lake-S lead product is 8+8 coming this year. Coming next year is 6+0?

There's a rumor that the 6+0 got cancelled in favor of more Rocket Lake. We'll have to see if that actually happens but you shouldn't count on it.

tamz_msc · Jul 1, 2021

IntelUser2000 said:
Linpack is meant for general purpose supercomputers.

You forgot something in your calculation. Probably FMA. Let me show you:

52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops

Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors

Oh yeah, you're right.

JoeRambo · Jul 1, 2021

If it really can sustain 5.8TF DP at 3.5ghz without dropping clocks, Intel did some good engineering there.

Gideon · Jul 1, 2021

Exist50 said:
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.

Agreed that would be best. The problem is Intel and AMD would then want to release processors with support at approximately the same time, thus even if such collaberation took place, the result would not bear fruit before Zen 5 or latter. I still remember when AMD made Bulldozer support FMA4 as they though that would be what Intel will adopt. Intel went with FMA3 instead to have parity with AMD, in the end AMD also deprecated FMA4 pretty quickly supporting FMA3 since Piledriver.

Still considering AMD went with 128 bit -> 256 bit with the transition from 12nm -> 7nm it would be prime time for another FP unit shakeup with Zen 4 during the shrink to 5nm. I would prefer if they added more 256 bit units though rather than creating a giant 512 bit one, but we shall see.

Discussion Intel current and future Lakes & Rapids thread

Platinum Member

Lifer

Platinum Member

Lifer

Platinum Member

Platinum Member

Diamond Member

Platinum Member

Lifer

Senior member

Lifer

Lifer

Platinum Member

Lifer

Platinum Member

Elite Member

Golden Member

Platinum Member

Elite Member

Diamond Member

Elite Member

Lifer

Diamond Member

Golden Member

Platinum Member