Gideon
Golden Member
- Nov 27, 2007
- 1,637
- 3,673
- 136
If @Kepler_L2 is to be believed all B/H series Alder Lake boards will be DDR4 only, even some Z690 boards. This is IMO a good move considering the scarcity of DDR5
That may have happened, absolutely. If yes, I'm sorry.I think you're misinterpreting my tone.
My point was simply that the claims of MTL in Q4'23-Q1'24 are, at this point, complete speculation, given Intel's last statement on the matter (and after the 7nm delay) is that they'd ship a year earlier than that.
When will MTL actually ship? Darned if I know. Probably Q1'23 at best. But I feel like if someone is going to assert a full year delay from the official timeline, they should at least justify that claim.
Then I got hammered for being too pro-Intel in an Intel thread. ¯\_(ツ)_/¯
I know you still can't buy Ice Lake-SP. Honestly though, 90% of everything on Dell, HP, and Lenovo's sites right now is shipping Aug-Oct. It's frustrating.
If you're not talking about the latter, then yes, Meteor Lake lead die PRQ will probably take place during H1'23, making Bob Swan's statement to investors 100% truthful.
So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800W
I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.
Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!
The most interesting comparisons today were pitting the 24- and 16-core Milan parts against Intel’s newest 28-core Xeon 6330 based on the new Ice Lake SP microarchitecture. The AMD parts are also in the same price range to Intel’s chip, at $2010 and $1565 versus $1894. The 16-core chip actually mostly matches the performance of the 28-competitor in many workloads while still showcasing a large per-thread performance advantage, while the 24-core part, being 6% more expensive, more notably showcases both large total +26% throughput and large +47% per-thread performance leadership. Database workloads are admittedly still AMD’s weakness here, but in every other scenario, it’s clear which is the better value proposition.
For that sustained throughput the clocks would have to be absurdly high. Like 3.5GHz.
So according to this Sapphire Rapid has 26 cores per socket at about 2.7Ghz. The power consumption of a server with 2 of these inside, including DDR5 and all the rest is ... 800W
EDIT:
It obviously also depends on how much RAM the system has.
The following comparison is obviously apples to oranges but IMO significant enough that I just can't not make it:
I can order a dual-socket Dell PowerEdge 7525 with 2x Epyc 7713 (225W) with 128 cores (instead of 52 though yeah, only with 2 GHz base clock), fit it with up to and including 14 x 64GB 3200MT/s dimms, before it starts to complain that the 800W PSU is not enough.
Bear in mind the PSU definitely has a safety margin, so even with 16 such dimms it absolutely is drawing less than 800W!
If this is anywhere near accurate and goes aganst 5nm 96 core Genoa (and the 128 Core successor) ....
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.
The again it's AVX2 vs AVX-512, and Sapphire Rapid won't be facing Milan most of it's product cycle.
There were some rumors than Zen4 would have AVX-512 and even if anyone would say there's reason to doubt that, I think it's also fair to think that AMD would have an easier time adding AVX-512 (even if it isn't as good as Intel's implementation) to their chips than Intel will have matching AMD on core count.
I'm quite confident AMD will add new vector extensions in Zen4 even if the units thenselves will remain 256bit wide. They have supported AVX since Carrizo (2015) after all.
And while I do hope they'll support AVX512 (at least up to 256bit wide instructions since there are many good new ones, but hopefully 512bit wth microcode as well)
What i really hope for is that they are brave enough to introduce a x86 SVE2 analog (calling it FPV or whatever). AVX-512, while offering many necessary updates to AVX2, is a an insanity compared to SVE IMHO. And the latter will produce binaries that will "just work" with future processors, be their vector units 1024 or 2048 bit wide.
This. AMD open source initiatives lag horribly behind the actual product launches. Good luck going against Intel.As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
If AMD were going to try to push their own vector ISA, they should have done it on Zen 2 and put it in both next gen consoles. That would have put adoption through the roof.
As it is, I don't think AMD has the ecosystem they need to make a move like that. They don't have Intel's big software teams to help optimise libraries, add compiler support, and so on.
Methink that you didnt understand that he s talking of implementing AVX512, eventually using two 256b FP units.
In the vein of the "x86 SVE" topic, I do wish Intel (and to a lesser degree, AMD) would approach ISA more collaboratively. It made sense to pursue exclusivity when it was just Intel vs AMD, but now they should really start looking at it as x86 vs ARM. Especially for Intel, now that AMD has enough of a presence back that Intel can't make extensions ubiquitous by themselves.
Given that AVX-512 is in many cases the sole reason a customer bought Xeon over something else, I doubt it.
To be fair that same 2 x 7713 can "only" do about 4 TF/s in linpack, though the 2x 7763 (280W) can do 5TF/s, which I'm sure will still draw well below 800W while running linpack.
That's theoretical throughput. So if this SPR leak is actual throughput, there's a potential 20% gain to be had, assuming 85%. The clock will have to exceed 4GHz to get that kind of Linpack flops.
5.8TFlops using 52 cores:
-Theoretical: 3.5GHz
-95%: 3.68GHz
-90%: 3.89GHz
-85%: 4.11GHz
This could be an ultra high clocked part with low amount of cores. In itself it doesn't tell much about Sapphire Rapids.
At 4GHz, under a matrix multiplication load fully using the double AVX-512 FMA units, it exceeds the peak Turbo frequency of any Icelake-SP chip out there. In fact, only the super high frequency parts on the Cascade Lake generation goes over 4GHz.
Those numbers are using AVX, right? Could linpack not just be leveraging AMX/TMUL? Would explain the high scores.
Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.From their website:
What numerical precision is required to run and benchmark and gain an entry in the Linpack Benchmark report?
In order to have an entry included in the Linpack Benchmark report the results must be computed using full precision. By full precision we generally mean 64 bit floating point arithmetic or higher. Note that this is not an issue of single or double precision as some systems have 64-bit floating point arithmetic as single precision. It is a function of the arithmetic used.
If it was AMX or something else that result wouldn't be bad, it would be considered atrocious. I understand they screwed up in the past few years but there has to be some realism in coming to conclusions.
\Linpack would be a prime candidate for acceleration through AMX/TMUL though. Unless Intel added two more 512-bit wide units(for a total of four) compared to two in Sky/Cascade/Ice Lake, I see no other way they can achieve 5.8 TFLOP/s.
To catch up: Alder Lake-S lead product is 8+8 coming this year. Coming next year is 6+0?
Oh yeah, you're right.Linpack is meant for general purpose supercomputers.
You forgot something in your calculation. Probably FMA. Let me show you:
52 cores x 3.5GHz x 8 DP Flops/FPU(512/64) x 2 FPUs x FMA = 5824GFlops
Core 2 = 128-bit vectors
Sandy Bridge = 256-bit vectors
Haswell = FMA
Skylake = 512-bit vectors