Discussion [Tomshardware] EPYC Genoa and Radeon Instinct to Power Two-Exaflop DOE Supercomputer

DisEnchantment

Senior member
Mar 3, 2017
531
940
106
EPYC win for AMD for two exaflops exascale supercomputer.
EPYC Genoa + Future Radeon Instinct
Zen4 + DDR5 + Radeon Instinct

Big win for ROCm and users of open source SW.


I hope this gives a big boost to ROCm.
As a Linux user I am really looking forward to ROCm support upstreamed for most frameworks and the community can benefit without having to reverse engineer their ass out to find bugs and root causes if something is not working in the SW.

(Intel is also a big OSS contributor so I hope they win some too)


Update 1
AT Link

However the most interesting claim is that these IF 3.0 device nodes will support unified memory across the CPU and GPU, which is something AMD doesn’t offer today.
I hope it means what it sounds.
That we can write code which treat memory across the GPU as how we treat memory shared with another thread/core on the CPU? Sounds like a dream come true.


Best bit for me

Scott said, “As part of this procurement, the Department of Energy has provided additional funds beyond the purchase of the machine to fund non-recurring engineering efforts and one major piece of that is to work closely with AMD on enhancing the programming environment for their new CPU-GPU architecture.” Work is ongoing by all three partners to take the critical applications and workloads forward and optimize them to get the best performance in the machine when El Capitan is delivered.

Update 2
It is time to start a new Zen 4 thread :)
 
Last edited:

JasonLD

Senior member
Aug 22, 2017
221
162
86
Hmmm, interesting that it uses Genoa, i would have assumed it will be using Zen 5.
Does it mean Genoa will not come out until 2022 at the earliest?
 
  • Like
Reactions: lightmanek

prtskg

Senior member
Oct 26, 2015
246
83
101
Yeah, good win. ROCm is also receiving some funding. Honestly though, I was expecting IBM+Nvidia to win this one. One Intel, one AMD and one by IBM+Nvidia. Guess products from two different companies raises cost.
 

Hitman928

Platinum Member
Apr 15, 2012
2,541
1,712
136
Hmmm, interesting that it uses Genoa, i would have assumed it will be using Zen 5.
Does it mean Genoa will not come out until 2022 at the earliest?
My guess is Zen3 gets released 4Q this year, still on AM4 and then AMD gives a bit more time between generations to allow for the transition to AM5 (or whatever they call it) and the cooresponding server socket which means that Zen4 (Genoa) will be released late 1H 2022. Just a guess though.
 

amrnuke

Senior member
Apr 24, 2019
838
1,015
96
Fairly incredible, even just looking at efficiency. Frontier 30MW for 1.5 EF, El Capitan "fairly substantially under" 40MW for 2 EF means that efficiency is expected to continue to increase substantially.

Hmmm, interesting that it uses Genoa, i would have assumed it will be using Zen 5.
Does it mean Genoa will not come out until 2022 at the earliest?
Frontier will likely be on Zen3 (expected release this year) but is being completed in 2021. However, close customers are getting Zen3 earlier.
Same with Zen4. LLNL could conceivably get Zen4 in 2021, but actually building this massive of a system requires time I'm guessing.
 
  • Like
Reactions: lightmanek

JasonLD

Senior member
Aug 22, 2017
221
162
86
Why do you think it can't come in 2021? DDR5?
Usually the supercomputer gets the latest or even yet commercially unavailable tech available at the time of its build. It is just my assumption that sole latest EPYC product that AMD will have for 2022 is Zen 4 Genoa. If Zen 5 was available (or close to release), then it would have used that.

My guess is Zen3 gets released 4Q this year, still on AM4 and then AMD gives a bit more time between generations to allow for the transition to AM5 (or whatever they call it) and the cooresponding server socket which means that Zen4 (Genoa) will be released late 1H 2022. Just a guess though.
That sounds possible. Or we might get Milan refresh in 2021 and Genoa in 2H 2022.
 

DisEnchantment

Senior member
Mar 3, 2017
531
940
106
I think Zen5 would not leave much wiggle room should there be issues. Besides the system will come online in Early 2023.
I am imagining something like this, Zen 3 Q2-Q3 2020, Zen4 Q3-Q4 2021, Zen5 Q4 2022-Q1 2023. around 14-16 months as mentioned by Lisa in the Earnings call.

And to think of this, Mike Clark was starting work on Zen5 high level architecture in late 2018 already as mentioned in their video with John Taylor !!!
 
  • Like
Reactions: spursindonesia

Topweasel

Diamond Member
Oct 19, 2000
5,326
1,521
136
Usually the supercomputer gets the latest or even yet commercially unavailable tech available at the time of its build. It is just my assumption that sole latest EPYC product that AMD will have for 2022 is Zen 4 Genoa. If Zen 5 was available (or close to release), then it would have used that.



That sounds possible. Or we might get Milan refresh in 2021 and Genoa in 2H 2022.
You have to look at time for implementation, CPU shipments and stuff like that. Not only that but its not like when AMD and Intel are working through ODM's. AMD will be working with Cray who will be then creating this and installing it. Don't know exact numbers but assume this is the 20k of AMD's bestest chips at the time. They could be shipping pre-release Zen 4. After the design is final but hold off shipping out seeds and stuff to other server clients. But 20k of the bestest fastest EPYC configurations would tremendously slow down retail CPU shipping.

My personal guess is this is a 2022 year long project and the server comes online well before 2023 and that 2023 is when the full system is finished.
 

eek2121

Senior member
Aug 2, 2005
439
316
136
They left out just enough details to keep us from getting an idea of what Genoa is capable of. ;)

Regarding the launch date: it takes time to source the parts and put together a freakin super computer! 😉
 
  • Like
Reactions: lightmanek

jpiniero

Diamond Member
Oct 1, 2010
7,816
1,126
126
What would be nice would be for them to put GPU chiplets on the package. And an active interposer...
 

soresu

Golden Member
Dec 19, 2014
1,225
464
136
EPYC win for AMD for two exaflops exascale supercomputer.
EPYC Genoa + Future Radeon Instinct
Zen4 + DDR5 + Radeon Instinct

Big win for ROCm and users of open source SW.


I hope this gives a big boost to ROCm.
As a Linux user I am really looking forward to ROCm support upstreamed for most frameworks and the community can benefit without having to reverse engineer their ass out to find bugs and root causes if something is not working in the SW.

(Intel is also a big OSS contributor so I hope they win some too)


Update 1
AT Link



I hope it means what it sounds.
That we can write code which treat memory across the GPU as how we treat memory shared with another thread/core on the CPU? Sounds like a dream come true.


Best bit for me




Update 2
It is time to start a new Zen 4 thread :)
I must be confused.... wasn't the whole point of Fusion/HSA a unified memory architecture?

Wasn't that already fully realised as of Volcanic Islands/GCN3?
 

uzzi38

Senior member
Oct 16, 2019
815
963
96
Impressive!

So an El Capitan node will look something like this, perhaps, with socketed GPUs with HBM on interposer, and cache-coherent Infinity Fabric 3 interconnect on the motherboard:

Maybe not.

Just a bit of a guess of mine, but I'd say there's a possibility both Frontier and El Capitan utilise SoIC to get both CPU and GPU on custom packages.
 
  • Like
Reactions: Vattila

Atari2600

Golden Member
Nov 22, 2016
1,182
1,263
106
I must be confused.... wasn't the whole point of Fusion/HSA a unified memory architecture?

Wasn't that already fully realised as of Volcanic Islands/GCN3?
Alright Kenobi - says this in the ananad piece:

For Infinity Fabric 3.0, AMD is promising further improvements to inter-chip bandwidth and latency. However the most interesting claim is that these IF 3.0 device nodes will support unified memory across the CPU and GPU, which is something AMD doesn’t offer today. Indeed even Frontier is only slated to offer coherency between the processors which is a step below a true unified memory model.
I do realise the goal of fusion - circa 10 years ago now! - was unified memory architecture - and thought given the introduction of the High Bandwidth Cache Controller on Vega we were getting very close. But then they stepped back from that a bit.
 

DisEnchantment

Senior member
Mar 3, 2017
531
940
106
Apparently this is not all there is to this deal.
There will be a smaller scale clone of the El Capitan which will be available for use to the general scientific community

There will be a “smaller clone” of El Capitan that does not have a name that will be available for the broader scientific community. This smaller system should be faster than the current #1 Summit supercomputer.

BTW, GENCI will be announcing an 'exascale class' system from Atos using AMD EPYC. Not sure about the GPU though.
 

soresu

Golden Member
Dec 19, 2014
1,225
464
136
Interesting that one slide says "Next Generation High Bandwidth Memory".

Perhaps the HBM3 spec is finally nearing completion.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
20,276
7,922
136

ASK THE COMMUNITY