Discussion [Tomshardware] EPYC Genoa and Radeon Instinct to Power Two-Exaflop DOE Supercomputer

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
EPYC win for AMD for two exaflops exascale supercomputer.
EPYC Genoa + Future Radeon Instinct
Zen4 + DDR5 + Radeon Instinct

Big win for ROCm and users of open source SW.


I hope this gives a big boost to ROCm.
As a Linux user I am really looking forward to ROCm support upstreamed for most frameworks and the community can benefit without having to reverse engineer their ass out to find bugs and root causes if something is not working in the SW.

(Intel is also a big OSS contributor so I hope they win some too)


Update 1
AT Link

However the most interesting claim is that these IF 3.0 device nodes will support unified memory across the CPU and GPU, which is something AMD doesn’t offer today.

I hope it means what it sounds.
That we can write code which treat memory across the GPU as how we treat memory shared with another thread/core on the CPU? Sounds like a dream come true.


Best bit for me

Scott said, “As part of this procurement, the Department of Energy has provided additional funds beyond the purchase of the machine to fund non-recurring engineering efforts and one major piece of that is to work closely with AMD on enhancing the programming environment for their new CPU-GPU architecture.” Work is ongoing by all three partners to take the critical applications and workloads forward and optimize them to get the best performance in the machine when El Capitan is delivered.


Update 2
It is time to start a new Zen 4 thread :)
 
Last edited:

prtskg

Senior member
Oct 26, 2015
261
94
101
Yeah, good win. ROCm is also receiving some funding. Honestly though, I was expecting IBM+Nvidia to win this one. One Intel, one AMD and one by IBM+Nvidia. Guess products from two different companies raises cost.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,792
136
Hmmm, interesting that it uses Genoa, i would have assumed it will be using Zen 5.
Does it mean Genoa will not come out until 2022 at the earliest?

My guess is Zen3 gets released 4Q this year, still on AM4 and then AMD gives a bit more time between generations to allow for the transition to AM5 (or whatever they call it) and the cooresponding server socket which means that Zen4 (Genoa) will be released late 1H 2022. Just a guess though.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Fairly incredible, even just looking at efficiency. Frontier 30MW for 1.5 EF, El Capitan "fairly substantially under" 40MW for 2 EF means that efficiency is expected to continue to increase substantially.

Hmmm, interesting that it uses Genoa, i would have assumed it will be using Zen 5.
Does it mean Genoa will not come out until 2022 at the earliest?
Frontier will likely be on Zen3 (expected release this year) but is being completed in 2021. However, close customers are getting Zen3 earlier.
Same with Zen4. LLNL could conceivably get Zen4 in 2021, but actually building this massive of a system requires time I'm guessing.
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
Why do you think it can't come in 2021? DDR5?

Usually the supercomputer gets the latest or even yet commercially unavailable tech available at the time of its build. It is just my assumption that sole latest EPYC product that AMD will have for 2022 is Zen 4 Genoa. If Zen 5 was available (or close to release), then it would have used that.

My guess is Zen3 gets released 4Q this year, still on AM4 and then AMD gives a bit more time between generations to allow for the transition to AM5 (or whatever they call it) and the cooresponding server socket which means that Zen4 (Genoa) will be released late 1H 2022. Just a guess though.

That sounds possible. Or we might get Milan refresh in 2021 and Genoa in 2H 2022.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
I think Zen5 would not leave much wiggle room should there be issues. Besides the system will come online in Early 2023.
I am imagining something like this, Zen 3 Q2-Q3 2020, Zen4 Q3-Q4 2021, Zen5 Q4 2022-Q1 2023. around 14-16 months as mentioned by Lisa in the Earnings call.

And to think of this, Mike Clark was starting work on Zen5 high level architecture in late 2018 already as mentioned in their video with John Taylor !!!
 

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Usually the supercomputer gets the latest or even yet commercially unavailable tech available at the time of its build. It is just my assumption that sole latest EPYC product that AMD will have for 2022 is Zen 4 Genoa. If Zen 5 was available (or close to release), then it would have used that.



That sounds possible. Or we might get Milan refresh in 2021 and Genoa in 2H 2022.
You have to look at time for implementation, CPU shipments and stuff like that. Not only that but its not like when AMD and Intel are working through ODM's. AMD will be working with Cray who will be then creating this and installing it. Don't know exact numbers but assume this is the 20k of AMD's bestest chips at the time. They could be shipping pre-release Zen 4. After the design is final but hold off shipping out seeds and stuff to other server clients. But 20k of the bestest fastest EPYC configurations would tremendously slow down retail CPU shipping.

My personal guess is this is a 2022 year long project and the server comes online well before 2023 and that 2023 is when the full system is finished.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
They left out just enough details to keep us from getting an idea of what Genoa is capable of. ;)

Regarding the launch date: it takes time to source the parts and put together a freakin super computer! 😉
 

Vattila

Senior member
Oct 22, 2004
799
1,351
136
Impressive!

So an El Capitan node will look something like this, perhaps, with socketed GPUs with HBM on interposer, and cache-coherent Infinity Fabric 3 interconnect on the motherboard:

9114301_784fe02041725d14929582ac2563fb00.png
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
EPYC win for AMD for two exaflops exascale supercomputer.
EPYC Genoa + Future Radeon Instinct
Zen4 + DDR5 + Radeon Instinct

Big win for ROCm and users of open source SW.


I hope this gives a big boost to ROCm.
As a Linux user I am really looking forward to ROCm support upstreamed for most frameworks and the community can benefit without having to reverse engineer their ass out to find bugs and root causes if something is not working in the SW.

(Intel is also a big OSS contributor so I hope they win some too)


Update 1
AT Link



I hope it means what it sounds.
That we can write code which treat memory across the GPU as how we treat memory shared with another thread/core on the CPU? Sounds like a dream come true.


Best bit for me




Update 2
It is time to start a new Zen 4 thread :)
I must be confused.... wasn't the whole point of Fusion/HSA a unified memory architecture?

Wasn't that already fully realised as of Volcanic Islands/GCN3?
 

uzzi38

Platinum Member
Oct 16, 2019
2,622
5,880
146
Impressive!

So an El Capitan node will look something like this, perhaps, with socketed GPUs with HBM on interposer, and cache-coherent Infinity Fabric 3 interconnect on the motherboard:

9114301_784fe02041725d14929582ac2563fb00.png
Maybe not.

Just a bit of a guess of mine, but I'd say there's a possibility both Frontier and El Capitan utilise SoIC to get both CPU and GPU on custom packages.
 
  • Like
Reactions: Tlh97 and Vattila

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
I must be confused.... wasn't the whole point of Fusion/HSA a unified memory architecture?

Wasn't that already fully realised as of Volcanic Islands/GCN3?

Alright Kenobi - says this in the ananad piece:

For Infinity Fabric 3.0, AMD is promising further improvements to inter-chip bandwidth and latency. However the most interesting claim is that these IF 3.0 device nodes will support unified memory across the CPU and GPU, which is something AMD doesn’t offer today. Indeed even Frontier is only slated to offer coherency between the processors which is a step below a true unified memory model.

I do realise the goal of fusion - circa 10 years ago now! - was unified memory architecture - and thought given the introduction of the High Bandwidth Cache Controller on Vega we were getting very close. But then they stepped back from that a bit.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Apparently this is not all there is to this deal.
There will be a smaller scale clone of the El Capitan which will be available for use to the general scientific community

There will be a “smaller clone” of El Capitan that does not have a name that will be available for the broader scientific community. This smaller system should be faster than the current #1 Summit supercomputer.


BTW, GENCI will be announcing an 'exascale class' system from Atos using AMD EPYC. Not sure about the GPU though.
 

soresu

Platinum Member
Dec 19, 2014
2,657
1,858
136
Interesting that one slide says "Next Generation High Bandwidth Memory".

Perhaps the HBM3 spec is finally nearing completion.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,542
14,496
136

thesmokingman

Platinum Member
May 6, 2010
2,307
231
106