News First diagram of AMD Trento node for first exascale supercomputer Frontier emerges

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,289
106
1632934895282.png

- Server CPU is Trento, Milan derivative
- IO die update that was promised surprisingly has a new memory controller - 8 channels of DDR5
- MI is new Mi200 datacenter GpGPU to be announced soon
- Green links are presumably PCI gen 4 x 16 each, 2x per card, which is new, IIRC, never before used
- Mi200 cards have their own mesh interconnect - Purple
- Green and Purple lines are presumably implementing Infinity Fabric Architecture 3.0
- each Mi200 card has its own PCIe x16 connection for external networking, to connect to PCIe switch
- additional PCIe x 4 connection for storage (which would add to 132 lanes?)

Credit to Hardware Times:
AMD Epyc Next-Gen Server CPU Layout Surfaces w/ 3rd Gen Infinity Fabric Link Architecture - Hardware Times
 

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,289
106
Maybe Hardware Times got this wrong and it is not Trento, but Genoa?

Some people say that Trento is still DDR4. AMD official deck does not specify DDR5, just DDR:
1632937027613.png

Edit: so probably just "DDR5" part was wrong, other things ok, so still Trento...

Maybe we will get more analysis from other Tech sites, including AnandTech.
 
Last edited:

zir_blazer

Golden Member
Jun 6, 2013
1,165
408
136
So is Trento supposed to be using Zen 3 Chiplets like Milan, but with a different IO die and perhaps Socket pinout? Was it discarded than it is the correct codename for Milan-X, or are they different products?

Also, the last image doesn't specify how many PCIe Lanes there are in total. The sole PCIe coming from the Processor may as well be the extra PCIe Lane added in Rome for BMC purposes. A 4 lane Port for Storage would be inadequated since you would then be left without BMC if putting all 128 lanes for GPUs, which was the issue with Naples (Or maybe you can bifurcate it to 2x for NVMe, 1 for BMC and 1 for something else like a NIC, which is still workable).
 
  • Like
Reactions: Krteq and Joe NYC

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,289
106
So is Trento supposed to be using Zen 3 Chiplets like Milan, but with a different IO die and perhaps Socket pinout? Was it discarded than it is the correct codename for Milan-X, or are they different products?

Also, the last image doesn't specify how many PCIe Lanes there are in total. The sole PCIe coming from the Processor may as well be the extra PCIe Lane added in Rome for BMC purposes. A 4 lane Port for Storage would be inadequated since you would then be left without BMC if putting all 128 lanes for GPUs, which was the issue with Naples (Or maybe you can bifurcate it to 2x for NVMe, 1 for BMC and 1 for something else like a NIC, which is still workable).

The assumption is that the socket and pinout is the same as Milan, and I/O die just has some optimization to allow Unified Memory Access. Not a wholesale change...

Milan and Milan socket support 128 lanes for single socket servers.

It could just be that there is some flexibility to re-assign individual lanes from the x16 slot, say 2 from each leaving 2x14 lanes to each card. Then the 2 x 2 x 4 = 16 unused, and re-assignable to chipset, M.2, network card etc. That could be another feature of the updated Trento IO die

Just speculating here.
 
  • Like
Reactions: Tlh97 and Saylick

zir_blazer

Golden Member
Jun 6, 2013
1,165
408
136
Milan and Milan socket support 128 lanes for single socket servers.
Single Milan has 129 PCIe Lanes, not 128. Rome introduced an extra auxiliary PCIe Lane intended for BMC, reason why Motherboards that made use of it are not compatible with Naples at all. Milan sticked to that.

And as far that I know, PCIe Lanes works in multiple of 2 so you can't just downgrade a 16 slot to have 14 usable lanes. But you can actually team up two 16x for a 32x. I think that stuff should be in the PCIe Specification, but I'm too lazy to check it.
 

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,289
106
And as far that I know, PCIe Lanes works in multiple of 2 so you can't just downgrade a 16 slot to have 14 usable lanes. But you can actually team up two 16x for a 32x. I think that stuff should be in the PCIe Specification, but I'm too lazy to check it.

AMD has been looking super hard on extending PCIe bandwidth in the server environment. From interview with Norrod (I think on Anandtech) he was talking about overclocking PCIe bus to get more bandwidth.

So to seems quite likely that having a dual slot card, that AMD using , that AMD would be thinking hard how to grab the bandwidth from the 2nd one.

If it means extending the PCIe functionality, which for now would only be be between Epyc server and Mi200 card, I could see AMD doing it.

So, it could be making PCIe grouping and teaming more granular...
 
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
Any information on the MI200? Seems crazy that it would need that much bandwidth, but I'm not really familiar with those products or what they need.
 

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,289
106
Any information on the MI200? Seems crazy that it would need that much bandwidth, but I'm not really familiar with those products or what they need.

It should be officially unveiled at the upcoming Supercomputing Conference in mid November, but there was an announcement from US Department of Energy that it is already being delivered for the Frontier Supercomputer.

Some (incomplete) specs here. Roughly 2x Mi100, but may have some changes. It has HBM 2e memory and coherent, uniform memory addressing, so it opens up more of the direct read and writes to GPU memory by CPU, which is probably why they expect it to be bandwidth hungry.
AMD Instinct MI200 with MCM Aldebaran GPU might feature 110 Compute Units - VideoCardz.com

BTW, my confusion about the form factor of Mi200 may be from the picture in the article, showing (likely) Mi100.
 
  • Like
Reactions: Tlh97

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,289
106
Here is another article from a Japanese web site. It seems that everyone gets something right, something wrong, something new.. Here is there diagram:

1633320156384.png

The article mentions that there is 128 PCIe4 lanes + 1, which agrees with what @zir_blazer posted, and each card then would have full 2 x 16 PCIe4 lanes.

This article calls them CCIX, which apparently stands for Cache Coherent Interconnect for Accelerators. This is apparently a technology that is similar but not same as CXL that Intel has been pushing.

Link to the article:
ASCII.jp: Frontier's Node Configurations Visible in AMD's Announcement (1/3)
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
This article calls them CCIX, which apparently stands for Cache Coherent Interconnect for Accelerators. This is apparently a technology that is similar but not same as CXL that Intel has been pushing.

CCIX (and Gen-Z for that matter) are long running widely supported consortium efforts we regularly discussed on this forum. CCIX was announced back in 2016:

Of course it was impossible for Intel, which missed the train initially, to join late and retain face, so they set up CXL in 2019 instead. In 2020 CXL and Gen-Z essentially combined their efforts.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Going forward, that's supposedly where their focus will be, but CCIX is FAR from dead. For ARM, Neoverse N2, X1, and the CMN-700 mesh include active support as of Q2 this year, Xilinx, who was purchased by AMD, is actively producing products with it. AMD had products in the pipeline that supported it. It's not going to simply disappear. However, with Intel backing CXL, that's definitely the way the industry will be moving, especially with AMD, NVidia, ARM and Xilinx all current members.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,663
136
I thought AMD had largely abandoned CCIX for CXL?
No, CXL just largely builds upon previous efforts but isn't old enough yet to affect projects currently at the stage of being realized. As CCIX exists for far longer there are plenty projects with plans to include cache coherency that were, are and will be using it. Frontier is just one of them.
 

DrMrLordX

Lifer
Apr 27, 2000
21,634
10,847
136
No, CXL just largely builds upon previous efforts but isn't old enough yet to affect projects currently at the stage of being realized. As CCIX exists for far longer there are plenty projects with plans to include cache coherency that were, are and will be using it. Frontier is just one of them.

So we'll see CCIX on existing and near-future builds involving EPYC, but eventually they'll move to CXL?