Question HBM Genoa

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

LightningZ71

Platinum Member
Mar 10, 2017
2,317
2,909
136
Who could forget Kid Icky?!?!

I do think that Nintendo does do a better job of making sure that their first part IP products aren't hot garbage on release. That's more than most of their competition these days.
 

StefanR5R

Elite Member
Dec 10, 2016
6,554
10,305
136
While this thread was meant to be about MI300C specifically, here is a closely related article:


(How MI300A's Infinity Fabric is structured, and latency and bandwidth measurements. Edit, also touches on topics such as: pros and cons of memory-side cache compared to cache in core complexes; why Genoa crams many cores into dualsocket nodes rather than scaling up to quadsocket nodes; the noisy neighbor problem; SPEC 1T performance of Zen 4 in MI300A compared with Zen 4 desktop and Zen 2 desktop; MI300A's CPU–GPU memory sharing compared to some desktop and mobile implementations; Infinity Fabric as a tool to manage hardware design development complexity…)

Chester Lam said:
The Radeon Instinct MI300A’s memory subsystem may not be kind to its Zen 4 cores from a latency perspective. From the bandwidth side though, it’s an all-you-can-eat buffet where Infinity Fabric links between each CCD and the rest of the system is your plate. […] Unlike desktop Zen 4, hitting Infinity Fabric or DRAM bandwidth limits with the CPU cores is simply impossible.
Hmm, doesn't this make MI300C a somewhat unbalanced product?
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
6,554
10,305
136
Chester Lam said:
Unlike desktop Zen 4, hitting Infinity Fabric or DRAM bandwidth limits with the CPU cores is simply impossible.
Hmm, doesn't this make MI300C a somewhat unbalanced product?
AFAIU, the SERDES based interface between CCD and IOD was improved somewhat in Turin over Genoa, but not substantially. In MI300*A*, CPU bandwidth tests show the CCD 2 IOD interface to be the bottleneck before IOD x IOD IF and memory bandwidth. That's natural though for such a configuration in which the more bandwidth hungry part are the GPUs, not the CPUs. Yet I was wondering how well the raw memory bandwidth can be translated into actual performance in MI300*C*, given the CCD 2 IOD IF limitation (Genoa-style GMI wide with 2×32 B/cycle × bidirectional).

But I should have simply looked up the figures which were published so far:
– The aggregated HBM3 bandwidth on MI300A is 5.3 TB/s, according to Chips and Cheese.
– AMD claim MI300C's performance in STREAM Triad to be 6.9 TB/s even.
– Chips and Cheese measured MI300A's per-CCD performance with 71.5 GB/s read and 60.7 GB/s write with their own microbenchmarking software.
– If the 12 CCDs of MI300C performed the same, that would be 858 GB/s read and 728 GB/s write.
This doesn't make sense to me. What am I missing?
Or are the 5.3 and 6.9 TB/s for four sockets together, not for a single MI300?

Edit: Looking back at Microsoft's announcement of Azure HBv5 virtual machines, the 6.9 TB/s appear to be the sum of four sockets indeed. If so, this would match well with Chips and Cheese's MI300A measurements.

Sounds like Zen5 would love the HBM hookup even more than Zen4.
A hypothetical Zen 5 based MI300A and/or MI300C successor (with Zen 5's considerably increased vector arithmetic execution width over Zen 4) would apparently profit from some sort of upgrade of the CCD 2 IOD IF. The existing Zen 5 CCD may not be prepared for such an upgrade.
 
Last edited: