Chester Lam said:
Unlike desktop Zen 4, hitting Infinity Fabric or DRAM bandwidth limits with the CPU cores is simply impossible.
Hmm, doesn't this make MI300C a somewhat unbalanced product?
AFAIU, the SERDES based interface between CCD and IOD was improved somewhat in Turin over Genoa, but not substantially. In MI300*A*, CPU bandwidth tests show the CCD 2 IOD interface to be the bottleneck before IOD x IOD IF and memory bandwidth. That's natural though for such a configuration in which the more bandwidth hungry part are the GPUs, not the CPUs. Yet I was wondering how well the raw memory bandwidth can be translated into actual performance in MI300*C*, given the CCD 2 IOD IF limitation (Genoa-style GMI wide with 2×32 B/cycle × bidirectional).
But I should have simply looked up the figures which were published so far:
– The aggregated HBM3 bandwidth on MI300A is 5.3 TB/s, according to Chips and Cheese.
– AMD claim MI300C's performance in STREAM Triad to be 6.9 TB/s even.
– Chips and Cheese measured MI300A's per-CCD performance with 71.5 GB/s read and 60.7 GB/s write with their own microbenchmarking software.
– If the 12 CCDs of MI300C performed the same, that would be 858 GB/s read and 728 GB/s write.
This doesn't make sense to me. What am I missing?
Or are the 5.3 and 6.9 TB/s for four sockets together, not for a single MI300?
Edit: Looking back at Microsoft's announcement of Azure HBv5 virtual machines, the 6.9 TB/s appear to be the sum of four sockets indeed. If so, this would match well with Chips and Cheese's MI300A measurements.
Sounds like Zen5 would love the HBM hookup even more than Zen4.
A hypothetical Zen 5 based MI300A and/or MI300C successor (with Zen 5's considerably increased vector arithmetic execution width over Zen 4) would apparently profit from some sort of upgrade of the CCD 2 IOD IF. The existing Zen 5 CCD may not be prepared for such an upgrade.