[Sorry for the bit OT now on the Zen3 thread]
Keep in mind that the key customers LLNL/ORNL are not so interested in TOPS, they want DPFP performance.
“Our workloads are primarily not deep learning models, although we are exploring something we call cognitive simulation, which brings deep learning and other AI models to bear on our workloads by evaluating how they can accelerate our simulations and how they can also improve their accuracy and find where they actually work,” explained de Supinski.
El Capitan, for example, is targeted to have 2 exaflops of
DPFP.
The El Capitan system will have in excess of 2 exaflops of peak double precision performance
As the steward of the nuclear weapon arsenal for the United States government, it is probably not an overstatement to say that Lawrence Livermore National Laboratory, one of the main supercomputer and scientific research facilities operated by the Department of Energy, is keenly interested in...
www.nextplatform.com
AMD is lucky to have won the two contracts for Frontier and El Capitan. It allows them a lot of Flexibility in designing CDNA. They have a captive market to deliver these products with the development paid for and the Software Development paid for to some extent. On top of that Scientists participating in any of the US establishments LLNL/ORNL etc will contribute actively to ROCm (stated in AMD's own page for Frontier/El Capitan).
The government researchers have made a complete roadmap for the replacement of CUDA with elements proposed by AMD but mainly centering around OpenMP. (But for the life of me I cannot find the link again)
That said...
I wouldn't assume that CDNA1 is going to be a trivial upgrade over MI60. I doubt that just scaling up TOPS would be such a big challenge for AMD.
Just doing packed int4 will make MI60 go above 100TOPS without doing anything, then consider more CUs. With MFMA they can chain multiple matrix operations in a single wave. If they can pack mixed precision in there too, the gain is really incredible.
The main kernel work for Arcturus has been centered around networking GPUs to achieve the first step in workload sharing, data coherency between the GPUs.
So I think that was always the main focus. 2nd Gen Infinity Architecture.