• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion CDNA 6 / Instinct MI500 series thread

1000x vs MI300x

is it 10x?

i really don't know how to interpret this:

AMD shared additional details at CES on the next-generation AMD Instinct MI500 GPUs, planned to launch in 2027. The MI500 Series is on track to deliver up to a 1,000x increase in AI performance compared to the AMD Instinct MI300X GPUs introduced in 20231


"Based on engineering projections by AMD Performance Labs in December 2025, to estimate the peak theoretical precision performance of AMD Instinct™ MI500 Series GPU powered AI Rack vs. an AMD Instinct MI300X platform. Results subject to change when products are released in market."

is 1000x Helios vs single "platform" of 8 mi300x??? Is 10x vs. mi400 single gpu or "helios" rack vs mi500 "titan" rack (with 3.5x number of gpus)

????
 
for reference

AMD CES 2026 live blog

AMD CES 2026 Keynote Live Coverage​

By
Ryan Smith
-
January 5, 2026

 
news coverage


AMD unwraps Instinct MI500 boasting 1,000X more performance versus MI300X — setting the stage for the era of YottaFLOPS data centers​

News
By Anton Shilov published 16 hours ago
Next-generation CDNA 6 architecture on-track for 2027.

AMD's Instinct MI500X-series accelerators are set to be based on the CDNA 6 architecture (no UDNA yet?) with their compute chiplets made on one of TSMC's N2-series fabrication process (2nm-class). AMD says that its Instinct MI500X GPUs will offer up to 1,000 times higher AI performance compared to the Instinct MI300X accelerator from late 2023, but does not exactly define comparison metrics.


The demands of AI data centers compute capability are set to increase dramatically from around 100 ZettaFLOPS today to around 10+ YottaFLOPS* in the next five years (approximately by about 100 times), according to AMD.

*One YottaFLOPS equals to 1,000 ZettaFLOPS, or one million ExaFLOPS


 
AFAICT David Wang either didn't properly explain it to the interviewer, or he misinterpreted something said during an RTG meeting with the higher ups and AMD simply didn't refute it, because confusion about their roadmap goes in their favor against competitors as much as it keeps us REEEEEEing at them 😅

UDNA was never really anything more than a long term strategy to keep:

#1. Latest CDNA in line with fairly recent CU µArch improvements - no more than a generation behind RDNA if Kepler is correct.

So RDNA development is basically dogfooding µArch features for the more fiscally important CDNA, and bugs that would otherwise be swept under the rug for a RDNA release (like Next Gen Geometry) can be fixed before they are implemented in a future CDNA.

#2. Latest RDNA supported in ROCm, because a lot of the work has to be done anyway to enable it for the next CDNA.

In short it keeps CDNA relatively modern on the general compute side, and RDNA usable on the ROCm side.

ie the best of both worlds while still allowing for domain specific optimisation.
 
for reference

AMD CES 2026 live blog

AMD CES 2026 Keynote Live Coverage​

By
Ryan Smith
-
January 5, 2026

"Ryan Smith"? That name sounds familiar!
 
bugs that would otherwise be swept under the rug for a RDNA release (like Next Gen Geometry) can be fixed before they are implemented in a future CDNA.
Why would CDNA need Next Gen Geometry? A lot of complex (and thus bug prone) cool new stuff from patents seems to be useful only for RDNA with zero value for CDNA.
 
Why would CDNA need Next Gen Geometry? A lot of complex (and thus bug prone) cool new stuff from patents seems to be useful only for RDNA with zero value for CDNA.
I was just using that as an example of a major bug making it into a production µArch.
 
I was just using that as an example of a major bug making it into a production µArch.
Yeah I get this, but bugs more likely in complex new things like exactly you've mentioned, but that's non issue for CDNA.

If I had to guess what they want to use RDNA for is testing new better caching and scheduling (subtle bugs that make it into final silicon) - plus avoiding RDNA3 issue with lower clocks, but on the other hand if RDNA sticks to N-1 process then it's not really like for like comparison anyway.
 
So RDNA development is basically dogfooding µArch features for the more fiscally important CDNA, and bugs that would otherwise be swept under the rug for a RDNA release (like Next Gen Geometry) can be fixed before they are implemented in a future CDNA.
not really, gfx13 CU is derived from 1250 and not 1200/1201.
The roadmap kind of eats itself now with client iterating into DC into client into DC into yaddayaddayadda.
 
I already explained it here: 1000x of an MI500 system vs. MI300X system is very reasonable:
I think it is rather simple: 8x GPU MI300X cluster vs. Full MI500 rack
MI300X delivers 1.3 PFLOPS of FP16 (matrix calculations). A MI300X platform is a 8x GPU cluster, so this results in 10.4 PFLOPS.

MI455X will bring FP4 support and at Helios rack level this results in 3 ExaFLOPS. Now double the amount of GPUs per rack for MI500, make the GPUs itself 1.75x faster (plausible if e.g. increasing GPU size by using 3x or 4x base Die tiles instead of 2x) and we land at 10.5 ExaFLOPS per rack.

10.5 ExaFLOPS / 10.4 PFLOPS = 1000x 😉
 
Last edited:
Gfx1250 (CDNA5) and presumably later (gfx13/RDNA5/CDNA6) is seemingly deprecating CDNA1-4 MFMA intrinsics in favor of more modern WMMA matrix intrinsics seen on Gfx11/RDNA3 and onwards. (Per ROCDL LLVM GitHub)
Screenshot_20260110_174827_Samsung Internet.jpg
 
Do we have any idea of how Rubin Ultra vs MI500 compares in terms of launch schedule and GPU performance?
I know NVIDIA has rebuild entire DC stack + SW moat on top but will they continue to win or does NVIDIA have to pull forward Feynman to H2 2027?
 
Do we have any idea of how Rubin Ultra vs MI500 compares in terms of launch schedule and GPU performance?
They're both H2'27 and dunno.
I know NVIDIA has rebuild entire DC stack + SW moat on top but will they continue to win or does NVIDIA have to pull forward Feynman to H2 2027?
They can't "pull in" anything since their hweng mines are already overworked to death.
 

HLRS director reveals existence of previously unannounced AMD MI600 AI chip​

Comments were made as part of a discussion regarding the center’s forthcoming supercomputers
July 08, 2025 By Charlotte Trueman

The director of the High-Performance Computing Center (HLRS) in Stuttgart, Germany, has revealed the existence of the AMD MI600 AI chip, something that has not been previously disclosed by the chipmaker.

During a discussion with journalists about the procurement processes for the successor to the center’s forthcoming system, Herder, Professor Dr. Michael Resch, said: “We are not so much interested in the MI400… We are already interested in MI500, 600.”

 
Back
Top