Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Krteq · Nov 8, 2021

MI200 "die shot"

https://twitter.com/x/status/1457711975925002241

Stuka87 · Nov 8, 2021

Obviously a CG representation, but looks like two dies, each having four HBM memory stacks. Not quite the chiplet design I had imagined. Then a question is, does each module show up as a single GPU?

DisEnchantment · Nov 8, 2021

Stuka87 said:
Obviously a CG representation, but looks like two dies, each having four HBM memory stacks. Not quite the chiplet design I had imagined. Then a question is, does each module show up as a single GPU?

This one is interposer based only, it is like a high dandwidth version of EPYC Naples chiplets. The full 3D stacked SoIC versions will come with RDNA3 and MI300

leoneazzurro · Nov 8, 2021

AMD Instinct MI200 MCM Aldebaran GPU pictured - VideoCardz.com

AMD Instinct MI200 pictured AMD’s first multi-chip-module general-purpose graphics processor (GPGPU) has been leaked ahead of launch. ExecutableFix has leaked the photo of the actual Instinct MI200 compute accelerator. AMD Instinct MI200, Source: ExecutableFix The render appears to confirm it...

videocardz.com

Better pictures.

biostud · Nov 8, 2021

More rumors regarding infinity cache

AMD Next-Gen RDNA GPUs Might Feature 3D Infinity Cache Technology, MCM GPUs With 3D Stacking

AMD's next-generation RDNA 3 and CDNA GPU are expected to feature brand new 3D Stacked Infinity Cache technology for higher bandwidth.

wccftech.com

eek2121 · Nov 8, 2021

MI200 has 95TF of fp32 performance. 🤯

tamz_msc · Nov 8, 2021

eek2121 said:
MI200 has 95TF of fp32 performance. 🤯

That's for matrix ops. Even more crazy is that it has the same for double precision matrix ops. I wonder what workloads need such matrix throughput. Most probably not AI-type workloads.

gdansk · Nov 8, 2021

I wonder where it is the most efficient. 560W seems like a lot.

Krteq · Nov 8, 2021

eek2121 said:
MI200 has 95TF of fp32 performance. 🤯

I think it's TF32, not FP32

moinmoin · Nov 8, 2021

https://twitter.com/x/status/1457532290104545280

Should be a good omen for RDNA3.

Asterox · Nov 8, 2021

As usual, AMD product trailers or presentations are CGI eye medicine.

leoneazzurro · Nov 8, 2021

Krteq said:
I think it's TF32, not FP32

It is packed FP32 peak throughput.

AtenRa · Nov 9, 2021

Lots of interesting things from this video.

beginner99 · Nov 9, 2021

Stuka87 said:
Then a question is, does each module show up as a single GPU?

Seems like it appears as 2 GPUs and isn't all that connected really. For sure wouldn't work for gaming.

Stuka87 · Nov 9, 2021

beginner99 said:
Seems like it appears as 2 GPUs and isn't all that connected really. For sure wouldn't work for gaming.

Which is to be expected. Its clearly a compute only part.

Saylick · Nov 9, 2021

beginner99 said:
Seems like it appears as 2 GPUs and isn't all that connected really. For sure wouldn't work for gaming.

Stuka87 said:
Which is to be expected. Its clearly a compute only part.

I believe RDNA3 is going to use an embedded bridge to connect the GCDs, and not using regular IF links like how MI200 does. The silicon bridges in MI200 are between the GCD and HBM modules, which makes sense given the bandwidth required there. Same goes for RDNA3 between the GCDs.

Mopetar · Nov 9, 2021

Some of the numbers are getting to the point of downright nutty. If you look at the BF/FP16 matrix numbers we're getting to the point where it's only a few more generations before we start having to measure the performance numbers in PFLOPs.

gdansk · Nov 9, 2021

Mopetar said:
Some of the numbers are getting to the point of downright nutty. If you look at the BF/FP16 matrix numbers we're getting to the point where it's only a few more generations before we start having to measure the performance numbers in PFLOPs.

Another downright nutty figure is the TDP. In a few generations (maybe just one?) it'll be measured in kilowatts.

Saylick · Nov 9, 2021

gdansk said:
Another downright nutty figure is the TDP. In a few generations (maybe just one?) it'll be measured in kilowatts.

An unavoidable side effect of packing more and more silicon in the same package. It used to not be possible to jam this much silicon onto the package due to reticle limits, but MCM and other advanced packaging techniques eliminates that. As long as perf/W and perf/socket increases, increasing package power is of little consequence.

eek2121 · Nov 9, 2021

gdansk said:
Another downright nutty figure is the TDP. In a few generations (maybe just one?) it'll be measured in kilowatts.

Meh, it is on an older node, and with 2 GPUs no less. My RTX 3090 peaks at around 420W, and can’t come close to these numbers, though the Instinct doesn’t use CUDA, so many would opt for NVIDIA anyway.

beginner99 · Nov 9, 2021

gdansk said:
Another downright nutty figure is the TDP. In a few generations (maybe just one?) it'll be measured in kilowatts.

Ins't actually ponte vecchio rumored to be close to that?

Saylick · Nov 9, 2021

beginner99 said:
Ins't actually ponte vecchio rumored to be close to that?

600W, allegedly. Water-cooled.

https://twitter.com/x/status/1375443623324491785

gdansk · Nov 9, 2021

eek2121 said:
Meh, it is on an older node, and with 2 GPUs no less. My RTX 3090 peaks at around 420W, and can’t come close to these numbers, though the Instinct doesn’t use CUDA, so many would opt for NVIDIA anyway.

beginner99 said:
Isn't actually ponte vecchio rumored to be close to that?

I'm not saying it's AMD specific. It's an industry wide trend. Intel, Nvidia, AMD are all doing it in response to some demand.

biostud · Nov 9, 2021

gdansk said:
Another downright nutty figure is the TDP. In a few generations (maybe just one?) it'll be measured in kilowatts.

We've already been there with 3 and 4 way cf /sli.
But since they now can put multiple GPUs in a single package, it is to be expected that power consumption goes up as well.

leoneazzurro · Nov 9, 2021

In the end, in these applications, only perf/W matters at the end. Even if now a single package has 560W of max power draw, it matters nothing if it has a performance >2x a part with 300W draw. PC environment is different.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Senior member

Diamond Member

Golden Member

Senior member

Lifer

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Golden Member

Senior member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Lifer

Senior member