Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Det0x · Jan 23, 2021

AMD Navi 31 rumored to be dual 80CU chiplet design, up to 10240 cores? - VideoCardz.com

AMD Navi 31, the first desktop chiplet-based GPU? Navi 31 shaping into a true compute monster. We have heard rumors about the upcoming Navi 31 GPU for a while now. In fact, there have been rumors about Navi 41 already. The Navi 31 might be AMD’s first MCM (multi-chip module) design. NVIDIA is...

videocardz.com

AMD Navi 31, the first desktop chiplet-based GPU?
Navi 31 shaping into a true compute monster.

We have heard rumors about the upcoming Navi 31 GPU for a while now. In fact, there have been rumors about Navi 41 already. The Navi 31 might be AMD’s first MCM (multi-chip module) design. NVIDIA is too expected to take the same route with its Hopper series, however, it remains unclear if the architecture is meant for gaming or compute workloads. On the other hand, AMD made it clear that RDNA3 has Radeon DNA and it is for sure aiming at the gaming market.

The successor to Instinct MI100 (no longer called Radeon) based on Arcturus GPU and CDNA architecture will compete with NVIDIA’s Gx100 compute chips. The CDNA is more than likely to take the same path with multi-chiplet design at some point in the future – it is simply easier to synchronize simple compute workloads across multiple dies rather than complex graphics. Even Intel’s Xe-HP architecture will be based on ’tiles’, which might be the industry’s first attempt at GPGPU chiplet design.

AMD Navi 31 had already appeared in a macOS11 leak. But we have not heard much about the architecture ever since. AMD actually did talk about RDNA 3 architecture back in November, but the information provided by AMD’s EVP Rick Bergman was only related to one topic – power efficiency. In an interview with The Street back in November, Rick Bergman confirmed that AMD is committed to delivering the same performance per watt improvement over RDNA 2.

Bergman: “Let’s step back and talk about the benefits of both. So why did we target, pretty aggressively, performance per watt [improvements for] our RDNA 2 [GPUs]. And then yes, we have the same commitment on RDNA 3.”
“So [there are] actually a lot of efficiencies…if you can improve your perf-per-watt substantially. On the notebook side, that’s of course even more obvious, because you’re in a very constrained space, you can just bring more performance to that platform again without some exotic cooling solutions…We focused on that on RDNA 2. It’s a big focus on RDNA 3 as well.”
— AMD EVP, Rick Bergman, via The Street

A new rumor had appeared on January 1st. We have only noticed it because it was retweeted by 3DCenter.org. While it is impossible to confirm those revelations, I thought it might still be interesting to our readers. According to Twitter user @Kepler_L2, it is rumored that Navi 31 has a dual chiplet design with 80 Compute Units each. This means that the GPU could offer up to 160 CUs in total, twice as many as Navi 21

https://twitter.com/x/status/1352552165613727744

AMD RDNA 3 is also expected to bring a noticeable performance upgrade in ray tracing. Based on reviews, the RDNA 2 implementation of ray tracing hardware acceleration is clearly not as good as NVIDIA 2nd Gen RT core, so AMD has a lot of work to catch up. We are still waiting for more details on AMD’s implementation of Deep Learning Super Sampling for AI-based super-resolution technology. Those, however, would be based on Tensor-like compute cores and AMD has so far not implemented such cores into their GPUs. Would Navi 31 offer such a type of core? It is unknown yet.

Patent-hunting Twitter user @Underfox3 has recently discovered that AMD has is already developing a technology to synchronize workload between MCM based GPUs and a new command processor orchestrating ray tracing pipeline for next-gen GPUs. Those two may appear in RDNA 3 architectures.

AMD has never confirmed when exactly could RDNA 3 architecture be announced, but based on the roadmaps, AMD is expected to talk about the architecture by 2022. It would appear that AMD might lift a curtain on RDNA3 somewhere later this year.

GodisanAtheist · Jan 23, 2021

Oh my God we have one foot off the RDNA2 hype train...and it looks like it has found purchase on the RDNA3 hype train!

Chugga chugga choo...

Honestly, I think the first implementations of MCM GPUs are going to be compute based. That is relatively low hanging fruit for massive increases in performance per board.

If either AMD or NV deliver a true MCM gpu in 2022/next gen I will be absolutely floored.

uzzi38 · Jan 23, 2021

GodisanAtheist said:
Oh my God we have one foot off the RDNA2 hype train...and it looks like it has found purchase on the RDNA3 hype train!

Chugga chugga choo...

Honestly, I think the first implementations of MCM GPUs are going to be compute based. That is relatively low hanging fruit for massive increases in performance per board.

If either AMD or NV deliver a true MCM gpu in 2022/next gen I will be absolutely floored.

That's nothing. The real hype train begins here:

2.5x perf.

Need I say more?

Glo. · Jan 23, 2021

If they will hit this performance target, nothing what we will say right now will make any sense.

Insanity.

Mopetar · Jan 23, 2021

Normally I'd say not to get too wrapped up in the hype that you put off buying a GPU for the supposed performance uplift that's rumored in the next generation, but at the rate things are going we may well be waiting anyhow

Glo. · Jan 23, 2021

Mopetar said:
Normally I'd say not to get too wrapped up in the hype that you put off buying a GPU for the supposed performance uplift that's rumored in the next generation, but at the rate things are going we may well be waiting anyhow

Well don't expect that crypto boom will cool down before the end of the year.

So you might just be correct on this one

gdansk · Jan 23, 2021

Hard to temper expectations after what they achieved without a node shrink. The poor hype train must be exhausted.

Glo. · Jan 23, 2021

gdansk said:
Hard to temper expectations after what they achieved without a node shrink. The poor hype train must be exhausted.

What do you mean by "hype train must be exhausted"?

Thomas the Tank Engine that is piloting the hype train will be just fed not with coal but with cocaine!

DisEnchantment · Jan 24, 2021

Source is Forum Member @Kepler_L2

Maybe he could enlighten us here.

Kuiva maa · Jan 24, 2021

Perhaps RDNA2 infinity cache was a testbed more than anything else.

Kepler_L2 · Jan 24, 2021

Wish I could share the source of this info but it's mostly internal AMD stuff so...

Anyway I'm 99% sure RDNA 3 is MCM, the 2x 80 CU is a bit of speculation from date of first Navi31 VBIOS and the MacOS leak being very close.

beginner99 · Jan 25, 2021

Mopetar said:
Normally I'd say not to get too wrapped up in the hype that you put off buying a GPU for the supposed performance uplift that's rumored in the next generation, but at the rate things are going we may well be waiting anyhow

Given supply waiting seems to be the only option anyway...(still on a 290x...)

Paul98 · Jan 25, 2021

beginner99 said:
Given supply waiting seems to be the only option anyway...(still on a 290x...)

I am in the same boat(but with a 290), I was hopeful I could finally get an upgrade with the 3080 or 6800xt. Now it just feels like I will be waiting and waiting for supply.

beginner99 · Jan 26, 2021

Paul98 said:
I am in the same boat(but with a 290), I was hopeful I could finally get an upgrade with the 3080 or 6800xt. Now it just feels like I will be waiting and waiting for supply.

Even with MSRP I will probably need to settle for a 6700x / 3060 TI. Actually I kind of missed my chance as 3060 Tis were in stock for maybe 2 days, but just too expensive for mid-range (and still on 1080p). Well I'm obviously not upgrading my screen before I have a new GPU.

moinmoin · Jan 26, 2021

Kuiva maa said:
Perhaps RDNA2 infinity cache was a testbed more than anything else.

Definitely was a necessary step toward decoupling required bandwidth from RAM which benefits a chiplets based approach as well.

I think AMD prepares RDNA/CDNA for a Naples/Zen 1 (Epyc 7001) like step, i.e. having self sufficient dies that can be combined in multi-die packages. Not going to call it MCM since the kind of connection between the dies is exactly the issue why this has taken so long to happen: package substrat doesn't deliver the bandwidth and bits per joules to allow a workable product. So the high bandwidth requirement of GPUs has to be tamed (Infinity Cache) and an interposer with good energy efficiency properties found (this is still a wildcard as it ideally shouldn't raise complexity of assembling).

jrdls · Jan 30, 2021

Kepler_L2 said:
the 2x 80 CU is a bit of speculation from date of first Navi31 VBIOS and the MacOS leak being very close.

Wouldn't a card with two 80 CU chiplets consume too much power? I'm guestimating the TGP of such a configuration would be between 350 and 390 W (+ the extra power required for the rest of the board and 3rd gen IF). If RDNA 3 uses an IO die, then the TBP would only go higher. That said, my back-of-the-envelope calculations could be wrong. What do you guys think the TGP and the TBP of such a GPU would be?

Mopetar · Jan 30, 2021

I don't think it will be a problem. Back in the day these kinds of dual-GPU cards were fairly common. For example the 5970 was just two 5870's on one board instead of needing to run two cards in crossfire.

A 5870 had a TDP of 228W, but the 5970 was only 294W. All you really need to do is drop the clocks enough to where the chips don't need as much voltage and you can make it work.

Moving to something like HBM that uses less power than GDDR memory could also help as well. There may be other architecture changes that help reduce power consumption if some resources that two chips would normally run independently could be unified.

GodisanAtheist · Jan 30, 2021

AMD Files Patent for Chiplet Machine Learning Accelerator to be Paired With GPU, Cache Chiplets

AMD has filed a patent whereby they describe a MLA (Machine Learning Accelerator) chiplet design that can then be paired with a GPU unit (such as RDNA 3) and a cache unit (likely a GPU-excised version of AMD's Infinity Cache design debuted with RDNA 2) to create what AMD is calling an "APD"...

www.techpowerup.com

Maybe we're all thinking "chiplets" while AMD is thinking "SoC" and breaking up component design that would have gone into a monolithic GPU into individual blocks on the substrate.

One way or the other, the Silicon Wall is fast approaching and all chip and software designers need to really start thinking out of the box about what comes next until the next big leap forward in material science returns us to generational improvements on monolithic designs.

moinmoin · Mar 5, 2021

Underfox shared a lot of more AMD patents today, many of them related to GPUs (which is why I put it here, be sure to read the whole Twitter thread if interested):

https://twitter.com/x/status/1367778625840414722

Kepler_L2 · Apr 4, 2021

https://www.freepatentsonline.com/y2021/0097013.html Say hello to RDNA 3

dr1337 · Apr 4, 2021

Kepler_L2 said:
https://www.freepatentsonline.com/y2021/0097013.html Say hello to RDNA 3

seems very similar to this patent from earlier this year: https://www.freepatentsonline.com/y2020/0409859.html
which one is the real one? 😏

Kepler_L2 · Apr 4, 2021

dr1337 said:
seems very similar to this patent from earlier this year: https://www.freepatentsonline.com/y2020/0409859.html
which one is the real one? 😏

That patent is either for MI200/300 or an early design that was scrapped. The new patent with L3 cache on the interconnect die is RDNA3.

DisEnchantment · Apr 5, 2021

Kepler_L2 said:
That patent is either for MI200/300 or an early design that was scrapped. The new patent with L3 cache on the interconnect die is RDNA3.

Both these patents seem related to WGP and FF GFX blocks which are not present in CDNA. So very likely RDNA related architecture.
Also these statements are in general more valid for GFX than for compute

However, the geometry that a GPU processes includes not only sections of fully parallel work but also work that requires synchronous ordering between different sections. Accordingly, a GPU programming model that spreads sections of work across multiple GPUs are often inefficient, as it is difficult and expensive computationally to synchronize the memory contents of shared resources throughout the entire system to provide a coherent view of the memory to applications. Additionally, from a logical point of view, applications are written with the view that the system only has a single GPU. That is, even though a conventional GPU includes many GPU cores, applications are programmed as addressing a single device. For at least these reasons, it has been historically challenging to bring chiplet design methodology to GPU architectures.

In some embodiments, the application 112 utilizes a graphics application programming interface (API) 114 to invoke a user mode driver 116 (or a similar GPU driver). User mode driver 116 issues one or more commands to the array 104 of one or more GPU chiplets for rendering one or more graphics primitives into displayable graphics images. Based on the graphics instructions issued by application 112 to the user mode driver 116, the user mode driver 116 formulates one or more graphics commands that specify one or more operations for GPU chiplets to perform for rendering graphics. In some embodiments, the user mode driver 116 is a part of the application 112 running on the CPU 102. For example, in some embodiments the user mode driver 116 is part of a gaming application running on the CPU 102. Similarly, in some embodiments a kernel mode driver (not shown) is part of an operating system running on the CPU 102.

exquisitechar · Apr 5, 2021

DisEnchantment said:
Both these patents seem related to WGP and FF GFX blocks which are not present in CDNA. So very likely RDNA related architecture.
Also these statements are in general more valid for GFX than for compute

Can't wait for RDNA3. AMD is being bold, bringing advanced packaging technology to client like this.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Golden Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Golden Member

Senior member