Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
Really interesting stuff!

To me the coolest part was the possibility of having the interconnect bridges (118 on figure) containing the L3 on top of the GPU chiplets (106-X on figure):

MBXF1FY.png

I always wondered how they solve the heat issues with active interposers (having caches) underneath the multi-chiplet GPUs.

Putting the interconnects on top seems like a really nice way of doing that. The chips would obviously be design in a way that the toasty parts of GPU chiplets (CUs) and interconnects (L3) are not not under each-other.

On Figure 3 they have a more "classical" solution. I'm not really how big the difference between the two designs would actually be . One would haveTSV's (Through Silicon Via) on GPU chiplets instead of the bridge. To me it would seem that you can fit more L3 on the Fig. 4 design (as L3 can potentially take up more area than just the GAP between the chiplets) but I would be really interested if anyone with actual knowledge about these matters could comment.

Anyway really exciting and I do hope we see this in RDNA3. How capable are TSMCs current processes of handling such packaging at-scale anyway?
 

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,620
136
seems very similar to this patent from earlier this year: https://www.freepatentsonline.com/y2020/0409859.html
which one is the real one? 😏
For AMD the big advantage of using old school package substrate for MCM so far has been twofold: Packaging is a solved issue and very cheap, and the length of the connections is not limited, allowing chiplets to be spread apart and still be connected individually. Bandwidth made that approach not feasible for a chiplet based GPUs though.

Passive interposers, like discussed in the older patent above, seem more akin to Intel's EMIB which as I see it brings two negatives: chiplets need to lie next to each other (so longer connection either need bigger interposers or connections need to be routed through several chiplets), and packaging is more costly with a lower yield.

The new patent is more exciting due to potentially decisively changing the balance: Infinite Cache could be moved to the interposer altogether, making the GPU chiplets smaller and directly scaling the IC with the amount of chiplets used. Packaging may still be expensive by itself, but the dies may be much cheaper to produce as a result (interposer dies could even be done on a cheaper older node like is done with the IOD as well).
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
That just seems a bit too good to be true. I get that a lot of rumors about RDNA2 were treated similarly, but in reality they just seemed unlikely only because AMD hadn't done well with the last few releases and were quite a bit far behind Nvidia, particularly at the high end. All it really meant was that AMD was back to roughly equal footing, which when phrased that way doesn't seem quite so remarkable.

This just strays into the realm of too good to be true or only makes sense if it's one of those "up to" measures of performance where one tiny aspect of performance sees a major uplift that may have an almost negligible impact on the overall average.

I suppose I could watch it for myself, but I hate clickbait with a passion and this reeks of it through and through. Even in the heydays of GPUs when you could see major generational uplifts, 50% was considered a lot and I'm not sure we ever really saw much more than 70% in a general overall sense of improvement.

At least this makes me feel less bad about not being able to get a GPU right now.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,719
7,016
136
This just strays into the realm of too good to be true or only makes sense if it's one of those "up to" measures of performance where one tiny aspect of performance sees a major uplift that may have an almost negligible impact on the overall average.

- This right here.

I mean, is there even 2.5x untapped performance left for CPUs to even feed a GPU?

The RTX 3000 series is already slamming into all kinds of card and processor side bottlenecks trying to feed its huge shader array, and it is in no way in nearly any workload 2.5 times faster than RDNA2.
 

soresu

Platinum Member
Dec 19, 2014
2,617
1,812
136
I mean, is there even 2.5x untapped performance left for CPUs to even feed a GPU?
I think that this particular problem has more to do with bad game engine design/threading and various other bottlenecks in the system rather than the CPU itself.

Increasing ST perf seems more like treating a symptom than the cause, which it seems like game resource virtualisation (textures, geometry) and optimisation for higher IO SSD's to complement that virtualisation might finally be solving with new engines like UE5.

This also includes new API's like DirectStorage removing the CPU middleman from the IO equation that keeps the GPU fed.

I think we may see more improvement in the next few 2-3 years than we did in the last 10+ as a result of these long awaited overhauls.

I just hope that UE5 isn't so far ahead that it discourages other devs from trying to improve their own engines - lack of competition just leads to stagnation given enough time so we need UE and others to have a driving need to keep going hard.
 

maddie

Diamond Member
Jul 18, 2010
4,723
4,628
136
That just seems a bit too good to be true. I get that a lot of rumors about RDNA2 were treated similarly, but in reality they just seemed unlikely only because AMD hadn't done well with the last few releases and were quite a bit far behind Nvidia, particularly at the high end. All it really meant was that AMD was back to roughly equal footing, which when phrased that way doesn't seem quite so remarkable.

This just strays into the realm of too good to be true or only makes sense if it's one of those "up to" measures of performance where one tiny aspect of performance sees a major uplift that may have an almost negligible impact on the overall average.

I suppose I could watch it for myself, but I hate clickbait with a passion and this reeks of it through and through. Even in the heydays of GPUs when you could see major generational uplifts, 50% was considered a lot and I'm not sure we ever really saw much more than 70% in a general overall sense of improvement.

At least this makes me feel less bad about not being able to get a GPU right now.

This post with voltage/frequency testing for RDNA2 https://forums.anandtech.com/thread...na-architectures-thread.2579999/post-40487064 by uzzi38 shows how it could possibly be accomplished.
 
  • Like
Reactions: Tlh97 and Gideon

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
That just seems a bit too good to be true. I get that a lot of rumors about RDNA2 were treated similarly, but in reality they just seemed unlikely only because AMD hadn't done well with the last few releases and were quite a bit far behind Nvidia, particularly at the high end. All it really meant was that AMD was back to roughly equal footing, which when phrased that way doesn't seem quite so remarkable.

This just strays into the realm of too good to be true or only makes sense if it's one of those "up to" measures of performance where one tiny aspect of performance sees a major uplift that may have an almost negligible impact on the overall average.

I suppose I could watch it for myself, but I hate clickbait with a passion and this reeks of it through and through. Even in the heydays of GPUs when you could see major generational uplifts, 50% was considered a lot and I'm not sure we ever really saw much more than 70% in a general overall sense of improvement.

At least this makes me feel less bad about not being able to get a GPU right now.
2.5 times performance per watt uplift, and now they claim that its higher than that, but lower than 3x.

150W GPU having performance of a 350W GPU.

Yeah, right.

Im pretty sure next thing we will hear of is that it will come with personal Pig in the box.
 

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
This post with voltage/frequency testing for RDNA2 https://forums.anandtech.com/thread...na-architectures-thread.2579999/post-40487064 by uzzi38 shows how it could possibly be accomplished.
I'll just link the spreadsheet for convenience because it has the actual numerical values as well.


But ultimately applying RDNA2's V/f curve directly and trying to get to say a 350W TBP (Nvidia went that far this gen, I wouldn't be surprised if AMD do the same next gen) then you'd be fine provided core-only power lands somewhere in the 250-280W range more or less which for 160CUs is doable at ~1800MHz (40CUs at 1800MHz == 61W core only power, * both by 4 and you get ~244W core only power). Now obviously RDNA3 will come with it's own improvements to power efficiency, coupled alongside N6/N5 gains (I'm not convinced the entire RDNA3 product stack is on N5 yet because of AMD's wording) so I think 1900MHz - 2000MHz is a safe bet. Realistically the node alone should be able to provide that much.

Anything afterwards would come from uArch gains. Given that even with just some basic theory crafting we can already more or less see that RDNA3 clocking 160CUs to a similar degree to RDNA2 on 80CUs isn't really all that far fetched. Or at least, I can see that. Perhaps I'm stretching a bit.

So that's why I don't see the RDNA3 rumours as far fetched. Not yet anyway. We'll have to see but if I can so easily make sense of a 2x improvement over RDNA2 then heck I don't see why I would completely rule out 2.5x just yet.
 

gorobei

Diamond Member
Jan 7, 2007
3,654
980
136
That just seems a bit too good to be true. I get that a lot of rumors about RDNA2 were treated similarly, but in reality they just seemed unlikely only because AMD hadn't done well with the last few releases and were quite a bit far behind Nvidia, particularly at the high end. All it really meant was that AMD was back to roughly equal footing, which when phrased that way doesn't seem quite so remarkable.

This just strays into the realm of too good to be true or only makes sense if it's one of those "up to" measures of performance where one tiny aspect of performance sees a major uplift that may have an almost negligible impact on the overall average.

I suppose I could watch it for myself, but I hate clickbait with a passion and this reeks of it through and through. Even in the heydays of GPUs when you could see major generational uplifts, 50% was considered a lot and I'm not sure we ever really saw much more than 70% in a general overall sense of improvement.

At least this makes me feel less bad about not being able to get a GPU right now.
The general buzz from the leaktubers is that amd is delivering gpu chiplets with rdn3 (though not sure if that means it comes with the 7000 series). with 2 chiplets and possibly an i/o die to reduce relative die sizes; amd can increase total cu's, drop power usage, and possibly lower costs with the increase in dies/wafer.

Cramming all the cu's from a big navi die into a chiplet and sticking a second chiplet on a single card is where most of the 2.5x gain is coming from. so long as they can make it look like a single gpu to the OS, the game devs wont care how they get the performance. with all of it being made in-house at tsmc they can reduce some of the transportation costs compared to ryzen/epyc where they have to ship io dies from globalfoundries to tsmc for packaging.

nv seemingly has delayed their consumer chiplet plans, probably because they cant abandon samsung 8nm for tsmc 7nm just yet. there is no reason to attempt chiplets on consumer products if it cant beat rdna3 on perf/power/price.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
It depends in which area this "2,5x performance" could be delivered. In rasterization, I see that unlikely (also because CPU limitation would become an issue even at very high resolutions). Ray tracing? Definitely possible (even likely).

nv seemingly has delayed their consumer chiplet plans, probably because they cant abandon samsung 8nm for tsmc 7nm just yet. there is no reason to attempt chiplets on consumer products if it cant beat rdna3 on perf/power/price.

Question is, it's not only a problem of making the chiplet: packaging for these "multi-chiplet GPUs" will be real hell and definitely not cheap. Until today we saw interposers only in high-end products and with somewhat "limited" applications like HBM and the AMD CPUs have not the large interconnection lanes needed for GPUs. If RDNA3 will resemble something like the AMD patent we saw some time ago (with cache on interposer) it will be a real challenge, and not cheap for sure.
 
Last edited:
  • Like
Reactions: DiogoDX

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
If RDNA3 will resemble something like the AMD patent we saw some time ago (with cache on interposer) it will be a real challenge, and not cheap for sure.
Yeah, especially if they do deliver on the > 2x perf metric (that I also find suspicious for all workloads).

That would be over 1000 mm² of silicon real-estate on 6nm with exotic packaging and with at least ~ 384bit memory bus or HBM2 or GDDR 6X all of which make things more complex or cost more. If they use 5nm, the dies will be smaller but the cost of them certainly won't.

All in all the highest end card won't be in the price bracket of the 6800 - 6800XT for sure. If I had to guess, the "non bitcoin craze" MSRP would be at least between $999 - $1499 for the models corresponding to 6800 - 6900XT.

And if the performance rumors are true, they could easily justify the price, possibly even more.
 
  • Like
Reactions: Tlh97

Timorous

Golden Member
Oct 27, 2008
1,538
2,537
136
I have no idea how AMD would achieve 2.5x more performance with 2x the CUs. The 6900XT has just about 2x the performance of the 5700XT in 4k with 2x a doubling of the core and a ~ 17.4% clockspeed uplift.

I guess stripping out Infinity Cache and memory controllers from the GPU chiplet might allow it to clock higher which is required to hit that 2.5x target but is it going to be 30+% higher? That is 3Ghz + and just seems like a no but I said the exact same thing about the PS5 GPU clock speed rumours and I was wrong there.

I don't think 2.5x is impossible, especially for a potentially new paradigm like chiplets, but I am not convinced it will get there outside of specific workloads. I was quite bullish on what I thought AMD could do with RDNA2 but until AMD release some glimpses of information on RDNA3 I am skeptical.
 
  • Like
Reactions: Tlh97 and scineram

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
I don't think 2.5x is impossible, especially for a potentially new paradigm like chiplets, but I am not convinced it will get there outside of specific workloads. I was quite bullish on what I thought AMD could do with RDNA2 but until AMD release some glimpses of information on RDNA3 I am skeptical.

They can't achieve it with clocks. The only way that would be possible at all would be a radical redesign of the CUs (they have some interesting patents for that) but I still find it doubtful, as they would also hit a memory- (or cache-) bandwidth wall and they don't have that much power to spare being already at 300W.

I could see 2.5 - 3x the raytracing performance, maybe even slightly more, but rasterisation seems too good to be true. Even 2x the rasterization performance would be a huge achievement (especially as they can't possibly increase clocks with 160CU's unlike last time. If anything they will regress slightly.
 

Timorous

Golden Member
Oct 27, 2008
1,538
2,537
136
especially as they can't possibly increase clocks with 160CU's

I said they can't possibly clock the PS5 > 2Ghz and hit a console TDP window and I was way off of the mark. I think can't is too strong but it seems exceedingly unlikely that do it like that.

I dunno maybe they have another magic bullet or maybe they are talking about a 2.5x performance increase with RT on which is totally believable and probably a bit on the low end to be honest.
 
  • Like
Reactions: scineram

Glo.

Diamond Member
Apr 25, 2015
5,661
4,419
136
As I see, I am not the only one, who has really hard time believing those numbers/rumors...

But, we've been here before...

The thing that makes me think, that AMD can achieve that higher than 2.5X performance is in "Specific scenarios" like, I don't know, Ray Tracing, for example.

If its in pure rasterization - that front end of the GPU must have some magical capabilities, as well, apart from really effective Geometry Culling.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,573
136
I said they can't possibly clock the PS5 > 2Ghz and hit a console TDP window and I was way off of the mark. I think can't is too strong but it seems exceedingly unlikely that do it like that.
I looked at PS5's PSU and had no trouble believing they could hit ~2 Ghz at 300W with 50% perf/watt improvements. The thing is ... a lot of that was due-to Infinity Cache.

Going to 160 CUs and multiple chiplets power will take a hit as communicating between chiplets draws (if ever so slightly) more than a monolithic chip. They certainly need to upgrade their memory bandwidth as well.

All in all, I wouldn't mind at all for this to be true, would be a truly unprecedented gain (vs Nvidia's BS claims for Ampere), but I'm not holding my breath
 
  • Like
Reactions: Tlh97 and scineram

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,575
146
I looked at PS5's PSU and had no trouble believing they could hit ~2 Ghz at 300W with 50% perf/watt improvements. The thing is ... a lot of that was due-to Infinity Cache.

Going to 160 CUs and multiple chiplets power will take a hit as communicating between chiplets draws (if ever so slightly) more than a monolithic chip. They certainly need to upgrade their memory bandwidth as well.

All in all, I wouldn't mind at all for this to be true, would be a truly unprecedented gain (vs Nvidia's BS claims for Ampere), but I'm not holding my breath

Never mind I misunderstood.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Putting the interconnects on top seems like a really nice way of doing that. The chips would obviously be design in a way that the toasty parts of GPU chiplets (CUs) and interconnects (L3) are not not under each-other.
If we examine the patent from AMD and cross reference with what kind of packaging TSMC (and we could even investigate the timeline) is offering it might be possible to make a reasonable explanation why the patent is how it is and why the previous patent could have been superseded.
One main problem is indeed heat transfer, and the main drawback of CoWoS and InFO (all other BE based stacking) is that there is a metallic layer as well and the chips are stacked together using Micro bumps. The issue with this is that there is a problem with heat transfer from one die to another across the microbumps. In addition to having to take care about different thermal coefficients of different materials.

Looking at the new patent for chiplet fabrication, there are no bumps/metallic layers between the two dies depicted which makes me believe this is probably FE based like SoIC
1620133422343.png
The FE based stacking make use of same material for the stacked dies and they have same thermal coefficient and the bonding layer allows for a better thermal transfer.

1620133580800.png

Only issue with this is that there are constraints in how the dies can be mixed and matched, but of course reaping all the benefits of chiplets when co designed properly.

In short, it seems to me the previous patent below uses CoWoS

Whereas the new patent uses SoIC

Read about advantages of SoIC vs CoWoS here
TL;DR; SoIC offers better thermal characteristics than CoWoS

Also I think FE actually is basically only the layers before any metal layers are done, not as described in the AT article.
 

biostud

Lifer
Feb 27, 2003
18,194
4,674
136
I think they will keep 256 bit memory bus and 16GB memory, maybe gddr6x, otherwise they would have to go 24GB. Maybe they will make a top end card with 32GB. If anything this crisis has shown that many consumers are willing to buy video cards at high prices.
 
  • Like
Reactions: Tlh97 and Elfear

moinmoin

Diamond Member
Jun 1, 2017
4,934
7,620
136
Is it mentioned anywhere that it's 2.5x perf with 2x CUs? From RDNA to RDNA2 AMD simplified and doubled the logic and imo put in preparations for a future chiplet based design. If RDNA3 is chiplet based I could imagine a lower speed 4x CUs 8 chiplets "Epyc" style MCM RDNA3 reach 2.5x perf without breaking the TBP bank.