Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

uzzi38 · May 17, 2022

JasonLD said:
Lot of people didn't take it that way for sure. Many will be disappointed when they see actual performance improvement being usual jump between 2 generations on new nodes. (around 70-80%)

You can thank the rumour mill making stuff up constantly for that.

Saylick · May 17, 2022

Speaking of yields... if AMD's approach lets them use 400mm2 of TSMC N5 to compete with Nvidia's 600mm2 of N5, the improved yields appear to be in the ~12% range. Not a big amount, but considering that N5 is like 1.8x the cost of N7 (and N6 is supposed to be slightly cheaper than N7), every bit counts.

22mm x 28mm (616 mm2):

vs.
17mm x 24mm (408 mm2):

maddie · May 17, 2022

Saylick said:
Speaking of yields... if AMD's approach lets them use 400mm2 of TSMC N5 to compete with Nvidia's 600mm2 of N5, the improved yields appear to be in the ~12% range. Not a big amount, but considering that N5 is like 1.8x the cost of N7 (and N6 is supposed to be slightly cheaper than N7), every bit counts.

22mm x 28mm (616 mm2):
View attachment 61641

vs.
17mm x 24mm (408 mm2):

View attachment 61642

More than that. Wafers being short, you get 80% more good die/wafer. Smart way to increase output in a time of rationing.

Saylick · May 17, 2022

maddie said:
More than that. Wafers being short, you get 80% more good die/wafer. Smart way to increase output in a time of rationing.

Ah, good point. Better yields and more volume; what's not to love?

Frenetic Pony · May 17, 2022

Ajay said:
Exactly! This is a consumer graphics GPU, not an HPC compute ASIC. TFLOPS do not directly correspond to gaming performance.

Here they could be close-ish, with enough bandwidth and other re-architecting, especially as these things are just supposed to be going twice as wide+. Really, in RDNA2 terms 2x96 CUs (20% more than a 6900xt, then x2) a huge performance boost is within reach. And that's within reach of an air cooler as well. Cut clockspeed and take in arch and node improvements and you'd get down from 600 watts (6900xt + 20% x 2) to 480 watts.

That's within reach if that configuration is what it ends up being. Twice the performance of a 6900xt, much more when it comes to RT, and faster than Nvidia's RTX 4090ti. But that assumes this config. No idea what AMD is really going for; but their better perf/watt over Nvidia other than RT in most cases certainly gives them more headroom.

Saylick · May 18, 2022

Alright, here's your daily dose of rumors:

https://twitter.com/x/status/1526833191188762625

https://twitter.com/x/status/1526819565417943040

Saylick · May 18, 2022

As for die size... Not sure how to best interpret this but 520mm2 (Navi 21) divided by 1.25 gives you 416mm2.

https://twitter.com/x/status/1526829154783612928

Timorous · May 18, 2022

Saylick said:
As for die size... Not sure how to best interpret this but 520mm2 (Navi 21) divided by 1.25 gives you 416mm2.

https://twitter.com/x/status/1526829154783612928

416 is ballpark of what I would have expected.

I always estimated 250mm^2 for a 6-7k shader design when we thought it was 15k shaders split across 2 graphics dies (so ~500mm^2 of N5 silicon) so 416 for 12k shaders is a tad smaller than I was estimating.

What are the chances those 64MB cache tiles are just the same ones used for Milan-X and the 5800X3D? Do they need to be different to work for a GPU or is cache just cache? Also if they are the same does that mean AMD can start 3D stacking at some point.

Also if the GPU GCD has links to talk to stuff off die then if AMD do get two GCDs acting like one die seamlessly then could AMD bring out an X2 refresh or something. Imagine a 7950XT-X2-3D with two 12k shader GCDs and stacked cache with 24 cache tiles. (2 lots of 6 per GCD). To be honest such a part could easily be an 8000 series update with 9000 being RDNA 4.

Once AMD get this working and get the packaging infrastructure in place then the options really broaden quite significantly for what is possible. Pretty cool even if right now they are playing it safe.

gdansk · May 18, 2022

I think they're different. Called MCD because they include more than the cache unlike what is on MilanX.

With more memory bus width I suspect N31 won't be as efficient relative to competition as N21 was.

maddie · May 18, 2022

Love how they say -------------- "As @kopite7kimi pointed out, old info from last Oct is outdated 😉"

Outdated, what a joke.

It seems nobody can admit to, we were wrong, or that was BS by us.

coercitiv · May 18, 2022

maddie said:
It seems nobody can admit to, we were wrong, or that was BS by us.

The leaks are fine, the final product is wrong.

maddie · May 18, 2022

Saylick said:
As for die size... Not sure how to best interpret this but 520mm2 (Navi 21) divided by 1.25 gives you 416mm2.

https://twitter.com/x/status/1526829154783612928

This makes no sense. Are you all sure these guys really know anything?

Thats a misestimation the die should reduce in size by 25% not 40% due to moving the infinity cache off die and AMD custom 5nm 2x density vs 7nm.

Ajay · May 18, 2022

maddie said:
This makes no sense. Are you all sure these guys really know anything?

Thats a misestimation the die should reduce in size by 25% not 40% due to moving the infinity cache off die and AMD custom 5nm 2x density vs 7nm.

I think there is way to much guess work being passed off as 'leaks'. Oh well, nature of the beast. YouTube, Twitter and rumor sites are just bouncing rumors around between themselves till things appear true, until they are not.

GodisanAtheist · May 18, 2022

Either guess work or an active disinformation campaign by AMD/NV is going on here, or a combination of both which is really turning everything into a slurry.

The rumor mill and hype train are fun, but they should never be taken too seriously.

gdansk · May 18, 2022

Some extrapolate too far from what little information they do receive. Others make it all up. Some receive anonymous bogus and run with it.

The Twitter rumor mill doesn't make money off views so they are a bit more trustworthy in my estimation. But still they are often wrong until right before the product announcement.

But personally, with no specific knowledge, the 1 big GCD seems much more reasonable to me than the previous 2 GCD speculation.

Mopetar · May 18, 2022

Saylick said:
Speaking of yields... if AMD's approach lets them use 400mm2 of TSMC N5 to compete with Nvidia's 600mm2 of N5, the improved yields appear to be in the ~12% range. Not a big amount, but considering that N5 is like 1.8x the cost of N7 (and N6 is supposed to be slightly cheaper than N7), every bit counts.

22mm x 28mm (616 mm2):
View attachment 61641

vs.
17mm x 24mm (408 mm2):

View attachment 61642

The yield rate almost doesn't matter for big GPUs. Since they're so massively parallel and full of duplicated hardware it's unlikely that a defect occurs in anything that completely junks the chip. Even functional hardware is often disabled to fit into a particular bin with these big dies.

The only advantage AMD can get is moving transistors to different chips that can be optimized around that purpose to gain an economic edge. A smaller die does little good if they need two of them to compete with a bigger NVidia die. On the other hand, the Zen 3D cache die was about 2x as dense as the cache on the CPU die. That's where AMD has the most potential.

Saylick · May 18, 2022

Mopetar said:
The yield rate almost doesn't matter for big GPUs. Since they're so massively parallel and full of duplicated hardware it's unlikely that a defect occurs in anything that completely junks the chip. Even functional hardware is often disabled to fit into a particular bin with these big dies.

The only advantage AMD can get is moving transistors to different chips that can be optimized around that purpose to gain an economic edge. A smaller die does little good if they need two of them to compete with a bigger NVidia die. On the other hand, the Zen 3D cache die was about 2x as dense as the cache on the CPU die. That's where AMD has the most potential.

Good points. There's also process variation across the wafer that can put a limit to clocks with increasing die size. The GPU is only as fast as its slowest SM/CU.

Frenetic Pony · May 18, 2022

Saylick said:
Good points. There's also process variation across the wafer that can put a limit to clocks with increasing die size. The GPU is only as fast as its slowest SM/CU.

Multi clock domains that work on an active GPU would be coooool. But until then that's another chalk up to chiplets being a good thing. Bin bin bin.

And yes, if it wasn't obvious, the "leakers" know shit all about AMD. I'd been operating on the assumption that they at least had some vague idea, RDNA2 was leaked before release. But at this point I'm just happy to wait till the press conference as far as leaks go.

As far as guesses: Either the max bus size is 384bit for a single chip with PHYs on it and max performance is about 2x a 6950xt, or for two chiplets with PHYs 512bit bus is possible and over 2x is possible or would allow for more perf out of a lower speed ram. Either way chiplets (either 2 symmetrical dies, or 1 5nm/1 6nm die, +SRAM either way, oh and 192mb/128mb sram for like all but 1 config) seem entirely plausible in some sort of configuration and should make AMD even more competitive on cost.

Aapje · May 18, 2022

My current theory is that Lisa Su felt so bad for the 'leakers' that she just called one of them up and told them straight up about the 1 compute die.

OK, not really.

eek2121 · May 19, 2022

Timmah! said:
Just found this was already posted few days ago on a previous page, and i am late to the party, so nevermind.

Even if its still technically multi-chiplet, the "expectation" was multiple compute chiplets, Ryzen style. Thats why it matters, because its the reason why it was expected potentially to handily beat Lovelace with its standard monolithic design. While that may still happen, who knows at this point, it will be for different reasons.

The "2 large GCD chiplets" never made sense to me. A much more sensible option would be multiple chips similar in size (or slightly larger, but nowhere near 400mm2) to Ryzen. This allows you to cater to all markets and scale performance as much as you want. However, it is likely the tech isn't there yet.

JasonLD said:
If N31 is just getting one GCD, than 2.5x Navi21 performance claims can be safely dismissed.

Timmah! said:
I am not saying its bad. And no doubt the way it has massive benefits over what was before.
But its still not what was expected (2 compute chiplets) and i presume, even with the additional space saved on cache not on the die, it still wont hit the expected / rumored performance, which was significantly more than Lovelace.
Not to mention the whole "multiple compute tiles" had certain ring to it, as it was finally supposed to solve the "SLI" scaling issue. And its not really happening.

I think that all the tech needed to go multichip is still there, they've just decided not to release it yet. No sense in needlessly halving the amount of chips (and therefore cards) you can produce. Maybe it needed a revision, and the tech wouldn't be done before launch? Who knows.

It's all going to come down to clocks. If they get the chip significantly above 3GHz, things are going to look pretty interesting. Remember, AMD is using a custom spin of N5. By moving to the MCD solution, heat is distributed across a wider area, and you don't have to risk overheating the cache, memory controllers, etc. I expect clocks will be much higher than N21.

Saylick said:
The GCD will change based on the product. So N31 gets a larger N5 GCD (with cutdown products), N32 gets a smaller N5 GCD (with cutdown products). N33 is completely N6 (with cutdown products). N31 and N32 use N6 MCDs which are the same between the two, but N31 simply uses more of them.

I previously thought that it was silly to have the MCDs also have the memory controller, but I guess it makes sense if you look at it from a scaling standpoint. AMD issues one N6 mask set for all MCDs, and they can slap on as many MCDs as it takes to scale up the memory performance. At least with a single GCD approach, there's no worry of having to resolve inter-GCD communication issues; it's just back to monolithic-esque performance but the difference is that the memory subsystem has been stripped out and put on a more economical node.

Basically, something like this:
View attachment 61640

I stated as much a few pages back.

The other thing an external MCD with memory controller allows them to do is upgrade to GDDR6X or even GDDR7 when it becomes available. Design a new MCD and leave the GCD alone. Want to increase cache? Pump out a new MCD.

We may end up seeing a 2 GCD + faster memory refresh sometime later on.

I'm actually excited to see the next gen cards. I will probably buy one provided they compete with NVIDIA's next gen stuff.

Aapje · May 19, 2022

eek2121 said:
The other thing an external MCD with memory controller allows them to do is upgrade to GDDR6X or even GDDR7 when it becomes available.

And then can make cards with the same compute die, but where one has GDDR6 and the other has GDDR6X. Segmentation is easier this way.

AtenRa · May 20, 2022

Can we draw a conclusion from the slide bellow that top RDNA3 sku (7950XT ???) will have same TBP of 335W as RX6950XT ???

Aapje · May 20, 2022

In reality, the 6950 uses more power than the 3090, so they are lying. I expect the higher clocks to pull up the power consumption for the next gen, although they should be below Lovelace.

The 6950 is a very inefficiently overclocked card and it's very feasible that the non-overclocked (or less overclocked) 7900 will consume the same. Comparing an overclocked card to a non-overclocked card is a good way to hide a consumption increase through a false comparison.

Karnak · May 20, 2022

Aapje said:
In reality, the 6950 uses more power than the 3090, so they are lying.

Nope, it's around 335-340W for the reference 6950XT. Of course AiB are more power hungry, but comparing AiB vs. reference cards doesn't make any sense.

And are we sure that is an official slide? Doesn't look like it tbh.

jpiniero · May 20, 2022

Karnak said:
And are we sure that is an official slide? Doesn't look like it tbh.

Yeah that's got to be a fake slide. They wouldn't mention Ada at this point.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Golden Member

Diamond Member

Golden Member

Lifer

Golden Member

Senior member

Lifer