Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 214 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,552
5,527
146

dr1337

Senior member
May 25, 2020
293
488
106
It is interesting to see the 7000 series doing much better here than one might expect compared to RDNA2.
I mean it is rated for 2x the fp16 tflops just like with fp32, I guess it actually scales for AI/stable diffusion. However something to note is that the nod.ai client is still very much work in progress and performance varies quite a bit depending on what version you use let alone what model and image size. Toms has the 6900xt doing ~4it/s but mine does upwards of 7it/s at the same settings and prompt.

Also not sure why that toms article says AMD can only do SD 2.1 when I run 1.4, 2.1, and other models just fine.
 
Last edited:

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136
No, but at least they're getting it working somewhere. Hopefully they get it figured out before RDNA3 [RDNA4?] and don't decide to not enable whatever improvements they can get from it for RDNA3 cards.

AMD has a pretty good track record for things like this and RDNA3 sales being on the low end wouldn't hurt future generation sales prospects. On the other hand they certainly haven't minded acting like NVidia on the pricing front.
Well, since it's most likely that I'll be picking up a 7000 series AMD GPU at some point, I certainly do hope they improve it.
 
  • Like
Reactions: Tlh97 and Mopetar

Joe NYC

Golden Member
Jun 26, 2021
1,864
2,147
106
-Therein lies the problem. I don't think AMD is getting 60 CUs to 3Ghz without blowing the power budget.

N31 will do it at 450W, thing is N32 needs to do it at 200-250W and I don't think that's happening. It will likely need 300W or more and that's just not acceptable for second tier silicon, expecially when Nvidia will be offering that performance for 200W.
What is the power budget?

Intel is using a comical power budget for their CPUs, and getting away with it.

In theory, AMD can use up to the power budget of Navi 31 for Navi 32.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,555
6,765
136
What is the power budget?

Intel is using a comical power budget for their CPUs, and getting away with it.

In theory, AMD can use up to the power budget of Navi 31 for Navi 32.

-If AMD wouldn't do 450W on their top model of card to get a clear victory over the 4080, they're not gonna go to 300W on the power budget for the more "mainstream" part. That's a lot of head to dissipate in mid-tower cases and starts getting into a solid -con territory when compared to the 4070.

My assumption anyway, we'll see what happens
 

Mopetar

Diamond Member
Jan 31, 2011
7,784
5,879
136
Well, since it's most likely that I'll be picking up a 7000 series AMD GPU at some point, I certainly do hope they improve it.

It's pretty unlikely that they're going with a radically different design that doesn't build off the changes they made in going from RDNA2 to RDNA3, so they'll have to sort it all out eventually.

I couldn't see them withholding driver improvements from 7xxx series cards just to make the next generation look better by comparison, but I've been wrong before.
 

Joe NYC

Golden Member
Jun 26, 2021
1,864
2,147
106
-If AMD wouldn't do 450W on their top model of card to get a clear victory over the 4080, they're not gonna go to 300W on the power budget for the more "mainstream" part. That's a lot of head to dissipate in mid-tower cases and starts getting into a solid -con territory when compared to the 4070.

My assumption anyway, we'll see what happens
7900 XTX has power limit of 355W.

AMD could do 300 W for 7800 XT. Or call it 7800 XTX for extra crunchy. That might be able to get close to 3 GHz, without major revisions in silicon.
 
  • Like
Reactions: Tlh97

Ajay

Lifer
Jan 8, 2001
15,332
7,787
136

GodisanAtheist

Diamond Member
Nov 16, 2006
6,555
6,765
136
Were there any who wasn't too wide of the mark on RDNA3?

-Don't think so. I just went through the first 20 or so pages and Jesus the hype monster around RDNA3 was just off the charts. Even those of us poking fun at the hype train got sucked into the nutty theory crafting and outrageous performance expectations.
 
  • Like
Reactions: Lodix

RnR_au

Golden Member
Jun 6, 2021
1,641
3,973
106
"All models are wrong. Some are useful" - George Box

The model used was that of the RDNA2. Then two pieces of data was then inserted into the model; "well north of 3Ghz clock speed", and the "2x CU" - and from this combo many a tweet and youtube click bait were created.

Both data points were sorta correct. Parts of the RDNA3 can indeed run at a very high clock speed, but not every part, and not under every work load. The "2 x CU" is sorta reflected in the 58B transistors of the N31 vs the 26.8B of the N21. Potential compiler issues were never discussed afaik.

I was one of those that enthusiastically posted rumour tweets with abandon. Meh.

But hype trains are always fun. Its just the after party that sucks :p
 

Timorous

Golden Member
Oct 27, 2008
1,514
2,453
136
-Don't think so. I just went through the first 20 or so pages and Jesus the hype monster around RDNA3 was just off the charts. Even those of us poking fun at the hype train got sucked into the nutty theory crafting and outrageous performance expectations.

Well it was actually correct. The 6900XT at average sustained clocks is around 23 Tflops and the 7900XTX is around 65 Tflops. That is a 2.8x increase which is what the RGT and many others claimed so from that perspective they were spot on.

Issue is that it was achieved through 20% more shaders and making them dual issue instead of the rumoured > 2x increase in shaders.

The reason people thought that might even be possible at all is because AMD did similar when going from RV670 with 320 shaders on 55nm to RV770 with 800 shaders also on 55nm. They also doubled it from 800 shaders to 1600 shaders in the 5870. I remember the 4870 hype train and I remember the disbelief around the rumoured 800 shaders. When it launched with 800 shaders, ~2x the 3870 performance and a very aggressive price point it was pretty sweet. I wonder if all AMD hype trains since have tried to match that.

IE AMD had pulled it off before and given RDNA 2 -> RDNA 3 also included a die shrink (for the shaders at least) it did not seem impossible.

Ultimately though a 2x increase in transistors and a 2.8x increase in raw compute performance should have led to a greater uplift than 49% over the 6900XT at 4K. Something is broken with N31 at least and possibly with RDNA 3 overall because RDNA 2 managed to pretty much match Ampere when it comes to perf/transistor where as ADA far surpasses the perf/transistor of RDNA 3 so far.
 
  • Like
Reactions: Tlh97

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
-Therein lies the problem. I don't think AMD is getting 60 CUs to 3Ghz without blowing the power budget.

N31 will do it at 450W, thing is N32 needs to do it at 200-250W and I don't think that's happening. It will likely need 300W or more and that's just not acceptable for second tier silicon, expecially when Nvidia will be offering that performance for 200W.
Only RTX 4070 has 200W TBP and performs like RX 6800XT.
You don't need 3GHz for that level of performance.
RTX 4070Ti despite being a second(third?) tier silicon has 285W TBP.
If AMD wanted, then they can set It to 300W. It would still be lower than 315W for RTX 7900XT, but even this wouldn't allow 3GHz in every game.

The problem with RDNA3 is that It can't keep high clocks in everything.
In Tiny Tina's Wonderlands It can clock to 2715MHz, but in F1 22 + RT you are down to only 2380MHz. That's 14% difference in clockspeed!
Screenshot_3.png
I don't think N32 will do much better than N31 in this regard.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
Ultimately though a 2x increase in transistors and a 2.8x increase in raw compute performance should have led to a greater uplift than 49% over the 6900XT at 4K. Something is broken with N31 at least and possibly with RDNA 3 overall because RDNA 2 managed to pretty much match Ampere when it comes to perf/transistor where as ADA far surpasses the perf/transistor of RDNA 3 so far.
The performance uplift is pretty good considering dual issue provides just a few % to performance and specs weren't improved that much(+20% CU, shaders, TMUs and +50% ROPs).

What I don't understand is why It needed so much more transistors, just the GCD has 70% more transistors than the whole N21(including 128MB IC and 256-bit controller).
N33 needed ~20% more compared to N23, which is also very high for what you got in return.
I think those interconnects used a lot of transistors from the increased budget.
 
  • Like
Reactions: Lodix

Timorous

Golden Member
Oct 27, 2008
1,514
2,453
136
The performance uplift is pretty good considering dual issue provides just a few % to performance and specs weren't improved that much(+20% CU, shaders, TMUs and +50% ROPs).

What I don't understand is why It needed so much more transistors, just the GCD has 70% more transistors than the whole N21(including 128MB IC and 256-bit controller).
N33 needed ~20% more compared to N23, which is also very high for what you got in return.
I think those interconnects used a lot of transistors from the increased budget.

The obvious reason is they expected those transistors to bring more performance than they actually do.

Bare minimum a 2x increase in transistors should yield a 70% performance boost or you are making the die larger for no reason.
 
  • Like
Reactions: Tlh97 and KompuKare

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
The obvious reason is they expected those transistors to bring more performance than they actually do.

Bare minimum a 2x increase in transistors should yield a 70% performance boost or you are making the die larger for no reason.
AMD expected higher performance? From what? Dual-issue or frequency? You don't even know how many extra transistors were used for that, so you can't say for sure If It underperforms or not compared to the allocated transistor budget. Ok, maybe we can say that about dual-issue, but without any software(game) optimizations It's not surprising It helps very little.

3090Ti(GA102): 28.3 billion transistors
4090(AD102): 76.3 billion transistors
+169% in transistors for +44.8% in raster and 53.3% in RT, but that's a cutdown version.
Let's say 4090Ti will bring another 20% to performance.
The end result is +169% in transistors for +73.8% in raster and 84% in RT.
Let's not forget that a big chunk of that increase comes from higher clocks.

That is a lot worse than your bare minimum.:D N21 vs N10 was at least a lot closer to your claim, only ~15% extra performance was missing at 4K.
My point is that you can't expect let's say 50% increase in performance just because the transistor budget was increased by 100% for example. First, we would need to know what those transistors were used for.
 
Last edited:
  • Like
Reactions: Tlh97

Timorous

Golden Member
Oct 27, 2008
1,514
2,453
136
AMD expected higher performance? From what? Dual-issue or frequency? You don't even know how many extra transistors were used for that, so you can't say for sure If It underperforms or not compared to the allocated transistor budget. Ok, maybe we can say that about dual-issue, but without any software(game) optimizations It's not surprising It helps very little.

3090Ti(GA102): 28.3 billion transistors
4090(AD102): 76.3 billion transistors
+169% in transistors for +44.8% in raster and 53.3% in RT, but that's a cutdown version.
Let's say 4090Ti will bring another 20% to performance.
The end result is +169% in transistors for +73.8% in raster and 84% in RT.
Let's not forget that a big chunk of that increase comes from higher clocks.

That is a lot worse than your bare minimum.:D N21 vs N10 was at least a lot closer to your claim, only ~15% extra performance was missing at 4K.
My point is that you can't expect let's say 50% increase in performance just because the transistor budget was increased by 100% for example. First, we would need to know what those transistors were used for.

In general. AMD would not have used that many transistors if they didn't think it would bring performance via IPC or increased clocks. Something has meant that IPC has not really improved much and clocks have only moved on a little despite the massive transistor spend. The fact N31 is actually capable of very high clock speeds also speaks to this. The design is capable it just uses too much juice to get there which might be fixable.

Fermi to Fermi 2.0 is a good example because there were not huge changes to the architecture. GF100 went from 3.1M transistors to 3M in GF110 yet NV improved performance by 24% and lowered power use RDNA 3 feels a lot like Fermi.

NV are in a slightly different situation because their are sold in very expensive profressional grade parts at far higher volume than AMD. That means NV can spend transistors to improve performance in workloads that don't help gaming because they have a market for it. AMD does not have that luxury which is why features like half rate FP64 have been reduced over time with the GCN and then RDNA cards. It takes up transistors for something that does not bring enough of a return. Also the 4090 gets CPU limited even at 4K so the gains over the 3090Ti are very game suite dependent. In the 2023 ComputerBase.de game suite the 4090 is 58% faster than the 3090Ti because it uses a lot of newer games that are more GPU limited so I don't think we have seen the 4090 fully stretch its legs vs the 3090Ti.

N10 to N21 was 2.6x the transistors for 2x performance. 2.17x if you do 5700XT vs 6950XT reference design. If AMD had achieved that then the 7900XTX would be 77% faster than the 6950XT which would be a reasonable design goal IMO, also that is what they alluded to in their RDNA 3 presentation which was a load of nonsense. Can't believe that presentation was allowed to go out like that because it blew up so much goodwill they had rebuilt after the vega fiasco. No idea why AMD decided to shoot themselves in the credibility foot that hard.
 

PJVol

Senior member
May 25, 2020
505
422
106
Since AMD GPUs has been split to "fps" and "compute" oriented products, does anyone have an idea, why they spent so much transistors on things like doubled SIMD32 units which they don't know (or don't care) how to utilize, GEMM accelerators, DPFP's and other AI-**** ? Isn't it a helluva lot of transistors?
Maybe I wasn't far off when I posted here on launch day that Vega64 dejavu is coming...

And btw, if no one noticed, MI300 probably has a similar type of CUs given there are just 228 of them in Apu version and 220 in mi200/250
 
Last edited:
  • Like
Reactions: Tlh97

GodisanAtheist

Diamond Member
Nov 16, 2006
6,555
6,765
136
Only RTX 4070 has 200W TBP and performs like RX 6800XT.
You don't need 3GHz for that level of performance.
RTX 4070Ti despite being a second(third?) tier silicon has 285W TBP.
If AMD wanted, then they can set It to 300W. It would still be lower than 315W for RTX 7900XT, but even this wouldn't allow 3GHz in every game.

The problem with RDNA3 is that It can't keep high clocks in everything.
In Tiny Tina's Wonderlands It can clock to 2715MHz, but in F1 22 + RT you are down to only 2380MHz. That's 14% difference in clockspeed!
View attachment 81758
I don't think N32 will do much better than N31 in this regard.

-N32 ain't gonna compete with the 4070ti, the 7900xt is already doing that and N32 just doesn't have the muscle. So it needs to stick around a 200-250w TDP.

In order to make-up the 12 CU deficit to the 6800xt, N32 is going to need all the clock speed it can get and it sounds like we're on the same page that it's unfortunately not going to get it.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
In general. AMD would not have used that many transistors if they didn't think it would bring performance via IPC or increased clocks. Something has meant that IPC has not really improved much and clocks have only moved on a little despite the massive transistor spend. The fact N31 is actually capable of very high clock speeds also speaks to this. The design is capable it just uses too much juice to get there which might be fixable.
I think you should read this.:cool:

We know that N33 has 20% more transistors(13.3 vs 11.1) than N23 despite the same HW specs, but this includes every single improvement in that chip.
So basically only 20% more transistors would have been needed to have an RDNA3 based N21, let's call It N30. Ok a bit more, because N33 has only 128 KB register file per SIMD, instead of 192KB registers used in N31, so N33 is missing 4MB of registers. I will give 3% extra to that(~330 millions transistors per 4MB).

You will end up with this:
CU (WGP)Shaders(FP32)TMUsROPsMemory widthInfinity CacheMemoryTBPTransistors
N3080(40)5120(10240)320128256-bit128MB16GB?33 billions
(26.8*1.23)
RX 7900 XT84(42) [+5%]5376(10752) [+5%]336 [+5%]192 [+50%]320-bit [+25%]80MB [-37.5%]20GB [+25%]315W57.7 billions [+74.8%]
RX 7900 XTX96(48) [+20%]6144(12880) [+20%]384 [+20%]192 [+50%]384-bit [+50%]96MB [-25%]24GB [+50%]355W57.7 billions [+74.8%]

If I calculated a conservative 110 million transistors per mm2, then N30 would be 300mm2 on N5. If I kept the 335W TBP then It would clock higher than 7900 XTX.
I would dare to say It wouldn't have worse performance than 7900 XT, which is 25.5% faster in 4K raster and 34% in 4K RT than RX 6900 XT.
All this performance for only 23% more transistors, not bad.:)

Instead of asking about IPC or frequency and what went wrong with them, you should ask why N31 needs 24.7 billion(75%) more transistors compared to this imaginary N30, that difference is almost the amount of transistors in N21. :D
 
Last edited:
  • Like
Reactions: Tlh97 and KompuKare

TESKATLIPOKA

Platinum Member
May 1, 2020
2,312
2,798
106
-N32 ain't gonna compete with the 4070ti, the 7900xt is already doing that and N32 just doesn't have the muscle. So it needs to stick around a 200-250w TDP.

In order to make-up the 12 CU deficit to the 6800xt, N32 is going to need all the clock speed it can get and it sounds like we're on the same page that it's unfortunately not going to get it.
Maybe I misunderstood the part where you mentioned 60CU and 3GHz.

You are kinda overestimating RX 6800 XT or underestimating N32.:)
RX 6800XT is only 15% faster than RX 6800 despite having 20% more CU(shaders, TMUs) and also having higher clocks.TPU
N32 is exactly RX 6800 except 1/2 of Infinity cache, but with likely higher clocked memory.
It doesn't look like N32 needs to have higher clocks than 2600MHz to be on par with 6800xt in raster and be faster in RT. For this clockspeed, I don't think It needs more than 250W.
 
Last edited:

KompuKare

Golden Member
Jul 28, 2009
1,004
900
136
I fully expect N32 GPUs to be worse value than N21/N22 cards at first. Then at best we can hope for prices to drop to more attractive levels over time.
Aside from any inventory issues, AMD's misplaced price pride, AMD also crave poor reviews
I think you should read this.:cool:

We know that N33 has 20% more transistors(13.3 vs 11.1) than N23 despite the same HW specs, but this includes every single improvement in that chip.
So basically only 20% more transistors would have been needed to have an RDNA3 based N21, let's call It N30. Ok a bit more, because N33 has only 128 KB register file per SIMD, instead of 192KB registers used in N31, so N33 is missing 4MB of registers. I will give 3% extra to that(~330 millions transistors per 4MB).

You will end up with this:
CU (WGP)Shaders(FP32)TMUsROPsMemory widthInfinity CacheMemoryTBPTransistors
N3080(40)5120(10240)320128256-bit128MB16GB?33 billions
(26.8*1.23)
RX 7900 XT84(42) [+5%]5376(10752) [+5%]336 [+5%]192 [+50%]320-bit [+25%]80MB [-37.5%]20GB [+25%]315W57.7 billions [+74.8%]
RX 7900 XTX96(48) [+20%]6144(12880) [+20%]384 [+20%]192 [+50%]384-bit [+50%]96MB [-25%]24GB [+50%]355W57.7 billions [+74.8%]

If I calculated a conservative 110 million transistors per mm2, then N30 would be 300mm2 on N5. If I kept the 335W TBP then It would clock higher than 7900 XTX.
I would dare to say It wouldn't have worse performance than 7900 XT, which is 25.5% faster in 4K raster and 34% in 4K RT than RX 6900 XT.
All this performance for only 23% more transistors, not bad.:)

Instead of asking about IPC or frequency and what went wrong with them, you should ask why N31 needs 24.7 billion(75%) more transistors compared to this imaginary N30, that difference is almost the amount of transistors in N21. :D
I still can't help by think that someone at AMD is super proud that they've cracked the GPU chiplet problem and like the pride that time after AMD finally got HBM developed they are not paying attention to the bigger picture.

That is maybe the RDNA3 chips should have looked like this:
  1. N33 6nm monolith, 204mm² (i.e. as it is)
  2. N32 5nm monolith, 300mm²
  3. N31 5nm monolith, 450mm²
  4. N30 5nm chiplet. 550mm² GCD
In terms of risk management, 2. and 3. would have been far less risky.
For 4., making a chiplet design and not taking advantage of the possibility is crazy. Yes, we know AMD are now margins obsessed (and for Radeon that seem to mean make things cheap because we lack the marketshare to spread the fixed design costs across volume sales), but a 500-600 GCD even if running at far saner clocks closer to the process perf/watt sweetpoing should have outperformed AD102 easily.

In fact as a low volume halo part, they could even have experimented with 3D stacked cache on the MCDs.
s