Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 136 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,621
5,870
146

biostud

Lifer
Feb 27, 2003
18,236
4,755
136
I'm mostly managing my own expectations about being able to get a higher end GPU at affordable prices. the answer is "No".
For me living in Denmark where we don't use € but DKK, as a rule of thumb you can multiply the US prices with ten to find the prices here. So whether that is affordable or not, i has always been this way.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
Just look at the specs.

7900XT6950XTDelta7800XT(N32)6800Delta
84 CU80 CU+ 5%60 CU60 CU0%
192 ROPs128 ROPs+ 50%96 ROPs96 ROPs0%
800 GBps576 GBps+ 39%640 GBps512 GBps+25 %
Overall FPS DeltaPer AMD slides+ 30%

If we take AMD at face value that the RDNA3 CUs are 17% faster clock for clock then to hit 6800XT + 30% you need to be about 50% faster than the 6800 at 4K so that would require 30% higher clocks. The 6800 manages to sustain between 2.1 and 2.2 Ghz so call it 2.15. That means overall clocks for N32 need to be 2.8Ghz to hit the desired performance target of 6800XT + 30%.

Even with 30% higher clocks though it would have a lower pixel fillrate than the 6800XT and given a minimum gen on gen gain of 30% you probably need more pixel fill (perhaps not 30% more but some amount more). To do that you need more like + 50% clocks which is 3.2Ghz.

Provided it can clock that high (and AMD did say RDNA3 was architected to hit > 3Ghz and N32 is the ideal die for that to be true) it will work out fine. It it falls short then the gen on gen game will be lacklustre. Good pricing could make up for it but I don't think AMD will want to charge less than $650.
If pixel fillrate is low, then texture fillrate is also a problem. Having only 3 SE is also a problem, just to be on par you need 33% higher clocks.
Texture fillrate:
RX6800XT: 72*4*2250 = 648
RX6950XT: 80*4*2310 =739
N32: 60*4*2700 = 648
N32: 60*4*3079 =739
I don't have high expectation for performance.
To be faster than N21, not sure by how much, you would need >3GHz, and I am a bit skeptical.
N33's specs looks even worse.
 

leoneazzurro

Senior member
Jul 26, 2016
919
1,450
136
You are not considering other factors, like the increased L0/L1 cache, i.e., and other architectural improvements. It depends a lot also on where the bottleneck is. The comparison above, i.e., puts RDNA3 CU vs RDNA3 CU but RDNA3 CU has double issue, giving a performance increase of 17% per CU according to AMD's words (and that is probably calculated on final FPS, not simply on FLOPs). Also, clocks on 7900XT are nominally lower than those of 6950XT. So I don't see why a reasonably clocked (2.7-2.8GHz) 7800XT cannot be on par or slightly above the 6950XT. My bet is on around 5-10% more (-20/25% respect to the 7900XT).
 
  • Like
Reactions: Tlh97 and Kaluan

Kaluan

Senior member
Jan 4, 2022
500
1,071
96
Clocks on Navi32 are bound to be very high. I think 2,7-2,8 boost clock is the low ball. The clock delta between N21 and N22 designs was pretty massive. If N32 follows a similar trend, an over 2,4GHz base clock would not surprise.

Exceeding 30% higher boost than N21 is realistic.


Side note: has anyone else read up on Samsung's new "GDDR6W" technology press release?
 
  • Like
Reactions: Tlh97 and Joe NYC

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
You are not considering other factors, like the increased L0/L1 cache, i.e., and other architectural improvements. It depends a lot also on where the bottleneck is. The comparison above, i.e., puts RDNA3 CU vs RDNA3 CU but RDNA3 CU has double issue, giving a performance increase of 17% per CU according to AMD's words (and that is probably calculated on final FPS, not simply on FLOPs). Also, clocks on 7900XT are nominally lower than those of 6950XT. So I don't see why a reasonably clocked (2.7-2.8GHz) 7800XT cannot be on par or slightly above the 6950XT. My bet is on around 5-10% more (-20/25% respect to the 7900XT).
Those improvements you mentioned are part of 17.4% increase, and they are likely talking about WGPs not CUs or maybe WGP sees higher increase, but they didn't mention It yet.
2nzd2VPZ7VyshqC32tXE6c-970-80.jpg.webp


I don't see a problem with adding that 17.4% increase to FLOPs. It looked about right for N31.

If you exclude architectural improvements of 17.4% and clocks, then N32 based on specs is ~1.5x of N22.
6950XT is 69% faster than 6700XT.
If you add specs and architecture together 1.5*1.174 = 1.761, then N32 is 4% faster at only 2583MHz than RX6950XT.
The problem is that increasing specs by 50% doesn't translate to 50% higher performance, so the question is how much performance you loose. The lost performance due to scaling needs to be compensated by clocks.
If the loss is within 10% then yes, N32 at 2.8GHz could be a few % faster.
What I want to point out is that 2.8GHz is just the shader clockspeed, the Frontend will be clocked higher, based on N31 It should be 3050 MHz.
I would like to see 3GHz for shaders and 3250-3300MHz for frontend.
Then I can see It being 10% faster or a bit more.
 
  • Like
Reactions: Tlh97 and Joe NYC

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
Side note: has anyone else read up on Samsung's new "GDDR6W" technology press release?
I saw It, and It looked very interesting until I saw that IO is widening from 32 to 64.
For example with a 64bit bus on a GPU you will have the same capacity, bandwidth, but only one memory chip instead of two. This just saves space and cost, but doesn't allow more memory for the same bus width. Maybe clamshell config could be more acceptable this way, If this new memory won't be significantly costlier, but that's not possible for mobile.

For example N31 wouldn't have 12 chips but only 6. Bandwidth and amount of Vram wouldn't change.
 

Glo.

Diamond Member
Apr 25, 2015
5,704
4,548
136
I saw It, and It looked very interesting until I saw that IO is widening from 32 to 64.
For example with a 64bit bus on a GPU you will have the same capacity, bandwidth, but only one memory chip instead of two. This just saves space and cost, but doesn't allow more memory for the same bus width. Maybe clamshell config could be more acceptable this way, If this new memory won't be significantly costlier, but that's not possible for mobile.

For example N31 wouldn't have 12 chips but only 6. Bandwidth and amount of Vram wouldn't change.
It also saves power. You have one chip moving data, instead of two.

Whole GDDR6W almost looks like it wasn't designed for GPUs, but for CPUs, SOCs and APUs.

P.S. 64 bit, 24 Gbps GDDR6W will have 192 GB/s bandwidth with 4 GB capacity. That is enough as a VRAM for smaller APU/SOC GPUs.
 
  • Like
Reactions: Tlh97 and Kaluan

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
It also saves power. You have one chip moving data, instead of two.

Whole GDDR6W almost looks like it wasn't designed for GPUs, but for CPUs, SOCs and APUs.
Are you sure It saves power? It looks like 2 chips on top of each other and bus width confirms It.

Then they wouldn't call It GDDR6W, because that's for graphics and the speed is also 22gbps.
Consoles
 
  • Like
Reactions: Tlh97 and Kaluan

Glo.

Diamond Member
Apr 25, 2015
5,704
4,548
136
Then they wouldn't call It GDDR6W, because that's for graphics and the speed is also 22gbps.
GDDR6 should do same type of work as DDR. 4700S/4800S already has proven it. The only problem with it is higher latency, but that can be solved with low-latency caches.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
GDDR6 should do same type of work as DDR. 4700S/4800S already has proven it. The only problem with it is higher latency, but that can be solved with low-latency caches.
4700/4800s are console based SoCs, and they used GDDR6 as a unified cache for both CPU and GPU.
AMD, Intel had more than enough time to start using them, yet they still use DDR4 or DDR5.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
P.S. 64 bit, 24 Gbps GDDR6W will have 192 GB/s bandwidth with 4 GB capacity. That is enough as a VRAM for smaller APU/SOC GPUs.
1.) We know all too well how 6500XT fared with only 4GB Vram. You yourself posted a link to the 8GB version, which was so much better.
2.) With 4GB Vram for GPU you still need additional RAM for CPU.
24gbps memory won't be cheap either.
This wouldn't be much help to either RDNA3 or Phoenix.
It could help If the bus width will be widened to 512 or 768bit, because realistically 384bit bus and 12 chips is the limit.
 
  • Like
Reactions: Tlh97 and Joe NYC

Glo.

Diamond Member
Apr 25, 2015
5,704
4,548
136
4700/4800s are console based SoCs, and they used GDDR6 as a unified cache for both CPU and GPU.
AMD, Intel had more than enough time to start using them, yet they still use DDR4 or DDR5.
Second reason: vastly higher power draw for GDDR6 memory subsytem than for DDR5, and especially: LPDDR5/X.
 

Glo.

Diamond Member
Apr 25, 2015
5,704
4,548
136
1.) We know all too well how 6500XT fared with only 4GB Vram. You yourself posted a link to the 8GB version, which was so much better.
2.) With 4GB Vram for GPU you still need additional RAM for CPU.
24gbps memory won't be cheap either.
This wouldn't be much help to either RDNA3 or Phoenix.
It could help If the bus width will be widened to 512 or 768bit, because realistically 384bit bus and 12 chips is the limit.
With GDDR6W you need 6 chips, not 12 to achieve the same bandwidth and capacity(384 bit, 24 GB VRAM).
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
Second reason: vastly higher power draw for GDDR6 memory subsytem than for DDR5, and especially: LPDDR5/X.
So of course we can't expect them to start using them as an universal memory.
With GDDR6W you need 6 chips, not 12 to achieve the same bandwidth and capacity(384 bit, 24 GB VRAM).
I wrote the same thing before, so what's your point? That It will save space or that It will probably cost less?
If you meant my last sentence, then I was talking about GDDR6 384bit bus and 12 chips.

If this costs less than 2 chips, then there will be demand and It will be used for GPUs, but I don't see It happening for CPUs or APUs.
Console SoCs is a high possibility.
Using It for mobile GPUs is also worth It just for saving space.
I would still prefer HBM. ;)
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,704
4,548
136
So of course we can't expect them to start using them as an universal memory.

I wrote the same thing before, so what's your point? That It will save space or that It will probably cost less?
If you meant my last sentence, then I was talking about GDDR6 384bit bus and 12 chips.
If this costs less than 2 chips, then there will be demand for GPUs.
Console SoCs is a high possibility.
Using It for mobile GPUs is also worth It just for saving space.
I would still prefer HBM. ;)
Im not arguing with you, I am just adding more to the topic of GDDR6W :p.

Well, in the topic: GDDR6W should "cost" less energy in the first place, because it has 64 bit bus for one memory package(?), so it should make it possible to be used even as a system RAM. Of course, its impossible to use it if it has only 4 GB chips, even if it delivers what... 192 GB/s bandwidth? 8 GB is the absolute minimum, and even that starts to be hardly enough. 128 bit bus packages usually would cost 16-20W of power with 4 chips. With 64 bit GDDR6W it should be halved.

So the last unknown here - the cost. It has to be financially adding up to all of its benefits.

One last thing. If those memory packages were not developed for integrating them on CPU/SOC/APU packages, why making them smaller in height?

I could easily see for example next generation Steam Decks using those memory chips.

What is the benefit of HBM over GDDR6W, apart from lower power? Lower latency?
 
  • Like
Reactions: Tlh97 and Kaluan

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
Im not arguing with you, I am just adding more to the topic of GDDR6W :p.

Well, in the topic: GDDR6W should "cost" less energy in the first place, because it has 64 bit bus for one memory package(?), so it should make it possible to be used even as a system RAM. Of course, its impossible to use it if it has only 4 GB chips, even if it delivers what... 192 GB/s bandwidth? 8 GB is the absolute minimum, and even that starts to be hardly enough. 128 bit bus packages usually would cost 16-20W of power with 4 chips. With 64 bit GDDR6W it should be halved.
If you use 2 chips with 32bit each or stacked chips with 64bit how does It save power? Even If there were some power savings, I don't believe It's anywhere near 50%, but for example even 10% is better than nothing.

So the last unknown here - the cost. It has to be financially adding up to all of its benefits.

One last thing. If those memory packages were not developed for integrating them on CPU/SOC/APU packages, why making them smaller in height?

I could easily see for example next generation Steam Decks using those memory chips.
Steam deck uses 16GB LPDDR5 and It needs to have as low consumption as possible. For GDDR6W to have 16GB you need 4 chips with 256bit bus, so this memory is already out.
Standard APUs won't use It for added cost, console SoCs(PS6, Xbox next) could use It.
It still can be used for GPUs, especially for mobile GPUs It will be useful.
Even 10% lower price would be interesting for manufacturers.

What is the benefit of HBM over GDDR6W, apart from lower power? Lower latency?
Space savings, higher bandwidth and more memory. You can have 24GB just from a single 12Hi stack with HBM3 and 819GB/s BW.
Actually, we should look the other way and ask what is the benefit of GDDR6W over HBM. The only thing I can think of is lower cost. Seriously, I want to know HBM's cost.
 

Glo.

Diamond Member
Apr 25, 2015
5,704
4,548
136
Space savings, higher bandwidth and more memory. You can have 24GB just from a single 12Hi stack with HBM3 and 819GB/s BW.
Actually, we should look the other way and ask what is the benefit of GDDR6W over HBM. The only thing I can think of is lower cost. Seriously, I want to know HBM's cost.
For HBM we know that its not the memory tech that is costly. Its the packaging technology, for the entire package.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,355
2,848
106
For HBM we know that its not the memory tech that is costly. Its the packaging technology, for the entire package.
Isn't the packaging technology similar to what RDNA3 uses? Then I don't understand why they didn't use HBM for N32 and N31.
With HBM3 you wouldn't need big cache and just the basic GCD would be enough without increasing Its size considering how much space interconnects use in N31, maybe It would be even smaller.
 
  • Like
Reactions: Tlh97 and Joe NYC

amenx

Diamond Member
Dec 17, 2004
3,892
2,101
136
If 7800XT uses the same amount of power as the 6800XT and that RDNA3 delivers 54% more performance/watt than RDNA2, then it will be 54% faster than the 6800XT ;)
I hope its price and perf vs the 7900xt is relative to how the 6800xt was to the 6900xt. If so they would have a killer card in their lineup. Problem is it would sell out pretty quick or the demand will inflate its price for quite a while.
 

biostud

Lifer
Feb 27, 2003
18,236
4,755
136
It depends on what you mean by equal.
Is AMD saying or confident that the 7800XTX will equal the 4080?

My guess is that it will be a bit slower and a lot slower in RT, and the difference will grow with higher resolutions. But likely the 4080 is also priced 60-80% higher than the 7800XT.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,783
7,112
136
There is no way the performance target for the 7800xt is not the 6950xt.

It will perform like a 6900XT (+/- 10%) some way or some how.

If it can outperform the 6900 by significant margins then AMD will lock that crap down in HW (like they did with the 6800's OC potential) and save for the mid gen refresh.