Question Speculation: RDNA2 + CDNA Architectures thread

Page 175 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,983
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

maddie

Diamond Member
Jul 18, 2010
4,746
4,689
136

Uhhh, has anyone seen these benchmarks?

What the hell?
Validates my prediction from several days ago. The fall off in performance with infinity cache as resolution increases would be less than previous generations, thus more competitive at lower resolutions.

The patent on the elimination of cache data replication could explain the lowered bandwidth needed. If you don't need to load multiple instances of either shader code or data, then that would free up a lot of data movement.

Let's assume a fps of 60, then you have 16.66ms/frame to store all of the necessary code and data for that frame. If you can now drop that data volume by 50% due to the elimination of replicated data in the local caches, you can get by with a 1/2 sized bus.

Maybe this is why AMD is concentrating on 4K in their messaging, as it seems that with higher framerates (lower rez), this sort of optimization will start to be overwhelmed as the time allowed for updating new data is reduced. Bandwidth x frametime (total data) falls below the necessary number needed for that new frame, even with no replication of data
.

Nail waiting to be hammered prediction:
If this is close to being correct, we should see the RX 6xxx cards having less of a % reduction in framerate as resolution increases. This is opposite to what we've become accustomed to.
 
  • Like
Reactions: kurosaki

Hitman928

Diamond Member
Apr 15, 2012
5,321
7,999
136
Validates my prediction from several days ago. The fall off in performance with infinity cache as resolution increases would be less than previous generations, thus more competitive at lower resolutions.

The patent on the elimination of cache data replication could explain the lowered bandwidth needed. If you don't need to load multiple instances of either shader code or data, then that would free up a lot of data movement.

Let's assume a fps of 60, then you have 16.66ms/frame to store all of the necessary code and data for that frame. If you can now drop that data volume by 50% due to the elimination of replicated data in the local caches, you can get by with a 1/2 sized bus.

Maybe this is why AMD is concentrating on 4K in their messaging, as it seems that with higher framerates (lower rez), this sort of optimization will start to be overwhelmed as the time allowed for updating new data is reduced. Bandwidth x frametime (total data) falls below the necessary number needed for that new frame, even with no replication of data
.

Nail waiting to be hammered prediction:
If this is close to being correct, we should see the RX 6xxx cards having less of a % reduction in framerate as resolution increases. This is opposite to what we've become accustomed to.

The benchmarks shown on the website are showing the 6800XT increasing its lead at 1440p versus 4K over the 3080. In other words, the 6800XT does better competitively than the 3080 at a lower resolution compared to the higher resolution. Am I missing something?
 
  • Like
Reactions: lightmanek

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
The benchmarks shown on the website are showing the 6800XT increasing its lead at 1440p versus 4K over the 3080. In other words, the 6800XT does better competitively than the 3080 at a lower resolution compared to the higher resolution. Am I missing something?
That its RTX 3080 that is bottlenecked by its design, and not any of RDNA2 GPUs.

What Maddie says is that as the resolution increases, RDNA2 GPUs remain their performance scaling.
 
  • Like
Reactions: guachi and Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,848
6,001
136
RX5700XT Anniversary Edition
RX5700XT
RX5700
RX5600XT

I pretty much skipped over Navi 10 so I didn't realize it was split like that. Actually it was binned 5 ways though, not 4.

Navi 10 XE
Navi 10 XLE
Navi 10 XL
Navi 10 XT
Navi 10 XTX (though this was really just a clock speed bump and didn't have extra hardware)

I suppose that when you really only have 2 dies total and only one of them is a mainstream part you've got to stretch it a bit more.
 

maddie

Diamond Member
Jul 18, 2010
4,746
4,689
136
The benchmarks shown on the website are showing the 6800XT increasing its lead at 1440p versus 4K over the 3080. In other words, the 6800XT does better competitively than the 3080 at a lower resolution compared to the higher resolution. Am I missing something?
I mean that the % fps difference of the 6xxx cards would be smaller than normal between lower & higher resolutions, so whatever their ranking at 4K, they would compete better as resolution falls. Normally we expect the big guys to really show their power as resolution increases, but I though that the IC would work in reverse, even though they are obviously still very competitive at 4K.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
I mean that the % fps difference of the 6xxx cards would be smaller than normal between lower & higher resolutions, so whatever their ranking at 4K, they would compete better as resolution falls. Normally we expect the big guys to really show their power as resolution increases, but I though that the IC would work in reverse, even though they are obviously still very competitive at 4K.
Effectively - its best of both worlds ;).
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
7,999
136
That its RTX 3080 that is bottlenecked by its design, and not any of RDNA2 GPUs.

What Maddie says is that as the resolution increases, RDNA2 GPUs remain their performance scaling.

I don't think that's what he meant, but maybe I'm just confused.

Edit:

I mean that the % fps difference of the 6xxx cards would be smaller than normal between lower & higher resolutions, so whatever their ranking at 4K, they would compete better as resolution falls. Normally we expect the big guys to really show their power as resolution increases, but I though that the IC would work in reverse, even though they are obviously still very competitive at 4K.

Ok, maybe I'm just confused by how you worded it.

Maybe this is why AMD is concentrating on 4K in their messaging, as it seems that with higher framerates (lower rez), this sort of optimization will start to be overwhelmed as the time allowed for updating new data is reduced.

This seems to be the opposite of what you are saying now? Why would they focus their messaging on 4K if they were more competitive at lower resolutions? What do you mean by the optimizations being overwhelmed at higher framerates but that they would be more competitive at lower resolutions. I'm all kinds of confused. The big cards usually don't scale in fps as resolution lowers because of CPU bottlenecks, not because of anything to do with the GPU.

Either way, the infinity cache should perform better at lower resolutions. The higher the resolution goes, the more likely it is that you will have cache misses and be forced to go out to VRAM. This doesn't look like it's a problem at 4K with today's games so it seems like AMD found a good balance in cache size (pending full reviews of course)
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
I don't think that's what he meant, but maybe I'm just confused.

Edit:



Ok, maybe I'm just confused by how you worded it.



This seems to be the opposite of what you are saying now? Why would they focus their messaging on 4K if they were more competitive at lower resolutions? What do you mean by the optimizations being overwhelmed at higher framerates but that they would be more competitive at lower resolutions. I'm all kinds of confused.

Either way, the infinity cache should perform better at lower resolutions. The higher the resolution goes, the more likely it is that you will have cache misses and be forced to go out to VRAM. This doesn't look like it's a problem at 4K with today's games so it seems like AMD found a good balance in cache size (pending full reviews of course)
:D

Maddie says that in previous years, the higher in resolution you go, the more performance is unleashed, because the GPUs are not getting bottlenecked by any part of the pipeline.

RDNA2 changes this, and in lower resolutions you still get the same performance scaling as in 4K.

Which is why the lower you will go in resolution with RDNA2, the bigger performance advantage you'll see over Ampere.
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
7,999
136
:D

Maddie says that in previous years, the higher in resolution you go, the more performance is unleashed, because the GPUs are not getting bottlenecked by any part of the pipeline.

RDNA2 changes this, and in lower resolutions you still get the same performance scaling as in 4K.

Which is why the lower you will go in resolution with RDNA2, the bigger performance advantage you'll see over Ampere.

Again though, the issue with scaling between card tiers at lower resolutions had far more to do with CPU bottlenecks than anything in the GPU. So is he proposing that infinity cache fixes this somehow?
 

swilli89

Golden Member
Mar 23, 2010
1,558
1,181
136
If (it's a big IF) AMD can claim the absolute performance crown with Ryzen 5000 + Radeon 6000, tests should absolutely make a point to include SAM.

Another reason to base all future tests off Ryzen 5000 is if it captures the performance crown which, given the information we have at present, is likely to happen.

Those spending $700-$1000 on a GPU are the exact crowd that will also look to purchase the fastest available gaming CPUs. If those same CPUs, which by themselves are the best option for gamers, happen to give new Radeon cards a performance boost via SAM, then it's logical to include those results. It's hardware you can buy and put together for those results, at default.

Zen+ and Zen2 owners are looking at fairly large upgrades simply switching to Ryzen 5000. However, since B450 motherboards likely won't receive BIOS updates to support Zen3 there will be a lag of several months before they make the jump.
 

Hitman928

Diamond Member
Apr 15, 2012
5,321
7,999
136
If (it's a big IF) AMD can claim the absolute performance crown with Ryzen 5000 + Radeon 6000, tests should absolutely make a point to include SAM.

Another reason to base all future tests off Ryzen 5000 is if it captures the performance crown which, given the information we have at present, is likely to happen.

Those spending $700-$1000 on a GPU are the exact crowd that will also look to purchase the fastest available gaming CPUs. If those same CPUs, which by themselves are the best option for gamers, happen to give new Radeon cards a performance boost via SAM, then it's logical to include those results. It's hardware you can buy and put together for those results, at default.

Zen+ and Zen2 owners are looking at fairly large upgrades simply switching to Ryzen 5000. However, since B450 motherboards likely won't receive BIOS updates to support Zen3 there will be a lag of several months before they make the jump.

You need a Zen3 CPU and a B550 or B570 motherboard to enable SAM. If you have a Zen3 CPU on B450/470, no SAM for you.
 

maddie

Diamond Member
Jul 18, 2010
4,746
4,689
136
Again though, the issue with scaling between card tiers at lower resolutions had far more to do with CPU bottlenecks than anything in the GPU. So is he proposing that infinity cache fixes this somehow?
This is the thinking, but we might have to reconsider, just as we thought that CPU IPC increases were stagnating due to fundamental limits.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
Again though, the issue with scaling between card tiers at lower resolutions had far more to do with CPU bottlenecks than anything in the GPU. So is he proposing that infinity cache fixes this somehow?
Well Architectural design can either diminish the CPU bottlenecks(which Maddie exactly points out), or magnify them.

Heavy compute designs appear to be doing the second part, as per Pre RDNA1 and per Ampere results, while every Pre-Ampere Nvidia GPU architecture and RDNA2 appear to be diminishing the CPU bottlenecks.

They really switched places from architectural perspective :).
 
  • Like
Reactions: spursindonesia

zinfamous

No Lifer
Jul 12, 2006
110,597
29,231
146
Let's wait for benchmarks, then people can make informed decisions about all the possible compromises they can afford to make with their partly cutting edge gaming systems. Does that mean that the additional performance possible with SAM (that likely makes good use of PCIe 4) shouldn't be tested? I don't think so.

What is this poppycock???

:D
 
  • Haha
Reactions: soresu

Hitman928

Diamond Member
Apr 15, 2012
5,321
7,999
136
Well Architectural design can either diminish the CPU bottlenecks(which Maddie exactly points out), or magnify them.

Heavy compute designs appear to be doing the second part, as per Pre RDNA1 and per Ampere results, while every Pre-Ampere Nvidia GPU architecture and RDNA2 appear to be diminishing the CPU bottlenecks.

They really switched places from architectural perspective :).

Having 'light' versus 'heavy' compute doesn't diminish CPU bottlenecks. For instance, you take a 3090 and 2080Ti at 720p, they are both severely CPU bottlenecked and won't show much scaling, if any depending on the game and settings. You move to 4K and now the CPU can feed both cards faster than they can render so then they are limited by their compute and rasterization. If I take the 3090 and remove the double fp32 and give it more rasterization units, it's not all of a sudden going to perform better at 720p because the 3090 was already maxing out the CPU.
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
Well the 3080 does better vs old architecture at 4k vs 1440p, most just assumed this was because of being more CPU bound, which may be part. But seemed more due to it's design, sort of reminded me of was it Vega, which also was relatively slower at lower resolutions but did better as resolution increased.
 

thigobr

Senior member
Sep 4, 2016
232
166
116
Smart Access Memory isn't new or exclusive to Zen3+500 chipsets... And it's an open spec.

Smart Access Technology works just fine on Linux. It is resizeable BAR support which Linux has supported for years (AMD actually added support for this), but which is relatively new on windows. You just need a platform with enough MMIO space. On older systems this is enabled via sbios options with names like ">4GB MMIO".
I don't know what windows does exactly, but on Linux at least, it will work on any platform with enough MMIO space. I suspect windows would behave the same way (although I think windows has stricter requirements about BAR resizing compared to Linux so you may need a sbios update for your platform to make windows happy).

Source:
 

kurosaki

Senior member
Feb 7, 2019
258
250
86
You'd only get 63GB/s if you saturated both the up and downstream links with PCIe 4.0 x16, no? DDR4 3200 would already be 25.6GB/s for single channel one way and twice that in dual channel.
the 63 or 64 is all nitpicky depending on EC and so on, boo for that. The whole, Kurosaki, you forgot to multiply by a factor of two. On that I commend you! How could i not have thought things through more than single channel. but still, around PCIex5, things start to get interesting I guess!