Question Speculation: RDNA2 + CDNA Architectures thread

Page 101 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,638
5,991
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

Asterox

Golden Member
May 15, 2012
1,026
1,775
136
As far as I can tell the whole Infinity Cache thing started with the l1 patent got published and went around on twitter. Im not sure who started the 128mb cache rumor but afaik the patents were only ever about l1 sharing.

Well, maybe Cache size is not that important detail.No doubt, IPC Evaluation is very important detail.

2020-10-05_164322.jpg
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
As far as I can tell the whole Infinity Cache thing started with the l1 patent got published and went around on twitter. Im not sure who started the 128mb cache rumor but afaik the patents were only ever about l1 sharing.
The "Infinity cache" trademark links to the patent "adaptive cache"... The patent is exacly the same as found later the .pdf, a crossbar comunication
 

andermans

Member
Sep 11, 2020
151
153
76
The "Infinity cache" trademark links to the patent "adaptive cache"... The patent is exacly the same as found later the .pdf, a crossbar comunication

Maybe I'm blind, but I can't find where the trademark links to the patent? (And otherwise there is no confirmation that Infinity cache is really about the shared L1 vs e.g. the big cache that some techtubers talked about or something else entirely?)
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
Well if this is the case, wouldn't this be a position to use crossfire? I understand the spec added for MGPU in DX12, but that is fairly difficult to implement.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
While I plan to get a Big Navi card, the performance of the 40CU card makes it a compelling offering from AMD.

EDIT: the 40CU card AMD is launching this year should have performance similar to the 2080ti. It will be competitive with the 3070, that is why NVIDIA is prepping a 3070ti.
 
Last edited:

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Maybe I'm blind, but I can't find where the trademark links to the patent? (And otherwise there is no confirmation that Infinity cache is really about the shared L1 vs e.g. the big cache that some techtubers talked about or something else entirely?)
Yeah... i miss read from reddit, sorry
 

stebler

Member
Sep 10, 2020
25
75
61
As far as I can tell the whole Infinity Cache thing started with the l1 patent got published and went around on twitter. Im not sure who started the 128mb cache rumor but afaik the patents were only ever about l1 sharing.
The term "Infinity Cache" was first mentioned in a RGT video which describes it as a 128 MiB cache used to compensate for the 256-bit bus of Navi 21.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,827
7,192
136
The term "Infinity Cache" was first mentioned in a RGT video which describes it as a 128 MiB cache used to compensate for the 256-bit bus of Navi 21.

-Must be something to it, I cannot imagine AMD would come up with a new caching protocol just to offset a tried and true large bus high bandwidth methodology.

I imagine the large cache must have a substantial positive offset per mm2 or per transistor over simply increasing bus width. Otherwise it's just reinventing the wheel...
 

Justinus

Diamond Member
Oct 10, 2005
3,175
1,518
136
-Must be something to it, I cannot imagine AMD would come up with a new caching protocol just to offset a tried and true large bus high bandwidth methodology.

I imagine the large cache must have a substantial positive offset per mm2 or per transistor over simply increasing bus width. Otherwise it's just reinventing the wheel...

I imagine the latency advantage of having substantially fewer cache misses is probably a big deal for frametime consistency, even if averages don't truly reflect it.
 

ModEl4

Member
Oct 14, 2019
71
33
61
While I plan to get a Big Navi card, the performance of the 40CU card makes it a compelling offering from AMD.

EDIT: the 40CU card AMD is launching this year should have performance similar to the 2080ti. It will be competitive with the 3070, that is why NVIDIA is prepping a 3070ti.
Can you clarify why you think a 2.5GHz 40CU design can be at that level? (I mean it would be great but is it likely?) Let's suppose that the 2.2GHz 80CU part is at 3090 level, 3090 is just +50% from a 2080Ti (at 4K, less in QHD), what aspects of the design will give it nearly +20%?
3090 =150
2.2GHz 80CU =150😍
2080Ti=100
2.5GHz 40CU = 85 with linear scaling (+20%=102)
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,560
136
Can you clarify why you think a 2.5GHz 40CU design can be at that level? (I mean it would be great but is it likely?) Let's suppose that the 2.2GHz 80CU part is at 3090 level, 3090 is just +50% from a 2080Ti (at 4K, less in QHD), what aspects of the design will give it nearly +20%?
3090 =150
2.2GHz 80CU =150😍
2080Ti=100
2.5GHz 40CU = 85 with linear scaling (+20%=102)
Lets see.

40 CU GPU was on par in performance with 40 SM Turing GPU, plus or minus the differences in turbo boost clock speeds.

Also, RDNA1 was on par in IPC with Turing GPUs, already: https://www.computerbase.de/2019-07.../4/#diagramm-performancerating-navi-vs-turing

The difference is just 1-2% margin IPC benefit for RDNA1 architecture.

Now lets look at Ampere.

68 SM GPU with 320 bit bus is 25% faster than RTX 2080 Ti, which has exactly the same number of SM's but 352 bit bus.

We can safely assume that Ampere brought 25% IPC increase, per SM.

The problem for Ampere is that it clocks somewhat similarly to Turing, it didn't brought clock speed increases.

Now, Navi 22 has, according to MacOS data 2.5 GHz, and according to AMD, themsleves, RDNA2 GPUs have 50% performance per watt increase, and that includes both, IPC and clock speeds.

2.5 GHz is 33% higher clock speed over 1.887 Mhz of 40 CU RX 5700 XT.

And that is without any IPC differences. If RDNA2 brings 10% uplift over RDNA1 in IPC? 40 RDNA2 CUs at 1.8 GHz will perform like 44 RDNA1 CUs at 1.8 GHz. Add to that 33% more performance and you get... RTX 2080 Ti performance levels.

The thing is, what if that 50% performance per watt increase is simply IPC and clock speeds, per CU, as Paul from RedGamingTech says?

What if it actually 20% IPC increase, and 33% clock speeds, at the same time?

That 40 CU GPU, will be 15% faster than RTX 2080 Ti. Just like Coreteks, and AdoredTV suggested that Big Navi GPUs will perform.

My personal expectation is that RDNA2 brings 10% IPC uplift over RDNA1. But I will absolutely not be surprised if the cache patents and discussion that was already done will come to fruition, and that RDNA2 actually has that 20% IPC uplift over RDNA1.

Because since we are on a hype train, and we have seen the clock speeds of Navi 22, what the hell could not be possible, at this point?
 

Justinus

Diamond Member
Oct 10, 2005
3,175
1,518
136
Can you clarify why you think a 2.5GHz 40CU design can be at that level? (I mean it would be great but is it likely?) Let's suppose that the 2.2GHz 80CU part is at 3090 level, 3090 is just +50% from a 2080Ti (at 4K, less in QHD), what aspects of the design will give it nearly +20%?
3090 =150
2.2GHz 80CU =150😍
2080Ti=100
2.5GHz 40CU = 85 with linear scaling (+20%=102)

We can go the other direction, with TechPowerUp's relative GPU performance chart showing the 2080ti being 34% faster than the 5700XT, a 40 CU RDNA1 card.

The 5700XT maintains clockspeeds around 1900 MHz. Simply increasing the sustained clocks to 2500 MHz would grant potentially a 31% performance increase. Couple that with a minor (9% I've seen flying around?) IPC increase and you could well have the 40CU RDNA2 performing at 5700XT + 34% levels.
 

maddie

Diamond Member
Jul 18, 2010
4,749
4,691
136
-Must be something to it, I cannot imagine AMD would come up with a new caching protocol just to offset a tried and true large bus high bandwidth methodology.

I imagine the large cache must have a substantial positive offset per mm2 or per transistor over simply increasing bus width. Otherwise it's just reinventing the wheel...
What about the fact that the energy cost of far data transfers are much more than the actual computation? Even going across a large die is about 10-100x as expensive as an L1/L2 hit and going off die is another order of magnitude increase.

I would think that having much less (data x distance) values are essential for lowering power. The result of lowering the aggregate off-die data (Gb/s) value allows a lower bandwidth bus to be sufficient.

Sure, both methods can lead to the same performance result, but one allows a big reduction in power consumed/computation (Flops).
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
-Must be something to it, I cannot imagine AMD would come up with a new caching protocol just to offset a tried and true large bus high bandwidth methodology.

I imagine the large cache must have a substantial positive offset per mm2 or per transistor over simply increasing bus width. Otherwise it's just reinventing the wheel...
The video and research says 0,09mm2 of silicon per CU... it also regress the performance in some cases...