Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

andermans · Mar 18, 2021

Panino Manino said:
IPC comparison between 40CU: https://www.computerbase.de/2021-03/amd-radeon-rdna2-rdna-gcn-ipc-cu-vergleich/

Somehow, RDNA2 is behind RDNA1 in most cases.
It's just immature drivers?

I think the expectation is mostly that GPUs don't necessarily increase IPC. Consider that IPC here is not really instructions per clock for the whole device but instructions per clock per core. So suddenly what you call a core matters. On CPUs a prime way to increase IPC is to add execution units, but on GPUs it is often easier to just go to more cores so the IPC for a given core doesn't increase that much.

Changes that could increase performance (I'd rather not use instructions because with all the fixed function hardware it is a really bad indicator) per clock are:

1) New accelerators like raytracing. The problem with measuring this is that nobody seems to support the software feature you'd before hardware support.
2) Non shader features like variable rate shading or mesh shading
3) Improvements in the memory hierarchy
4) Maybe some improvements in say branch prediction.

I don't think AMD talked about any things like 4, and looks like 3 has been traded off against a smaller bus with so that is a gain some lose some situation. Leaves 1 and 2 which are kinda in a situation like AVX512 tends to be on the CPU side: if you have programs that make use of it it is great, otherwise kinda useless.

Of course sometimes there is a real performance per clock per core increase. RDNA1 was actually a great example because AMD was able to remove a lot of the bottlenecks to keep the shader units busy.

GodisanAtheist · Mar 18, 2021

andermans said:
I think the expectation is mostly that GPUs don't necessarily increase IPC. Consider that IPC here is not really instructions per clock for the whole device but instructions per clock per core. So suddenly what you call a core matters. On CPUs a prime way to increase IPC is to add execution units, but on GPUs it is often easier to just go to more cores so the IPC for a given core doesn't increase that much.

Changes that could increase performance (I'd rather not use instructions because with all the fixed function hardware it is a really bad indicator) per clock are:

1) New accelerators like raytracing. The problem with measuring this is that nobody seems to support the software feature you'd before hardware support.
2) Non shader features like variable rate shading or mesh shading
3) Improvements in the memory hierarchy
4) Maybe some improvements in say branch prediction.

I don't think AMD talked about any things like 4, and looks like 3 has been traded off against a smaller bus with so that is a gain some lose some situation. Leaves 1 and 2 which are kinda in a situation like AVX512 tends to be on the CPU side: if you have programs that make use of it it is great, otherwise kinda useless.

Of course sometimes there is a real performance per clock per core increase. RDNA1 was actually a great example because AMD was able to remove a lot of the bottlenecks to keep the shader units busy.

- Excellent points.

Also worth noting, that RDNA2 clocks *really high*. Its entirely possible that AMD engineers said "ok, we can give up 5% 'IPC', but the trade off will be cranking clocks up 35-40%, ultimately allowing much more real work to get done".

I'm no computer engineer but if life has taught me anything, its that everything is just a series of trade-offs and AMD engineers made several to ultimately get to their 30% performance increase over their prior 40 CU design on the same process at the same power...

Edit: Also worth noting that RDNA 1 basically did not see any real performance gains from core OCs (and limited gains even from mem OCs), so clearly there were some major changes under the hood to not only allow RDNA2 to clock as high as it does, but also realize performance gains from those high clocks.

Gideon · Mar 18, 2021

GodisanAtheist said:
Also worth noting, that RDNA2 clocks *really high*. Its entirely possible that AMD engineers said "ok, we can give up 5% 'IPC', but the trade off will be cranking clocks up 35-40%, ultimately allowing much more real work to get done".

Yeah IPC comparisons with vastly different clocks aren't the best idea. It's natural that performance does not scale linearly with clocks, if you compare 10700K to 6700 (non-k) skylake I'm pretty sure the latter will have "better ipc" despite it being the same microarchitecture.

Then again there are some differences in RDNA2 that hurt performance on 40CU designs that were made to make the 80CU design possible (with good die-area). One is that 6700XT has 2x less triangle discard capability than 5700XT

GodisanAtheist · Mar 18, 2021

Gideon said:
Yeah IPC comparisons with vastly different clocks aren't the best idea. It's natural that performance does not scale linearly with clocks, if you compare 10700K to 6700 (non-k) skylake I'm pretty sure the latter will have "better ipc" despite it being the same microarchitecture.

Then again there are some differences in RDNA2 that hurt performance on 40CU designs that were made to make the 80CU design possible (with good die-area). One is that 6700XT has 2x less triangle discard capability than 5700XT

- Indeed.

In that same link, the site starts off with RDNA/RDNA2/GCN alll normalized at 1000Mhz clock and while GCN is left in the dust, RDNA 1 squeeks ahead of RDNA 2.

Further down they normalize at 2000mhz, and RDNA 2 then takes the 'IPC' lead (GCN cannot clock that high and drops out).

Edit: Looks like the later tests are at 1440P as well.

Glo. · Mar 18, 2021

GodisanAtheist said:
- Indeed.

In that same link, the site starts off with RDNA/RDNA2/GCN alll normalized at 1000Mhz clock and while GCN is left in the dust, RDNA 1 squeeks ahead of RDNA 2.

Further down they normalize at 2000mhz, and RDNA 2 then takes the 'IPC' lead (GCN cannot clock that high and drops out).

Edit: Looks like the later tests are at 1440P as well.

And in 1440p it appears that RDNA2 is faster in "IPC" than RDNA1 was.

Unfathomable.

Mopetar · Mar 19, 2021

Kind of an odd result. I'm assuming that the publishers having noticed the same would have rerun the tests just to make sure it wasn't a fluke. Now the question is whether it's due to the resolution change or the clock speed increase and the latter doesn't make much sense so I'd assume it comes down to something that RDNA2 does better or improved upon.

gdansk · Mar 19, 2021

Maybe an over simplification on my part but I was under the impression IPC was far less important on a GPU. They work on problems for which "more cores" is a working strategy. In that environment, instructions per watt is the prime target in order to allow larger designs.

Did they show the power consumption of each device at the given clock rates? Regardless, it appears the cache arrangement allows them to get similar performance with lower memory bandwidth.

moinmoin · Mar 19, 2021

IPC becomes very important once two other parameters are exhausted: frequency and amount of cores (which for GPUs are essentially limitless thanks to embarrassingly parallel computation).

With RDNA the narrative set by AMD is that its Ryzen team looked at the Radeon chips and higher frequency was achieved through simplification of the CU cores. When one thinks about it compared to CPUs GPU frequencies have been very low. So frequency improvements achieved in the last several gens have been low hanging fruits (highest (boost) clock for each gen: RX200/300: 1050, RX400: 1266, Vega: 1677, Radeon VII: 1750, RDNA: 1980, RDNA2: 2581 MHz).

With RDNA2 AMD doubled the max possible amount of CUs from 40 to 80 and, if you look at the second page of the computerbase article, achieved a rather stable performance per core scalability of ~75% regardless of if going from 40 to 60, 72 or 80 CUs. This in particular is very promising for further CU core amount expansions using MCM.

lightmanek · Mar 19, 2021

It's very hard to measure IPC of a GPU at vastly different clocks as we have no control of memory timings and internal frequency dividers. Very likely that some components designed to scale with high clock are simply tanking RDNA 2 performance at much lower clocks, where GCN or RDNA 1 run these components at lower latencies or have more of them (see Gideon's post above about discard units). That's the reason tests were done at two fixed clock settings and that's mainly why we see lead changing from RDNA 1 to RDNA 2.

leoneazzurro · Mar 19, 2021

T

lightmanek said:
It's very hard to measure IPC of a GPU at vastly different clocks as we have no control of memory timings and internal frequency dividers. Very likely that some components designed to scale with high clock are simply tanking RDNA 2 performance at much lower clocks, where GCN or RDNA 1 run these components at lower latencies or have more of them (see Gideon's post above about discard units). That's the reason tests were done at two fixed clock settings and that's mainly why we see lead changing from RDNA 1 to RDNA 2.

A simple example: Infinity cache. Its efficiency it's likely to change a lot with GPU clocks.

uzzi38 · Mar 19, 2021

leoneazzurro said:
T

A simple example: Infinity cache. Its efficiency it's likely to change a lot with GPU clocks.

Infinity Cache has it's own clock domain, and is locked at 1.94GHz iirc.

leoneazzurro · Mar 19, 2021

uzzi38 said:
Infinity Cache has it's own clock domain, and is locked at 1.94GHz iirc.

Even when you clock down the chip to 1 Ghz? Unlikely.

Zepp · Mar 19, 2021

GodisanAtheist said:
Further down they normalize at 2000mhz, and RDNA 2 then takes the 'IPC' lead (GCN cannot clock that high and drops out).

The 7nm Vegas can.

Mopetar · Mar 19, 2021

Zepp said:
The 7nm Vegas can.

There weren't any 40 CU Vega parts though. The only one that lines up with a modern GPU is the Radeon VII which has the same 60 CU as the 6800.

I don't know about how it would stack up clock for clock, but the Radeon VII is regularly matched or even beaten by the 40 CU 5700 XT in benchmarks.

andermans · Mar 19, 2021

Mopetar said:
There weren't any 40 CU Vega parts though. The only one that lines up with a modern GPU is the Radeon VII which has the same 60 CU as the 6800.

I don't know about how it would stack up clock for clock, but the Radeon VII is regularly matched or even beaten by the 40 CU 5700 XT in benchmarks.

I do think the Radeon VII had some bottlenecks that made it really hard to use that many CUs though. Would be interesting to disable some CUs per shader engine to get 40 CUs total and then compare.

TESKATLIPOKA · Mar 19, 2021

Panino Manino said:
IPC comparison between 40CU: https://www.computerbase.de/2021-03/amd-radeon-rdna2-rdna-gcn-ipc-cu-vergleich/

Somehow, RDNA2 is behind RDNA1 in most cases.
It's just immature drivers?

RDNA2 clocks significantly higher than RDNA1, so It could be a simple trade off. You sacrifice a bit of IPC for a lot higher frequency.

Glo. · Mar 19, 2021

HUB appears to be confirming tests from Computerbase on IPC.

TESKATLIPOKA · Mar 19, 2021

Twitter

Dimgray Cavefish GPU
<For Premium 1080p Gaming>
~1440p < RTX 3060 < 1080p
~April
~CNY 2,499
~32 Compute Units
~About 236 mm2
~64MB Infinity Cache
~128-bit 16Gbps GDDR6 with 8GB VRAM

I am quite skeptical about having 64MB IC within 236mm2 when N22 with additional 8CU, 32MB IC and 64bit GDDR6 is ~100mm2 bigger.
Price is ~384 USD, which is a lot.

scineram · Mar 19, 2021

TESKATLIPOKA said:
Twitter

I am quite skeptical about having 64MB IC within 236mm2 when N22 with additional 8CU, 32MB IC and 64bit GDDR6 is ~100mm2 bigger.
Price is ~384 USD, which is a lot.

Well, we already know IC is 32MB.
June for Cézanne is just about time already.

PhoBoChai · Mar 19, 2021

uzzi38 said:
Infinity Cache has it's own clock domain, and is locked at 1.94GHz iirc.

There is nothing to suggest this. AMD's slides refer to the 1.94ghz as the fabric that ties the cache and GPU together, not the actual cache operating freq.

AMD claims they used Ryzen L3 SRAM libraries for RDNA2 on 7nm, and we know Ryzen's L3 has no problems running >4.5ghz. There should be no issues at all tying infinity cache clks to the engine clock. While the fabric clk is linked to the memory controller.

As for the IF clks, its not static, it has two states, 1.4ghz power saving and 1.94ghz boost.

DiogoDX · Mar 20, 2021

Glo. said:
HUB appears to be confirming tests from Computerbase on IPC.

The same result as Pascal vs Maxwell. A small IPC loss for a massive clock increase.

And seems to be worth as was with the Pascal arch.

Timorous · Mar 20, 2021

TESKATLIPOKA said:
Twitter

I am quite skeptical about having 64MB IC within 236mm2 when N22 with additional 8CU, 32MB IC and 64bit GDDR6 is ~100mm2 bigger.
Price is ~384 USD, which is a lot.

It is less than half of Navi 21 so 230-240 seems doable at a similar density.

KOMACHI_ENSAKA · Mar 20, 2021

Does anyone correctly understand the difference between Navi 22 XT (AIB/MBA) and Navi 22 XTLH (AIB)? I'm still ambiguous

Leeea · Mar 20, 2021

KOMACHI_ENSAKA said:
Does anyone correctly understand the difference between Navi 22 XT (AIB/MBA) and Navi 22 XTLH (AIB)? I'm still ambiguous

I have no clue. I will also eye anyone on this forum who claims to know suspiciously.

TESKATLIPOKA · Apr 11, 2021

It's almost half of April and still no news about N23's release date. Does anyone have any new info?

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Golden Member

Platinum Member

Golden Member

Senior member

Diamond Member

Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Member

Senior member

Golden Member

Junior Member

Diamond Member

Platinum Member