• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Question Speculation: RDNA2 + CDNA Architectures thread

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
1597687215605.png

Some Observations

Cache

Size of L2 slice is same like Navi10. Well that is a bummer. I was hoping to see big jumps there for BW amplification.
Hoping for more passthrough modes from L0 to L2.

Multi core Command Processor.
It is the same Dual GFX pipe like in Sienna. If this is implemented in XSX So RDNA2 could possibly schedule and keep track of multiple shaders wavefronts in flight. In addition the ACE can already do Compute shaders without using the Command Processor.
This is something.

Unified Geometry Engine
I think they finally got NGG in the shape they envisioned. I heard devs saying the GE doubled the number of culling of primitives/clock compared to N10.
 
Power is a cubic relation to clocks, if I'm not mistaken.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 263W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 283W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 303W

Also, assuming the PS5 is similar in architecture:

130W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 164W
140W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 177W
150W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 189W

EDIT: Adding in some more power ranges.
I believe power rises as a square not cube.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 240W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 258W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 277W

140W * (80 CUs / 52 CUs) * (2.2 GHz / 1.825 GHz)^2 = 312W

Now slap in some HBM and we could see some really impressive power figures.
Issue though could be the 7nm heat density which is troublesome on Zen2 as well.
 
So a 72CU GPU running at 2Ghz would have 18.43 TFLOPS of compute power. The PS5 shows us that RDNA2 can support clocks of up to 2.23 Ghz, though thermal constraints might limit that. A 2.2 Ghz 72CU card would have around 20 TFLOPs of FP32 performance. By comparison, the 5700XT has 9.754 TFLOPS of FP32 performance.

EDIT: based on available information, it looks like Big Navi will be around 30% faster than a 2080ti.
 
Last edited:
I believe power rises as a square not cube.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 240W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 258W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 277W

140W * (80 CUs / 52 CUs) * (2.2 GHz / 1.825 GHz)^2 = 312W

Now slap in some HBM and we could see some really impressive power figures.
Issue though could be the 7nm heat density which is troublesome on Zen2 as well.
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.
 
For ever 10% clock frequency increase power rises by 23%.

Also CU power does not scale linearly. You may have 50% higher number of CUs in a particular GPU, clocked at the same frequency and power may increase only by 30%.

Big Navi Power targets are 250-275W, at this very moment. And Im confident about this last information.
 
So a 72CU GPU running at 2Ghz would have 18.43 TFLOPS of compute power. The PS5 shows us that RDNA2 can support clocks of up to 2.23 Ghz, though thermal constraints might limit that. A 2.2 Ghz 72CU card would have around 20 TFLOPs of FP32 performance. By comparison, the 5700XT has 9.754 TFLOPS of FP32 performance.

EDIT: based on available information, it looks like Big Navi will be around 30% faster than a 2080ti.
Navi21 has 80 CUs, why are you using 72 CUs? Don’t tell me it’s because a YouTuber said it...
 
Navi21 has 80 CUs, why are you using 72 CUs? Don’t tell me it’s because a YouTuber said it...
Yields? Also last I checked, AMD hasn’t released the official specs.

EDIT: It has nothing to do with YouTube. The only channel that isn’t complete garbage there is GN.
 
Yields? Also last I checked, AMD hasn’t released the official specs.

EDIT: It has nothing to do with YouTube. The only channel that isn’t complete garbage there is GN.

Also, power constraints. AMD could definitely sell an 80CU part, but at lower clocks. 80CUs at 2.2 Ghz would still consume more than 400W assuming a 50% power improvement over RDNA1.
 
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

You can’t compare console power consumption. The console chip will always be more efficient than a GPU because some components are shared, such as memory.

EDIT: That was meant as a general comment for those who are attempting to estimate power consumption.
 

130 to 140W of power for the GPU portion.

Raghu78 was right 😉.

I had estimated the Series X GPU with 16GB GDDR6 at 140-150w. In reality its slightly better. Series X GPU power draw is same as Xbox One X. But since Series X SoC has 8 Zen 2 cores at 3.66 Ghz with SMT (3.8 Ghz SMT off) the CPU portion will draw roughly 55w. The entire SoC will draw around 200w.

Based on this data I am even more confident that Navi 21 with 80CU can deliver 21 TF at 275w. Nvidia is gong to require > 350w to deliver the same performance if the current rumours are true.
 
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

While P=CV²f is totally correct - V and f are not proportional. In essence you cannot simply assume a cubic relation.
 
Hmmmm. RDNA2 implementation of RT shows that the RT operations is sharing with Textures saying you can either do one or the other but not both at the same time. Won't that impact overall RT performance delivery??
Why can't they be done at the same time?
 
Just to put in perspective the improvement in GPU perf from Xbox Series X vs Xbox One X.

Since RDNA2 is expected to deliver higher perf/clock if we assume that RDNA2 delivers 1.15x perf/clock vs RDNA

12 RDNA2 TFlops = 13.8 RDNA TFlops

According to the latest pcgameshardware review a 13.4 TF Radeon VII (avg clock of 1750 Mhz) is 3% faster than a 9.2 TF Radeon RX 5700XT (avg clock of 1800 Mhz)



Thats roughly 3% better perf for 45% raw flops for Radeon VII. 1.45/1.03 = 1.41. It would be reasonable to say 1 RDNA flop = 1.4 Vega GCN flop

Multiplying this 1.4x we get

12 RDNA2 TFLOPS = 13.8 * 1.4 = 19.32 GCN TFLOPS

Thats 3.22x the perf of Xbox One X GPU in the same power. This has been delivered in 3 years (Xbox One X - Nov 2017, Xbox Series X - Nov 2020). BTW Xbox One X is a mid gen console refresh. If we take the OG Xbox One at 1.3 TF the improvement is roughly 15x. This next gen console is going to challenge PC gaming like no other previous console gen as it has desktop class CPU with 8 Zen 2 cores at 3.66 Ghz. With a mid gen console refresh in 2023 or 2024 the mainstream PC GPU in the $350-$400 price range will continue to be challenged till atleast 2025. PC gaming will never be the same again.
 
Last edited:
RDNA2 implementation of RT shows that the RT operations is sharing with Textures saying you can either do one or the other but not both at the same time. Won't that impact overall RT performance delivery??
Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.
 
Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.
Based on?

You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

I would suggest waiting until the final product before making claims on RTRT performance. It could be better, it could be worse. Who knows? What we do know is: it's a noticably different method to Turing so we can't judge them just yet.
 
Nope DirectML is just an API, AMD will need to provide a model that competes with DLSS 2 on top of it (a tall order), and then provide the hardware (tensor units) to allow it to run fast enough on their GPUs, which means AMD won't have any DLSS 2 competitior any time this gen, as they lack both the model and the tensor units with RDNA 2.

What makes you think they haven't been working on a DLSS competitor for a while?

As for fast enough hardware, again, you're assuming that tensor cores are 100% necessary, but ever since Navi14 (so Navi12 as well) rapid packed math has been extended to INT4 and INT8 as well (packing 8 and 4 at the same time respectively).

Depending on the model, this could be more than enough processing power to be able to perform the algorithm on RDNA2. Remember how DLSS1.9 worked on just shaders? Sure, the algorithm wasn't as potent as DLSS2.0, but to my knowledge Turing doesn't support INT4 or INT8 packing on shaders anyway.

It'll depend on the game and how computationally expensive DLSS2.0-tier algorithms. They're a step in the middle of the pipeline after all. The key will just be on how much upscaling they can do at a minimal cost to time to render the frame.
 
You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.
Your post in nothing more than damage control at this point, a shared unit can never beat a dedicated unit, worse yet, RDNA2 does BVH traversal on the shaders as well (Turing does it on RT cores). So RT acceleration is shared on two levels with RDNA2, not one.

And no Textures units are not overbudgeted on modern GPUs, they are just the right amount for regular texturing, 16X AF filtering, and texture heavy shaders and effects.



1597730725336.png
1597730754132.png

As for fast enough hardware, again, you're assuming that tensor cores are 100% necessary,
More damage control, Tensores are not necessary, but they are fast enough to offset any performance loss due to using ML to upscale the image, without tensors the loss would be bigger.
 
Based on?

You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

I would suggest waiting until the final product before making claims on RTRT performance. It could be better, it could be worse. Who knows? What we do know is: it's a noticably different method to Turing so we can't judge them just yet.
Indeed, plus Turing’s implementation has its own deficiencies that RDNA2 doesn’t.
 
Back
Top