Question Speculation: RDNA2 + CDNA Architectures thread

uzzi38 · Apr 28, 2020

All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html

DisEnchantment · Aug 17, 2020

TESKATLIPOKA said:
64ROPs at 1825Mhz = 116.8Gpix/s so nothing surprising compared to Navi 10.

Pixel shading throughput is not so critical if the Geometry engine keeps culling unseen primitives. So I think it makes sense to keep it the same. No need to waste silicon on unused units.

TESKATLIPOKA · Aug 17, 2020

The question is If there is 16RBE capable of 4 pixels per clock or 8RBE capable of 8 pixels per clock as was speculated on beyond3d forum.

DisEnchantment · Aug 17, 2020

Some Observations

Cache
Size of L2 slice is same like Navi10. Well that is a bummer. I was hoping to see big jumps there for BW amplification.
Hoping for more passthrough modes from L0 to L2.

Multi core Command Processor.
It is the same Dual GFX pipe like in Sienna. If this is implemented in XSX So RDNA2 could possibly schedule and keep track of multiple shaders wavefronts in flight. In addition the ACE can already do Compute shaders without using the Command Processor.
This is something.

Unified Geometry Engine
I think they finally got NGG in the shape they envisioned. I heard devs saying the GE doubled the number of culling of primitives/clock compared to N10.

DisEnchantment · Aug 17, 2020

Saylick said:
Power is a cubic relation to clocks, if I'm not mistaken.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 263W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 283W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^3 = 303W

Also, assuming the PS5 is similar in architecture:

130W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 164W
140W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 177W
150W * (36 CUs / 52 CUs) * (2.23 GHz / 1.825 GHz)^3 = 189W

EDIT: Adding in some more power ranges.

I believe power rises as a square not cube.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 240W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 258W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 277W

140W * (80 CUs / 52 CUs) * (2.2 GHz / 1.825 GHz)^2 = 312W

Now slap in some HBM and we could see some really impressive power figures.
Issue though could be the 7nm heat density which is troublesome on Zen2 as well.

eek2121 · Aug 17, 2020

So a 72CU GPU running at 2Ghz would have 18.43 TFLOPS of compute power. The PS5 shows us that RDNA2 can support clocks of up to 2.23 Ghz, though thermal constraints might limit that. A 2.2 Ghz 72CU card would have around 20 TFLOPs of FP32 performance. By comparison, the 5700XT has 9.754 TFLOPS of FP32 performance.

EDIT: based on available information, it looks like Big Navi will be around 30% faster than a 2080ti.

MrTeal · Aug 17, 2020

DisEnchantment said:
I believe power rises as a square not cube.

130W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 240W
140W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 258W
150W * (80 CUs / 52 CUs) * (2 GHz / 1.825 GHz)^2 = 277W

140W * (80 CUs / 52 CUs) * (2.2 GHz / 1.825 GHz)^2 = 312W

Now slap in some HBM and we could see some really impressive power figures.
Issue though could be the 7nm heat density which is troublesome on Zen2 as well.

P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

Glo. · Aug 17, 2020

For ever 10% clock frequency increase power rises by 23%.

Also CU power does not scale linearly. You may have 50% higher number of CUs in a particular GPU, clocked at the same frequency and power may increase only by 30%.

Big Navi Power targets are 250-275W, at this very moment. And Im confident about this last information.

exquisitechar · Aug 17, 2020

eek2121 said:
So a 72CU GPU running at 2Ghz would have 18.43 TFLOPS of compute power. The PS5 shows us that RDNA2 can support clocks of up to 2.23 Ghz, though thermal constraints might limit that. A 2.2 Ghz 72CU card would have around 20 TFLOPs of FP32 performance. By comparison, the 5700XT has 9.754 TFLOPS of FP32 performance.

EDIT: based on available information, it looks like Big Navi will be around 30% faster than a 2080ti.

Navi21 has 80 CUs, why are you using 72 CUs? Don’t tell me it’s because a YouTuber said it...

eek2121 · Aug 17, 2020

exquisitechar said:
Navi21 has 80 CUs, why are you using 72 CUs? Don’t tell me it’s because a YouTuber said it...

Yields? Also last I checked, AMD hasn’t released the official specs.

EDIT: It has nothing to do with YouTube. The only channel that isn’t complete garbage there is GN.

eek2121 · Aug 17, 2020

eek2121 said:
Yields? Also last I checked, AMD hasn’t released the official specs.

EDIT: It has nothing to do with YouTube. The only channel that isn’t complete garbage there is GN.

Also, power constraints. AMD could definitely sell an 80CU part, but at lower clocks. 80CUs at 2.2 Ghz would still consume more than 400W assuming a 50% power improvement over RDNA1.

eek2121 · Aug 17, 2020

MrTeal said:
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

You can’t compare console power consumption. The console chip will always be more efficient than a GPU because some components are shared, such as memory.

EDIT: That was meant as a general comment for those who are attempting to estimate power consumption.

DeathReborn · Aug 17, 2020

exquisitechar said:
Navi21 has 80 CUs, why are you using 72 CUs? Don’t tell me it’s because a YouTuber said it...

Perhaps Apple will get all the 80CU parts.

Glo. · Aug 17, 2020

DeathReborn said:
Perhaps Apple will get all the 80CU parts.

No.

72 CU and 80 CU versions are going to land in retail.

raghu78 · Aug 17, 2020

Glo. said:
https://twitter.com/x/status/1295411437423210496

130 to 140W of power for the GPU portion.

Raghu78 was right 😉.

I had estimated the Series X GPU with 16GB GDDR6 at 140-150w. In reality its slightly better. Series X GPU power draw is same as Xbox One X. But since Series X SoC has 8 Zen 2 cores at 3.66 Ghz with SMT (3.8 Ghz SMT off) the CPU portion will draw roughly 55w. The entire SoC will draw around 200w.

Based on this data I am even more confident that Navi 21 with 80CU can deliver 21 TF at 275w. Nvidia is gong to require > 350w to deliver the same performance if the current rumours are true.

Thala · Aug 17, 2020

MrTeal said:
P=CV²f
You need more voltage for higher frequencies, so if voltage rises linearly with frequency then you end up with a cubic relation for power; the square from increased voltage and power dissipated per cycle, and another from the increased frequency itself.

While P=CV²f is totally correct - V and f are not proportional. In essence you cannot simply assume a cubic relation.

Konan · Aug 17, 2020

Hmmmm. RDNA2 implementation of RT shows that the RT operations is sharing with Textures saying you can either do one or the other but not both at the same time. Won't that impact overall RT performance delivery??
Why can't they be done at the same time?

raghu78 · Aug 17, 2020

Just to put in perspective the improvement in GPU perf from Xbox Series X vs Xbox One X.

Since RDNA2 is expected to deliver higher perf/clock if we assume that RDNA2 delivers 1.15x perf/clock vs RDNA

12 RDNA2 TFlops = 13.8 RDNA TFlops

According to the latest pcgameshardware review a 13.4 TF Radeon VII (avg clock of 1750 Mhz) is 3% faster than a 9.2 TF Radeon RX 5700XT (avg clock of 1800 Mhz)

Grafikkarten-Benchmarks 2020/2021: 20 Benchmarks, 15 Grafikkarten, 4 Auflösungen - Fazit nach gut 3.600 Messungen [Update]

Auf der zweiten Seite des Artikels geht es ans Eingemachte: Wie kommen Grafikkarten mit den 20 neuen Tests zurecht? Die ersten Ergebnisse und das Fazit.

www.pcgameshardware.de

AMD Radeon VII im Test (Seite 2)

AMD Radeon VII im Test: Zu laut, zu langsam und zu teuer, aber mit 16 GB HBM2 / Die Radeon VII im Detail

www.computerbase.de

AMD Radeon RX 5700 und RX 5700 XT im Test: Die Taktraten sowie Benchmarks in Full HD, WQHD und Ultra HD

AMD Radeon RX 5700 (XT) im Test: Die Taktraten sowie Benchmarks in Full HD, WQHD und Ultra HD / So wurde die Radeon RX 5700 (XT) getestet

www.computerbase.de

Thats roughly 3% better perf for 45% raw flops for Radeon VII. 1.45/1.03 = 1.41. It would be reasonable to say 1 RDNA flop = 1.4 Vega GCN flop

Multiplying this 1.4x we get

12 RDNA2 TFLOPS = 13.8 * 1.4 = 19.32 GCN TFLOPS

Thats 3.22x the perf of Xbox One X GPU in the same power. This has been delivered in 3 years (Xbox One X - Nov 2017, Xbox Series X - Nov 2020). BTW Xbox One X is a mid gen console refresh. If we take the OG Xbox One at 1.3 TF the improvement is roughly 15x. This next gen console is going to challenge PC gaming like no other previous console gen as it has desktop class CPU with 8 Zen 2 cores at 3.66 Ghz. With a mid gen console refresh in 2023 or 2024 the mainstream PC GPU in the $350-$400 price range will continue to be challenged till atleast 2025. PC gaming will never be the same again.

DXDiag · Aug 17, 2020

Konan said:
RDNA2 implementation of RT shows that the RT operations is sharing with Textures saying you can either do one or the other but not both at the same time. Won't that impact overall RT performance delivery??

Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.

DXDiag · Aug 17, 2020

Glo. said:
https://twitter.com/x/status/1295415994861735937
DLSS competitor from AMD?

Nope DirectML is just an API, AMD will need to provide a model that competes with DLSS 2 on top of it (a tall order), and then provide the hardware (tensor units) to allow it to run fast enough on their GPUs, which means AMD won't have any DLSS 2 competitior any time this gen, as they lack both the model and the tensor units with RDNA 2.

lobz · Aug 17, 2020

DXDiag said:
Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.

This again? You're like Richie Rich without the fancy comparison sheets. How could you possibly know with so much confidence, whether it'll be slower or just as fast or faster than Turing in RT?

uzzi38 · Aug 17, 2020

DXDiag said:
Yes it will impact it, RT with RDNA 2 will be slower than RT in Turing.

Based on?

You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

I would suggest waiting until the final product before making claims on RTRT performance. It could be better, it could be worse. Who knows? What we do know is: it's a noticably different method to Turing so we can't judge them just yet.

uzzi38 · Aug 18, 2020

DXDiag said:
Nope DirectML is just an API, AMD will need to provide a model that competes with DLSS 2 on top of it (a tall order), and then provide the hardware (tensor units) to allow it to run fast enough on their GPUs, which means AMD won't have any DLSS 2 competitior any time this gen, as they lack both the model and the tensor units with RDNA 2.

What makes you think they haven't been working on a DLSS competitor for a while?

As for fast enough hardware, again, you're assuming that tensor cores are 100% necessary, but ever since Navi14 (so Navi12 as well) rapid packed math has been extended to INT4 and INT8 as well (packing 8 and 4 at the same time respectively).

Depending on the model, this could be more than enough processing power to be able to perform the algorithm on RDNA2. Remember how DLSS1.9 worked on just shaders? Sure, the algorithm wasn't as potent as DLSS2.0, but to my knowledge Turing doesn't support INT4 or INT8 packing on shaders anyway.

It'll depend on the game and how computationally expensive DLSS2.0-tier algorithms. They're a step in the middle of the pipeline after all. The key will just be on how much upscaling they can do at a minimal cost to time to render the frame.

itsmydamnation · Aug 18, 2020

Saylick said:
Power is a cubic relation to clocks, if I'm not mistaken.

you are , Power is square to Voltage

DXDiag · Aug 18, 2020

uzzi38 said:
You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

Your post in nothing more than damage control at this point, a shared unit can never beat a dedicated unit, worse yet, RDNA2 does BVH traversal on the shaders as well (Turing does it on RT cores). So RT acceleration is shared on two levels with RDNA2, not one.

And no Textures units are not overbudgeted on modern GPUs, they are just the right amount for regular texturing, 16X AF filtering, and texture heavy shaders and effects.

uzzi38 said:
As for fast enough hardware, again, you're assuming that tensor cores are 100% necessary,

More damage control, Tensores are not necessary, but they are fast enough to offset any performance loss due to using ML to upscale the image, without tensors the loss would be bigger.

exquisitechar · Aug 18, 2020

uzzi38 said:
Based on?

You do realise modern GPUs heavily over-budget on texture units, right? Re-allocating some of those TMUs to work on RTRT shouldn't cause a noticable bottleneck in practice.

I would suggest waiting until the final product before making claims on RTRT performance. It could be better, it could be worse. Who knows? What we do know is: it's a noticably different method to Turing so we can't judge them just yet.

Indeed, plus Turing’s implementation has its own deficiencies that RDNA2 doesn’t.

Question Speculation: RDNA2 + CDNA Architectures thread

Platinum Member

Golden Member

Platinum Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Senior member

Diamond Member

Member

Member

Platinum Member

Platinum Member

Platinum Member

Diamond Member

Member

Senior member