Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Leeea · Jun 13, 2022

jpiniero said:
Ethereum still needs to fall further to get The Flood though.

Should not be to long though.

Eth Hit $1200 today, and it is no longer cost effective for me to mine. Specifically, for me the USD cost of electricity exceeds the USD value mined.

Celebrations will be coming soon!

Ajay · Jun 13, 2022

Leeea said:
Should not be to long though.

Eth Hit $1200 today, and it is no longer cost effective for me to mine. Specifically, for me the USD cost of electricity exceeds the USD value mined.

Celebrations will be coming soon!

Bulk miner's have lower costs than most individuals, so, it will need to go lower for the 'flood' to begin. Fingers crossed, I need a new GFX card.

maddie · Jun 13, 2022

Ajay said:
Bulk miner's have lower costs than most individuals, so, it will need to go lower for the 'flood' to begin. Fingers crossed, I need a new GFX card.

They might also have higher costs. Housing, labor, etc. The home miner assumes his time & space is worthless.

Ajay · Jun 14, 2022

maddie said:
They might also have higher costs. Housing, labor, etc. The home miner assumes his time & space is worthless.

The home miner only saves on labor, since they are 'donating' it to themselves. They are paying for everything else, even if they aren’t putting on their books like a crypto-mining company. The home user has zero perks of scale.

beginner99 · Jun 14, 2022

maddie said:
They might also have higher costs. Housing, labor, etc. The home miner assumes his time & space is worthless.

true and hence they actually need to sell the coins they mine. If you mine on the side with your gaming rig, you can always bet on prices going up again and wait till then to sell to make it profitable again. I mined some I think in 2015. Was it profitable at that time? not really. But boy even with ETH down to $1k it payed for my next couple gpus.

maddie · Jun 14, 2022

Ajay said:
The home miner only saves on labor, since they are 'donating' it to themselves. They are paying for everything else, even if they aren’t putting on their books like a crypto-mining company. The home user has zero perks of scale.

He doesn't save by "donating" labor. It's an exercise in self delusion. Opportunity cost is helpful here. I find that few enthusiasts consider this when calculating true cost of investments.

Ajay · Jun 14, 2022

maddie said:
He doesn't save by "donating" labor. It's an exercise in self delusion. Opportunity cost is helpful here. I find that few enthusiasts consider this when calculating true cost of investments.

I'll take your word for it. Anyway, this is getting to OT.

Karnak · Jun 14, 2022

tl;dw regarding "new" information:

- he claims N33 is still on N6, although I doubt that after AMDs roadmap on their FAD and RDNA3 being mentioned there as "5nm"
- single GCD around 350mm²-400mm² (more towards 380mm²-400mm²)
- around 50% perf/watt improvement for N33
- around 60-70% perf/watt improvement for N31/N32

Saylick · Jun 15, 2022

Karnak said:
tl;dw regarding "new" information:

- he claims N33 is still on N6, although I doubt that after AMDs roadmap on their FAD and RDNA3 being mentioned there as "5nm"
- single GCD around 350mm²-400mm² (more towards 380mm²-400mm²)
- around 50% perf/watt improvement for N33
- around 60-70% perf/watt improvement for N31/N32

The biggest "take with a grain of salt" item there is that AMD's claimed >50% perf/W applies to the 6nm version of RDNA3, which may or may not be the case. But, if we assume that it's a >50% perf/W gain across the entire stack, irrespective of node, then the 60-70% perf/W gains for the N5 variants is in-line with the +25% perf/W that N5 delivers over N7 (a la Zen 4). 1.5 x 1.25 = 1.875, so probably some loss in scaling too.

350-400mm2 for N31 GCD is in the ballpark of what I expect for a product that has 2.4x the shaders on an optimized node with 2x the density, and knowing that roughly half of the N21 die (520mm2) was shader logic itself. That math alone gets you to low 300mm2, but add back in larger structures, PHY for the fan out bridges, AV1 decode, etc, and it's not hard to imagine it being in the upper half of the 300mm2 range.

Assuming 65% perf/W increase, they could maybe squeeze out 2.5x 6900XT performance in a 450W TDP envelope. 1.65 x 1.5 = 2.475. This more or less lets them match a cut-down AD102 in the form of the RTX 4090, which is rumored to be 2.2x RTX 3090.

JasonLD · Jun 15, 2022

Saylick said:
Assuming 65% perf/W increase, they could maybe squeeze out 2.5x 6900XT performance in a 450W TDP envelope. 1.65 x 1.5 = 2.475. This more or less lets them match a cut-down AD102 in the form of the RTX 4090, which is rumored to be 2.2x RTX 3090.

In terms of Flops maybe, gaming performance, I really doubt.

dangerman1337 · Jun 15, 2022

About top RDNA 3 I think there's a possibility of a Navi "30"?

https://twitter.com/x/status/1537005854334910464

I mean I wonder if those rumours of 2.5-2.8x where this? I mean this is a multiple of Navi 23's 2048 & Navi 33's 4096.

JasonLD said:
In terms of Flops maybe, gaming performance, I really doubt.

I think it all comes to down whether if or not RDNA 3 TF = RDNA 2 TF in real world performance.

Frenetic Pony · Jun 15, 2022

Saylick said:
The biggest "take with a grain of salt" item there is that AMD's claimed >50% perf/W applies to the 6nm version of RDNA3, which may or may not be the case. But, if we assume that it's a >50% perf/W gain across the entire stack, irrespective of node, then the 60-70% perf/W gains for the N5 variants is in-line with the +25% perf/W that N5 delivers over N7 (a la Zen 4). 1.5 x 1.25 = 1.875, so probably some loss in scaling too.

Yeah if that's what he said we can 100% conclude this is fake. The newer node that gives a perf per watt uplift by itself isn't the one that gives perf per watt uplift, and the giant blue thing hanging over everyones head all day isn't the sky.

Maybe we should just stop posting rumors from these people. Or start posting rumors from a ouija board, for all the sense it would make in comparison.

Karnak · Jun 15, 2022

He was the one back in 2020 leaking the infinity cache for the first time though. He definetely got his sources regarding AMD stuff, at least on the GPU side. Ofc this doesn't mean everything is true, but still.

Benefit of the doubt, the IF$ wasn't just a random guess you could make.

DiogoDX · Jun 15, 2022

dangerman1337 said:
About top RDNA 3 I think there's a possibility of a Navi "30"?

https://twitter.com/x/status/1537005854334910464

I mean I wonder if those rumours of 2.5-2.8x where this? I mean this is a multiple of Navi 23's 2048 & Navi 33's 4096.

I think it all comes to down whether if or not RDNA 3 TF = RDNA 2 TF in real world performance.

Dual Navi32 for pro?

DisEnchantment · Jun 16, 2022

DisEnchantment said:
RDNA3 has dual issue wave32 vector op support.

⚙ D125261 [AMDGPU] gfx11 subtarget features & early tests

reviews.llvm.org

Legacy geometry engine is gone.

I found the corresponding patent for what seems like the dual wave32 support in RDNA3.

As per LLVM commits RDNA3 can issue dual wave32 instructions which looks like what is described in this patent.

https://www.freepatentsonline.com/y2022/0188076.html

Each SIMD unit can do 2x FP32 whenever the operand cache can gather all the operands from the VGPR bank.

So whenever operand gather is optimal each CU of RDNA3 can do 2x the FP32 of RDNA2 CU per cycle.
When needed, RDNA3 can do 1 cycle wave64.

def FeatureVOPD : SubtargetFeature<"vopd",
"HasVOPDInsts",
"true",
"Has VOPD dual issue wave32 instructions"
>;

From latest bunch of commits, which seems to be the most I have seen to support a GPU architecture (much more than RDNA2) RDNA3 scatter gather support in LLVM was thoroughly reworked.

Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU

rGd393538c7f85

reviews.llvm.org

This mostly just tests that DPFP is 1/32 rate on GFX11, instead of 1/16
rate as on GFX10.

Aapje · Jun 16, 2022

beginner99 said:
true and hence they actually need to sell the coins they mine. If you mine on the side with your gaming rig, you can always bet on prices going up again and wait till then to sell to make it profitable again. I mined some I think in 2015. Was it profitable at that time? not really. But boy even with ETH down to $1k it payed for my next couple gpus.

If you want to speculate with the price going up, but it costs more to mine than you can currently sell the coins for, then it is cheaper to just buy the coins rather than mine them.

Frenetic Pony · Jun 16, 2022

Neat, and another blow to the old leakers and rumors, getting rid of the double compute unit indeed, instead it's there and has increased efficiency (1 wave64 was RDNA2 yes, or was that just shared memory and two cycle? But now two independent wave32s when conditions are met as well at the very least)

I wonder what else their "re-architectured compute unit" implies.

moinmoin · Jun 16, 2022

dangerman1337 said:
About top RDNA 3 I think there's a possibility of a Navi "30"?

If it's a new die it would be 34. The numbering for these so far has been purely chronological when each respective project started up without relation to its performance and size.

jpiniero · Jun 16, 2022

DiogoDX said:
Dual Navi32 for pro?

I had thought about that. N31/32 can take two chips, but the gaming ones are only one because of power or ASP concerns.

maddie · Jun 16, 2022

DisEnchantment said:
I found the corresponding patent for what seems like the dual wave32 support in RDNA3.

As per LLVM commits RDNA3 can issue dual wave32 instructions which looks like what is described in this patent.

https://www.freepatentsonline.com/y2022/0188076.html

View attachment 63171

Each SIMD unit can do 2x FP32 whenever the operand cache can gather all the operands from the VGPR bank.

So whenever operand gather is optimal each CU of RDNA3 can do 2x the FP32 of RDNA2 CU per cycle.
When needed, RDNA3 can do 1 cycle wave64.

From latest bunch of commits, which seems to be the most I have seen to support a GPU architecture (much more than RDNA2) RDNA3 scatter gather support in LLVM was thoroughly reworked.

Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU

rGd393538c7f85

reviews.llvm.org

Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU

How does this work. This makes no sense to me.

Olikan · Jun 16, 2022

I really can't remember another gpu that can do a wave64 in a single clock, very impressive.

Saylick · Jun 16, 2022

Frenetic Pony said:
Neat, and another blow to the old leakers and rumors, getting rid of the double compute unit indeed, instead it's there and has increased efficiency (1 wave64 was RDNA2 yes? But now two independent wave32s when conditions are met as well?)

I wonder what else their "re-architectured compute unit" implies.

It was mentioned in this thread before, but maybe there's the chance for "VLIW2"-esque execution:

https://www.reddit.com/r/Amd/comments/bu5mum

DisEnchantment · Jun 17, 2022

maddie said:
Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU

How does this work. This makes no sense to me.

What I meant is that, they did not really change VGPR width or throughput, it is comparable to RDNA2.
But with the updated operand cache they can gather the operands from multiple VGPR bank to feed the dual VALUs. The operand cache likely is not holding 64 bit operands for obvious reasons, i.e. area and power.
So the DPFP pipeline is similar to RDNA2 and cannot make full use of the 2x VALU pipe. i.e. the FP32 indeed gained 2x but DPFP didn't, so ratio went from 1:16 to 1:32.
Of course this is just what I am surmising based on circumstantial evidences and not what is written in an ISA manual.

Olikan · Jun 20, 2022

Vliw compiler is back... but more advanced

https://www.freepatentsonline.com/y2022/0188120.html

GodisanAtheist · Jun 20, 2022

Can the smarty pants in this thread break down this VLIW news a bit? I recall AMD was on VLIW5/4 with with their og Dx10/11 archs, but moved away from VLIW for GCN and up because of occupancy issues and the arch's weakness with compute workloads etc.

What does VLIW"2" do for AMD in modern workloads and how does it overcome the issues that got AMD to move away from it in the first place?

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Senior member

Diamond Member

Senior member

Senior member

Senior member

Senior member

Senior member

Golden Member

Golden Member

Senior member

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Diamond Member