Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 40 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,573
146

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
Should not be to long though.

Eth Hit $1200 today, and it is no longer cost effective for me to mine. Specifically, for me the USD cost of electricity exceeds the USD value mined.

Celebrations will be coming soon!
Bulk miner's have lower costs than most individuals, so, it will need to go lower for the 'flood' to begin. Fingers crossed, I need a new GFX card.
 
  • Like
Reactions: Leeea

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
They might also have higher costs. Housing, labor, etc. The home miner assumes his time & space is worthless.
The home miner only saves on labor, since they are 'donating' it to themselves. They are paying for everything else, even if they aren’t putting on their books like a crypto-mining company. The home user has zero perks of scale.
 

beginner99

Diamond Member
Jun 2, 2009
5,208
1,580
136
They might also have higher costs. Housing, labor, etc. The home miner assumes his time & space is worthless.

true and hence they actually need to sell the coins they mine. If you mine on the side with your gaming rig, you can always bet on prices going up again and wait till then to sell to make it profitable again. I mined some I think in 2015. Was it profitable at that time? not really. But boy even with ETH down to $1k it payed for my next couple gpus.
 

maddie

Diamond Member
Jul 18, 2010
4,722
4,627
136
The home miner only saves on labor, since they are 'donating' it to themselves. They are paying for everything else, even if they aren’t putting on their books like a crypto-mining company. The home user has zero perks of scale.
He doesn't save by "donating" labor. It's an exercise in self delusion. Opportunity cost is helpful here. I find that few enthusiasts consider this when calculating true cost of investments.
 

Ajay

Lifer
Jan 8, 2001
15,332
7,792
136
He doesn't save by "donating" labor. It's an exercise in self delusion. Opportunity cost is helpful here. I find that few enthusiasts consider this when calculating true cost of investments.
I'll take your word for it. Anyway, this is getting to OT.
 

Karnak

Senior member
Jan 5, 2017
399
767
136

tl;dw regarding "new" information:

- he claims N33 is still on N6, although I doubt that after AMDs roadmap on their FAD and RDNA3 being mentioned there as "5nm"
- single GCD around 350mm²-400mm² (more towards 380mm²-400mm²)
- around 50% perf/watt improvement for N33
- around 60-70% perf/watt improvement for N31/N32
 
  • Like
Reactions: Mopetar and Saylick

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136

tl;dw regarding "new" information:

- he claims N33 is still on N6, although I doubt that after AMDs roadmap on their FAD and RDNA3 being mentioned there as "5nm"
- single GCD around 350mm²-400mm² (more towards 380mm²-400mm²)
- around 50% perf/watt improvement for N33
- around 60-70% perf/watt improvement for N31/N32
The biggest "take with a grain of salt" item there is that AMD's claimed >50% perf/W applies to the 6nm version of RDNA3, which may or may not be the case. But, if we assume that it's a >50% perf/W gain across the entire stack, irrespective of node, then the 60-70% perf/W gains for the N5 variants is in-line with the +25% perf/W that N5 delivers over N7 (a la Zen 4). 1.5 x 1.25 = 1.875, so probably some loss in scaling too.

350-400mm2 for N31 GCD is in the ballpark of what I expect for a product that has 2.4x the shaders on an optimized node with 2x the density, and knowing that roughly half of the N21 die (520mm2) was shader logic itself. That math alone gets you to low 300mm2, but add back in larger structures, PHY for the fan out bridges, AV1 decode, etc, and it's not hard to imagine it being in the upper half of the 300mm2 range.

Assuming 65% perf/W increase, they could maybe squeeze out 2.5x 6900XT performance in a 450W TDP envelope. 1.65 x 1.5 = 2.475. This more or less lets them match a cut-down AD102 in the form of the RTX 4090, which is rumored to be 2.2x RTX 3090.
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
Assuming 65% perf/W increase, they could maybe squeeze out 2.5x 6900XT performance in a 450W TDP envelope. 1.65 x 1.5 = 2.475. This more or less lets them match a cut-down AD102 in the form of the RTX 4090, which is rumored to be 2.2x RTX 3090.

In terms of Flops maybe, gaming performance, I really doubt.
 
  • Like
Reactions: Leeea

dangerman1337

Senior member
Sep 16, 2010
333
5
81
About top RDNA 3 I think there's a possibility of a Navi "30"?

I mean I wonder if those rumours of 2.5-2.8x where this? I mean this is a multiple of Navi 23's 2048 & Navi 33's 4096.
In terms of Flops maybe, gaming performance, I really doubt.
I think it all comes to down whether if or not RDNA 3 TF = RDNA 2 TF in real world performance.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
The biggest "take with a grain of salt" item there is that AMD's claimed >50% perf/W applies to the 6nm version of RDNA3, which may or may not be the case. But, if we assume that it's a >50% perf/W gain across the entire stack, irrespective of node, then the 60-70% perf/W gains for the N5 variants is in-line with the +25% perf/W that N5 delivers over N7 (a la Zen 4). 1.5 x 1.25 = 1.875, so probably some loss in scaling too.

Yeah if that's what he said we can 100% conclude this is fake. The newer node that gives a perf per watt uplift by itself isn't the one that gives perf per watt uplift, and the giant blue thing hanging over everyones head all day isn't the sky.

Maybe we should just stop posting rumors from these people. Or start posting rumors from a ouija board, for all the sense it would make in comparison.
 

Karnak

Senior member
Jan 5, 2017
399
767
136
He was the one back in 2020 leaking the infinity cache for the first time though. He definetely got his sources regarding AMD stuff, at least on the GPU side. Ofc this doesn't mean everything is true, but still.

Benefit of the doubt, the IF$ wasn't just a random guess you could make.
 
  • Like
Reactions: Leeea

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
RDNA3 has dual issue wave32 vector op support.
Legacy geometry engine is gone.
I found the corresponding patent for what seems like the dual wave32 support in RDNA3.

As per LLVM commits RDNA3 can issue dual wave32 instructions which looks like what is described in this patent.

1655375279037.png

Each SIMD unit can do 2x FP32 whenever the operand cache can gather all the operands from the VGPR bank.

So whenever operand gather is optimal each CU of RDNA3 can do 2x the FP32 of RDNA2 CU per cycle.
When needed, RDNA3 can do 1 cycle wave64.

def FeatureVOPD : SubtargetFeature<"vopd",
"HasVOPDInsts",
"true",
"Has VOPD dual issue wave32 instructions"
>;
From latest bunch of commits, which seems to be the most I have seen to support a GPU architecture (much more than RDNA2) RDNA3 scatter gather support in LLVM was thoroughly reworked.

Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU
This mostly just tests that DPFP is 1/32 rate on GFX11, instead of 1/16
rate as on GFX10.
 

Aapje

Golden Member
Mar 21, 2022
1,311
1,773
106
true and hence they actually need to sell the coins they mine. If you mine on the side with your gaming rig, you can always bet on prices going up again and wait till then to sell to make it profitable again. I mined some I think in 2015. Was it profitable at that time? not really. But boy even with ETH down to $1k it payed for my next couple gpus.

If you want to speculate with the price going up, but it costs more to mine than you can currently sell the coins for, then it is cheaper to just buy the coins rather than mine them.
 
  • Like
Reactions: Tlh97 and maddie

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Neat, and another blow to the old leakers and rumors, getting rid of the double compute unit indeed, instead it's there and has increased efficiency (1 wave64 was RDNA2 yes, or was that just shared memory and two cycle? But now two independent wave32s when conditions are met as well at the very least)

I wonder what else their "re-architectured compute unit" implies.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,722
4,627
136
I found the corresponding patent for what seems like the dual wave32 support in RDNA3.

As per LLVM commits RDNA3 can issue dual wave32 instructions which looks like what is described in this patent.

View attachment 63171

Each SIMD unit can do 2x FP32 whenever the operand cache can gather all the operands from the VGPR bank.

So whenever operand gather is optimal each CU of RDNA3 can do 2x the FP32 of RDNA2 CU per cycle.
When needed, RDNA3 can do 1 cycle wave64.


From latest bunch of commits, which seems to be the most I have seen to support a GPU architecture (much more than RDNA2) RDNA3 scatter gather support in LLVM was thoroughly reworked.

Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU
Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU

How does this work. This makes no sense to me.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Neat, and another blow to the old leakers and rumors, getting rid of the double compute unit indeed, instead it's there and has increased efficiency (1 wave64 was RDNA2 yes? But now two independent wave32s when conditions are met as well?)

I wonder what else their "re-architectured compute unit" implies.
It was mentioned in this thread before, but maybe there's the chance for "VLIW2"-esque execution:
 

DisEnchantment

Golden Member
Mar 3, 2017
1,590
5,722
136
Also found the commit indicating RDNA3 has 1/2 DPFP of RDNA2 (i.e. 1/16 in RDNA2 vs 1/32 in DRNA3) throughput. which could support the idea of 2x FP throughput per CU

How does this work. This makes no sense to me.
What I meant is that, they did not really change VGPR width or throughput, it is comparable to RDNA2.
But with the updated operand cache they can gather the operands from multiple VGPR bank to feed the dual VALUs. The operand cache likely is not holding 64 bit operands for obvious reasons, i.e. area and power.
So the DPFP pipeline is similar to RDNA2 and cannot make full use of the 2x VALU pipe. i.e. the FP32 indeed gained 2x but DPFP didn't, so ratio went from 1:16 to 1:32.
Of course this is just what I am surmising based on circumstantial evidences and not what is written in an ISA manual.
 
Last edited:
  • Like
Reactions: maddie

GodisanAtheist

Diamond Member
Nov 16, 2006
6,716
7,006
136
Can the smarty pants in this thread break down this VLIW news a bit? I recall AMD was on VLIW5/4 with with their og Dx10/11 archs, but moved away from VLIW for GCN and up because of occupancy issues and the arch's weakness with compute workloads etc.

What does VLIW"2" do for AMD in modern workloads and how does it overcome the issues that got AMD to move away from it in the first place?
 
  • Like
Reactions: Thibsie and RnR_au