Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Joe NYC · Nov 6, 2022

rommelrommel said:
I think the chances of a respin are inversely proportional to how close RDNA 4 is. I believe these generational teams work mostly independently, so if RDNA 4 is going well and can maybe even be pushed a bit why respin? Maybe it’s a dream but if it’s 18 months off…

I think the chances of re-spin are proportional to the fix successfully applied to Navi 32 and Navi 32 hitting 3GHz + frequencies, with decent power efficiency.

DisEnchantment · Nov 6, 2022

AMD told the press the design will go above 3 GHz on 7900XTX, claimed as industry's first, why are folks doubling down on the opposite?
AMD has consistently demonstrated to be conservative and largely delivering on their perf claims lately. Be it CPUs and GPUs.
Would be weird to say that it can go to 3+ GHz if they knew it won't. Give it time we are 1+ month away from launch.

And problem fixed on N32 is contradicting what was being said about all current RDNA3 chips having the problem.
On top of the fact that N33 is not even a new process node, at least based on the rumor this being process related problem a la R520.
AMD knows N6/7 in and out, how many chips have they industrialized on this process.

To me it seems like a simpler problem, but of course bigger problem to corpo. How can they fail to execute? They have plastered in all corporate presentations +50% perf/W gains. Running it at 3+ GHz will make it fall short of that. Can't let that happen.

Let AIBs deal with the efficiency problem and provide the performance while they run at efficient levels to look good on the corpo slides.

Anyway, it is quite difficult to discern what is true, what is disinformation or just plain fiction from click farmers.

Rekluse · Nov 6, 2022

I remember Charlie from Semiaccurate giving a pretty comprehensive timetable on how long it took for a re-spin.

Cannot remember it now.

Curious if 9 - 12 months is enough, since RDNA4 is due 18 - 24 months from now.

biostud · Nov 6, 2022

Joe NYC said:
I have not really started seriously planning. Just in general:

B650E motherboard
hopefully an PCIe Gen5 M.2 drive.
8 core Zen 4 V-Cache
Navi 32 based card
Probably a pedestrian speed DDR5, 32 GB
I will need a new PSU, and may just as well get a new case, and leave the old PC intact as a back up.

So we are in the same boat.

As I have dedicated soundcard I'm going for the most basic B650E board, with a PCIE5 x16 and PCIE5 nvme slot. I'm going for pure air cooling.

So far:

ASRock B650E PG Riptide WiFi
DeepCool AK620 CPU cooler (has better reviews than DarkRock and Noctua)
Thermal Grizzly Kryonaut
Fractal Design Torrent (a bit to the expensive side, so might find a cheaper alternative)

And otherwise same as you

majord · Nov 6, 2022

biostud said:
While it will compete with

It is simply that you would expect x900 vs x090, x800 vs x080 etc. So when that does not match, it looks like something is wrong.

It's not like AMD haven't known performance and clocks for some time. There is a chance of some minor shipping clock adjustments after nailing down names, but nothing drastic.

They don't seem to be following that nomenclature 'rule' anyway - Both SKU's are 7900's , yet there aren't two levels of 4090 tier (yet)

TESKATLIPOKA · Nov 6, 2022

majord said:
When AMD price too close to Nvidia based on their Raster performance , everyone screams they need to lower prices because RT and feature set can't compete... 'they'll never gain market share' , 'Distrupt the market' , etc etc.

AMD come in undercutting Nvidia significantly:

"Something's wrong"
"it must be even slower than the 4080 in rasterization"
" should be renamed and be dropped to $949"

You guys are funny.
.......
Bit of a reality check people.. Raster perf and perf/watt is looking fine. Not amazing, not matching the random rumors started my morons, sure. but all things in the real world considered.. Fine.

Price is great for that raster performance and raster is also good. I don't think there is a need for additional price cut.
If AMD doesn't plan any stronger chip, which they most likely don't, then I don't think It matter If N31 will stay as RX 79** or RX78** series.

As for the chip/architecture itself. The only thing that's "wrong" is the RT performance. Yet everyone's fixated on the clock speeds not being through the roof, not beating the s**t out of 4090 (even at a mere 355w) and therfore it must have been botched. It's a Fermi, it's an R520...

Hello? , since when is a 50% increase in perf/watt, and 60% increase in performance vs a predecessor "Botched"

It's still a huge uplift over RDNA2 at the end of he day. It's also the first Gen Chiplet architecture, which no doubt has presented a host of challenges, and wouldn't come without some compromise..

Comparing to Nvidia's Gen on Gen - They've gone from an inferior 8nm SS process, to a Superior custom '4nm' process , so you can't even draw any parallels there either. It was always going to be a challenge to maintain status quo with Nvidia this gen because of this fact.

That 54% perf/W was comparing 6900XT vs 7900XTX at 300W.
If they compared 300 vs 355W, then It would be less, but to be fair, AMD could have compared 6950XT(335W) vs 7900XTX(355W) and It would probably be similar to 54%.
Even If the increase was only 40% ISO power, I couldn't say It was botched.
Ok, I would expect a lot more based on 54% increase with RDNA->RDN2 and Zen4's >15% perf. improvement, but that's my fault.
RT is weak, no surprise there, but It's also due to relatively low number of CU(WGPs) and clocks.
Based on leaks, not sure how accurate, there really is a problem, which is affecting the clockspeed and having 3GHz or 2.5GHz as the final clockspeed would mean up to 20% higher performance.

I am still a bit sad, they didn't make a bigger GCD.

Based on Locuza's table posted by DisEnchantment I made a table for bigger GCDs. Keep in mind, It is still only 384bit GDDR6 + 96MB IC, so the interconnects in the GCD which use up a lot of space stayed the same.

GCD size	CUs(WGPs)	SHADERs	TMUs	ROPs
300mm2	96(48)	12288	384	192
360mm2	128(64)	16384	512	256
410mm2	160(80)	20480	640	256

I personally like the middle one the best.

TESKATLIPOKA · Nov 6, 2022

Saylick said:
Seems like OC potential is limited by the silicon bug:
....
Some potential good news for those waiting for N32:

https://twitter.com/x/status/1589075249337085952

Saylick said:
A little bit more info. Take with a grain of salt since it's not coming from the horses mouth:

https://twitter.com/x/status/1589106651139014657

That's interesting. There is not just the problem of low achievable clocks, but also the problem with too high power consumption.
Achieving 2.8GHz at 450W provides only 5% better performance? The performance/frequency scaling is pretty bad and even clocks >3GHz won't provide so much performance.

TESKATLIPOKA · Nov 6, 2022

I had an interesting idea. Instead of having 6MCDs + 24GB GDDR6 Vram, what If AMD used HBM instead? Would HBM be still more expensive than what AMD used in RDNA3?
Would It have worse bandwidth/W ratio than what RDNA3 currently have?

Link to Hynix's HBM development history.

	Memory size	Memory width	Speed	Bandwidth
RX 7900XTX	24 GB	384-bit	20 GT/s	960 GB/s
Radeon VII	16GB (4*4 GB stacks)	4096-bit (4*1024-bit)	2 GT/s	1024 GB/s
HBM2E Jedec	up to 48 GB ( 4*12 GB per stack)	4096-bit (4*1024-bit)	2.4 GT/s	1228 GB/s
HBM2E Hynix	up to 64 GB (4*16 GB per stack)	4096-bit (4*1024-bit)	3.6 GT/s	1843 GB/s
HBM3 Hynix	up to 96 GB (4*24 GB per stack)	4096-bit (4*1024-bit)	6.4 GT/s	3277 GB/s

Ok, just the 96MB 2nd gen Infinity cache provides 4340 GB/s and that's more than even a 4 stack HBM3 can provide, but there is still the hit rate penalty. N22 with 96MB IC had ~53% hit rate at 4k. If there is a miss, then you have only 960GB/s.

You would have only 5 chips (1 GCD + 4 HBM stacks) instead of 19 chips (1 GCD + 6 MCD + 12 DDR6 chips) on a card and GCD could be even smaller thanks to less interconnects.
So I wonder If even HBM2 32GB Vram(4*8GB per stack) with 1843 GB/s wouldn't be good enough to replace It. Would It cost more, would It use more of the power budget?
What do you think?

DisEnchantment · Nov 6, 2022

TESKATLIPOKA said:
RT is weak, no surprise there, but It's also due to relatively low number of CU(WGPs) and clocks.

TESKATLIPOKA said:
I am still a bit sad, they didn't make a bigger GCD.

I personally like the middle one the best.

You can see that if they went for 128 CUs they get 128 Ray Units like 4090 for 360mm2 of GCD for a combined die size of 582 mm2 which is still less than AD102.
If you consider that they used roughly 50 mm2 of IF Interconnect area that is even much lesser die area for the actual GPU.

Well with 128 Ray units, and with the 3 GHz boost working as shown in that slide, for RT perf...
1.5x RT perf/CU * 1.6x CU * 1.25x Clock = 3x gain over 6900XT which is now in 4090 Tier RT.
So overall it is not so bad from architecture perspective, the N31 product is simply not scaled high enough to compete in the same segment.

But if you are not shopping in the 1500 USD range does it matter? So why feel sad about it.

TESKATLIPOKA said:
That's interesting. There is not just the problem of low achievable clocks, but also the problem with too high power consumption.
Achieving 2.8GHz at 450W provides only 5% better performance? The performance/frequency scaling is pretty bad and even clocks >3GHz won't provide so much performance.

All of this makes no sense.

Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?

PJVol · Nov 6, 2022

DisEnchantment said:
And problem fixed on N32 is contradicting what was being said about all current RDNA3 chips having the problem.

Yeah, that looks strange. Is it really possible to identify issue and fix it in a N32 within the reasonable timeframe?

TESKATLIPOKA · Nov 6, 2022

DisEnchantment said:
You can see that if they went for 128 CUs they get 128 Ray Units like 4090 for 360mm2 of GCD for a combined die size of 582 mm2 which is still less than AD102.
If you consider that they used roughly 50 mm2 of IF Interconnect area they that is even much lesser die area for the actual GPU.

Well with 128 Ray units, and with the 3 GHz boost working as shown in that slide
1.5x perf/CU * 1.6x CU * 1.25x Clock = 3x gain over 6900XT which is now in 4090 Tier RT.
So overall it is not so bad from architecture perspective, the N31 product is simply not scaled high enough to compete in the same segment.

But if you are not shopping in the 1500 USD range does it matter? So why feel sad about it.

All of this makes no sense.

Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?

Architecture is good, just the released product is a bit underwhelming, but price is very good. AMD was too conservative with specs in my opinion.

You are right, this price bracket is too high for what I am willing to invest in my casual gaming hobby, not to mention I need a portable machine due to my work.
I am sad because I am a die hard fan of hardware, and I am a bit partial to(prefer) AMD.

Let's say N32 will be really capable of 3.2GHz clocks, and It will scale almost linearly, which is pretty questionable based on leaks.
AMD can artificially set the clocks lower, so N32 won't be too close to 7900XT or they will set a higher price, the amount of Vram is 16GB, so that won't be criticized either.
I am more interested in performance scaling, because N32 should have only 3 shader engines and 10 WGPs per shader engine unlike N31 with 6 shader engines + 8 WGPs per SE.

DisEnchantment · Nov 6, 2022

PJVol said:
Yeah, that looks strange. Is it really possible to identify issue and fix it in a N32 within the reasonable timeframe?

Spinning out new GPU revisions does not take as long as CPUs. (We also make some ASICs using ARM cores on N7 family, just for background)
I have seen 2Qs for tapeout being mentioned. A respin lesser than that.
Because there is not so much V&V behind it like CPUs it also takes far less time to industrialize once it comes out from the fab. Compare that to CPUs that could take more than two years for server processors.
You can ship GPUs with several HW bugs and fix it with "driver updates". Just look at 5700XT.

TESKATLIPOKA said:
Let's say N32 will be really capable of 3.2GHz clocks, and It will scale almost linearly, which is pretty questionable.
AMD can artificially set the clocks lower, so N32 won't be too close to 7900XT or they will set a higher price, the amount of Vram is 16GB, so that won't be criticized either.
I am more interested in performance scaling when N32 should have only 3 shader engine and 10 WGPs per shader engine.

This clock limiting also makes no sense, because it prevents AMD to match a cheaper NV offering with a smaller die.
So something does not add up in all of this.

I am inclined to believe AMD, that N31 can scale to 3GHz+, at least until the product launches and we discover it is a dud.
And knowing Scott Herkelmann's history, he likes to play jebait games.

Timorous · Nov 6, 2022

DisEnchantment said:
You can see that if they went for 128 CUs they get 128 Ray Units like 4090 for 360mm2 of GCD for a combined die size of 582 mm2 which is still less than AD102.
If you consider that they used roughly 50 mm2 of IF Interconnect area that is even much lesser die area for the actual GPU.

Well with 128 Ray units, and with the 3 GHz boost working as shown in that slide, for RT perf...
1.5x RT perf/CU * 1.6x CU * 1.25x Clock = 3x gain over 6900XT which is now in 4090 Tier RT.
So overall it is not so bad from architecture perspective, the N31 product is simply not scaled high enough to compete in the same segment.

But if you are not shopping in the 1500 USD range does it matter? So why feel sad about it.

All of this makes no sense.

Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?

A decently clocked N32 would have more flops but less bandwidth so it could ve 5600XT vs 5700 over again. Former has more flops but at 4k the bandwidth difference matters and creates a differentiation.

exquisitechar · Nov 6, 2022

DisEnchantment said:
This clock limiting also makes no sense, because it prevents AMD to match a cheaper NV offering with a smaller die.
So something does not add up in all of this.

I am inclined to believe AMD, that N31 can scale to 3GHz+, at least until the product launches and we discover it is a dud.

Didn't they claim up to 25% power savings from the decoupled clocks? It looks like it will scale terribly with more power, which has been corroborated by admittedly questionable sources. Even if it does reach 3 GHz, it will do so at a ridiculous power consumption and I think we can forget about it exceeding it. Maybe the boost clocks according to specifications are basically fake/conservative and it's already not as far from 3 GHz as they imply, and the performance is simply underwhelming.

TESKATLIPOKA · Nov 6, 2022

DisEnchantment said:
This clock limiting also makes no sense, because it prevents AMD to match a cheaper NV offering with a smaller die.
So something does not add up in all of this.

If this is all true, then It looks like they can't fix N31 unless there is a new revision, so what else can they do? Either limit clocks and OC or set a higher price for N32.

I am inclined to believe AMD, that N31 can scale to 3GHz+, at least until the product launches and we discover it is a dud.
And knowing Scott Herkelmann's history, he likes to play jebait games.

AMD likes to save on silicon cost and use higher frequency If possible, so I am also inclined to believe RDNA3 should have clocked higher.

jpiniero · Nov 6, 2022

PJVol said:
Yeah, that looks strange. Is it really possible to identify issue and fix it in a N32 within the reasonable timeframe?

Yeah, if it was that easy they would have just delayed N31.

JayMX · Nov 6, 2022

Everybody talking about a "N31 bug" that no one has described so far. What is this bug? And how it affects clocks?

SteinFG · Nov 6, 2022

If it's actually a bug, it would be funny to see how AMD will word it after ~6 months, or how long it takes to tape out another design.
Introduce new 7950 XT and XTX with higher prices? then discontinue cheaper 7900 XT/XTX? hahahah.
And if N32 gets fixed and reaches 3GHz, it will probably perform on par with 7900XT at its 2GHz.
But pricing 7800XT on par with 7900XT will be a PR nightmare. What a mess for them XD.

biostud · Nov 6, 2022

SteinFG said:
If it's actually a bug, it would be funny to see how AMD will word it after ~6 months, or how long it takes to tape out another design.
Introduce new 7950 XT and XTX with higher prices? then discontinue cheaper 7900 XT/XTX? hahahah.
And if N32 gets fixed and reaches 3GHz, it will probably perform on par with 7900XT at its 2GHz.
But pricing 7800XT on par with 7900XT will be a PR nightmare. What a mess for them XD.

They can just do a fall refresh '23 with 7950XTX/XT clocked 20% higher with 10% price increase. (If the +3Ghz is really a possibility)

DisEnchantment · Nov 6, 2022

JayMX said:
Everybody talking about a "N31 bug" that no one has described so far. What is this bug? And how it affects clocks?

"He did not say that"...

JayMX · Nov 6, 2022

@DisEnchantment
Isn't this when he speaks about the 3GHz ability of the arch and that he does not think the chiplet design is holding back clocks? It is the same as in the leaked slide where the "Architected to exceed 3 GHz - Industry 1st" reference was shown.

Or am I missing something?

zinfamous · Nov 6, 2022

RnR_au said:
They should also add a 'HD' tag to their products to denote the DP2.1 capability.

RX 7950 XTX HD GHZ PWND edition.

Panino Manino · Nov 6, 2022

RT is really this low?

https://twitter.com/x/status/1588537489299763205

SteinFG · Nov 6, 2022

edit: Probably yes, 3090 Ti is ~1.5x RT perf of 6950XT, so 7900XTX will just reach the 3090Ti perf.

Mopetar · Nov 6, 2022

Yosar said:
AMD this time was not so strong on crypto market. Even miners preferred nVidia. GCN was much better for crypto than RDNA. But yeah AMD cards also benefited from crypto boom but nowhere near as nVidia. I doubt designing chips they were taking it into consideration prioritizing again high end cards (and that's not a fact even).

Although not all mining algorithms are equal, the most popular like Etherium used algorithms that were designed to be ASIC resistant by shifting the bottleneck to the memory system instead of the raw compute power of earlier algorithms.

This made NVidia cards with their wider memory buses and faster GDDR6X memory preferable since extra cache does very little to improve mining performance for those coins.

This time around it's actually AMD who'd probably be the target of miners. The 4080 only has a 256-but bus, so it still has less bandwidth than the 7900 XT even though that card has slower GDDR6 memory. The 7900 XTX has about as much bandwidth as a 4090 at about 60% of the price.

3000-series cards would probably be the main beneficiary as they'd be just as good (or better) than this generation of cards in most cases. NVidia could probably mop up by removing LHR limiters and producing more Ampere on Samsung 8nm since it doesn't interfere with Ada production for the most part.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Golden Member

Member

Lifer

Senior member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Senior member

Platinum Member

Golden Member

Golden Member

Senior member

Platinum Member

Lifer

Member

Senior member

Lifer

Golden Member

Member

No Lifer

Golden Member

Senior member

Diamond Member