Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 121 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,632
5,959
146

Joe NYC

Golden Member
Jun 26, 2021
1,948
2,288
106
I think the chances of a respin are inversely proportional to how close RDNA 4 is. I believe these generational teams work mostly independently, so if RDNA 4 is going well and can maybe even be pushed a bit why respin? Maybe it’s a dream but if it’s 18 months off…

I think the chances of re-spin are proportional to the fix successfully applied to Navi 32 and Navi 32 hitting 3GHz + frequencies, with decent power efficiency.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
AMD told the press the design will go above 3 GHz on 7900XTX, claimed as industry's first, why are folks doubling down on the opposite?
AMD has consistently demonstrated to be conservative and largely delivering on their perf claims lately. Be it CPUs and GPUs.
Would be weird to say that it can go to 3+ GHz if they knew it won't. Give it time we are 1+ month away from launch.

And problem fixed on N32 is contradicting what was being said about all current RDNA3 chips having the problem.
On top of the fact that N33 is not even a new process node, at least based on the rumor this being process related problem a la R520.
AMD knows N6/7 in and out, how many chips have they industrialized on this process.

To me it seems like a simpler problem, but of course bigger problem to corpo. How can they fail to execute? They have plastered in all corporate presentations +50% perf/W gains. Running it at 3+ GHz will make it fall short of that. Can't let that happen.

Let AIBs deal with the efficiency problem and provide the performance while they run at efficient levels to look good on the corpo slides.

Anyway, it is quite difficult to discern what is true, what is disinformation or just plain fiction from click farmers.
 
Last edited:

Rekluse

Junior Member
Sep 16, 2022
19
18
41
I remember Charlie from Semiaccurate giving a pretty comprehensive timetable on how long it took for a re-spin.

Cannot remember it now.

Curious if 9 - 12 months is enough, since RDNA4 is due 18 - 24 months from now.
 

biostud

Lifer
Feb 27, 2003
18,251
4,764
136
I have not really started seriously planning. Just in general:

B650E motherboard
hopefully an PCIe Gen5 M.2 drive.
8 core Zen 4 V-Cache
Navi 32 based card
Probably a pedestrian speed DDR5, 32 GB
I will need a new PSU, and may just as well get a new case, and leave the old PC intact as a back up.
So we are in the same boat.

As I have dedicated soundcard I'm going for the most basic B650E board, with a PCIE5 x16 and PCIE5 nvme slot. I'm going for pure air cooling.

So far:

ASRock B650E PG Riptide WiFi
DeepCool AK620 CPU cooler (has better reviews than DarkRock and Noctua)
Thermal Grizzly Kryonaut
Fractal Design Torrent (a bit to the expensive side, so might find a cheaper alternative)

And otherwise same as you :)
 

majord

Senior member
Jul 26, 2015
433
523
136
While it will compete with

It is simply that you would expect x900 vs x090, x800 vs x080 etc. So when that does not match, it looks like something is wrong.

It's not like AMD haven't known performance and clocks for some time. There is a chance of some minor shipping clock adjustments after nailing down names, but nothing drastic.

They don't seem to be following that nomenclature 'rule' anyway - Both SKU's are 7900's , yet there aren't two levels of 4090 tier (yet)
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
When AMD price too close to Nvidia based on their Raster performance , everyone screams they need to lower prices because RT and feature set can't compete... 'they'll never gain market share' , 'Distrupt the market' , etc etc.

AMD come in undercutting Nvidia significantly:

"Something's wrong"
"it must be even slower than the 4080 in rasterization"
" should be renamed and be dropped to $949"

You guys are funny.
.......
Bit of a reality check people.. Raster perf and perf/watt is looking fine. Not amazing, not matching the random rumors started my morons, sure. but all things in the real world considered.. Fine.
Price is great for that raster performance and raster is also good. I don't think there is a need for additional price cut.
If AMD doesn't plan any stronger chip, which they most likely don't, then I don't think It matter If N31 will stay as RX 79** or RX78** series.
As for the chip/architecture itself. The only thing that's "wrong" is the RT performance. Yet everyone's fixated on the clock speeds not being through the roof, not beating the s**t out of 4090 (even at a mere 355w) and therfore it must have been botched. It's a Fermi, it's an R520...

Hello? , since when is a 50% increase in perf/watt, and 60% increase in performance vs a predecessor "Botched"

It's still a huge uplift over RDNA2 at the end of he day. It's also the first Gen Chiplet architecture, which no doubt has presented a host of challenges, and wouldn't come without some compromise..

Comparing to Nvidia's Gen on Gen - They've gone from an inferior 8nm SS process, to a Superior custom '4nm' process , so you can't even draw any parallels there either. It was always going to be a challenge to maintain status quo with Nvidia this gen because of this fact.
That 54% perf/W was comparing 6900XT vs 7900XTX at 300W.
If they compared 300 vs 355W, then It would be less, but to be fair, AMD could have compared 6950XT(335W) vs 7900XTX(355W) and It would probably be similar to 54%.
Even If the increase was only 40% ISO power, I couldn't say It was botched.
Ok, I would expect a lot more based on 54% increase with RDNA->RDN2 and Zen4's >15% perf. improvement, but that's my fault.
RT is weak, no surprise there, but It's also due to relatively low number of CU(WGPs) and clocks.
Based on leaks, not sure how accurate, there really is a problem, which is affecting the clockspeed and having 3GHz or 2.5GHz as the final clockspeed would mean up to 20% higher performance.

I am still a bit sad, they didn't make a bigger GCD.

Based on Locuza's table posted by DisEnchantment I made a table for bigger GCDs. Keep in mind, It is still only 384bit GDDR6 + 96MB IC, so the interconnects in the GCD which use up a lot of space stayed the same.
GCD sizeCUs(WGPs)SHADERsTMUsROPs
300mm296(48)12288384192
360mm2128(64)16384512256
410mm2160(80)20480640256
I personally like the middle one the best.
 
Last edited:
  • Like
Reactions: Tlh97 and Kaluan

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
Seems like OC potential is limited by the silicon bug:
....
Some potential good news for those waiting for N32:
A little bit more info. Take with a grain of salt since it's not coming from the horses mouth:
That's interesting. There is not just the problem of low achievable clocks, but also the problem with too high power consumption.
Achieving 2.8GHz at 450W provides only 5% better performance? The performance/frequency scaling is pretty bad and even clocks >3GHz won't provide so much performance.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
I had an interesting idea. Instead of having 6MCDs + 24GB GDDR6 Vram, what If AMD used HBM instead? Would HBM be still more expensive than what AMD used in RDNA3?
Would It have worse bandwidth/W ratio than what RDNA3 currently have?

Link to Hynix's HBM development history.
Memory sizeMemory widthSpeedBandwidth
RX 7900XTX24 GB384-bit20 GT/s960 GB/s
Radeon VII16GB (4*4 GB stacks)4096-bit (4*1024-bit)2 GT/s1024 GB/s
HBM2E Jedecup to 48 GB ( 4*12 GB per stack)4096-bit (4*1024-bit)2.4 GT/s1228 GB/s
HBM2E Hynixup to 64 GB (4*16 GB per stack)4096-bit (4*1024-bit)3.6 GT/s1843 GB/s
HBM3 Hynixup to 96 GB (4*24 GB per stack)4096-bit (4*1024-bit)6.4 GT/s3277 GB/s

Ok, just the 96MB 2nd gen Infinity cache provides 4340 GB/s and that's more than even a 4 stack HBM3 can provide, but there is still the hit rate penalty. N22 with 96MB IC had ~53% hit rate at 4k. If there is a miss, then you have only 960GB/s.
Navi-2-1536x864.jpg
You would have only 5 chips (1 GCD + 4 HBM stacks) instead of 19 chips (1 GCD + 6 MCD + 12 DDR6 chips) on a card and GCD could be even smaller thanks to less interconnects.
So I wonder If even HBM2 32GB Vram(4*8GB per stack) with 1843 GB/s wouldn't be good enough to replace It. Would It cost more, would It use more of the power budget?
What do you think?
 
Last edited:

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
RT is weak, no surprise there, but It's also due to relatively low number of CU(WGPs) and clocks.
I am still a bit sad, they didn't make a bigger GCD.

I personally like the middle one the best.
You can see that if they went for 128 CUs they get 128 Ray Units like 4090 for 360mm2 of GCD for a combined die size of 582 mm2 which is still less than AD102.
If you consider that they used roughly 50 mm2 of IF Interconnect area that is even much lesser die area for the actual GPU.

Well with 128 Ray units, and with the 3 GHz boost working as shown in that slide, for RT perf...
1.5x RT perf/CU * 1.6x CU * 1.25x Clock = 3x gain over 6900XT which is now in 4090 Tier RT.
So overall it is not so bad from architecture perspective, the N31 product is simply not scaled high enough to compete in the same segment.

But if you are not shopping in the 1500 USD range does it matter? So why feel sad about it.

That's interesting. There is not just the problem of low achievable clocks, but also the problem with too high power consumption.
Achieving 2.8GHz at 450W provides only 5% better performance? The performance/frequency scaling is pretty bad and even clocks >3GHz won't provide so much performance.
All of this makes no sense.

Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
You can see that if they went for 128 CUs they get 128 Ray Units like 4090 for 360mm2 of GCD for a combined die size of 582 mm2 which is still less than AD102.
If you consider that they used roughly 50 mm2 of IF Interconnect area they that is even much lesser die area for the actual GPU.

Well with 128 Ray units, and with the 3 GHz boost working as shown in that slide
1.5x perf/CU * 1.6x CU * 1.25x Clock = 3x gain over 6900XT which is now in 4090 Tier RT.
So overall it is not so bad from architecture perspective, the N31 product is simply not scaled high enough to compete in the same segment.

But if you are not shopping in the 1500 USD range does it matter? So why feel sad about it.


All of this makes no sense.

Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?
Architecture is good, just the released product is a bit underwhelming, but price is very good. AMD was too conservative with specs in my opinion.

You are right, this price bracket is too high for what I am willing to invest in my casual gaming hobby, not to mention I need a portable machine due to my work.
I am sad because I am a die hard fan of hardware, and I am a bit partial to(prefer) AMD.:D

Let's say N32 will be really capable of 3.2GHz clocks, and It will scale almost linearly, which is pretty questionable based on leaks.
AMD can artificially set the clocks lower, so N32 won't be too close to 7900XT or they will set a higher price, the amount of Vram is 16GB, so that won't be criticized either.
I am more interested in performance scaling, because N32 should have only 3 shader engines and 10 WGPs per shader engine unlike N31 with 6 shader engines + 8 WGPs per SE.
 
Last edited:
  • Like
Reactions: scineram

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
Yeah, that looks strange. Is it really possible to identify issue and fix it in a N32 within the reasonable timeframe?
Spinning out new GPU revisions does not take as long as CPUs. (We also make some ASICs using ARM cores on N7 family, just for background)
I have seen 2Qs for tapeout being mentioned. A respin lesser than that.
Because there is not so much V&V behind it like CPUs it also takes far less time to industrialize once it comes out from the fab. Compare that to CPUs that could take more than two years for server processors.
You can ship GPUs with several HW bugs and fix it with "driver updates". Just look at 5700XT.

Let's say N32 will be really capable of 3.2GHz clocks, and It will scale almost linearly, which is pretty questionable.
AMD can artificially set the clocks lower, so N32 won't be too close to 7900XT or they will set a higher price, the amount of Vram is 16GB, so that won't be criticized either.
I am more interested in performance scaling when N32 should have only 3 shader engine and 10 WGPs per shader engine.
This clock limiting also makes no sense, because it prevents AMD to match a cheaper NV offering with a smaller die.
So something does not add up in all of this.

I am inclined to believe AMD, that N31 can scale to 3GHz+, at least until the product launches and we discover it is a dud.
And knowing Scott Herkelmann's history, he likes to play jebait games.
 
Last edited:

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
You can see that if they went for 128 CUs they get 128 Ray Units like 4090 for 360mm2 of GCD for a combined die size of 582 mm2 which is still less than AD102.
If you consider that they used roughly 50 mm2 of IF Interconnect area that is even much lesser die area for the actual GPU.

Well with 128 Ray units, and with the 3 GHz boost working as shown in that slide, for RT perf...
1.5x RT perf/CU * 1.6x CU * 1.25x Clock = 3x gain over 6900XT which is now in 4090 Tier RT.
So overall it is not so bad from architecture perspective, the N31 product is simply not scaled high enough to compete in the same segment.

But if you are not shopping in the 1500 USD range does it matter? So why feel sad about it.


All of this makes no sense.

Lets say 7900XT is gimped to max 2.4 GHz boost. But N32 is not, and is 'fixed' according to that rumor.
So a high boosting N32 at 3.2 GHz+ will match it in perf but costing lesser. Not sure if BW will be an issue for these chips, they have way too much BW.
Will AMD launch a gimped 84CU 7900XT at 2.4 GHz to be beaten by a 7700XT/7800XT at 3.2GHz+
How will the 7900XT buyer receive that?

A decently clocked N32 would have more flops but less bandwidth so it could ve 5600XT vs 5700 over again. Former has more flops but at 4k the bandwidth difference matters and creates a differentiation.
 

exquisitechar

Senior member
Apr 18, 2017
657
871
136
This clock limiting also makes no sense, because it prevents AMD to match a cheaper NV offering with a smaller die.
So something does not add up in all of this.

I am inclined to believe AMD, that N31 can scale to 3GHz+, at least until the product launches and we discover it is a dud.
Didn't they claim up to 25% power savings from the decoupled clocks? It looks like it will scale terribly with more power, which has been corroborated by admittedly questionable sources. Even if it does reach 3 GHz, it will do so at a ridiculous power consumption and I think we can forget about it exceeding it. Maybe the boost clocks according to specifications are basically fake/conservative and it's already not as far from 3 GHz as they imply, and the performance is simply underwhelming.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
This clock limiting also makes no sense, because it prevents AMD to match a cheaper NV offering with a smaller die.
So something does not add up in all of this.
If this is all true, then It looks like they can't fix N31 unless there is a new revision, so what else can they do? Either limit clocks and OC or set a higher price for N32.

I am inclined to believe AMD, that N31 can scale to 3GHz+, at least until the product launches and we discover it is a dud.
And knowing Scott Herkelmann's history, he likes to play jebait games.
AMD likes to save on silicon cost and use higher frequency If possible, so I am also inclined to believe RDNA3 should have clocked higher.
 
Last edited:
  • Like
Reactions: Kaluan

SteinFG

Senior member
Dec 29, 2021
420
472
106
If it's actually a bug, it would be funny to see how AMD will word it after ~6 months, or how long it takes to tape out another design.
Introduce new 7950 XT and XTX with higher prices? then discontinue cheaper 7900 XT/XTX? hahahah.
And if N32 gets fixed and reaches 3GHz, it will probably perform on par with 7900XT at its 2GHz.
But pricing 7800XT on par with 7900XT will be a PR nightmare. What a mess for them XD.
 

biostud

Lifer
Feb 27, 2003
18,251
4,764
136
If it's actually a bug, it would be funny to see how AMD will word it after ~6 months, or how long it takes to tape out another design.
Introduce new 7950 XT and XTX with higher prices? then discontinue cheaper 7900 XT/XTX? hahahah.
And if N32 gets fixed and reaches 3GHz, it will probably perform on par with 7900XT at its 2GHz.
But pricing 7800XT on par with 7900XT will be a PR nightmare. What a mess for them XD.

They can just do a fall refresh '23 with 7950XTX/XT clocked 20% higher with 10% price increase. (If the +3Ghz is really a possibility)
 
  • Like
Reactions: Tlh97 and Joe NYC

JayMX

Member
Oct 18, 2022
31
73
51
@DisEnchantment
Isn't this when he speaks about the 3GHz ability of the arch and that he does not think the chiplet design is holding back clocks? It is the same as in the leaked slide where the "Architected to exceed 3 GHz - Industry 1st" reference was shown.

Or am I missing something? :)
 

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
AMD this time was not so strong on crypto market. Even miners preferred nVidia. GCN was much better for crypto than RDNA. But yeah AMD cards also benefited from crypto boom but nowhere near as nVidia. I doubt designing chips they were taking it into consideration prioritizing again high end cards (and that's not a fact even).

Although not all mining algorithms are equal, the most popular like Etherium used algorithms that were designed to be ASIC resistant by shifting the bottleneck to the memory system instead of the raw compute power of earlier algorithms.

This made NVidia cards with their wider memory buses and faster GDDR6X memory preferable since extra cache does very little to improve mining performance for those coins.

This time around it's actually AMD who'd probably be the target of miners. The 4080 only has a 256-but bus, so it still has less bandwidth than the 7900 XT even though that card has slower GDDR6 memory. The 7900 XTX has about as much bandwidth as a 4090 at about 60% of the price.

3000-series cards would probably be the main beneficiary as they'd be just as good (or better) than this generation of cards in most cases. NVidia could probably mop up by removing LHR limiters and producing more Ampere on Samsung 8nm since it doesn't interfere with Ada production for the most part.