Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

Timorous · Nov 15, 2022

AtenRa said:
Just to get a rough idea on RT performance of the RX7900XTX

Ill use the AMD slide and Kitguru RTX4080 review since I find it to have similar fps on the RX6050XT with AMDs own slides.

Unfortunately the review only have the CP77 and Resident Evil.

Cyberpunk 77 4K RayTracing (no DLSS/FSR)

4080 = 28.1fps

7900XTX = 21fps

6950XT = 13fps

Resident Evil : Village 4K RayTracing (no DLSS/FSR)

4080 =138.4fps

7900XTX = 135fps

6950XT = 92fps

https://cdn.videocardz.com/1/2022/11/RADEON-RX-7900-2.jpg

Nvidia RTX 4080 Founders Edition Review - KitGuru

Today's launch would have marked the release of the first of two RTX 4080 SKUs... but as we know, th

www.kitguru.net

This is always tricky to judge but I think if we go by the deltas between the 6950XT and the 7900XTX in the slides AMD showed we can get some ballpark info.

Using these numbers the XTX gets 134 in RE:V and 19.9 in CP2077

At Techspot we see Dying light hit 36 fps vs the 4080s 39 fps and CP2077 there gets 21 fps vs the 4080 31fps.

At TPU the same scaling would get a 7900XTX to 20.6 FPs vs the 4080 29FPS. RE:V is 120.6 vs 120.6

So CP2077 will probably be a big win for the 4080 in RT but the RE:V and Dying Light look pretty close with the edge towards the 4080 but by far far less than the price increase.

At 4K the 4080 does seem to lose a bit more performance when turning RT on than the 4090 does.

Joe NYC · Nov 15, 2022

TESKATLIPOKA said:
I am good at waiting.
BTW you forgot about N33. Although only N6, we can see the difference just from architectural improvement.

With a glut of N6 capacity, AMD can crank these out and sell them at competitive prices.

GodisanAtheist · Nov 15, 2022

Joe NYC said:
With a glut of N6 capacity, AMD can crank these out and sell them at competitive prices.

- Ok looks like we're back to N33 having 2048 Dual Pumped SPs, so take a 6600xt * 17.5% IPC bump is essentially what we're looking at.

No way the N33 hits 6900xt or even 6800 levels I think. May be closer to a 6750xt at the end of the day, but will go for peanuts thanks to a basic design and cheap manufacturing process.

Edit: let's say AMD gets it to clock to 2.8 -3.0 GHz, in which case we're looking at a solid 6800 competitor for likely $399. Not shabby.

Joe NYC · Nov 15, 2022

GodisanAtheist said:
- Ok looks like we're back to N33 having 2048 Dual Pumped SPs, so take a 6600xt * 17.5% IPC bump is essentially what we're looking at.

No way the N33 hits 6900xt or even 6800 levels I think. May be closer to a 6750xt at the end of the day, but will go for peanuts thanks to a basic design and cheap manufacturing process.

Edit: let's say AMD gets it to clock to 2.8 -3.0 GHz, in which case we're looking at a solid 6800 competitor for likely $399. Not shabby.

N33 does not have to beat N21. That was a fantasy scenario that was hard to believe. If for no other reason, the best case scenario would be 1/2 of the memory bandwidth of N21.

With the die size of ~N23, and the cost of N23, AMD can sell these $300 range.

Kaluan · Nov 15, 2022

Timorous said:
This is always tricky to judge but I think if we go by the deltas between the 6950XT and the 7900XTX in the slides AMD showed we can get some ballpark info.

Using these numbers the XTX gets 134 in RE:V and 19.9 in CP2077

At Techspot we see Dying light hit 36 fps vs the 4080s 39 fps and CP2077 there gets 21 fps vs the 4080 31fps.

At TPU the same scaling would get a 7900XTX to 20.6 FPs vs the 4080 29FPS. RE:V is 120.6 vs 120.6

So CP2077 will probably be a big win for the 4080 in RT but the RE:V and Dying Light look pretty close with the edge towards the 4080 but by far far less than the price increase.

At 4K the 4080 does seem to lose a bit more performance when turning RT on than the 4090 does.

Native 4K RT is a relevant metric, but realistically, no one would run that, no one buys +$1K GPUs to play at ~30FPS.

Almost everyone who really wants RT at 4K would use upscaling tech to achieve good framerates (where applicable).

This is why how N31 (and AD103) performs at native 1080-1440p RT would be more relevant, just add a approximation of FSR/DLSS overhead. But we have zero sub-4K numbers from AMD, so eh.

Joe NYC said:
N33 does not have to beat N21. That was a fantasy scenario that was hard to believe. If for no other reason, the best case scenario would be 1/2 of the memory bandwidth of N21.

With the die size of ~N23, and the cost of N23, AMD can sell these $300 range.

The N33* die itself should be quite a bit cheaper than N33, it's <86% of N23 and on N6 (which is allegedly cheaper to produce than N7/N7+/N7P).

IDK the GDDR clocks, but even if they'd go 20Gbps instead of 18 (don't think they'll use 16 again), the costs likely won't be obscene for a 8GB setup.

TESKATLIPOKA · Nov 16, 2022

Kaluan said:
The N33 die itself should be quite a bit cheaper than N23, it's <86% of N23 and on N6 (which is allegedly cheaper to produce than N7/N7+/N7P).

Fixed It for you.

GodisanAtheist said:
- Ok looks like we're back to N33 having 2048 Dual Pumped SPs, so take a 6600xt * 17.5% IPC bump is essentially what we're looking at.

No way the N33 hits 6900xt or even 6800 levels I think. May be closer to a 6750xt at the end of the day, but will go for peanuts thanks to a basic design and cheap manufacturing process.

Edit: let's say AMD gets it to clock to 2.8 -3.0 GHz, in which case we're looking at a solid 6800 competitor for likely $399. Not shabby.

RX 6600 XT has 2359 MHz game frequency.
N33 -> 2800MHz gaming frequency.
IPC should be lower than 17.5%, let's say 15%.
Gaming frequency of 3GHz would mean 100*1.15*[2800/2359]=137%

TPU	RX 6600xt	RX 6700xt	RX 6800
1080p	100%	122%	141%
1440p	100%	127%	155%
2160p	100%	141%	181%

Hopefully they won't go over $399, then It will be a good card.

uzzi38 · Nov 16, 2022

Kaluan said:
The N32 die itself should be quite a bit cheaper than N33, it's <86% of N23 and on N6 (which is allegedly cheaper to produce than N7/N7+/N7P).

PS: N6 is actually ever so slightly more expensive for a given die area, but not by much. Like, cost/transistor is basically flat from N7.

Either way, your point is still correct, N33 should be cheaper to produce than N23.

uzzi38 · Nov 16, 2022

TESKATLIPOKA said:
I am good at waiting.
BTW you forgot about N33. Although only N6, we can see the difference just from architectural improvement.

I didn't, I left it out intentionally

Timorous · Nov 16, 2022

uzzi38 said:
PS: N6 is actually ever so slightly more expensive for a given die area, but not by much. Like, cost/transistor is basically flat from N7.

Either way, your point is still correct, N33 should be cheaper to produce than N23.

TSMC were pushing companies to use N6 because it has fewer steps so higher WPM. They were offering discounts to move to that node vs N7 so unless TSMC have jacked up prices now more people have moved I am not sure this is actually correct.

DisEnchantment · Nov 16, 2022

Timorous said:
TSMC were pushing companies to use N6 because it has fewer steps so higher WPM. They were offering discounts to move to that node vs N7 so unless TSMC have jacked up prices now more people have moved I am not sure this is actually correct.

Jacked up prices of N6 forcing people to stay on N7 and then N7/N6 utilization at 50%. Some things are not adding up.

Capacity utilization rates for TSMC's 7nm process platform and its process variants N6, N7/N6 have fallen below 50%, according to industry sources.

TSMC 7nm process capacity utilization falling rapidly

Capacity utilization rates for TSMC's 7nm process platform consisting of N7, N6, and their process variants have fallen below 50%, according to industry sources.

www.digitimes.com

DigiTimes usually downplay pretty much every one else other than TSMC, usually their TSMC info is very good.

BTW we also have ASICs on N7, but meaningless to disclose anything (even if I want to) because everyone pays a different rate for wafers. It is not a commodity item you buy from the fish market, you enter in a business contract for wafer supply. You take a penalty if you don't accept the delivery of wafers and vice versa.

maddie · Nov 16, 2022

DisEnchantment said:
Jacked up prices of N6 forcing people to stay on N7 and then N7/N6 utilization at 50%. Some things are not adding up.

TSMC 7nm process capacity utilization falling rapidly

Capacity utilization rates for TSMC's 7nm process platform consisting of N7, N6, and their process variants have fallen below 50%, according to industry sources.

www.digitimes.com

DigiTimes usually downplay pretty much every one else other than TSMC, usually their TSMC info is very good.

BTW we also have ASICs on N7, but meaningless to disclose anything (even if I want to) because everyone pays a different rate for wafers. It is not a commodity item you buy from the fish market, you enter in a business contract for wafer supply. You take a penalty if you don't accept the delivery of wafers and vice versa.

Can you at least say if going up or down?

Kaluan · Nov 16, 2022

TESKATLIPOKA said:
Fixed It for you.

RX 6600 XT has 2359 MHz game frequency.
N33 -> 2800MHz gaming frequency.
IPC should be lower than 17.5%, let's say 15%.
Gaming frequency of 3GHz would mean 100*1.15*[2800/2359]=137%

TPU RX 6600xt RX 6700xt RX 6800
1080p 100% 122% 141%
1440p 100% 127% 155%
2160p 100% 141% 181%

Hopefully they won't go over $399, then It will be a good card.

Thanks, was incredibly tired and sleepy when I wrote that.

RDNA3 CU scaling is also a factor, hard to account for (so far, from 1st party data, N31 5376 SP to N31 6144 SP seem to scale very well). Then there's the reduced VGPR size on N33, which may or may not affect performance on lower CU count designs like it.

RX 6800 perf @ 1080 to 1440p @ ~170W is probably a safe bet, for now. If they price them a bit more aggressively than N23 launch MSRP (and consider inflation), I keep thinking they could finally have a true Polaris successor in the market.

PJVol · Nov 16, 2022

Is it me or do the recent slides clearly indicate the N31 raster performance (unless the math below is flawed somewhere) ?
80CU->96CU - 1.20x
IPC - 1.174x ( RX-810 endnote, "minimal uplift" based on tests in ~ 20 games and 5 synthetic benchmarks)
Clocks (~10% -- I added 1% to the 9% boost clock increase from 6950xt -> 7900xtx) - 1.1x
--------------------------
Total +55% perfomance to that of 6950xt

SteinFG · Nov 16, 2022

PJVol said:
Is it me or do the recent slides clearly indicate the N31 raster performance (unless the math below is flawed somewhere) ?
80CU->96CU - 1.20x
IPC - 1.174x ( RX-810 endnote, "minimal uplift" based on tests in ~ 20 games and 5 synthetic benchmarks)
Clocks (~10% -- I added 1% to the 9% boost clock increase from 6950xt -> 7900xtx) - 1.1x
--------------------------
Total +55% perfomance to that of 6950xt

Well yes, AMD already releasd slides with 4K game FPS. it's basically +55%

Mopetar · Nov 16, 2022

Karnak said:
Not gonna lie the so called 7"9"00XT looks like poor value. On it's own but also compared to the XTX.

Naming doesn't fit IMO. Should've been 7800XT and 7900XT, although I think AMD can't go below that $899 price tag without sacrificing margins and that's why it's called 7900XT and not 7800XT.

The chiplet based design does give them more flexibility in terms of how many of those they produce. I think that AMD took a page out of Apple's book and are using the 7900XT as the entry level model that is just there to make the next level (only $100 more) look so much better by comparison.

Depending on what Navi 32 winds up looking like, the 7900XT might have a hard time justifying itself against a 16 GB card if it clocks a lot higher. Especially if that card tops out at $600.

Saylick · Nov 16, 2022

Saw this Tweet where WildC calculates the energy usage for the fan out for the MCDs:

https://twitter.com/x/status/1592927593522946049

Now, 17W divided by 5% implies a GPU power of 340W, which is rather high?

If we assume that the effective bandwidth is more appropriate, rather than the 5.3TB/s of peak bandwidth, then the fan out costs 11W effective rather than 17W. This would mean a GPU power of 221W minimum, which is much more in line with my expectations. Ultimately, 11W is really a small price to pay to enable MCM/chiplets for GPUs. There's going to be a latency hit of course, but GPUs have historically been good at hiding latency. Besides, the latency of a cache hit is still far lower than going out to VRAM.

SteinFG · Nov 16, 2022

Saylick said:
Now, 17W divided by 5% implies a GPU power of 340W, which is rather high?

No. 7900 XTX TDP is 355W, and it just seems like they were basing 5% claim from 355W.

Saylick said:
If we assume that the effective bandwidth is more appropriate, rather than the 5.3TB/s of peak bandwidth, then the fan out costs 11W effective rather than 17W.

AMD page on the XTX GPU states: "Effective Memory Bandwidth: Up to 3500 GB/s"
So 3.5TB/s is tops, not 5.3.
The 5.3 figure is probably max link speed, but not max data speed. Like with DDR memory, you can't reach theoretical 3200*128/8 = 51.2GB/s on DDR4-3200, even if link speed is equal to that.

Saylick said:
This would mean a GPU power of 221W minimum, which is much more in line with my expectations

How does a 221W power consumption be more in line than 340W, if the subject is a 355W+ GPU.

Kaluan · Nov 16, 2022

PJVol said:
Is it me or do the recent slides clearly indicate the N31 raster performance (unless the math below is flawed somewhere) ?
80CU->96CU - 1.20x
IPC - 1.174x ( RX-810 endnote, "minimal uplift" based on tests in ~ 20 games and 5 synthetic benchmarks)
Clocks (~10% -- I added 1% to the 9% boost clock increase from 6950xt -> 7900xtx) - 1.1x
--------------------------
Total +55% perfomance to that of 6950xt

Here's the rest of the data extrapolated from the performance uplift numbers they've shown:

6950XT to 7900XTX
~55% (4K raster)
~68% (4K RT)
~62% (4K combined)

6950XT to 7900XT
~30% (4K raster)
~46% (4K RT)
~38% (4K combined)

7900XT to 7900XTX
~19% (4K raster)
~15% (4K RT)
~17% (4K combined)

BTW, there's some more numbers (some new) they have up on their RX 7900 series master page, but I don't care for comparing them to 3rd party, data might be messy/misleading:

Saylick · Nov 16, 2022

SteinFG said:
No. 7900 XTX TDP is 355W, and it just seems like they were basing 5% claim from 355W.

AMD page on the XTX GPU states: "Effective Memory Bandwidth: Up to 3500 GB/s"
So 3.5TB/s is tops, not 5.3.
The 5.3 figure is probably max link speed, but not max data speed. Like with DDR memory, you can't reach theoretical 3200*128/8 = 51.2GB/s on DDR4-3200, even if link speed is equal to that.

How does a 221W power consumption be more in line than 340W, if the subject is a 355W+ GPU.

First off, great questions. Same ones I asked myself when I tried to make sense of it.

The 7900XTX has a TDP/TBP of 355W but that includes EVERYTHING on the board. When I said GPU power, I meant the power from the N31 MCM package itself. At least that's my interpretation of the term when AMD uses it.

Igor's Lab had an article where they estimated the GPU power for the 3090 and they figured the die itself consumes roughly 230W, even though the total card is rated for 350W.

AMD said that the fan out costs <5% of the GPU power. I think we can agree that the GPU power means only the GPU die itself, and does not include the memory modules, VRM losses, etc. Taking 11W and dividing by 5% gives us 221W, which is closer to what Igor got for his estimate.

Secondly, regarding your question about the transfer rate for the MCDs. AMD reports that the peak bandwidth that can be transferred is 5.3 TB/s, but everyone knows that peak does not equal effective because it won't be delivering the peak bandwidth 100% of the time. I interpret that 5.3 TB/s as the measured transfer rate, not the link speed, so if you don't want to believe AMD on their choice of words, that's fine by me.

Going off of Wild_C's tweet, the peak power demand for the fan out should be 17W but if we assume the effective bandwidth is representative of the average or typical bandwidth, then it corresponds to an average or typical power usage of ~11W for the fan out.

PJVol · Nov 16, 2022

Kaluan said:
comparing them to 3rd party, data might be messy/misleading

Yes, that's why I put into consideration only their own numbers followed by simple math )

TESKATLIPOKA · Nov 17, 2022

Saylick said:
Igor's Lab had an article where they estimated the GPU power for the 3090 and they figured the die itself consumes roughly 230W, even though the total card is rated for 350W.

View attachment 71321

Igor mentions 60W for GDDR6x, yet he also wrote 2.5W/module and that would mean 2.5 * 12 = 30W.

I think memory controller is not included and that is probably ~10-20W more, so 70-80W in total.

As a continuation to my older post.Link
I still question If using HBM wouldn't be better than MCD+GDDR6.
It would consume less power than GDDR6.
It was estimated at 20W for 16GB HBM2 + 10W for controller or 30W in total. Link
With RDNA3 you also have to include Fanout to the power consumption, so I think ~80-90W power consumption is not unreasonable for MCD+GDDR6.
You would save 50-60W and that is a lot especially for mobile.

Cost for 8GB HBM2 was $175($150 memory + $25 interposer) vs $52-68($6.5-8.5 per module) for GDDR5 at the time. Link
A huge difference certainly, but this was in 2017.
In 2019 you could supposedly get 16GB HBM2(4 stacks) for $120 so $145 with interposer, If that didn't get cheaper. Link
Cost of GDDR6 in 2019 was $10.79-11.69 for 12-14gbps 1GB modules If you bought 2,000 units, that would put It at $173-187 for 16GB, but big customers have up to 40% discount so only $104-112. Link

Currently we already have 2GB modules, but at digikey the price is $26.22 per unit so $210 for 16GB or $105 for 16GB with 50% discount.Link
RX 7900XTX has 24GB Vram and that would mean $26.22*12 for Vram + 6 * $6.2 for MCD we are already at $352 just for the chips. If the discount is 40-50% then $157-189 Vram + $37.2 MCD for a total of $194-226.

For HBM you could either choose HBM2E with 4 stacks(32GB) for a total of 1843 GB/s.(+92% over N31)
Or HBM3 with 2 stacks(32GB) for a total 1639 GB/s.(+71% over N31)
That should be enough to feed N31.

My conclusion is that HBM would consume less power and cost less to make. If 2 stacks(32GB) of HBM3 cost more than what 4 stacks(16GB) cost in 2019 I would be surprised.

P.S. Does someone have a subscription to Techinsights? They do analysis for GPU's Bill of Materials.

edit: I wrongly calculated the cost of 24GB Vram, so I fixed It and It's bolded out.

AtenRa · Nov 17, 2022

I have the feeling RX7900XT (MSRP $899) will be 2-3% slower at 4K vs RTX 4080 (MSRP $1199) in raster performance.
And it could be 2-3% faster or at least equal at 1440p vs RTX4080.

eek2121 · Nov 17, 2022

uzzi38 said:
PS: N6 is actually ever so slightly more expensive for a given die area, but not by much. Like, cost/transistor is basically flat from N7.

Either way, your point is still correct, N33 should be cheaper to produce than N23.

Incorrect. Ask anyone in the industry. TSMC charges less for N6 because they can output more wafers per month thanks to decreased machine (thanks to EUV) time. That means they actually make more money, despite charging less.

Timorous · Nov 17, 2022

AtenRa said:
I have the feeling RX7900XT (MSRP $899) will be 2-3% slower at 4K vs RTX 4080 (MSRP $1199) in raster performance.
And it could be 2-3% faster or at least equal at 1440p vs RTX4080.

Taking the 30% scaling from a 6950XT in 4k raster and applying to TPU charts you get 99% with 4080 at 100% and in the techspot charts you get 111 FPS vs the 4080 111 fps.

So I think it will be trading blows in raster with the 4080.

moinmoin · Nov 17, 2022

TESKATLIPOKA said:
My conclusion is that HBM would consume less power and cost less to make. If 2 stacks(32GB) of HBM3 cost more than what 4 stacks(16GB) cost in 2019 I would be surpriced.

The big difference is that GDDR memory is being added by the card manufactures, so it's a cost manufacturing the card. HBM on the other hand adds to the cost of the GPU package AMD creates and has to sell to the manufacturers. As such the latter needs more investment and risk taking by AMD and as a result looks significantly more expensive than comparable GPU packages using standard GDDR memory.

(It's also the reason why only Apple adds all the memory on the same package as CPU/GPU on its A and M series chips: They don't need to sell the resulting package to anybody but themselves.)

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Golden Member

Diamond Member

Senior member

Senior member

Senior member

Diamond Member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member

Platinum Member

Lifer

Diamond Member

Golden Member

Diamond Member