Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 33 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,632
5,959
146

Frenetic Pony

Senior member
May 1, 2012
218
179
116
I still insist that both N33 will be more than the 6750 XT's MSRP. Maybe if they keep the size down it won't be that much more.

That's the benefit of a process shrink and chiplets (probably just 1 from SRAM bu still) though. Shrink the equivalent die by about 80%, cut another 25% off of that by making a cheaper SRAM chip for the big cache thing and you get about double the chips per wafer versus 67XX. Even with packaging costs BOM should be similar between.

Guess it'll all depend on how good Alchemist is and how available, same with Ada, and same with zombie crypto coming back though.

Also the new RDNA3 arrangement fits in with the lowest (current) end chip just kinda looking like a 6nm shrink of 66XX. Wonder if they'll change the arch to RDNA3 or just shrink the current chip and call it good.
 

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
I still insist that both N33 will be more than the 6750 XT's MSRP. Maybe if they keep the size down it won't be that much more.

On the one hand die size is going to be similar ish between N33 and N23 N22* because 4k shaders + 128 bit bus + 8 (assumed) PCIE 5 lanes + 128MB IC on N6 is probably similar in size to 2.5k shaders + 192 bit bus + 16 PCIE 4 lanes + 96MB IC on N7. N33 may even be a bit larger.

OTOH 8GB of GDDR6 vs 12GB does reduce the memory cost, lower TBP means the PCB and cooling can be a bit cheaper as well and N6 is cheaper than N7 too so overall BOM cost is probably about even at worst and at best is an advantage for the N33 based part. Given that the MSRP of the 6700XT was $480 a $450 7600XT is not out of the question with similar or better margin.

If performance is faster than a 6900XT at 1080p and similar at 1440p (outside of the cases where 8GB is limiting) then $450 would not be terrible value even if it is expensive for a 600 series card historically. I see no way AMD can sell N33 as the 7700XT with just 8GB of VRAM. Performance wise it is there but since 700 class is usually targeting 1440p or entry level 4K 8GB just will not cut it for the upcoming gen of GPUs so that won't fly. Giving it 16GB of VRAM is an option but that adds expense and it reduces max volume so that looks doubtful to me as well.

*Thank you uzzi for point that out.
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,632
5,959
146
On the one hand die size is going to be similar ish between N33 and N23 because 4k shaders + 128 bit bus + 8 (assumed) PCIE 5 lanes + 128MB IC on N6 is probably similar in size to 2.5k shaders + 192 bit bus + 16 PCIE 4 lanes + 96MB IC on N7. N33 may even be a bit larger.

OTOH 8GB of GDDR6 vs 12GB does reduce the memory cost, lower TBP means the PCB and cooling can be a bit cheaper as well and N6 is cheaper than N7 too so overall BOM cost is probably about even at worst and at best is an advantage for the N33 based part. Given that the MSRP of the 6700XT was $480 a $450 7600XT is not out of the question with similar or better margin.

If performance is faster than a 6900XT at 1080p and similar at 1440p (outside of the cases where 8GB is limiting) then $450 would not be terrible value even if it is expensive for a 600 series card historically. I see no way AMD can sell N33 as the 7700XT with just 8GB of VRAM. Performance wise it is there but since 700 class is usually targeting 1440p or entry level 4K 8GB just will not cut it for the upcoming gen of GPUs so that won't fly. Giving it 16GB of VRAM is an option but that adds expense and it reduces max volume so that looks doubtful to me as well.
*N22

But yeah, I'm expecting N33 to be in the 350-400mm^2 range. A little ahead of N22 on die area, but also lower memory and board costs.
 

Ajay

Lifer
Jan 8, 2001
15,454
7,862
136
OTOH 8GB of GDDR6 vs 12GB does reduce the memory cost, lower TBP means the PCB and cooling can be a bit cheaper as well and N6 is cheaper than N7 too so overall BOM cost is probably about even at worst and at best is an advantage for the N33 based part. Given that the MSRP of the 6700XT was $480 a $450 7600XT is not out of the question with similar or better margin.

I seem to recall that AMD is looking to boost it’s gross margins. Assuming that includes GPUs, prices will be going up. Hopefully, the increases will tend towards the incremental (< =10%).
 

Timorous

Golden Member
Oct 27, 2008
1,615
2,772
136
I seem to recall that AMD is looking to boost it’s gross margins. Assuming that includes GPUs, prices will be going up. Hopefully, the increases will tend towards the incremental (< =10%).

Make N33 attractively priced on N6 then just don't make many N31 and N32 parts and use most N5 capacity for Zen4 CCDs to use in Genoa / Ryzen etc. That might be a way to boost GMs while still providing enough stock of a part that will probably sell like hot cakes.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
OTOH 8GB of GDDR6 vs 12GB does reduce the memory cost, lower TBP means the PCB and cooling can be a bit cheaper as well and N6 is cheaper than N7 too so overall BOM cost is probably about even at worst and at best is an advantage for the N33 based part. Given that the MSRP of the 6700XT was $480 a $450 7600XT is not out of the question with similar or better margin.

There are shortages of everything. Prices of everything have gone up (raw materials, components, labor). The 7600XT is going to be a $500+ card. There is nothing to suggest a price cut over the current card.
 

Bigos

Member
Jun 2, 2019
129
287
136
gfx11 (a.k.a. RDNA3) support in mesa (mostly OpenGL stuff with some VCN as well; the compiler backend lives in LLVM though).


Not sure if there was any doubt, but AV1 decode seems to be supported at the very least (could not find any info about encode though).

 
Last edited:

Saylick

Diamond Member
Sep 10, 2012
3,162
6,385
136
Perhaps someone can explain better than me, but it appears that previous RDNA1 and RDNA2 had to decompress data at certain stages of memory subsystem before it could be used, but it sounds like in RDNA3 the data does not need to be decompressed at all regardless of where it exists, hence @Kepler_L2 's comment about "end-to-end DCC".

Edit: To help others better understand what's going on here, here's AMD's own article on Delta Color Compression (DCC). Looks like RDNA3 completely does away with needing to decompress as all the structures are rebuilt to work on compressed data, which helps AMD get away with a smaller memory bus.

1651701072716.png

1651701217609.png
 
Last edited:

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136

Just watch it yoursleves guys.

At this point I don't even know what to say.
Just watch it yourselves..... dude it's supposed to be a written forum and not a freaking videoconference. I can read through 2 pages of interesting arguments here instead of listening to Paul (whom I otherwise quite like) trying and failing to finish a sentence under 5 minutes.

Nothing personal, you're not the only one doing this, but my cup has run over just now 😂
 

Bigos

Member
Jun 2, 2019
129
287
136
Anyone here can help explain what might be going on here?

Exactly what the comments say.

NGG is the new hardware shader stage introduced in RDNA that replaces the post-tessellation pre-rasterization stages (vertex and geometry). It also directly implements mesh shaders. RDNA1/2 supports both the legacy and the new stage but RDNA3 removes the former. This means that AMD had to make NGG work good with all of the vertex processing features, like geometry shading and transform feedback (stream-out in DX nomenclature). Though notable is that transform feedback is currently unimplemented for gfx11 in mesa (they probably left it for later).

More details here: https://gitlab.freedesktop.org/mesa...h-software-stage-runs-on-which-hardware-stage

Another notable change is more versatile DCC support. Now all hardware blocks understand DCC so decompression is usually not required (though not always, imagine Intel iGPU + AMD dGPU situation where display is driven by the iGPU - you need to decompress the frame before sending it to display since Intel does not understand DCC).
 

DisEnchantment

Golden Member
Mar 3, 2017
1,605
5,795
136
Quite interesting commit, advanced PSR

For additional power savings, PSR SU (also referred to as PSR2) can be
enabled on eDP panels with PSR SU support.

PSR2 saves more power compared to PSR1 by allowing more opportunities
for the display hardware to be shut down. In comparison to PSR1, Shut
down can now occur in-between frames, as well as in display regions
where there is no visible update. In otherwords, it allows for some
display hw components to be enabled only for a **selectively updated**
region of the visible display. Hence PSR SU.
 
  • Like
Reactions: Elfear and moinmoin

Mopetar

Diamond Member
Jan 31, 2011
7,837
5,992
136
I can read through 2 pages of interesting arguments here instead of listening to Paul (whom I otherwise quite like) trying and failing to finish a sentence under 5 minutes.

Paul is really like the kid who realized that he needed a 10 page paper and is desperately trying to stretch the 4 pages he has to fit.

It's okay though because he has a soothing voice, so I can always put one of his videos on to help me drift off to sleep before it's finished. If it's important information someone here will summarize it the next day.
 
  • Like
Reactions: lobz

Saylick

Diamond Member
Sep 10, 2012
3,162
6,385
136
Paul is really like the kid who realized that he needed a 10 page paper and is desperately trying to stretch the 4 pages he has to fit.

It's okay though because he has a soothing voice, so I can always put one of his videos on to help me drift off to sleep before it's finished. If it's important information someone here will summarize it the next day.
lol, especially when he is compelled to always say "Nice :cool:" whenever he mentions the 6900XT.
 

Aapje

Golden Member
Mar 21, 2022
1,382
1,864
106
Just watch it yourselves..... dude it's supposed to be a written forum and not a freaking videoconference. I can read through 2 pages of interesting arguments here instead of listening to Paul (whom I otherwise quite like) trying and failing to finish a sentence under 5 minutes.

Thank God for video speedup.
 
  • Haha
Reactions: lobz

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Is that a tacit affirmation of the doubled shader logic per WGP?
Nope, a wavefront will use as many shaders it can...

I can see 2 uses of it:
1- Similar to hypertreading in CPUs, a way to use all resources avaible when possible. The drivers would need to be aware of it, will have diminushing returns If devs fine tune for it;

2- to Double the shaders performance, Remember Fermi? This time not inceasing clocks, but alot of silicon... AMD would need to double the resources to feed them. Registry pressure would be insane
 

Saylick

Diamond Member
Sep 10, 2012
3,162
6,385
136
Looks like the rumors for N31 are corralling on the new theory that it's a single N5 GCD and all the Infinity Cache and memory controllers are on N6 MCDs (6 total). Still MCM architecture, probably uses fan out bridges, but not 3D stacked. Still, with all of the logic on N5 and all of the IO and cache on N6, it appears to be very cost optimized. The N5 GCD likely is only around 400mm2 knowing that N21 was 520mm2 and around half of that was shaders. 520mm2 / 2 * 2.4x shader count / 2x node shrink gives you around 315mm2. Add on the PHY for the fan out bridges and extra die space for better RT and architectural improvements, 400mm2 seems to be within the ballpark.

Edit: Just adding more flavor from various leakers:

- RedGamingTech:
1652484755200.png

- KittyYYuko:

Translation:
Many people see the Navi 31's performance goal as three times the Navi 21's, but that's a mistake, and some say it's four times the Navi 21's.
I would like to reiterate that Navi31 is always designed to achieve 6 times the performance of Navi10.

1652484925072.png

- Kopite7Kimi:
1652485057208.png
 
Last edited:

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Looks like the rumors for N31 are corralling on the new theory that it's a single N5 GCD and all the Infinity Cache and memory controllers are on N6 MCDs (6 total). Still MCM architecture, probably uses fan out bridges, but not 3D stacked. Still, with all of the logic on N5 and all of the IO and cache on N6, it appears to be very cost optimized. The N5 GCD likely is only around 400mm2 knowing that N21 was 520mm2 and around half of that was shaders. 520mm2 / 2 * 2.4x shader count / 2x node shrink gives you around 315mm2. Add on the PHY for the fan out bridges and extra die space for better RT and architectural improvements, 400mm2 seems to be within the ballpark.

There's probably other shrinks elsewhere. Missing backwards compat for deprecated features like the old geo pipeline, more SIMD32 versus work dispatch, etc. Arch improvements so far seem more about fixing old problems like needing to decompress in certain stages; and the improved RT pipe that was patented a while ago used the texturing units, meaning any expansion there is probably minimal. Bridge is also pretty minimal afaik, could see the chip end up in the mid 300mm range.

With the price hike from TSMC and extra dies a single 128 simd32 compute chip alongside bus and etc. probably costs a bit more to produce than the 6700xt. Thus my guess of $500 for a full chip and $400 for a partially disabled/binned (non "xt") version. Of course if things go bad those prices could rise by $50-60 or so :(

Hopefully though it'll be a bit smaller, and that price increase won't happen. It'd be real nice to see 6800(non xt) or a bit better at $400.

Edit- really though I don't understand the rumored specs entirely. The most logical compute tile should be like, 96simds (48 RDNA2 compute units). The cost would be tiny, probably as low as a Navi 23 (6600) board for each compute die. You could pair it with 128bit/256bit/etc. bus and SRAM as you scale up to 1/2/3/4 compute dies. A single one would be theoretically 20% faster than a 6750xt at minimum, and more for raytracing. That should be the sweet spot for cost/performance right? Eg 12 teraflops/8gb for $329; 15tf/16gb $400; 24tf/16gb $700; 30tf/16gb $1000; 36 tf/12gb $1250; 44tf/24gb $1600; 58tf/16gb $2500.
 
Last edited:

Aapje

Golden Member
Mar 21, 2022
1,382
1,864
106
Yeah, RDNA3 seems to be heavily optimized for cost across the range, which also makes sense if it is to be the basis for console refreshes like the PS5 pro. In general, AMD may create an extremely compelling product with relatively low power consumption, relatively low cost and excellent performance.