Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 34 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,629
5,938
146

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
There's probably other shrinks elsewhere. Missing backwards compat for deprecated features like the old geo pipeline, more SIMD32 versus work dispatch, etc. Arch improvements so far seem more about fixing old problems like needing to decompress in certain stages; and the improved RT pipe that was patented a while ago used the texturing units, meaning any expansion there is probably minimal. Bridge is also pretty minimal afaik, could see the chip end up in the mid 300mm range.

With the price hike from TSMC and extra dies a single 128 simd32 compute chip alongside bus and etc. probably costs a bit more to produce than the 6700xt. Thus my guess of $500 for a full chip and $400 for a partially disabled/binned (non "xt") version. Of course if things go bad those prices could rise by $50-60 or so :(

Hopefully though it'll be a bit smaller, and that price increase won't happen. It'd be real nice to see 6800(non xt) or a bit better at $400.

Edit- really though I don't understand the rumored specs entirely. The most logical compute tile should be like, 96simds (48 RDNA2 compute units). The cost would be tiny, probably as low as a Navi 23 (6600) board for each compute die. You could pair it with 128bit/256bit/etc. bus and SRAM as you scale up to 1/2/3/4 compute dies. A single one would be theoretically 20% faster than a 6750xt at minimum, and more for raytracing. That should be the sweet spot for cost/performance right? Eg 12 teraflops/8gb for $329; 15tf/16gb $400; 24tf/16gb $700; 30tf/16gb $1000; 36 tf/12gb $1250; 44tf/24gb $1600; 58tf/16gb $2500.

Multi-chip designs are limited to the upper tiers for this gen. Think, $500+.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Looks like the rumors for N31 are corralling on the new theory that it's a single N5 GCD and all the Infinity Cache and memory controllers are on N6 MCDs (6 total).

Makes sense if we are realistic. Separating everything in one step probably is too much R&D effort and I would think "syncing" between compute dies is much more difficult than just feeding it data from memory/cache.

I would also imagine that with GPU tiles they would need a direct link to each other or else the transfer rate via IO die would be too much bandwidth and probably too much latency.
 
  • Like
Reactions: AAbattery

DisEnchantment

Golden Member
Mar 3, 2017
1,602
5,788
136
Looks like the rumors for N31 are corralling on the new theory that it's a single N5 GCD and all the Infinity Cache and memory controllers are on N6 MCDs (6 total). Still MCM architecture, probably uses fan out bridges, but not 3D stacked. Still, with all of the logic on N5 and all of the IO and cache on N6, it appears to be very cost optimized. The N5 GCD likely is only around 400mm2 knowing that N21 was 520mm2 and around half of that was shaders. 520mm2 / 2 * 2.4x shader count / 2x node shrink gives you around 315mm2. Add on the PHY for the fan out bridges and extra die space for better RT and architectural improvements, 400mm2 seems to be within the ballpark.

Edit: Just adding more flavor from various leakers:

- RedGamingTech:
View attachment 61442

- KittyYYuko:

Translation:


View attachment 61445

- Kopite7Kimi:
View attachment 61446

Ehhh .. Seems like everybody is just guessing, usual leakers seems lost. I have lost count how many times the usual leakers changed their mind.

Nevertheless, it does not seems to me that AMD will attempt only a Polaris or RDNA1 GPU again or "cost effective with decent performance"
They learnt the hard way that they need to aim for the ultimate high end with some SKUs.
Irrespective of whether folks can afford the high end or not, they can only be convinced to buy AMD's midrange if they see AMD's high end products are competitive. Mindshare is important when selling to the end user unlike B2B.
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
Ehhh .. Seems like everybody is just guessing, usual leakers seems lost. I have lost count how many times the usual leakers changed their mind.
Whats your theory on the double wavefront?

Just saw at twitter an interesting speculation: the gpu launches 2 wave32 at the same time, emulating as 1 wave64... this would allow the gpu to launch 1 wave64 per clock
 
  • Like
Reactions: Kaluan

DisEnchantment

Golden Member
Mar 3, 2017
1,602
5,788
136
Whats your theory on the double wavefront?
Nothing conclusive I could surmise, it could be one of many
1. Dual Vector Ops (VOP) packed in one instruction like static scheduled VLIW style (but not like packed math where only same type of ops are supported)
2. Dual SIMD32 resources to handle two VOPs at once , like you mentioned above (this will put a lot of strain on the register file)
3. Pipelined engine which can concurrently execute various stages of the instructions?? e.g. SIMD32 with more than 1 cycle? execute, gather, dispatch, etc, concurrently. This may be a bit of a stretch though.

#1 seems very likely.

Information is spread across many repositories, Kernel Driver, Mesa user space libs and LLVM compiler library. Don't know what is real and what is not. But wherever there is consensus across all these, it should be correct. In general LLVM has the best info, then Mesa and Kernel driver.
e.g. Mesa is indicating only 4 SEs available for GFX11, kernel driver indicates 6 and can support all the way to 8.
Also Kernel driver indicates no AV1 encode support but there are some machine headers indicating there is. But again Mesa is indicating no AV1 encode for now.
 
Last edited:

Justinus

Diamond Member
Oct 10, 2005
3,174
1,516
136
So no RDNA3 multi-chiplet solution after all it seems. Well, unless you count cache tiles on top as chiplets.

.... No? The new rumor is a single GCD chiplet with 6 MCD chiplets, which are PHY + stacked infinity cache. It's absolutely a multi chiplet solution, it's just not what everyone thought/expected.

This newly rumored layout also solves a lot of problems I was trying to figure out how they would approach with the previously rumored layout of 2 GCD + I/O MCD.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
.... No? The new rumor is a single GCD chiplet with 6 MCD chiplets, which are PHY + stacked infinity cache. It's absolutely a multi chiplet solution, it's just not what everyone thought/expected.

This newly rumored layout also solves a lot of problems I was trying to figure out how they would approach with the previously rumored layout of 2 GCD + I/O MCD.

Just found this was already posted few days ago on a previous page, and i am late to the party, so nevermind.

Even if its still technically multi-chiplet, the "expectation" was multiple compute chiplets, Ryzen style. Thats why it matters, because its the reason why it was expected potentially to handily beat Lovelace with its standard monolithic design. While that may still happen, who knows at this point, it will be for different reasons.
 

Justinus

Diamond Member
Oct 10, 2005
3,174
1,516
136
Just found this was already posted few days ago on a previous page, and i am late to the party, so nevermind.

Even if its still technically multi-chiplet, the "expectation" was multiple compute chiplets, Ryzen style. Thats why it matters, because its the reason why it was expected potentially to handily beat Lovelace with its standard monolithic design. While that may still happen, who knows at this point, it will be for different reasons.

By stripping off the infinity cache and PHY, they are saving a massive amount of die space on the GCD which is going to let them get way more dies fabbed per wafer and much better yields, even with the more than double rumored CU count. Increasing the memory bus to 384 bit and slapping more cache also isn't affecting the GCD die size at all, since it's all on the MCDs.

I think it's a great first gen design that is a good stepping stone to more complex graphics chiplet designs in the future.
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
By stripping off the infinity cache and PHY, they are saving a massive amount of die space on the GCD which is going to let them get way more dies fabbed per wafer and much better yields, even with the more than double rumored CU count. Increasing the memory bus to 384 bit and slapping more cache also isn't affecting the GCD die size at all, since it's all on the MCDs.

I think it's a great first gen design that is a good stepping stone to more complex graphics chiplet designs in the future.

If N31 is just getting one GCD, than 2.5x Navi21 performance claims can be safely dismissed.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,810
7,166
136
IMO if it brings an 800mm2 RDNA chip down to 500mm2 + 300mm2 of cheaper older silicon then it'll be a win for everyone.

AMD will be able to be much more aggressive on price and volume than it was able to be on the current gen. If they're neck and neck or even slightly behind NV in terms of Halo performance, but way head on $$$/FPS then all of us sitting back here on our 9xx/10xx series cards might have a not absurdly priced upgrade path.
 

Aapje

Golden Member
Mar 21, 2022
1,382
1,863
106
Even if its still technically multi-chiplet, the "expectation" was multiple compute chiplets, Ryzen style.

I've heard rumors about the IO + cache being separated a long time ago.

Increasing the memory bus to 384 bit and slapping more cache also isn't affecting the GCD die size at all, since it's all on the MCDs.

Indeed. It also makes it much easier to introduce variants, like a Navi 21 with 384 bit and one with 256 bit (and thus also allows for more memory configurations), by combining the same compute die with a different IO die.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
By stripping off the infinity cache and PHY, they are saving a massive amount of die space on the GCD which is going to let them get way more dies fabbed per wafer and much better yields, even with the more than double rumored CU count. Increasing the memory bus to 384 bit and slapping more cache also isn't affecting the GCD die size at all, since it's all on the MCDs.

I think it's a great first gen design that is a good stepping stone to more complex graphics chiplet designs in the future.

I am not saying its bad. And no doubt the way it has massive benefits over what was before.
But its still not what was expected (2 compute chiplets) and i presume, even with the additional space saved on cache not on the die, it still wont hit the expected / rumored performance, which was significantly more than Lovelace.
Not to mention the whole "multiple compute tiles" had certain ring to it, as it was finally supposed to solve the "SLI" scaling issue. And its not really happening.
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
It's still 3x the TF and I think most other structures are 2x+. Plus higher clocks.
Why? Surely shader count matters. 1 GCD tells us nothing about this.

We still don't know that yet. Even the shader counts, we don't know if that can be compared to 1:1 with Navi21.
Way the rumors are going, I expect leaked specs will go down to more reasonable numbers closer to launch. I am not convinced we will see top end cards having more than 2x performance of previous top end card, for both Nvidia and AMD.
 

Timorous

Golden Member
Oct 27, 2008
1,611
2,764
136
If 1 GCD is correct then what is product segmentation going to look like?

N31 with 384 bit bus and N33 with 128 bit bus seems like a large gap.

I am not sure AMD will want to serve both x900 and x800 parts with the same GCD because yields are not that bad and why sell good silicon at a lower price unless they can saturate the market with top tier SKUs.

Will be interesting to see what AMD decide to do.
 

Saylick

Diamond Member
Sep 10, 2012
3,154
6,368
136
If 1 GCD is correct then what is product segmentation going to look like?

N31 with 384 bit bus and N33 with 128 bit bus seems like a large gap.

I am not sure AMD will want to serve both x900 and x800 parts with the same GCD because yields are not that bad and why sell good silicon at a lower price unless they can saturate the market with top tier SKUs.

Will be interesting to see what AMD decide to do.
The GCD will change based on the product. So N31 gets a larger N5 GCD (with cutdown products), N32 gets a smaller N5 GCD (with cutdown products). N33 is completely N6 (with cutdown products). N31 and N32 use N6 MCDs which are the same between the two, but N31 simply uses more of them.

I previously thought that it was silly to have the MCDs also have the memory controller, but I guess it makes sense if you look at it from a scaling standpoint. AMD issues one N6 mask set for all MCDs, and they can slap on as many MCDs as it takes to scale up the memory performance. At least with a single GCD approach, there's no worry of having to resolve inter-GCD communication issues; it's just back to monolithic-esque performance but the difference is that the memory subsystem has been stripped out and put on a more economical node.

Basically, something like this:
1652808353665.png
 

JasonLD

Senior member
Aug 22, 2017
485
445
136
The 2.5x+ rumour was always TFLOPs. Which is still on the table even now.

Lot of people didn't take it that way for sure. Many will be disappointed when they see actual performance improvement being usual jump between 2 generations on new nodes. (around 70-80%)
 

Aapje

Golden Member
Mar 21, 2022
1,382
1,863
106
I am not sure AMD will want to serve both x900 and x800 parts with the same GCD because yields are not that bad and why sell good silicon at a lower price unless they can saturate the market with top tier SKUs.

In a more normal market the top tier will saturate and they need a range of tiers to capture various groups of customers. They can't just trust on the market staying as it is, so at the very least they they need to have the products, even if they may not want to supply as many of a certain tier if the demand greatly exceeds supply.

Historically, it's been quite common to downgrade perfectly good silicon, even just with software in the past, where you could upgrade by flashing a different bios or some other trick (although they've become quite good at preventing that, since people no longer buying the top tier does cost them money). It's not like selling a perfectly good Navi 21 chip as a 6800 XT loses them money. The profit margin is still very good, just not as good as for a 6900 XT. It's more expensive to make a specific chip for the 6800 XT, also because that then makes it impossible to 'secretly' salvage partially broken chips by selling them as a cheaper product.

Yields can also change over time, so the way it can work is that a lower tier starts off as mostly broken top tier chips and then as yields improve, more and more of those become perfectly good chips. Then the customer just sees the same 6800 XT cards on the shelf, but behind the scenes these actually change to more and more be perfectly good silicon, rather than salvage.

And when yields are good enough they can also introduce a better tier with binned chips or more fully unlocked chips (or both), like the 3090 Ti, which is binned.