Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

eek2121 · Sep 22, 2022

Timorous said:
So I was thinking about what we can glean from AMD official information re top N31.

All we have is the greater than 50% perf per watt increase but we don't know the baseline and we don't know the wattage so here are some potential numbers.

Baseline 6900XT (reference)
At 375W we have 1.5x perf / watt x 1.25x more watts which is 1.875x more performance than baseline. A direct match for the 4090 in raster performance and probably a low end estimate given AMD sandbagging on claims recently.

If AMD push to 450W and performance scaling is still decent then we see 1.5x X 1.5x for a 2.25x performance gain over the 6900XT which is in the middle of the rumoured gains.

If perf/watt is closer to 1.6x then at 375W we get 2x performance and at 450W we get 2.4x performance.

6950XT baseline (reference 335W) according to TPU this is 1.07x 6900XT and the 3090Ti is 1.1x ahead of the 6950XT. That puts the 4090 1.76x faster than the 6950XT.

A 375W N31 here with that 1.5x perf/watt would be 1.68x faster than the 6950XT and about 0.9x 4090 performance. 450W would be 2x faster than yhe 6950XT.

1.6x perf/watt would be 1.79x for 375W and 2.15x for 450W.

Also worth noting that in measured power draw the reference 6950XT is actually very frugal and is more efficient than the 6900XT.

Ultimately I think we have reasonable lower to upper bounds here and my estimate is that at worst top N31 will match the 4090 and at best it will be a performance tier ahead depending on final TDP and the exact perf/watt scaling.

Bottom line is I think AMD have levers to pull here and plenty of choices on how to approach the SKU list.

The 4090 is only 40-50% faster than the 3090. Given what we know so far about N31, 4090 performance numbers are an easy target.

What I suspect we will get is a very power efficient card that comes close to (or matches) the 4090 with a > 20% lower TGP. Next year AMD will drop an even faster card.

eek2121 · Sep 22, 2022

fleshconsumed said:
Nice, but it's just a $20-50 price cut for the 6800/6800XT. Doesn't bode well for upcoming 7000 series pricing.

I bet they will cost less than NVIDIA's offerings. The 4090 die size is almost twice the size of N31's GCD. While packaging will cost more, AMD's card will be much cheaper to produce thanks to the use of N6 for the MCDs. The only question is if AMD will pass the savings on to consumers.

EDIT: The 6900xt is $300 cheaper, not $50 cheaper btw. The 6800XT is $100 cheaper.

The 6900xt would be a steal if we weren't getting new GPUs soon.

fleshconsumed · Sep 22, 2022

eek2121 said:
I bet they will cost less than NVIDIA's offerings. The 4090 die size is almost twice the size of N31's GCD. While packaging will cost more, AMD's card will be much cheaper to produce thanks to the use of N6 for the MCDs. The only question is if AMD will pass the savings on to consumers.

EDIT: The 6900xt is $300 cheaper, not $50 cheaper btw. The 6800XT is $100 cheaper.

The 6900xt would be a steal if we weren't getting new GPUs soon.

My bad, I was off by $10. 6800 went from 579 to 549, 6800xt went from 649 to 599. That's a $30-50 cut for the most commonly purchased AMD cards.

Yes, 6900xt got a larger price cut, but it was always a poor value proposition, most gamers went for the sweet spot 6800xt/6800/6700xt, and those barely got any price cut.

I'm sure AMD 7000 will undercut nvidia as they typically do, but the question is by how much. Given very minor price cuts to the 6000 sweet spot cards there is a good chance AMD will follow suit and raise MSRP. If they keep MSRP the same I'll probably be first in line at microcenter trying to get one (although who am I kidding, there will probably be a huge line of people camping out overnight just like for 6800 release).

Stuka87 · Sep 22, 2022

This has all the before and after prices:

AMD Radeon RX 6000 "RDNA 2" GPUs Get Official Price Cuts Prior To Radeon RX 7000 "RDNA 3" Launch

AMD has dropped prices across its entire range of Radeon RX 6000 "RDNA 2'" lineup right before the launch of Radeon RX 7000 "RDNA 3" GPUs.

wccftech.com

SteveGrabowski · Sep 22, 2022

Stuka87 said:
This has all the before and after prices:

AMD Radeon RX 6000 "RDNA 2" GPUs Get Official Price Cuts Prior To Radeon RX 7000 "RDNA 3" Launch

AMD has dropped prices across its entire range of Radeon RX 6000 "RDNA 2'" lineup right before the launch of Radeon RX 7000 "RDNA 3" GPUs.

wccftech.com

View attachment 67953

Looks like a misprint, probably meant 6600 at $239 and not 6600 XT at $239. 6600 XT at $239 would be my buy price.

morihom5 · Sep 22, 2022

useful information

Paul98 · Sep 22, 2022

I assume this is a response to the NVidia launch, and in certain cases getting rid of inventory where it's needed before the RDNA3 launch. Rather than being reflective of where we will see RDNA3 pricing

I had already been looking forward to seeing RDNA3 as I am expecting something quite interesting, I am even more interested now with how disappointing the 40 series is.

Mopetar · Sep 22, 2022

NVidia launch is only the 4090 until November, so that doesn't do much to anything besides the 6900XT and up. If there's a lot of leftover Ampere stock, a price cut now might help them move more of their old cards now. The 6600/XT at $240 looks really good after the last few years. Even better if the 3050 is still hanging out at $300.

SteveGrabowski · Sep 23, 2022

Mopetar said:
NVidia launch is only the 4090 until November, so that doesn't do much to anything besides the 6900XT and up. If there's a lot of leftover Ampere stock, a price cut now might help them move more of their old cards now. The 6600/XT at $240 looks really good after the last few years. Even better if the 3050 is still hanging out at $300.

It's strange, 6600 XT is still $350 everywhere except one AsRock model that's $330. Two models of 6650 XT can be found for $300 though.

Joe NYC · Sep 23, 2022

Saylick said:
CoWoS-R doesn't have an silicon interposer, right? It uses a high pitch RDL layer right in the organic substrate itself, so it eliminates much of the cost of traditional 2.5D methods due to the lack of extra silicon, i.e. cheaper than a full interposer and cheaper than using embedded bridges. The bandwidth isn't going to be as high as using silicon due to the lower interconnect density but it's likely enough for the Infinity Cache to the GCD.

It does not have an interposer, but it has an interposer-like wafer on top of which the individual dies are placed, and then additional layers on the bottom can be added.

Then, the whole unit can be strengthened and made into something that resembles a larger chip, that is then placed on top of the organic substrate.

So, I think this method can likely have quite a high bandwidth, there will be some additional power needed to cross between the dies and the carrier wafer with RDL (vs. a monolithic die). Likely using micro bumps. But we will see if AMD somehow pulls some rabbit out of a hat on this connection...

I think this approach offers high bandwidth and can save layers of organic substrate - which (organic substrate) according to the latest AMD presentation is still a bottleneck (not wafers from TSMC anymore).

Stuka87 · Sep 23, 2022

SteveGrabowski said:
It's strange, 6600 XT is still $350 everywhere except one AsRock model that's $330. Two models of 6650 XT can be found for $300 though.

I think this is partly due to those being purchased by the reseller when prices were high, and them not wanting to lose money on them.

That or they have lost touch with reality and still think they can con people out of money

jpiniero · Sep 23, 2022

fleshconsumed said:
Yes, 6900xt got a larger price cut, but it was always a poor value proposition, most gamers went for the sweet spot 6800xt/6800/6700xt

That's why I am expecting AMD to 'correct' that with RDNA 3.

I suspect N31 is going to be much faster than the 4090 in raster and maybe even faster in RT at 4K (ignoring the frame projection). So I don't see why they should charge much less than the $1599 that the 4090 is for any N31. People are gonna be mad, but lets face it, unless it was $100 people are going to be mad regardless.

Kaluan · Sep 23, 2022

fleshconsumed said:
My bad, I was off by $10. 6800 went from 579 to 549, 6800xt went from 649 to 599. That's a $30-50 cut for the most commonly purchased AMD cards.

Yes, 6900xt got a larger price cut, but it was always a poor value proposition, most gamers went for the sweet spot 6800xt/6800/6700xt, and those barely got any price cut.

I'm sure AMD 7000 will undercut nvidia as they typically do, but the question is by how much. Given very minor price cuts to the 6000 sweet spot cards there is a good chance AMD will follow suit and raise MSRP. If they keep MSRP the same I'll probably be first in line at microcenter trying to get one (although who am I kidding, there will probably be a huge line of people camping out overnight just like for 6800 release).

Wacha mean? RX 6600-6700 are what the bulk of gamers get. And those got pretty sizeable cuts. And it's not like (if you're from the US at least 😡) we won't be seeing even lower prices in some places soon after (like how 5800X dropped to $300 but you can current find it for $260 or even less).

SteveGrabowski said:
It's strange, 6600 XT is still $350 everywhere except one AsRock model that's $330. Two models of 6650 XT can be found for $300 though.

That's likely because RX 6600 XT got silently phased out, 6650 XT permanently took it's place.

Still, it's a shame RX 6700 never became a official SKU. $300-330 MSRP would've been pretty good.

Interesting that they specify 4GB for the 6500 XT (and also not the RX 6400), official 8GB SKU finally coming? Sapphire's custom 8GB one still hasn't been reviewed yet sadly, but I expect a 8GB N24 to smooth out most of it's limitations.

BTW $700 RX 6900 XT already looks like a better (upper price range) deal than nVidia's $900 '4070 rebranded as a 4080'. Another big oof for nVidia lmao. Hope AMD continues this trend with RX 7000.

jpiniero · Sep 23, 2022

Kaluan said:
Sapphire's custom 8GB one still hasn't been reviewed yet sadly, but I expect a 8GB N24 to smooth out most of it's limitations.

That's because I suspect it was intended to be a mining card and a mining card only. I did see them on Newegg but I imagine they stopped production once mining slid and the # of units they did produce wasn't much.

DisEnchantment · Sep 23, 2022

DisEnchantment said:
Lel AMD devs leaking stuffs themselves inadvertently

[PATCH] drm/amdkfd: Match GC 11.0.1 cache info to yellow carp

Now I am wondering if the L0 and VGPR sizes are precise as mentioned by @Kepler_L2

https://twitter.com/x/status/1565627241937108992

Seems @Kepler_L2 was right on the money about VGPRs

C++:

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 512;
  if (!isGFX10Plus(*STI))
    return 256;
  bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32);
  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 1536 : 768;
  return IsWave32 ? 1024 : 512;
}

This code shows that VGPR --> 1536 * 32 * 4 = 192KiB (+50%) / 256 * 6 * 32 * 4 = 192KiB
Since the number of VGPRs per bank has not changed (i.e 256) this means full 6 bank VGPRs for a fullblown dual x32 ALUs in one SIMD.

And it also hint at 1-cycle wave64 mode because it seems they can now band two adjacent VGPR banks (see num VGPRs is halved when not in wave32) to form 3 banks of wave64 operands for full 1 cycle wave64.

C++:

unsigned getVGPRAllocGranule(const MCSubtargetInfo *STI,
                             Optional<bool> EnableWavefrontSize32) {
  if (STI->getFeatureBits().test(FeatureGFX90AInsts))
    return 8;

  bool IsWave32 = EnableWavefrontSize32 ?
      *EnableWavefrontSize32 :
      STI->getFeatureBits().test(FeatureWavefrontSize32);

  if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs))
    return IsWave32 ? 24 : 12;

Allocation granule is also 24 (2*2*6)

Looks like N31 and N32 will be compute monsters. As expected, 11.0.2 and 11.0.3 (N33) don't have this feature and they go the VOPD route.

⚙ D134522 [AMDGPU] Add GFX11 feature for subtargets with more VGPRs

reviews.llvm.org

Another unique thing of N31 is native fp16 ops, Vector Registers from 0-127 contains Lo and Hi 16 bit floats. In theory they can do 4x native fp16 ops (not matrix) per cycle per SIMD, This will be great of FSR kind of stuffs

Saylick · Sep 23, 2022

DisEnchantment said:
Seems @Kepler_L2 was right on the money about VGPRs

C++:

unsigned getTotalNumVGPRs(const MCSubtargetInfo *STI) { if (STI->getFeatureBits().test(FeatureGFX90AInsts)) return 512; if (!isGFX10Plus(*STI)) return 256; bool IsWave32 = STI->getFeatureBits().test(FeatureWavefrontSize32); if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs)) return IsWave32 ? 1536 : 768; return IsWave32 ? 1024 : 512; }

This code shows that VGPR --> 1536 * 32 * 4 = 192KiB (+50%)
Since the number of VGPRs per bank has not changed (i.e 256) this means full 6 bank VGPRs for a fullblown dual x32 ALUs in one SIMD.

And it also hint at 1-cycle wave64 mode because it seems they can now band two adjacent VGPR banks (see num VGPRs is halved when not in wave32) to form 3 banks of wave64 operands for full 1 cycle wave64.

C++:

unsigned getVGPRAllocGranule(const MCSubtargetInfo *STI, Optional<bool> EnableWavefrontSize32) { if (STI->getFeatureBits().test(FeatureGFX90AInsts)) return 8; bool IsWave32 = EnableWavefrontSize32 ? *EnableWavefrontSize32 : STI->getFeatureBits().test(FeatureWavefrontSize32); if (STI->getFeatureBits().test(FeatureGFX11FullVGPRs)) return IsWave32 ? 24 : 12;

Allocation granule is also 24 (2*2*6)

Looks like N31 and N32 will be compute monsters. As expected, 11.0.2 and 11.0.3 (N33) don't have this feature and they go the VOPD route.

⚙ D134522 [AMDGPU] Add GFX11 feature for subtargets with more VGPRs

reviews.llvm.org

Another unique thing of N31 is native fp16 ops, Vector Registers from 0-127 contains Lo and Hi 16 bit floats. In theory they can do 4x native fp16 ops (not matrix) per cycle per SIMD, This will be great of FSR kind of stuffs

Fully dual pumped FP32...

So 12288 shaders for N31 are actually a true 12288 shaders... That's a true 75 TFLOPS in terms of gaming performance, not some sort of Ampere-like dual pumping that results in only 1.33x increase in FPS when the TFLOPS are doubled.

TESKATLIPOKA · Sep 23, 2022

Saylick said:
Fully dual pumped FP32...
View attachment 67994

So 12288 shaders for N31 are actually a true 12288 shaders... That's a true 75 TFLOPS in terms of gaming performance, not some sort of Ampere-like dual pumping that results in only 1.33x increase in FPS when the TFLOPS are doubled.

That's nice and all, but N33 or Phoenix won't have It. I am disappointed.

Saylick · Sep 23, 2022

TESKATLIPOKA said:
That's nice and all, but N33 or Phoenix won't have It. I am disappointed.

Yeah, so some IPC loss there on a WGP basis if both SIMD banks cannot be fully fed, but not as bad as Ampere due to VOPD instructions being able to extract some ILP.

DisEnchantment · Sep 23, 2022

TESKATLIPOKA said:
That's nice and all, but N33 or Phoenix won't have It. I am disappointed.

I suppose Phoenix already constricted by Memory BW so it would make not much of a difference. Rather it would seem wiser to not add it and save some Transistors and Power

Saylick · Sep 23, 2022

DisEnchantment said:
I suppose Phoenix already constricted by Memory BW so it would make not much of a difference. Rather it would seem wiser to not add it and save some Transistors and Power

At least not until the move onto Strix Point, which presumably adds the 2 banks back in via RDNA3+.

GodisanAtheist · Sep 23, 2022

Words because of forum rules

Glo. · Sep 23, 2022

So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?

DiogoDX · Sep 23, 2022

Glo. said:
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?

I think with all this shader power maybe the rumored 384bits and 96mb cache are not enough. Scalling should be better than Ampere but how much?

Looking on the Nvidia performance slides AD102 with >2X Tflops is about ~50-60% faster than A102 in native 4K even with the huge increase on the L2 cache but the same 384bits memory.

Saylick · Sep 23, 2022

Glo. said:
So, solely based on the sheer throughput increase we should see not only the higher utilization, but higher throughput of those ALUs/Shaders.

2.5 Times more shaders, 30-50% higher clock speeds(?), 50% higher memory bandwidth.

I think it(full fat N31) will be faster than 4090. But how much more performance is there?

The memory bandwidth question, i.e. can RDNA 3 be fully fed, is the million dollar question (2 million dollars now, adjusted for inflation).

Goal = 3x more effective bandwidth over N21

We have, thus far....
- 50% wider memory bus
- 11% higher memory clocks
- Higher bandwidth on the Infinity Cache
- Better caching algorithms so that the Infinity Cache is better utilized
- End-to-end data compression

Might be an effective 2x at the end of the day, hence why we should only expect a doubling over N21.

biostud · Sep 23, 2022

Saylick said:
The memory bandwidth question, i.e. can RDNA 3 be fully fed, is the million dollar question (2 million dollars now, adjusted for inflation).

Goal = 3x more effective bandwidth over N21

We have, thus far....
- 50% wider memory bus
- 11% higher memory clocks
- Higher bandwidth on the Infinity Cache
- Better caching algorithms so that the Infinity Cache is better utilized
- End-to-end data compression

Might be an effective 2x at the end of the day, hence why we should only expect a doubling over N21.

And maybe a 3D cache version with the double amount of cache? Or is that rumor buried again?

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Junior Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Senior member

Lifer

Golden Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Lifer