Discussion RDNA4 + CDNA3 Architectures Thread

Page 29 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
Mar 3, 2017
1,567
5,553
136
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:

Ajay

Lifer
Jan 8, 2001
14,817
7,433
136
It looks like for RDNA3 a lot higher clock was intended(3-3.4GHz)TweakTown, so It wouldn't be surprising If RDNA4 aimed even higher.
89532_05_amd-radeon-rx-7000-rdna-3-infinity-links-9-2gb-10x-higher-than-ryzen-epyc_full.jpg


If the architecture is designed to allow those clocks, then power consumption doesn't need to be murderous. Ok, in the case of that cancelled chiplet GPU N4X the power consumption would be very high, but also the performance.

@adroc_thurston said the size is 200-250mm2. Don't know If It's true or not.
Well, yeah - RDNA4 was supposed to clock a lot higher, but it couldn't because the implementation used too much power. So, are AMD able to fix it - and by how much - remains to be seen.
It is nice to have one data point for the potential die size for the 'top' RDNA4 GPU. Seems like good odds that it will match the NV 4070Ti. If things go really well, maybe a bit higher. Not awesome, but not totally useless, if the price is right.
 

TESKATLIPOKA

Golden Member
May 1, 2020
1,997
2,393
106
Well, yeah - RDNA3 was supposed to clock a lot higher, but it couldn't because the implementation used too much power. So, are AMD able to fix it - and by how much - remains to be seen.
It is nice to have one data point for the potential die size for the 'top' RDNA4 GPU. Seems like good odds that it will match the NV 4070Ti. If things go really well, maybe a bit higher. Not awesome, but not totally useless, if the price is right.
Fixed It for you.
I think they should have found out the problem by now. I think RDNA3.5 will show us the truth.

Even 4070Ti level of performance wouldn't be bad, If price will be set to more sane levels.
On the other hand, we don't know what Nvidia will show us next year. Best for AMD would be If Nvidia released the next gen in 2025.

I just wonder what specs can be put in a 200-250mm2 monolithic GPU.
 
  • Like
Reactions: Tlh97 and Joe NYC

maddie

Diamond Member
Jul 18, 2010
4,661
4,498
136
It looks like for RDNA3 a lot higher clock was intended(3-3.4GHz)TweakTown, so It wouldn't be surprising If RDNA4 aimed even higher.
89532_05_amd-radeon-rx-7000-rdna-3-infinity-links-9-2gb-10x-higher-than-ryzen-epyc_full.jpg


If the architecture is designed to allow those clocks, then power consumption doesn't need to be murderous. Ok, in the case of that cancelled chiplet GPU N4X the power consumption would be very high, but also the performance.

@adroc_thurston said the size is 200-250mm2. Don't know If It's true or not.
How could that slide be produced for the launch and they had no clue up to that date? Sounds crazy.
 

Mopetar

Diamond Member
Jan 31, 2011
7,738
5,793
136
Yeah. AMD needs to improve dual-issue. It brings very little to performance. Nvidia's approach brought them ~25% of performance, It helped a lot in fighting off RDNA2's higher clocks.

NVidia already sort of had dual-issue prior to Ampere. However they could only issue an FP/INT pair instead of two floating point instructions. I think they had a better grasp on how to get the drivers to leverage the hardware changes they made in Ampere to allow for the added theoretical performance.

I don't know where AMD is at in their progression towards the same goal, but it's pretty clear that they aren't able to extract much performance right now. If they do get it ironed out hopefully it gives RDNA3 users a nice performance bump down the road. The architecture could use a little fine wine.
 

Saylick

Platinum Member
Sep 10, 2012
2,861
5,649
136
NVidia already sort of had dual-issue prior to Ampere. However they could only issue an FP/INT pair instead of two floating point instructions. I think they had a better grasp on how to get the drivers to leverage the hardware changes they made in Ampere to allow for the added theoretical performance.

I don't know where AMD is at in their progression towards the same goal, but it's pretty clear that they aren't able to extract much performance right now. If they do get it ironed out hopefully it gives RDNA3 users a nice performance bump down the road. The architecture could use a little fine wine.
Nvidia is a software-first company, so it's natural that the drivers and software optimizations needed to fully leverage the dual-issue in Ampere and beyond were already ready to go when Ampere launched... On the other hand, AMD is a hardware-first company, so we end up with this scenario where RDNA 3 pretty much has minimal IPC gains over RDNA 2, even with 50% more registers. I suspect a lot of software isn't taking advantage of that execution path. I hope by RDNA 4 they do something about it.
 
  • Like
Reactions: Tlh97 and soresu

Ajay

Lifer
Jan 8, 2001
14,817
7,433
136
NVidia already sort of had dual-issue prior to Ampere. However they could only issue an FP/INT pair instead of two floating point instructions. I think they had a better grasp on how to get the drivers to leverage the hardware changes they made in Ampere to allow for the added theoretical performance.
Uh, that was already working correctly in Turing. Not sure where you got this from.
 

TESKATLIPOKA

Golden Member
May 1, 2020
1,997
2,393
106
NVidia already sort of had dual-issue prior to Ampere. However they could only issue an FP/INT pair instead of two floating point instructions. I think they had a better grasp on how to get the drivers to leverage the hardware changes they made in Ampere to allow for the added theoretical performance.

I don't know where AMD is at in their progression towards the same goal, but it's pretty clear that they aren't able to extract much performance right now. If they do get it ironed out hopefully it gives RDNA3 users a nice performance bump down the road. The architecture could use a little fine wine.
Isn't this also true about RDNA?
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Isn't this also true about RDNA?
RDNA3 can issue either 1 64 wave instruction or 2 32 wave ones, but it's very dependent on instruction type, so it's rather different.

It is a great idea, if implemented properly. What Nvidia does do is re-partition what is it, L1/Shared memory dynamically? Regardless, dynamic instruction wave size and dynamic memory partitioning are both good targets to move towards to get towards better utilization. Part of Intel's excellent raytracing performance is they can process native 16wave instructions, and as RT is friendlier to branching/small waves they can complete/retire waves earlier. AMD moving towards the same goal, with say a 16/32/64 wave arch that can split shared/l1 memory seems the logical path forward.
 

TESKATLIPOKA

Golden Member
May 1, 2020
1,997
2,393
106
No, RDNA is 1*SIMD32 which chews 32 int or FP components at once.
I see my mistake.
Turing SM could do 64x FP32 and 64x INT32 calculations per circle.
Ampere(Ada) SM could do 128x FP32 calculations or 64x FP32 and 64x INT32 per circle.
RDNA2 CU could do 64x FP32 or 64x INT32 calculations per circle or 32x FP32 and 32x INT32 calculations per circle because It has 2 SIMD32 units.

RDNA3 CU looks like It still has 2 SIMD32 units, but one SIMD32 is now capable of 64x FP32 calculations in some rare cases. As for performance gain, it does very little.

Not sure why they didn't add a third SIMD32 unit instead, maybe 2 of the 3 simplifying to calculate FP32 only saving some space.
 
Last edited:
  • Like
Reactions: Tlh97

eek2121

Platinum Member
Aug 2, 2005
2,756
3,680
136
I can see RDNA4 being 40 and 60 CU's each. Based on comments made today by AMD, I wouldn't expect any revolution in efficiency, mostly improvements in raytracing and maybe a minor bump in clockspeed thanks to tweaks in scheduler/node/cache (2.8ghz boost, 3?).
RDNA3 can hit 3ghz today, just not for gaming workloads. There are users that have been able to run compute workloads at 3.2-3.5 ghz.

I imagine we will see a healthy total performance uplift between RDNA3 and RDNA4. I think they will be higher on the stack than most people think, despite halo parts getting canned. I think we will see a Radeon 8700XT at the very least. If AMD fixes the various issues with erm, dual-issue, fixes some other annoyances, and manages to get to a 3ghz clock you are looking at a healthy 30% (20% from clocks, 10% from IPC) total uplift.

If they fail at one or the other, it won’t be the end of the world. I suspect execution will be quite different next time around. They know they received a ton of bad press for RDNA3 and will want to make up for it.

Total speculation on my part, of course.

I just hope AMD gets serious about ray tracing. Hardware RT has a ton of potential, but we are still in the early days.
 
  • Like
Reactions: Tlh97

Mopetar

Diamond Member
Jan 31, 2011
7,738
5,793
136
Uh, that was already working correctly in Turing. Not sure where you got this from.

Maybe it was Turing that introduced it for the first time. I didn't go back to check. I do recall reading about it either on AT or one of the other sites that covered the architectural changes NVidia had made.
 

Ajay

Lifer
Jan 8, 2001
14,817
7,433
136
Maybe it was Turing that introduced it for the first time. I didn't go back to check. I do recall reading about it either on AT or one of the other sites that covered the architectural changes NVidia had made.
My bad, it was Ampere. My memory failed me and a dev on the Nvidia professional forums wrote something misleading. Apologies.
 
  • Like
Reactions: Mopetar

Mopetar

Diamond Member
Jan 31, 2011
7,738
5,793
136
My bad, it was Ampere. My memory failed me and a dev on the Nvidia professional forums wrote something misleading. Apologies.

Eh, no big deal. The point was more about how NVidia had been building towards the technology for a while and AMD is hardly a stranger to introducing hardware that the drivers/software can't utilize.
 

Ajay

Lifer
Jan 8, 2001
14,817
7,433
136
Eh, no big deal. The point was more about how NVidia had been building towards the technology for a while and AMD is hardly a stranger to introducing hardware that the drivers/software can't utilize.
It's funny, John Carmack said something like that about ATi 15-20 years ago. ATi had better hardware specs, but NV had higher framerates.
 

TESKATLIPOKA

Golden Member
May 1, 2020
1,997
2,393
106
It's funny, John Carmack said something like that about ATi 15-20 years ago. ATi had better hardware specs, but NV had higher framerates.
Didn't you hear about AMD fine wine?
It's not that AMD GPUs get better over time, It's just that It takes so much time for AMD to unlock their real capabilities.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
6,252
6,261
136
Didn't you hear about AMD fine wine?
It's not that AMD GPUs get better over time, It's just that It takes so much time for AMD to unlock their real capabilities.

-IMO the best time to buy an AMD GPU is at the end of a generational cycle at deep discount.

I've done this a number of times and literally never have I been disappointed.