• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Question Speculation: RDNA2 + CDNA Architectures thread

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,655
146
All die sizes are within 5mm^2. The poster here has been right on some things in the past afaik, and to his credit was the first to saying 505mm^2 for Navi21, which other people have backed up. Even still though, take the following with a pich of salt.

Navi21 - 505mm^2

Navi22 - 340mm^2

Navi23 - 240mm^2

Source is the following post: https://www.ptt.cc/bbs/PC_Shopping/M.1588075782.A.C1E.html
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
Going forward we might see RDNA dividing into a 'legacy' branch as it were for console compatibility, and a more CDNA inspired branch as and when such features may become beneficial with increased use if nVidia decides to push them.
Cannot is a big and very final word in such a changeable industry.
You are suggesting RDNA could probably be legacy if not for console. On the other hand you are suggesting CDNA which has no way of running a typical graphics pipeline could possibly be an inspiration of future AMD graphics development. CDNA does not even have rasterization HW besides others. Words of David Wang not mine.
CDNA is tailored for DPFP, networked load sharing, increased RAS, Matrix FMA, mixed precision ops and whatever. There are so less players in high performance graphics because it is not all about scheduling compute kernels.

I don't suppose anyone in their right mind would think that RDNA would not evolve to handle not just ML workloads but all compute loads in general. Just Navi1X itself evolved, in fact Navi12 has all the DL instructions supported by MI60 which Navi10 does not have.
One of the jobs of the ACE is to be able to bypass the Command processor and schedule compute shaders during different stages of a graphics pipeline and it is good at it. For doing additional things besides graphics, like FEM, Material, physics, TrueAudio and the like.
I would imagine the ACE in RDNA will continue to evolve to handle these loads besides others
I have my doubts CDNA would be the base of any future graphics at AMD, but we shall see.
 

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
I have my doubts CDNA would be the base of any future graphics at AMD, but we shall see.
I didn't mean CDNA as a literal base, only a base of inspiration and point of taking pre implemented, proven ideas for compute and transplanting them to RDNA.

Sort of like how Mozilla's Servo is acting as a proving ground for new pieces of code being integrated into Firefox over time.

Obviously CDNA is far more than just a compute uArch proving ground for AMD, but it's certainly not a giant leap of imagination to have tech implemented there eventually coming downstream to the more change shy RDNA that has that attachment to previous generations of console game code.

Also I would not take anything David Wang says as completely gospel to AMD's plans in any given area.

Not that I'm implying he isn't aware of said plans - as head of GPU development he obviously is, simply that he is less like Koduri in the manner of shooting his mouth off in public far too early before a tech is ready for market (ie GPU chiplets).

I've only seen a little of David Wang so far, but even from that he seems like a very cool and understated customer by comparison to Koduri's public appearances under AMD.

If Raja Koduri was a hype machine, David Wang would be an anti hype machine, he was completely down playing GPU chiplets when asked about them - which is exactly what he should be doing unless they are literally weeks to a few months from a product release.

Edit: On the subject of DP FLOPS, Unigine actually has a use for them believe it or not.

Obviously Unigine is a drop in the bucket compared to the oceans encompassed by Unreal and Unity projects, but Unigine do tend to be front runners for certain gfx engine tech, so I would not put the use of DP FLOPS in game engines entirely in the realms of insanity yet.
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
5,157
5,545
136
Cannot is a big and very final word in such a changeable industry.

Just because it looks that way now does not mean that it will always be that way.

Future RDNA need not take all of CDNA, it may just take pieces ala the ML optimised tensor silicon in smaller doses.

ML compute is the next big thing - quite possibly even bigger than fixed RT acceleration, and while it is not there in real time graphics yet, there are already indications of it becoming important in the future, even if just to reduce the considerable load that RT brings to graphics.

Something that is even more important in the mobile arena - where power consumption is everything, and regular RDNA CU's will not be as efficient as what they are cooking up for CDNA tensor silicon.

People are so distracted by the more obvious denoising/DLSS angle that nVidia are taking with ML - to the point that they don't realise there is further potential for ML to become a far more important part of graphics and gaming compute in the future.

This extends to areas like fluid procedurally generated animation, NPC AI, and player avatar generation (ie accurate personal photo based avatars, not the cludgy parameter based avatars provided by the game developer).
People are so distracted by the more obvious denoising/DLSS angle that nVidia are taking with ML - to the point that they don't realise there is further potential for ML to become a far more important part of graphics and gaming compute in the future.

This extends to areas like fluid procedurally generated animation, NPC AI, and player avatar generation (ie accurate personal photo based avatars, not the cludgy parameter based avatars provided by the game developer).


This is something I've wondered about. Is this a deterministic operation with exactly repeatable output, or will we get approximations to an outcome?
 
  • Like
Reactions: Tlh97 and soresu

Glo.

Diamond Member
Apr 25, 2015
5,930
4,991
136
People are so distracted by the more obvious denoising/DLSS angle that nVidia are taking with ML - to the point that they don't realise there is further potential for ML to become a far more important part of graphics and gaming compute in the future.

This extends to areas like fluid procedurally generated animation, NPC AI, and player avatar generation (ie accurate personal photo based avatars, not the cludgy parameter based avatars provided by the game developer).


This is something I've wondered about. Is this a deterministic operation with exactly repeatable output, or will we get approximations to an outcome?
Posts like this could spawn very interesting discussion/speculation.

So please - elaborate guys. Im not technical enough to discuss this with you, but I would be glad to read discussion on the topic :).
 
  • Like
Reactions: Mopetar and psolord

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
Huh, do we know if any Navi2X chips utilise these instructions yet?
Sienna Cichlid has them. But they are not the Tensor like matrix ops, just packed low precision ops. Navi12 with the same clocks like Navi10 would be able to do 80 TOPs INT4.

Matrix FMA ops are CDNA specific. But since CDNA is more training oriented it makes sense.
For RDNA use case of inferencing, optimizing the NN layers and the number of neurons per layer of the NN can produce good enough results until AMD reaches a point where they can cram enough transistors to do everything.
 

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
This is something I've wondered about. Is this a deterministic operation with exactly repeatable output, or will we get approximations to an outcome?
From what I've read and witnessed from academic papers and general explanations, I think exactly repeatable output is not likely with current methods - but it will be close enough that you might need to look hard at the results (ie frame by frame) to see the difference.

I've seen more than a few animation related papers explained on the Youtube channel "Two Minute Papers" that seem to show succeeding iterations/generations of the neural nets producing higher and higher quality output that varied less and less.

There was one paper I referenced for my masters dissertation about generated quadraped movement derived from mocap input data, it was pretty amazing - and a follow up paper that concentrated on biped movement interacting with an environment too.

YT links here and here.
 

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
Posts like this could spawn very interesting discussion/speculation.

So please - elaborate guys. Im not technical enough to discuss this with you, but I would be glad to read discussion on the topic :).
It's to do with using machine learning (artificial neural network or AI) techniques to reduce developer time crafting various parts of a game like character animation, models and textures, and in the context of this discussion using optimised silicon on GPU's to accelerate such compute.

Some new graphics techniques will also become available using ML also, I have read into a paper recently discussing ambient occlusion of much greater quality than conventional SSAO techniques without using ray tracing (and therefore potentially available of GPU's that lack RT acceleration).

The most recent version of Adobe Substance Alchemist has just replaced it's previous photo/image to PBR material texture engine (called B2M/Bitmap2Material, which was a more conventional algorithmic design) with a new engine called Materia which uses ML techniques to get better results.

This supposedly gets you better, more accurate normal maps from the same images and likely better PBR (metal/roughness) values also.
 
  • Like
Reactions: maddie

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
It's to do with using machine learning (artificial neural network or AI) techniques to reduce developer time crafting various parts of a game like character animation, models and textures, and in the context of this discussion using optimised silicon on GPU's to accelerate such compute.

Which seems clearly potentially very useful. Are we sure that that will need ML hardware in the end user graphics cards as well?

It would seem a priori logical to extract models/textures etc into a conventional, fixed set of assets rather than running the NN in 'real time' during the game. But maybe it works out sensible to keep it in the NN. Character animation I can see being varied enough to be unclear.
 

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
It would seem a priori logical to extract models/textures etc into a conventional, fixed set of assets rather than running the NN in 'real time' during the game. But maybe it works out sensible to keep it in the NN. Character animation I can see being varied enough to be unclear.
Yes the textures would be generated in a program like Substance Painter and exported as standard image formats.

Models though (like personalised avatars) I'm not so sure about, especially if it was generated in the game engine itself - Facebook are currently working on such a project, and though the fine details of it go over my head somewhat, it doesn't seem like it is using a standarised, exported mesh and blendshape mix for the end product.

Likewise for the quadraped animation thing it not only uses the NN to generate the animation in situ, it also uses it to blend from one gait or movement type to another (ie running to jumping to stopping) - this probably could be produced or exported to more conventional vertex and skeleton animation formats in clips, but it might well require significantly more time and effort on the part of the developers.

If the point of the exercise is to reduce developer time/effort (and therefore time/costs to develop total per game) then it is worth investing in hardware that can reduce the load on the creators for the sake of the industry, as with ray tracing.
 

moinmoin

Diamond Member
Jun 1, 2017
5,248
8,463
136
Generating assets procedurally is underrated. Adding ML to the mix may make it more widely used again. The biggest hindrance as always is that the usual artists involved don't have the technical background necessary to push for that direction.
 
  • Like
Reactions: Tlh97 and soresu

Mopetar

Diamond Member
Jan 31, 2011
8,496
7,753
136
I think most people would be fine with a realistic copy of their face in a character. No need to model my bulbous behind or anything else when the standard Johnny McBadass model is suitable.

ML is going to be far more useful for developing and training good AIs than it is for using them in real time. The same goes for other developer tools such as AI being able to generate content. The learning doesn't need to occur on the device running the game and can be conducted by far more powerful hardware.

I think the biggest benefit will be engines and frameworks that can handle a lot of that work for smaller game development teams that otherwise couldn't afford to spend a lot of time building and testing their own AI.

ML will have a big effect on the industry, but I don't think it will be in quite the way many people envision.
 

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
ML is going to be far more useful for developing and training good AIs than it is for using them in real time.
It's far too early in the day to be saying that about ML in real time or otherwise.

That's like saying RT acceleration is good for nothing but those crystal reflections in BF5 and such because it's all we had seen at the time.

RT isn't quite so mind blowing from the player real time stand point (at least compared to a well made state of the art raster based game), from what I have seen apart from gfx code simplification it seems to have far less potential to change games than ML does.

There may be further uses for RT hardware paths down the road, but ML hardware has really great potential for future development in games - though I don't believe this to be any reason nVidia pushed their tensor cores originally, that was almost certainly about locking up the AI/ML training market on the PC as early as possible.
 

GodisanAtheist

Diamond Member
Nov 16, 2006
8,347
9,730
136
Generating assets procedurally is underrated. Adding ML to the mix may make it more widely used again. The biggest hindrance as always is that the usual artists involved don't have the technical background necessary to push for that direction.

-To go even a step beyond this and to have an AI "Director" for a game to adjust enemy difficulty and game flow or even game narrative in a way that's beyond the "radiant" garbage that pollutes games nowadays.

It would be the ultimate end goal, just gave a Game Master for your game to tailor it to your taste and skill level on the fly.

You're right that the trick would be to give it enough assets and rules so it doesn't just dump out a bunch of samey garbage.
 
Last edited:

DiogoDX

Senior member
Oct 11, 2012
757
336
136
Don't know this guy track record but he just is saying the same rumors there we already know.

AMD said better IPC and faster clocks for RDNA2 than RDNA1 (but how much???)
80CUs is the rumored amount and about the same as 3080Ti (probably not the full chip like 2080Ti)
It will have higher clocks than Nvidia (maybe TSMC 7nm vs Samsung 8nm????)

But the problem with this is that we don't know with architecture is most efficient. Shaders and clocks numbers alone doesn't tell the full story.
 
  • Like
Reactions: Elfear

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
AMD said better IPC and faster clocks for RDNA2 than RDNA1 (but how much???)
For clocks I have no idea - but for IPC I would warrant the same improvement seen in Renoir Vega.

They claimed 1.59x performance per CU over Picasso, and gave us a 1.25x improvement in clocks, so about a 1.27x IPC improvement I would reckon.

That's assuming the Renoir Vega improvements are the only ones that is.

Of course new features like VRS and Mesh Shaders may offer more performance in supporting games also on top of base IPC/clock improvements.

All told, if XSX's purported intersection performance is anything to indicate of RT performance in PC RDNA2 then it should be a very interesting upgrade - even from RDNA1, let alone those on Radeon VII or previous cards.
 
  • Like
Reactions: Tlh97

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
For clocks I have no idea - but for IPC I would warrant the same improvement seen in Renoir Vega.

They claimed 1.59x performance per CU over Picasso, and gave us a 1.25x improvement in clocks, so about a 1.27x IPC improvement I would reckon.
There is no additional IPC increase in Renoir Vega.
Renoir performs better in games because:
1. Better cpu cores
2. More cpu cores
3. better memory support DDR4 3200(+33%) or LPDDR4 4266(+78%)
4. higher IGP clocks(+25%), It has the GFlops as Vega 10 in 3700U
5. 7nm process allows for higher sustained clocks
 
  • Like
Reactions: Tlh97 and Saylick

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,655
146

It carries over several features from RDNA actually. The ROPs and rasterizer are the two of interest when it comes to any sorts of per-clock improvements though.

But in any case, for per-clock improvements with RDNA2 over RDNA, I'm expecting more like 10-15% at most tbh. There's WAY less low hanging fruit to take advantage of this time around.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,777
6,791
136
It carries over several features from RDNA actually. The ROPs and rasterizer are the two of interest when it comes to any sorts of per-clock improvements though.

But in any case, for per-clock improvements with RDNA2 over RDNA, I'm expecting more like 10-15% at most tbh. There's WAY less low hanging fruit to take advantage of this time around.
I might be a bit more optimistic than you are ... :D
Somehow I see there are lots of knobs available to tweak to get more than 15% IPC gain over Navi1x

The rasterization and pixel shading is towards the latter part of the pipeline. If the geometry engine which lots of devs were talking about is as good as what they said, the culling capabilities of RDNA2 could be greatly enhanced. This alone would make a difference in performance improvement.
I snapshotted an interesting comment from Matt...

1594541625914.png

VRS + Upgraded Geometry Engine + Bugs from N10 squashed
These alone would bring decent improvements. Granted, for older games might not make as much of a difference.
if you recollect N10 had to do some extra circus working with the LDS because of few bugs, you can see how many extra extra instructions gets generated to skirt this bug. Then I supposed they removed the caveat of having no cache coherency between CUs in a WGP when working in WGP mode which results in a sync to be neccesary thereby losing some cycles too.
At CU level, there is an updated wavefront scheduler in the SIMD and new cache optimization(also evident in the LLVM VGPR allocation).
Then slap in some more cache and more compression and higher BW. Add to that the command processor changes.
Then the ROPs like you said. The ROPs feed off L1 and I imagine they could benefit with more capacity, BW and compression.

Basing off the above points and others which I have missed I believe we could get more than 10-15% IPC over N10.
If you recollect RDNA1's 1.5x perf gain over Vega is achieved with 1.25x IPC and 1.2x clocks. I suppose there will be a similar picture. Since we can't realistically expect Navi2x to hit ~2.4 GHz clocks(1.25% clocks over N10), the perf has to come from IPC. (sidenote, I have seen measurements that actual measurement shows RDNA1 is much more than 1.5x perf per watt than Vega.)
(Unless, if that 2.7+ GHz leak is true then there is no IPC gain needed at all :blush:)

1594542698160.png
 

soresu

Diamond Member
Dec 19, 2014
4,135
3,604
136
sidenote, I have seen measurements that actual measurement shows RDNA1 is much more than 1.5x perf per watt than Vega.
Of course you would never know it from how hard AMD seems to be set on completely ameliorating perf/watt gains to hit max performance per chip per generation in discrete graphics SKU's.

I mean seriously, would it kill them to make just one halfway decent performance SKU that was not set at perf/watt killing clocks off the shelf?

I can't be the only consumer that wants a halfway decent card which doesn't mandate a chunky, loud HS/F and power supply.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
I might be a bit more optimistic than you are ... :D
Somehow I see there are lots of knobs available to tweak to get more than 15% IPC gain over Navi1x

The rasterization and pixel shading is towards the latter part of the pipeline. If the geometry engine which lots of devs were talking about is as good as what they said, the culling capabilities of RDNA2 could be greatly enhanced. This alone would make a difference in performance improvement.
I snapshotted an interesting comment from Matt...

View attachment 25504

VRS + Upgraded Geometry Engine + Bugs from N10 squashed
These alone would bring decent improvements. Granted, for older games might not make as much of a difference.
if you recollect N10 had to do some extra circus working with the LDS because of few bugs, you can see how many extra extra instructions gets generated to skirt this bug. Then I supposed they removed the caveat of having no cache coherency between CUs in a WGP when working in WGP mode which results in a sync to be neccesary thereby losing some cycles too.
At CU level, there is an updated wavefront scheduler in the SIMD and new cache optimization(also evident in the LLVM VGPR allocation).
Then slap in some more cache and more compression and higher BW. Add to that the command processor changes.
Then the ROPs like you said. The ROPs feed off L1 and I imagine they could benefit with more capacity, BW and compression.

Basing off the above points and others which I have missed I believe we could get more than 10-15% IPC over N10.
If you recollect RDNA1's 1.5x perf gain over Vega is achieved with 1.25x IPC and 1.2x clocks. I suppose there will be a similar picture. Since we can't realistically expect Navi2x to hit ~2.4 GHz clocks(1.25% clocks over N10), the perf has to come from IPC. (sidenote, I have seen measurements that actual measurement shows RDNA1 is much more than 1.5x perf per watt than Vega.)
(Unless, if that 2.7+ GHz leak is true then there is no IPC gain needed at all :blush:)

View attachment 25505

AMD RDNA2 Efficiency improvements.png

This slide from AMD FAD 2020 seems to hint at a similar perf/clock improvement for RDNA2 vs RDNA compared to Zen 2 vs Zen. My expectation is 15% higher perf/clock and 25% higher clocks for Navi 2x. BTW I think AMD is sandbagging and the actual perf/watt improvement for RDNA2 will be >> 1.5x. My calculation based on Xbox Series X die size and power has Navi 21 delivering 22-24 TF at roughly 270-280w. If AMD deliver this it would give them a legitimate chance of claiming the GPU crown from Nvidia for traditional rasterization perf (something which they have not been able to do for a decade or more). The rumoured Nvidia Ampere GA102 350w power numbers seem to hint at Nvidia pushing the GA102 to its limits to keep the GPU crown.

For raytracing its tough to say without knowing the details of each GPU vendor's implementation but I would expect Nvidia to keep the lead given that Ampere would be their second generation ray tracing implementation.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,476
136
Could be, but those clocks would be very high indeed. I was basing this less than 2.4 GHz for Navi2x on the TSMC Shmoo plot for a GPU which I cannot find right now.

I used the 1755 Mhz game clock for Navi 10. So if we assume a 1.75-1.8 Ghz game clock for Navi 10 then the Navi 2x game clock should be 2.2 - 2.25 Ghz (for a 1.25x clock speed increase). My expectation is Navi 23 will hit those clocks while Navi 21 will clock 2 Ghz game clock given the 96 CU config I expect for this 505 sq mm GPU. We know PS5 is going to have variable clock upto 2.23 Ghz. Given that PS5 has to work in a power and form factor constrained console with the ability to work in varying ambient temperatures from cold (25 celsius) to warm (35+ celsius) my understanding is 2.23 Ghz is still in the reasonably efficient range of the v/f curve. I do think 2.4 - 2.5 Ghz might be Fmax for RDNA2.