Discussion RDNA4 + CDNA3 Architectures Thread

DisEnchantment · Mar 23, 2022

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits

History for llvm/lib/Target/AMDGPU - llvm/llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. - History for llvm/lib/Target/AMDGPU - llvm/llvm-project

github.com

Or Phoronix

More AMD "GFX940" Enablement Work Landing In LLVM - Phoronix

www.phoronix.com

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it

This is nuts, MI100/200/300 cadence is impressive.

Previous thread on CDNA2 and RDNA3 here

Question - Speculation: RDNA3 + CDNA2 Architectures Thread

Man I have been dying to make this one for a while now. First rumours for RDNA3 are here so new thread time! Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3...

forums.anandtech.com

SolidQ · Sep 25, 2023

adroc_thurston said:
We can, just that the IQ needle moved further with RTRT.

200-300$ cards can do it easily? When cards for 200-300$ easily going handle 4k@60+ like fullhd, then we can talk about "Can"

dr1337 · Sep 25, 2023

Ajay said:
Our brains, when say hunting, just ignore allot of unimportant stuff so we can focus on killing our next meal.

And say when reading, your brain and eyes are working on resolving the image in front of you to quite a high degree.

Honestly the only thing I can think is that you're farsighted or something because what you're saying doesn't make sense to me. The current paradigm is PPI matters more the closer a display is to you and frankly that's a fact. Nobody looks at a magazine and wishes it had less DPI.

adroc_thurston · Sep 25, 2023

SolidQ said:
When cards for 200-300$ easily going handle 4k@60+ like fullhd, then we can talk about "Can"

Well that's the thing, xtor cost scaling died out and GPUs are all hungry hippos.

soresu · Sep 26, 2023

Ajay said:
Unfortunately. 8K is a fools errand on small(ish) monitors. It's really only useful for xtra large TVs (85" and above)

Exactly, even so you need to be so close to the screen to pick out individual pixels that you will get serious neck problems just turning it to cover the whole screen 😅

soresu · Sep 26, 2023

Mopetar said:
Maybe if ray tracing takes off enough someone will build a card that's designed for ray tracing first and has fall back options for taster games

To a large extent they already are.

The raster specific hardware is minimal compared to the shader/general compute part of the µArch for both AMD and nVidia, which is why you can do RT entirely on GPU software compute as a fallback on many older generations.

soresu · Sep 26, 2023

adroc_thurston said:
We can, just that the IQ needle moved further with RTRT.

IMHO while true it's well into the arena of diminishing returns of extra compute to achieve that higher IQ, even with techniques like ReSTIR to neutralise the compute impact.

It seems to be more of a benefit for game devs than actual gamers from where I'm standing.

By comparison UE5 Nanite and techniques inspired by it brought far more potential for higher IQ from the huge potential poly count increase limited more by available memory and IO to store and feed hipoly geometry to the GPU.

Mopetar · Sep 26, 2023

soresu said:
To a large extent they already are.

The raster specific hardware is minimal compared to the shader/general compute part of the µArch for both AMD and nVidia, which is why you can do RT entirely on GPU software compute as a fallback on many older generations.

Doing RT on shaders isn't a good way to achieve fully ray traced games. They can be used for RT workloads, but the performance will be abysmal compared to building hardware that's specifically designed for it.

It would be interesting to see someone build (or just design) an architecture that's really built for RT workloads. You'd still need some hardware to convert that to a 2D image containing pixels that can be displayed on the screen so you'll always have a little bit of rasterization hardware at some point, but (pixel) shaders are not the best approach to being able to handled a fully ray traced game.

adroc_thurston · Sep 26, 2023

Mopetar said:
isn't a good way to achieve fully ray traced games

You're never getting those unless CMOS gets replaced with something fancy and we break the memory barrier (somehow)

Mopetar · Sep 26, 2023

adroc_thurston said:
You're never getting those unless CMOS gets replaced with something fancy and we break the memory barrier (somehow)

I'm not sure what you mean. We've already seen fully RT titles re-released because the hardware can handle doing it for a ~20 year old game.

I think that you could do a hell of a lot better throwing out shaders entirely and starting from scratch. Of course that card sucks at 99.9% of games to be able have amazing results in the 0.1% which means no one will buy it.

blackangus · Sep 26, 2023

adroc_thurston said:
We can, just that the IQ needle moved further with RTRT.

I respect what you say here, but honestly I can hardly tell any difference with most RT titles with RT on or off. Does it look different sure does it look better by much ... not really.
I mean CP2077 with PT seems to be the best example but performance is so terrible its obviously not worth it, and even then it doesnt seem to look alot different.
The fortenite screens I saw seem to be pretty good from a GI perspective, but I dont play it.
RT performance needs magnitudes better performance before it really gives a global visual leap.

blackangus · Sep 26, 2023

Mopetar said:
I'm not sure what you mean. We've already seen fully RT titles re-released because the hardware can handle doing it for a ~20 year old game.

I think that you could do a hell of a lot better throwing out shaders entirely and starting from scratch. Of course that card sucks at 99.9% of games to be able have amazing results in the 0.1% which means no one will buy it.

So to get modern game with fully ray traced quality the you need what 10-13x the RT performance we have today? (conservatively)
I doubt using a clean sheet is going to get that type of performance anytime soon, even if you throw out "legacy" stuff even it money is no object. (And it is)
I agree with what @adroc_thurston said - We are going to need some kind of breakthrough to get real RT at current game quality.

adroc_thurston · Sep 26, 2023

Mopetar said:
We've already seen fully RT titles re-released because the hardware can handle doing it for a ~20 year old game.

Horribly undersampled garbage with insane GI lags.

Saylick · Sep 26, 2023

soresu said:
IMHO while true it's well into the arena of diminishing returns of extra compute to achieve that higher IQ, even with techniques like ReSTIR to neutralise the compute impact.

It seems to be more of a benefit for game devs than actual gamers from where I'm standing.

By comparison UE5 Nanite and techniques inspired by it brought far more potential for higher IQ from the huge potential poly count increase limited more by available memory and IO to store and feed hipoly geometry to the GPU.

Yeah, I'd sooner want a GPU architecture that accelerates the things UE5 does well, specifically Nanite and their software-based global illumination model, rather than a GPU architecture that is dedicated to pure RTRT. Perhaps AMD can engineer an APU that offers the massive bandwidth that you're describing to really let game developers leverage UE5 to its full potential. Many developers are getting on-board the UE5 bandwagon, e.g. CDPR, so it seems like a good bet to focus ones efforts there.

igor_kavinski · Sep 26, 2023

Saylick said:
Perhaps AMD can engineer an APU that offers the massive bandwidth that you're describing to really let game developers leverage UE5 to its full potential. Many developers are getting on-board the UE5 bandwagon, e.g. CDPR, so it seems like a good bet to focus ones efforts there.

Don't think that's gonna happen unless UE5 is open sourced. Tying your silicon future to proprietary software is prone to have a disastrous outcome sooner or later.

GodisanAtheist · Sep 26, 2023

Aren't we stuck in sort of a classic chicken and egg scenario?

Dev's are hesitant to go all in on RTRT because the hardware isn't there, and the hardware side is hesitant to go all in on RTRT thanks to the massive backlog of raster games and the dev's are just painting RT over their raster effects.

Hardware would have to move first IMO, dedicate a larger proportion die space to RTRT and shrink the share of raster only resources aggressively from gen to gen.

Saylick · Sep 26, 2023

igor_kavinski said:
Don't think that's gonna happen unless UE5 is open sourced. Tying your silicon future to proprietary software is prone to have a disastrous outcome sooner or later.

I wasn't implying that they tie the architecture to the game engine specifically, but to design the architecture to benefit the most from the techniques that are implemented in UE5. If those techniques are bandwidth heavy, then design the architecture to offer massive amounts of bandwidth such that game devs can take advantage of it.

igor_kavinski · Sep 26, 2023

Saylick said:
If those techniques are bandwidth heavy, then design the architecture to offer massive amounts of bandwidth such that game devs can take advantage of it.

I know that's just an example but Raja Koduri tried to do that with HBM2. I bet he wishes he could laser kill the brain cells involved in that idea and any other brain cells that retain the memory of that misadventure. Lisa won't let anyone try something like that again, at least as long as she is at the helm.

igor_kavinski · Sep 26, 2023

GodisanAtheist said:
Hardware would have to move first IMO, dedicate a larger proportion die space to RTRT and shrink the share of raster only resources aggressively from gen to gen.

An alternative to RTRT is to generate perfect lighting for a game's environment through distributed computing. Why bother with RTRT if static RT gives the same or better results with almost no performance hit? Thousands, if not millions, of gamers wouldn't hesitate to donate their idle GPU cycles to such a DC effort.

adroc_thurston · Sep 26, 2023

GodisanAtheist said:
dedicate a larger proportion die space to RTRT

It's not a die area issue, it's a memory wall issue.

PJVol · Sep 26, 2023

Well... n43 is dead, long live n48? )

soresu · Sep 26, 2023

Mopetar said:
Doing RT on shaders isn't a good way to achieve fully ray traced games. They can be used for RT workloads, but the performance will be abysmal compared to building hardware that's specifically designed for it.

They are needed on RT workloads regardless even when using the HW RT core functionality.

They are still needed to do the BRDF surface shading math that combines the various textures into a visible PBR material after the ray math has determined the correct light (and environment reflectance) actually hitting that surface.

This same process is more or less identical in shader materials used in offline RT/PT renderers like Arnold, Cycles and Renderman.

The RT cores / RT units are designed for that tracing specific compute, and while they will undoubtedly increase in number over time in the compute unit equivalents of various µArchs, they will not displace the general shader hardware because it is still needed not just in the shading of traced surface materials, but all the various things shaders have come to be used for in games to offload from the CPU since programmable GPGPU became a thing with nV Tesla and AMD Terascale.

soresu · Sep 26, 2023

Saylick said:
Yeah, I'd sooner want a GPU architecture that accelerates the things UE5 does well, specifically Nanite and their software-based global illumination model, rather than a GPU architecture that is dedicated to pure RTRT. Perhaps AMD can engineer an APU that offers the massive bandwidth that you're describing to really let game developers leverage UE5 to its full potential. Many developers are getting on-board the UE5 bandwagon, e.g. CDPR, so it seems like a good bet to focus ones efforts there.

General shader µArchs are already covering the Nanite part.

I believe that a large part of Nanite is compute shaders, though my memory on the explanation of how it works is fuzzy at this part - there has to be a ton of heuristics involved to be sure.

As far as massive bandwidth goes the DirectStorage idea takes care of a significant part of that, especially with compression applied to reduce necessary bandwidth.

For now they are using external SSDs, but I think it's only a matter of time before GFX cards start using onboard high capacity NVMe or soldered flash chips to ehlp combat this issue - like the Radeon SSG line of pro SKUs that has not been renewed since the Vega generation.

adroc_thurston · Sep 26, 2023

soresu said:
DirectStorage idea takes care of a significant part of that

Ughhh no?
It's just a less silly way to stream assets in and out.

soresu said:
but I think it's only a matter of time before GFX cards start using onboard high capacity NVMe or soldered flash chips to ehlp combat this issue - like the Radeon SSG line of pro SKUs that has not been renewed since the Vega generation.

Literally no benefit.

Joe NYC · Sep 27, 2023

soresu said:
As far as massive bandwidth goes the DirectStorage idea takes care of a significant part of that, especially with compression applied to reduce necessary bandwidth.

For now they are using external SSDs, but I think it's only a matter of time before GFX cards start using onboard high capacity NVMe or soldered flash chips to ehlp combat this issue - like the Radeon SSG line of pro SKUs that has not been renewed since the Vega generation.

How much memory are you talking about, that would be needed? It seems increasing memory would have more of the benefit.

Also, even the best SSD can't saturate 4x PCIe Gen 5. Which is about half of the bandwidth of 16x PCIe Gen 4. So, this is not the bottleneck at this time.

eek2121 · Sep 27, 2023

igor_kavinski said:
An alternative to RTRT is to generate perfect lighting for a game's environment through distributed computing. Why bother with RTRT if static RT gives the same or better results with almost no performance hit? Thousands, if not millions, of gamers wouldn't hesitate to donate their idle GPU cycles to such a DC effort.

There was a project I saw on the internet that used an AI model to generate realistic looking “ray-traced” lighting. The model ran on his GPU. I found that to be curious…and maybe just a little odd.

Discussion RDNA4 + CDNA3 Architectures Thread

Golden Member

Senior member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Lifer

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member