Question Speculation: RDNA3 + CDNA2 Architectures Thread

uzzi38 · Jan 23, 2021

Man I have been dying to make this one for a while now.

First rumours for RDNA3 are here so new thread time!

Just going to start off with this one for now: kopite7kimi on Twitter: "@VideoCardz Ah, I mean a simple mcm design with 10240 cores is not enough. Because the lift from RDNA2 to RDNA3 is much bigger than from RDNA1 to RDNA2. We should expect many big improvements in GFX11. 🤔" / Twitter

DisEnchantment · Nov 6, 2022

Kepler_L2 said:
Wild speculation but I believe the de-coupling of FE/SE clocks is due to future generations further splitting them up. Instead of multi-GCD (which is hard) we'll likely see a FED (Front-End Die) containing GCP, HWS, DCN, VCN and PCIe, while the GCD is literally just Shader Engines.

GCD with just shader engines still has to be scalable because cannot have a single GCD with 8 SEs for example. So they still need to link multi GCDs and patents also have indicate consistent investigation in this direction.
Something like multiple smaller GCDs linked around a CP die with bridges all around or mutithreaded CP on two independent GCDs

maddie said:
Is there that much % traffic between the shader engines? If not, as you wrote but with multiple SE block chiplets, the unit size corresponding to the low end model.

Shader Processor/Rasterizers/ROP/L1/etc. are all in the SE.
The biggest data exchange between shaders and the fixed function HW is at GL1 and GDS. It is where the shaders export the result of their operations and from where they pick up the data for the next stage of operation.
And there is a lot of synchronization happening here due to the sequential nature of the rendering pipeline.
Once the shaders are done and have exported their result, the rasterizer will take over using the data exported in GDS.
Similarly output of geometry in the SE goes into GDS and could be picked up by the shaders next if needed.

So there has to be a big crossbar between the shader engines here and link up with the CP.
For Multi GCD to work transparently, they need to link up the CP/L1/L2/GDS. That is why there is a big blob of interconnects around the CP in all the die shots.

Yosar · Nov 6, 2022

Kepler_L2 said:
Wild speculation but I believe the de-coupling of FE/SE clocks is due to future generations further splitting them up. Instead of multi-GCD (which is hard) we'll likely see a FED (Front-End Die) containing GCP, HWS, DCN, VCN and PCIe, while the GCD is literally just Shader Engines.

The best hypothesis I read. I don't think that there are any serious bugs in silicon preventing RDNA3 from clocking high. It's not like they produced million chips and it turned out it doesn't clock high. There were many samples/revisions before.
RDNA3 is what it is. The perceived low clocks may be exactly due to decoupling clocks for different parts and for example a need to synchronize them. It could be just a first necessary step to effectively clock any future chiplets with their clocks.
And maybe we all are wrong and it's not about multiplication of chiplets for GCD, but only some parts of them.
It could have huge influence on efficiency of those parts (no need to clock something high if lower clocks will feed other chiplets good enoughl). I could believe that now there are unknown some penalties for doing this. But sure they will try to minimize them with every new step.

I'm even a litle surprised that not many noticed this new feature in RDNA3. Or rather everybody noticed but due to the wrong reason (low clocks).

PJVol · Nov 6, 2022

Yosar said:
The perceived low clocks may be exactly due to decoupling clocks for different parts and for example a need to synchronize them

Why perceived? It was clearly stated 2300Mhz for the SE, which mostly contributes to the power and performance. Frontend has way less switching capacity, so it could be clocked higher without tradeoffs. As for "sync", afaik amd uses asymmetric IFOPs for the short routes since zen2, where any uncore clock domain can be "decoupled"

DisEnchantment said:
GFX11 has a new block IMU which is managing power for all GFX blocks
I think they are design issues if we believe there are any, not process issues

Not sure what is IMU but yes, I did mean possible design issues (if any) and I'm pretty sure that rdna 3 has AVFS model described in US20220091822A1 implemented.
I'm less prone to think of "clock wall", rather an undue increase in power approaching the design target frequency.

Timmah! · Nov 7, 2022

Panino Manino said:
RT is really this low?

https://twitter.com/x/status/1588537489299763205

Uh, oh, is this true? If yes, AMD is dropping the ball really hard here. They really cant be bothered by potential usage of their GPUs outside of gaming, it seems.

leoneazzurro · Nov 7, 2022

Not that real world RT performance will be limited only by theoretical Intersection rate... There are a lot of factors involved.

H T C · Nov 7, 2022

Timmah! said:
Uh, oh, is this true? If yes, AMD is dropping the ball really hard here. They really cant be bothered by potential usage of their GPUs outside of gaming, it seems.

If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

It means however AMD is focusing A LOT MORE on raster relative to RT than nVidia which, in my view, is actually a good thing.

Or is my reasoning flawed?

poke01 · Nov 7, 2022

RnR_au said:
I have zero interest in RT. It holds as much interest to me as fake frames.

Delete

Timmah! · Nov 7, 2022

H T C said:
If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

It means however AMD is focusing A LOT MORE on raster relative to RT than nVidia which, in my view, is actually a good thing.

Or is my reasoning flawed?

Look at the Cycles rendering speed in Blender, not games.

Test Nvidia GeForce RTX 4090 FE: Brutální výkon, na který procesory nestačí

Nvidia dnes uvádí na trh GeForce RTX 4090 a s ní i novou generaci GeForce. Pro ty nejnáročnější nastupuje jako první špičkový model RTX 4090. Nárůst výkonu proti minulé generaci je extrémní, daní za to je spotřeba i cena. Ale provozní vlastnosti vysokou spotřebou kupodivu významně netrpí.

pctuning.cz

This is comparison of 6950xt to Nvidia cards, but based on this info above, i dont expect 7900xt(x) to come anywhere close to 4090, it will probably match 3090ti at best.

Concentrating on raster is borderline OK if you are concentrating strictly on games. Which is not a good thing in my book, since this leaves Nvidia with a certain niche and as result an option to price their stuff to the stratosphere, justifying it with notion they can do so much more than AMDs cards, because AMD wont even try to compete with them in there.

H T C · Nov 7, 2022

Timmah! said:
Look at the Cycles rendering speed in Blender, not games.

I thought "the good part" about RT was displayed in games, where it's "more visible to the masses".

Timmah! said:
This is comparison of 6950xt to Nvidia cards, but based on this info above, i dont expect 7900xt(x) to come anywhere close to 4090, it will probably match 3090ti at best.

Without independent reviews, it's too early to tell.

If it matches the 3090 Ti, considering it has a bit more than half the RTIs it's still a better but smaller implementation, though to a much smaller extent than with games VS the 4090.

Timorous · Nov 7, 2022

The reality with RT is that until consoles have beefy enough hardware the vast majority of games are going to be hybrid so really we are looking at PS6 / Next Gen Xbox. I assume they will go with AMD again and I expect they will want good RT hardware so it is practically a given AMD are working on it.

I still think even the 4090 needs a bit more performance in RT to really make it a default on feature but it is pretty close. Maybe the 4090Ti can tip it over the line but even if not I do expect the 5090 will.

So really AMD need to focus on it for RDNA4 because it is the future. I expect with RDNA3 the focus was on chiplets and getting those interconnects working fast enough and low power enough to make it actually viable which they seem to have managed.

eek2121 · Nov 7, 2022

You will note that AMD did not exclude the possibility of dropping a 4090 competitor next year.

I don’t know it they will or not, but given the awkwardness of the presentation combined with multiple claims that AMD had cards hitting 3.4Ghz (with a vBIOS limit of 3.72 Ghz), also combined with info we have seen from AIBs, it does look like something happened. Either an unexpected bug or sync issues at higher clocks.

I don’t think they wanted to delay the launch. That would have costed them marketshare and revenue. Having something out to compete with the 4080 will be great for them.

I do hope we get a 7950XT, if nothing else, we get to see what potential RDNA3 really has.

H T C said:
If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

It means however AMD is focusing A LOT MORE on raster relative to RT than nVidia which, in my view, is actually a good thing.

Or is my reasoning flawed?

RT is the future of gaming. It is going to take time to get there, but eventually we will see most games use RT natively rather than the hybrid approach you see now.

The issue is the amount of silicon required to get there, hence why AMD is not focusing on it.

moinmoin · Nov 7, 2022

Timorous said:
I still think even the 4090 needs a bit more performance in RT to really make it a default on feature but it is pretty close.

RT is the future and will have to be the focus. The current implementation however won't be the solution, unless people seriously expect 4090 like implementations and performance in consoles etc. anytime soon.

poke01 · Nov 7, 2022

I agree next gen consoles in 2026/27 will have RT that is Great.

So RDNA5?

KompuKare · Nov 7, 2022

H T C said:
If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

While the RT thing is disappointing, being super efficient in terms of transistor usage isn't just a thing bean counters can rejoice in.

If the next consoles are to have good RT, then the implementation has to be as efficient as possible in terms of transistors/area/costs.

eek2121 said:
The issue is the amount of silicon required to get there, hence why AMD is not focusing on it.

That's why I think doing so in the most efficient way possible is (long term) a good thing.

Let's take a hypothetical: imagine that even 20% of AD101's 76 billion transistors is dedicated to RT.
Now 4090 is the most powerful RT card out there, but if using max RT settings it would crawl to a halt. By max I don't mean Cyberpunk's new ultra-ultra mode thing, but something closer to a PovRay scene. And by crawl I really mean crawl: maybe 1-2FPS or less.

Point being that going full RT in silicon is next to impossible without going into maybe 500 billion transistors and 5-10KW.

In the meantime, lots of cheating and hybrid RT will have to be used.

eek2121 · Nov 7, 2022

KompuKare said:
While the RT thing is disappointing, being super efficient in terms of transistor usage isn't just a thing bean counters can rejoice in.

If the next consoles are to have good RT, then the implementation has to be as efficient as possible in terms of transistors/area/costs.

That's why I think doing so in the most efficient way possible is (long term) a good thing.

Let's take a hypothetical: imagine that even 20% of AD101's 76 billion transistors is dedicated to RT.
Now 4090 is the most powerful RT card out there, but if using max RT settings it would crawl to a halt. By max I don't mean Cyberpunk's new ultra-ultra mode thing, but something closer to a PovRay scene. And by crawl I really mean crawl: maybe 1-2FPS or less.

Point being that going full RT in silicon is next to impossible without going into maybe 500 billion transistors and 5-10KW.

In the meantime, lots of cheating and hybrid RT will have to be used.

Metro Exodus has a version of it’s engine that replaces it’s global illumination engine with a new engine that does everything with ray tracing. While we are likely years away from photorealism, native ray tracing for modern games is quite a bit closer.

beginner99 · Nov 7, 2022

Before we get super good RT with high performance I would actual prefer proper game physics. There is nothing worth than sucide with a hand grenade that jumps around like like a rubber with gravity of the moon.

Dribble · Nov 7, 2022

beginner99 said:
Before we get super good RT with high performance I would actual prefer proper game physics. There is nothing worth than sucide with a hand grenade that jumps around like like a rubber with gravity of the moon.

Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently. You can't really do that because it makes the world too complex to store and for online too heavy to transmit the changes around all the clients. RT is just a visuals thing, it isn't complex to store (a bit of extra info for each material) and it doesn't need anything transmitting around all the clients for online play.

Better AI is a more achievable aim - efficient AI processing is now built into gpu's so that could be used to make smarter AI in games. That wouldn't be too complex to store or transmit around in multi-player (the AI soldier still has the same things it can do, it's just smarter about what it does).

KompuKare · Nov 7, 2022

beginner99 said:
Before we get super good RT with high performance I would actual prefer proper game physics. There is nothing worth than sucide with a hand grenade that jumps around like like a rubber with gravity of the moon.

Okay physics, someone somewhere mentioned better audio.

The thing I think Sony/Microsoft missed in the age of neural/AI co-processors in most phones is some of that and an AI framework. No wonder multiplayer is such big business as single player AI hasn't advanced much in decades.

Timmah! · Nov 7, 2022

H T C said:
I thought "the good part" about RT was displayed in games, where it's "more visible to the masses".

Nope, true benefit of RT hardware is with professional workloads. RT in games is just a gimmick for the moment, to justify existence of RT parts within gaming hardware, which is basically rebadged Quadro.

H T C · Nov 7, 2022

Timmah! said:
Nope, true benefit of RT hardware is with professional workloads.

But professional workloads are USUALLY done with professional cards: NOT with gaming cards, which the 4090 clearly is, no?

KompuKare · Nov 7, 2022

Timmah! said:
Nope, true benefit of RT hardware is with professional workloads. RT in games is just a gimmick for the moment, to justify existence of RT parts within gaming hardware, which is basically rebadged Quadro.

A solution looking for a problem, like tensor sensors being used for DLSS?

As I already said, AMD spending lots of R&D trying to make their shaders smaller / more efficient in terms of transistors might be aimed at future consoles or APUs (and bean counters), but it is not something Nvidia can really ignore long term. Maybe tensor sensors will eventually go from Geforce too?

KompuKare · Nov 7, 2022

H T C said:
But professional workloads are USUALLY done with professional cards: NOT with gaming cards, which the 4090 clearly is, no?

Yes and no.

Nvidia's efforts at re-use / dual use have mostly been that though. AD102 is a bit too new, but look at GA102:

15 cards, 7 are gaming, 8 are not.

So while fusing off things, or locking drivers is one thing, the silicon is design to do both.

Stuka87 · Nov 7, 2022

Dribble said:
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently. You can't really do that because it makes the world too complex to store and for online too heavy to transmit the changes around all the clients.

Apparently you never played Red Faction where almost the entire game was destructible. It most certainly can be done. Also, Battlefield has had destructible environments for over a decade, going way back to Bad Company 2.

beginner99 · Nov 7, 2022

Dribble said:
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently. You can't really do that because it makes the world too complex to store and for online too heavy to transmit the changes around all the clients. RT is just a visuals thing, it isn't complex to store (a bit of extra info for each material) and it doesn't need anything transmitting around all the clients for online play.

Better AI is a more achievable aim - efficient AI processing is now built into gpu's so that could be used to make smarter AI in games. That wouldn't be too complex to store or transmit around in multi-player (the AI soldier still has the same things it can do, it's just smarter about what it does).

Yeah I didn't say it was easy but RT is about realism and so is proper physics.

Better AI, I'm skeptical. In a shooter it's trivial. Just give them a gradually better "aimbot". In stratgey games be it real.time or turn-based? Yes there is some really good Starcraft AI out there. Don't know about Civilization which I think is more complex than SC. Anyway I'm sure these AI even if super-human will all have some weird bugs that can be exploited that would make the games very unfun because you loose without using cheese.

KompuKare said:
Okay physics, someone somewhere mentioned better audio.

Yeah in a multiplayer shooter correct audio would be very cool. Like here them breathing around the corner which make soo much sense. Much more than everything begin a mirror. Only downside is it gives potentially too much advantage to people with good audio setup (speak good headphones).

biostud · Nov 7, 2022

Dribble said:
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently.

I wouldn't say destructible environment is a requirement for good physics.

How different materials interact regarding to friction, gravity and collision can be achieved without being able to blow everything to gravel.

Question Speculation: RDNA3 + CDNA2 Architectures Thread

Platinum Member

Golden Member

Member

Senior member

Golden Member

Golden Member

Senior member

Diamond Member

Golden Member

Senior member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Platinum Member

Golden Member

Golden Member

Senior member

Golden Member

Golden Member

Diamond Member

Diamond Member

Lifer