Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 123 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,649
6,083
146

DisEnchantment

Golden Member
Mar 3, 2017
1,615
5,870
136
Wild speculation but I believe the de-coupling of FE/SE clocks is due to future generations further splitting them up. Instead of multi-GCD (which is hard) we'll likely see a FED (Front-End Die) containing GCP, HWS, DCN, VCN and PCIe, while the GCD is literally just Shader Engines.
GCD with just shader engines still has to be scalable because cannot have a single GCD with 8 SEs for example. So they still need to link multi GCDs and patents also have indicate consistent investigation in this direction.
Something like multiple smaller GCDs linked around a CP die with bridges all around or mutithreaded CP on two independent GCDs
Is there that much % traffic between the shader engines? If not, as you wrote but with multiple SE block chiplets, the unit size corresponding to the low end model.
Shader Processor/Rasterizers/ROP/L1/etc. are all in the SE.
The biggest data exchange between shaders and the fixed function HW is at GL1 and GDS. It is where the shaders export the result of their operations and from where they pick up the data for the next stage of operation.
And there is a lot of synchronization happening here due to the sequential nature of the rendering pipeline.
Once the shaders are done and have exported their result, the rasterizer will take over using the data exported in GDS.
Similarly output of geometry in the SE goes into GDS and could be picked up by the shaders next if needed.

So there has to be a big crossbar between the shader engines here and link up with the CP.
For Multi GCD to work transparently, they need to link up the CP/L1/L2/GDS. That is why there is a big blob of interconnects around the CP in all the die shots.

1667775821805.png
 
Last edited:

Yosar

Member
Mar 28, 2019
28
136
76
Wild speculation but I believe the de-coupling of FE/SE clocks is due to future generations further splitting them up. Instead of multi-GCD (which is hard) we'll likely see a FED (Front-End Die) containing GCP, HWS, DCN, VCN and PCIe, while the GCD is literally just Shader Engines.

The best hypothesis I read. I don't think that there are any serious bugs in silicon preventing RDNA3 from clocking high. It's not like they produced million chips and it turned out it doesn't clock high. There were many samples/revisions before.
RDNA3 is what it is. The perceived low clocks may be exactly due to decoupling clocks for different parts and for example a need to synchronize them. It could be just a first necessary step to effectively clock any future chiplets with their clocks.
And maybe we all are wrong and it's not about multiplication of chiplets for GCD, but only some parts of them.
It could have huge influence on efficiency of those parts (no need to clock something high if lower clocks will feed other chiplets good enoughl). I could believe that now there are unknown some penalties for doing this. But sure they will try to minimize them with every new step.

I'm even a litle surprised that not many noticed this new feature in RDNA3. Or rather everybody noticed but due to the wrong reason (low clocks).
 
Last edited:
  • Like
Reactions: Tlh97

PJVol

Senior member
May 25, 2020
539
451
106
The perceived low clocks may be exactly due to decoupling clocks for different parts and for example a need to synchronize them
Why perceived? It was clearly stated 2300Mhz for the SE, which mostly contributes to the power and performance. Frontend has way less switching capacity, so it could be clocked higher without tradeoffs. As for "sync", afaik amd uses asymmetric IFOPs for the short routes since zen2, where any uncore clock domain can be "decoupled"

GFX11 has a new block IMU which is managing power for all GFX blocks
I think they are design issues if we believe there are any, not process issues
Not sure what is IMU but yes, I did mean possible design issues (if any) and I'm pretty sure that rdna 3 has AVFS model described in US20220091822A1 implemented.
I'm less prone to think of "clock wall", rather an undue increase in power approaching the design target frequency.
 
Last edited:

H T C

Senior member
Nov 7, 2018
561
400
136
Uh, oh, is this true? If yes, AMD is dropping the ball really hard here. They really cant be bothered by potential usage of their GPUs outside of gaming, it seems.

If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

It means however AMD is focusing A LOT MORE on raster relative to RT than nVidia which, in my view, is actually a good thing.

Or is my reasoning flawed?
 

Timmah!

Golden Member
Jul 24, 2010
1,429
657
136
If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

It means however AMD is focusing A LOT MORE on raster relative to RT than nVidia which, in my view, is actually a good thing.

Or is my reasoning flawed?

Look at the Cycles rendering speed in Blender, not games.


This is comparison of 6950xt to Nvidia cards, but based on this info above, i dont expect 7900xt(x) to come anywhere close to 4090, it will probably match 3090ti at best.

Concentrating on raster is borderline OK if you are concentrating strictly on games. Which is not a good thing in my book, since this leaves Nvidia with a certain niche and as result an option to price their stuff to the stratosphere, justifying it with notion they can do so much more than AMDs cards, because AMD wont even try to compete with them in there.
 
  • Like
Reactions: Tlh97 and Joe NYC

H T C

Senior member
Nov 7, 2018
561
400
136
Look at the Cycles rendering speed in Blender, not games.

I thought "the good part" about RT was displayed in games, where it's "more visible to the masses".

This is comparison of 6950xt to Nvidia cards, but based on this info above, i dont expect 7900xt(x) to come anywhere close to 4090, it will probably match 3090ti at best.

Without independent reviews, it's too early to tell.

If it matches the 3090 Ti, considering it has a bit more than half the RTIs it's still a better but smaller implementation, though to a much smaller extent than with games VS the 4090.
 
  • Like
Reactions: Tlh97

Timorous

Golden Member
Oct 27, 2008
1,631
2,820
136
The reality with RT is that until consoles have beefy enough hardware the vast majority of games are going to be hybrid so really we are looking at PS6 / Next Gen Xbox. I assume they will go with AMD again and I expect they will want good RT hardware so it is practically a given AMD are working on it.

I still think even the 4090 needs a bit more performance in RT to really make it a default on feature but it is pretty close. Maybe the 4090Ti can tip it over the line but even if not I do expect the 5090 will.

So really AMD need to focus on it for RDNA4 because it is the future. I expect with RDNA3 the focus was on chiplets and getting those interconnects working fast enough and low power enough to make it actually viable which they seem to have managed.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,027
136
You will note that AMD did not exclude the possibility of dropping a 4090 competitor next year.

I don’t know it they will or not, but given the awkwardness of the presentation combined with multiple claims that AMD had cards hitting 3.4Ghz (with a vBIOS limit of 3.72 Ghz), also combined with info we have seen from AIBs, it does look like something happened. Either an unexpected bug or sync issues at higher clocks.

I don’t think they wanted to delay the launch. That would have costed them marketshare and revenue. Having something out to compete with the 4080 will be great for them.

I do hope we get a 7950XT, if nothing else, we get to see what potential RDNA3 really has.

If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.

It means however AMD is focusing A LOT MORE on raster relative to RT than nVidia which, in my view, is actually a good thing.

Or is my reasoning flawed?

RT is the future of gaming. It is going to take time to get there, but eventually we will see most games use RT natively rather than the hybrid approach you see now.

The issue is the amount of silicon required to get there, hence why AMD is not focusing on it.
 

moinmoin

Diamond Member
Jun 1, 2017
4,961
7,697
136
I still think even the 4090 needs a bit more performance in RT to really make it a default on feature but it is pretty close.
RT is the future and will have to be the focus. The current implementation however won't be the solution, unless people seriously expect 4090 like implementations and performance in consoles etc. anytime soon.
 

KompuKare

Golden Member
Jul 28, 2009
1,028
972
136
If true, i actually don't see this as bad because, though the 4090 has close to 6 times more Ray-Triangle Intersections per second, 7900 XTX's RT performance is nowhere near 6 times less than the 4090's, meaning AMD's implementation is actually better: just much smaller in size, and hence the lower performance.
While the RT thing is disappointing, being super efficient in terms of transistor usage isn't just a thing bean counters can rejoice in.

If the next consoles are to have good RT, then the implementation has to be as efficient as possible in terms of transistors/area/costs.

The issue is the amount of silicon required to get there, hence why AMD is not focusing on it.
That's why I think doing so in the most efficient way possible is (long term) a good thing.

Let's take a hypothetical: imagine that even 20% of AD101's 76 billion transistors is dedicated to RT.
Now 4090 is the most powerful RT card out there, but if using max RT settings it would crawl to a halt. By max I don't mean Cyberpunk's new ultra-ultra mode thing, but something closer to a PovRay scene. And by crawl I really mean crawl: maybe 1-2FPS or less.

Point being that going full RT in silicon is next to impossible without going into maybe 500 billion transistors and 5-10KW.

In the meantime, lots of cheating and hybrid RT will have to be used.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,027
136
While the RT thing is disappointing, being super efficient in terms of transistor usage isn't just a thing bean counters can rejoice in.

If the next consoles are to have good RT, then the implementation has to be as efficient as possible in terms of transistors/area/costs.


That's why I think doing so in the most efficient way possible is (long term) a good thing.

Let's take a hypothetical: imagine that even 20% of AD101's 76 billion transistors is dedicated to RT.
Now 4090 is the most powerful RT card out there, but if using max RT settings it would crawl to a halt. By max I don't mean Cyberpunk's new ultra-ultra mode thing, but something closer to a PovRay scene. And by crawl I really mean crawl: maybe 1-2FPS or less.

Point being that going full RT in silicon is next to impossible without going into maybe 500 billion transistors and 5-10KW.

In the meantime, lots of cheating and hybrid RT will have to be used.

Metro Exodus has a version of it’s engine that replaces it’s global illumination engine with a new engine that does everything with ray tracing. While we are likely years away from photorealism, native ray tracing for modern games is quite a bit closer.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Before we get super good RT with high performance I would actual prefer proper game physics. There is nothing worth than sucide with a hand grenade that jumps around like like a rubber with gravity of the moon.
 
  • Like
Reactions: Lodix and Kaluan

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
Before we get super good RT with high performance I would actual prefer proper game physics. There is nothing worth than sucide with a hand grenade that jumps around like like a rubber with gravity of the moon.
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently. You can't really do that because it makes the world too complex to store and for online too heavy to transmit the changes around all the clients. RT is just a visuals thing, it isn't complex to store (a bit of extra info for each material) and it doesn't need anything transmitting around all the clients for online play.

Better AI is a more achievable aim - efficient AI processing is now built into gpu's so that could be used to make smarter AI in games. That wouldn't be too complex to store or transmit around in multi-player (the AI soldier still has the same things it can do, it's just smarter about what it does).
 
  • Like
Reactions: Kaluan

KompuKare

Golden Member
Jul 28, 2009
1,028
972
136
Before we get super good RT with high performance I would actual prefer proper game physics. There is nothing worth than sucide with a hand grenade that jumps around like like a rubber with gravity of the moon.
Okay physics, someone somewhere mentioned better audio.

The thing I think Sony/Microsoft missed in the age of neural/AI co-processors in most phones is some of that and an AI framework. No wonder multiplayer is such big business as single player AI hasn't advanced much in decades.
 
  • Like
Reactions: Kaluan and arcsign

Timmah!

Golden Member
Jul 24, 2010
1,429
657
136
I thought "the good part" about RT was displayed in games, where it's "more visible to the masses".

Nope, true benefit of RT hardware is with professional workloads. RT in games is just a gimmick for the moment, to justify existence of RT parts within gaming hardware, which is basically rebadged Quadro.
 
  • Like
Reactions: Zepp

KompuKare

Golden Member
Jul 28, 2009
1,028
972
136
Nope, true benefit of RT hardware is with professional workloads. RT in games is just a gimmick for the moment, to justify existence of RT parts within gaming hardware, which is basically rebadged Quadro.
A solution looking for a problem, like tensor sensors being used for DLSS?

As I already said, AMD spending lots of R&D trying to make their shaders smaller / more efficient in terms of transistors might be aimed at future consoles or APUs (and bean counters), but it is not something Nvidia can really ignore long term. Maybe tensor sensors will eventually go from Geforce too?
 

KompuKare

Golden Member
Jul 28, 2009
1,028
972
136
But professional workloads are USUALLY done with professional cards: NOT with gaming cards, which the 4090 clearly is, no?
Yes and no.

Nvidia's efforts at re-use / dual use have mostly been that though. AD102 is a bit too new, but look at GA102:
PCiAJJd.png

15 cards, 7 are gaming, 8 are not.

So while fusing off things, or locking drivers is one thing, the silicon is design to do both.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently. You can't really do that because it makes the world too complex to store and for online too heavy to transmit the changes around all the clients.

Apparently you never played Red Faction where almost the entire game was destructible. It most certainly can be done. Also, Battlefield has had destructible environments for over a decade, going way back to Bad Company 2.
 

beginner99

Diamond Member
Jun 2, 2009
5,210
1,580
136
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently. You can't really do that because it makes the world too complex to store and for online too heavy to transmit the changes around all the clients. RT is just a visuals thing, it isn't complex to store (a bit of extra info for each material) and it doesn't need anything transmitting around all the clients for online play.

Better AI is a more achievable aim - efficient AI processing is now built into gpu's so that could be used to make smarter AI in games. That wouldn't be too complex to store or transmit around in multi-player (the AI soldier still has the same things it can do, it's just smarter about what it does).

Yeah I didn't say it was easy but RT is about realism and so is proper physics.

Better AI, I'm skeptical. In a shooter it's trivial. Just give them a gradually better "aimbot". In stratgey games be it real.time or turn-based? Yes there is some really good Starcraft AI out there. Don't know about Civilization which I think is more complex than SC. Anyway I'm sure these AI even if super-human will all have some weird bugs that can be exploited that would make the games very unfun because you loose without using cheese.

Okay physics, someone somewhere mentioned better audio.
Yeah in a multiplayer shooter correct audio would be very cool. Like here them breathing around the corner which make soo much sense. Much more than everything begin a mirror. Only downside is it gives potentially too much advantage to people with good audio setup (speak good headphones).
 
  • Like
Reactions: Kaluan and arcsign

biostud

Lifer
Feb 27, 2003
18,253
4,771
136
Realistic physics = blowing stuff up more realistically and having a more realistic world that allows stuff to be blown up and the changes show permanently.

I wouldn't say destructible environment is a requirement for good physics.

How different materials interact regarding to friction, gravity and collision can be achieved without being able to blow everything to gravel.