Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 176 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,572
146

blckgrffn

Diamond Member
May 1, 2003
9,110
3,028
136
www.teamjuchems.com
Interesting. Plus you have to figure there is no chance anyone optimized for ARC.

Intel has a giant die for a reason 😂 If the raster performance isn’t good enough, then it seems like a miscalculation.

That said, maybe in a few iterations it will matter more and it will be a genuine feather in their cap.

I just can’t care about it when it’s going from too slow to play to still too slow to play fps.
 
  • Like
Reactions: Tlh97 and Kaluan

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Was seeing the benchmarks for Hogwards and was curious, why is RT so bad on RDNA3? There's barely any improvement here over RDNA2, the minimums can actually be lower.


Meanwhile ARC is showing to be a RT champ.
Can it improve with drivers? If not I fear that Second Generation ARC may become a real treat to AMD.

There's something wrong with TPUs AMD tests. HWUB got very different results and there are now multiple users with RDNA3 cards posting results online that are in line with HWUB's results, not TPUs. TPU's reviewer in their forums even said he thinks there's some kind of bug happening with their AMD GPU test results, he just doesn't mention as much in the actual review.

1676042208566.png

1676042125023.png
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
There's something wrong with TPUs AMD tests. HWUB got very different results and there are now multiple users with RDNA3 cards posting results online that are in line with HWUB's results, not TPUs. TPU's reviewer in their forums even said he thinks there's some kind of bug happening with their AMD GPU test results, he just doesn't mention as much in the actual review.

The bug is that for RTX 4000 series cards frame-generation is stealth-enabled by default:

OMG! I solved the issue, it's not a Ryzen bug but rather a menu bug. Although DLSS and all forms of upscaling were disabled & greyed out in the menu, frame generation was on for just the 40 series. I had to enable DLSS, then enabled FG, then disable FG and disable DLSS to fix it!

Edit: Specifically with Intel CPU-based systems. Which TPU tested with a 13th gen Intel 13900K

Edit2: Never mind, additional weirdness in TPU results especially min FPS with RT enabled. Something's broken with their configuration...
 
Last edited:

Panino Manino

Senior member
Jan 28, 2017
813
1,010
136
I get the feeling that RDNA3 is still not fully accelerating RT, maybe some missing/not active instructions or just plain inability to execute them on their current hardware. Intel has kinda followed Nvidia closer on RT (hardware wise) while AMD is still catching up.
Interesting. Plus you have to figure there is no chance anyone optimized for ARC.


Interesting, Hardware Unboxed results are VERY different:

 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
The bug is that for RTX 4000 series cards frame-generation is stealth-enabled by default:



Edit: Specifically with Intel CPU-based systems. Which TPU tested with a 13th gen Intel 13900K

I thought about that but it doesn’t explain the numbers. Neither ARC nor the RTX3000 series support frame generation and they both are inexplicably faster than RDNA cars in TPU’s results.

TPU’s results are also inconsistent in VRAM needed versus performance on cards with lower VRAM amounts.
 
  • Like
Reactions: Tlh97 and Kaluan

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
I thought about that but it doesn’t explain the numbers. Neither ARC nor the RTX3000 series support frame generation and they both are inexplicably faster than RDNA cars in TPU’s results.

TPU’s results are also inconsistent in VRAM needed versus performance on cards with lower VRAM amounts.

You are right. I'm seeing additional inconsistency in their numbers especially min FPS with RT enabled versus other reviews. Something's broken in their configuration...
 
  • Like
Reactions: Kaluan

GodisanAtheist

Diamond Member
Nov 16, 2006
6,715
7,004
136
As an aside, I wonder how AMD could modify their arch to be more performant in RT workloads.

They currently don't use "dedicated" RT hardware the way NV does since AMD's whole philosophy is to do more with less even at the expense of performance, they have one RT capable unit within a CU that can perform standard rater ops when not being used for RT.

Its a great transitional arch, but I think AMD might have played it too safe this round thinking RT wouldn't be adopted as quickly or broadly (or as badly in many cases, just sort of tacked on to games as a checkbox feature) as it has. Sort of an inversion of their Bulldozer "everyone is going to go multithreaded, I know it" arch.

I suppose the obvious answer is to simply increase the number of RT accelerators within the CU, they can remain multipurpose but being able to dedicate more of them to RT when its required may shrink the gap between raster performance and RT performance.

Alternatively AMD could also reduce the size of each CU and increase the number of them, so the ratio of RT accelerators to plain jane SPs is reduced as well.

Not sure what their next step is here, but its becoming increasingly obvious that even if the performance just isn't there for RT without upscaling and frame generation tricks, it is an important metric for both developers and purchasers as well.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I wrote it before already but I'm actually glad that AMD is riding out the current RT wave the way they are. The eventual RT tech needs to scale down to be able to reach mainstream i.e. consoles and handhelds. Currently RT is only feasible with cards sized and powered in ways that won't make them fit either within the next decade the way node progress and pricing is going. That just can't stay that way, unless the majority of consumers and game developers agree that current RT remains a premium feature only available for a premium cost for a long time to come while still being worth bothering about. Which I highly doubt.
 

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
I think that we'll eventually see a push for 8K gaming. Maybe not the next console cycle, but definitely the one after that. RT might still be around, but it'll be relegated to an extra much like motion controls have become.

Even with steady progress, ray tracing isn't going to be feasible for mainstream performance expectations and when 8K gaming becomes the new trendy buzzword, expect to see more effort directed towards that.
 
  • Like
Reactions: Kaluan

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
RT might still be around, but it'll be relegated to an extra much like motion controls have become.

Definitely disagree but for this console gen it'll have to be hybrid rendering. FSR2/upscaling seems to be getting rather popular with console devs so maybe they will take that route to improving fidelity while maintaining frame rate.
 
  • Like
Reactions: Tlh97 and Kaluan

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
I wouldn't be surprised if the consoles end up using something like DLSS3, but pushed even further to where it generates fake frames in order to reach a target FPS. Basically upscale and insert frames to offer a "120 FPS 4K" gaming experience.

RT is still going to be too demanding for what'll amount to a midrange GPU at the time the next generation consoles launch. It takes a 4090 to deliver 60+ FPS in 4K with most current titles that use it to a limited extent. Anything like Portal where it's used for everything will still cripple even the best card available.

Next generation we might see every game have RT, but most of it will be in such minuscule amounts or some minimal aspect of it due to support from the game engine being used to where adding it is essentially free for developers, which means a lot of bad games will have RT as well and it won't be special. It will still exist, but it won't be the main selling point or what's being used to generate the buzz behind the new console.
 
  • Like
Reactions: Tlh97 and RnR_au

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
I wouldn't be surprised if the consoles end up using something like DLSS3, but pushed even further to where it generates fake frames in order to reach a target FPS. Basically upscale and insert frames to offer a "120 FPS 4K" gaming experience.
I highly doubt that will happen. Before that happens we will rather see "console" thin clients for game streaming where everything is rendered on servers and the "console" only does the upscaling and interpolation anymore. Would be a perfect match considering how both game streaming and frame interpolation hit input latency hard.
 
  • Like
Reactions: Tlh97 and psolord

RnR_au

Golden Member
Jun 6, 2021
1,675
4,079
106
I think a company like AMD will look at the current game engines and see where they can usefully accelerate the pipelines. Epic's Lumen potentially points the way to the future on how you can get great looking global illumination without needing to dedicate massive amounts of hardware for RT capabilities.

How fast would Lumen run with help from a Xilinx FPGA on a future AMD cpu?
 

Aapje

Golden Member
Mar 21, 2022
1,311
1,772
106
I think a company like AMD will look at the current game engines and see where they can usefully accelerate the pipelines. Epic's Lumen potentially points the way to the future on how you can get great looking global illumination without needing to dedicate massive amounts of hardware for RT capabilities.

Nvidia might setting themselves up for failure as they currently earn a lot of money from that hardware by selling to AI-farms, but I see it as a temporary situation. In the long term, dedicated hardware that doesn't include all this costly rasterization hardware should beat it for AI calculations. Nvidia's goal seems to be to push RT really hard for gaming, but I see a risk of Nvidia ending up in no-mans-land where their GPU's are not good enough for AI farms, but have way too much RT-hardware. But it will then be hard for them to walk back this choice.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
Nvidia might setting themselves up for failure as they currently earn a lot of money from that hardware by selling to AI-farms, but I see it as a temporary situation. In the long term, dedicated hardware that doesn't include all this costly rasterization hardware should beat it for AI calculations. Nvidia's goal seems to be to push RT really hard for gaming, but I see a risk of Nvidia ending up in no-mans-land where their GPU's are not good enough for AI farms, but have way too much RT-hardware. But it will then be hard for them to walk back this choice.
Eh? I thought those who do the real AI work run them in their server lineup of GPUs which don't have raster or RT capabilities. Pro-vis and rendering applications use their Quadro lineup, which shares the same dies as the Geforce consumer graphics lineup but with different drivers. Or am I mistaken?
 

Aapje

Golden Member
Mar 21, 2022
1,311
1,772
106
Eh? I thought those who do the real AI work run them in their server lineup of GPUs which don't have raster or RT capabilities. Pro-vis and rendering applications use their Quadro lineup, which shares the same dies as the Geforce consumer graphics lineup but with different drivers. Or am I mistaken?

As far as I can tell, their data center cards (formerly known as Tesla) use the same chips as the GPU's. For example, compare these specs:


Note the identical number of Shading Units on both (which are INT32/FP32 cores and thus rasterization cores, which is not what you need for AI).

In general, you need to keep in mind that these companies like to pretend that they make many more chips then they actually do. But they aren't any different just because they are marketed differently.
 

Kepler_L2

Senior member
Sep 6, 2020
308
977
106
As far as I can tell, their data center cards (formerly known as Tesla) use the same chips as the GPU's. For example, compare these specs:


Note the identical number of Shading Units on both (which are INT32/FP32 cores and thus rasterization cores, which is not what you need for AI).

In general, you need to keep in mind that these companies like to pretend that they make many more chips then they actually do. But they aren't any different just because they are marketed differently.
AD102 and GH100 are not the same chip at all.
 

Saylick

Diamond Member
Sep 10, 2012
3,084
6,184
136
You seem to be correct: https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/

However, my actual argument still stands, as the thing has a lot of FP32 cores. To build a farm for ChatGPT or the like, those seem useless and you'd want a ton of matrix multipliers (which are called RT cores by Nvidia).
Yeah, but these server GPUs still have to serve the wider HPC community as well as the AI market, hence why they aren't purely just tensor cores. Speaking of, it's the tensor core that is their matrix multiply unit, not the RT unit.

A pure AISC dedicated to AI workloads looks more like Google's TPU or even Tenstorrent's chip, but they aren't available to the broader market. Because Nvidia has to serve a ton of different customers, they can't quite make a chip that only does one thing really well. In the meantime, they've been slowly adjusting the dial with each GPU generation to proportion the execution units based on market trajectory. If more and more customers use lower precision math, they'll simply skew the next gen architecture thusly.
 

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
I highly doubt that will happen. Before that happens we will rather see "console" thin clients for game streaming where everything is rendered on servers and the "console" only does the upscaling and interpolation anymore. Would be a perfect match considering how both game streaming and frame interpolation hit input latency hard.

We've tried that (several times over the years) and it never catches on with the masses. Google just recently shut down Stadia because most gamers don't want to add network latency into their games when it's single player. The graphics aren't going to be as good either because their will be compression to cut down on bandwidth. They aren't going to send out 120 frames either because of the bandwidth costs.

The idea that you don't actually own any of your games or that access is lost when the service shuts down or they have issues on their side is also unappealing to many. I think NVidia gets around this by having steam integration (I haven't really looked into their service at all so I may be wrong) but even with elevated prices of cards it's not going to take too long before I could afford a used previous generation card that would give better looking and more responsive performance than renting time on a GPU from them.

Then there's the logistics for the company providing the service. How much hardware do they buy and what do they do with it when it's not being used for games? How do they pay the developers for the games when their aren't any sales? Ask music artists how much they like Spotify and that model. Even though both Sony and Microsoft have online subscription models that give limited time access to titles on a rotating basis, they tend to use a lot of first or second party titles and haven't replaced the traditional model. Digital copies are enough to kill the used market, but people still buy physical copies in many cases.
 

Aapje

Golden Member
Mar 21, 2022
1,311
1,772
106
A pure AISC dedicated to AI workloads looks more like Google's TPU or even Tenstorrent's chip, but they aren't available to the broader market. Because Nvidia has to serve a ton of different customers, they can't quite make a chip that only does one thing really well.

The availability will improve over time. Tenstorrent is only just now releasing their product, for example.

So they might lose a lot of that specific market. Tenstorrent is going to sell PCIe cards, that apparently can just be dropped into a Linux workstation. And if you get an AI-farm, why wouldn't you get a dedicated system for much less money?