• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Discussion RDNA4 + CDNA3 Architectures Thread

Page 464 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DisEnchantment

Golden Member
1655034287489.png
1655034259690.png

1655034485504.png

With the GFX940 patches in full swing since first week of March, it is looking like MI300 is not far in the distant future!
Usually AMD takes around 3Qs to get the support in LLVM and amdgpu. Lately, since RDNA2 the window they push to add support for new devices is much reduced to prevent leaks.
But looking at the flurry of code in LLVM, it is a lot of commits. Maybe because US Govt is starting to prepare the SW environment for El Capitan (Maybe to avoid slow bring up situation like Frontier for example)

See here for the GFX940 specific commits
Or Phoronix

There is a lot more if you know whom to follow in LLVM review chains (before getting merged to github), but I am not going to link AMD employees.

I am starting to think MI300 will launch around the same time like Hopper probably only a couple of months later!
Although I believe Hopper had problems not having a host CPU capable of doing PCIe 5 in the very near future therefore it might have gotten pushed back a bit until SPR and Genoa arrives later in 2022.
If PVC slips again I believe MI300 could launch before it :grimacing:

This is nuts, MI100/200/300 cadence is impressive.

1655034362046.png

Previous thread on CDNA2 and RDNA3 here

 
Last edited:
Is there a toggle to switch to lower res than 4k?
9070XT isn't a 4k GPU and in such GPU heavy situations differences are minimal ...
You can switch down resolutions within games. Or use upscaling (FSR3/4) which essentially renders at lower resolution and then upscales to target 4k, gives performance boost while maintaining fedility.
 
You can switch down resolutions within games. Or use upscaling (FSR3/4) which essentially renders at lower resolution and then upscales to target 4k, gives performance boost while maintaining fedility.

By comparing averages with native res and up-scaling, it turns out bigger gains are on 4k than lower resolutions/fsr.
 
FSR4 gonna be aviable on C2077
2ee782c1c6f68c23c0c8bd8567d7fa8f.jpg


So that why Azor was meeting with CDproject team
 
Yeah, it's a big discount to previous 32GB workstation Radeons.
I guess AMD is finally trying harder to get people to adopt their GPUs for AI workloads.

And it's not like putting another 16GB GDDR6 on the back of the PCB is costing them too much. It's only ~$40 at the moment, so with a $500 premium over the regular 9070XT they're getting more than a 1000% markup on the memory upgrade.
 
This is actually pretty decent for AI hobbyists.
I am curious, how does it fair in AI against the 7900XTX, with its 24GB of memory? Obviously the 7900XTX is RDNA3 and not a pro card, and has a bit less memory, but it is a bit cheaper.
 
I am curious, how does it fair in AI against the 7900XTX, with its 24GB of memory? Obviously the 7900XTX is RDNA3 and not a pro card, and has a bit less memory, but it is a bit cheaper.

1752897185079.png
1752897214049.png

Would be roughly 25% slower in brute speed unless significant bottlenecks have been removed in the RDNA4 drivers.
 
Would be roughly 25% slower in brute speed unless significant bottlenecks have been removed in the RDNA4 drivers.
FP16 is usually the training territory.
On the inference side of things, it may be faster than XTX since RDNA4 now has dedicated hardware for computing FP8 and 4 times the throughput for INT4/8.
The majority of out-of-the-box inferencing models are served in INT8 or INT4 quantizations to reduce VRAM cost and faster tokens/sec.
 

Attachments

  • 1752899212003.png
    1752899212003.png
    151.5 KB · Views: 18

View attachment 127345
View attachment 127346

Would be roughly 25% slower in brute speed unless significant bottlenecks have been removed in the RDNA4 drivers.
Compute is not that important in inference. Memory bandwidth is king and the 7900XTX has nearly 50% more membw over the 9700. So the 7900XTX should be a fair bit faster.
 
Huh so the 7900XTX is actually faster with AI? Interesting. How would it compare to a 3090 or 4090?
 
Compute is not that important in inference. Memory bandwidth is king and the 7900XTX has nearly 50% more membw over the 9700. So the 7900XTX should be a fair bit faster.
Indeed. R9700 will be a mix of perf stats. Upside of newer compute accelerators, alongside being capped by memory bandwidth. Gonna be interesting to see initial 3rd party numbers after embargo.
 
Last edited:
Back
Top