Discussion RDNA 5 / UDNA (CDNA Next) speculation

Page 41 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

adroc_thurston

Diamond Member
Jul 2, 2023
6,357
8,963
106
That is the exact reason I didn't believe the cache rumours for RDNA2. It seemed like going with a wider bus would cost less die area to achieve similar performance goals so if AMD were willing to spend 500mm of die area then going with a wider bus and adding more shader cores would be better balanced overall than a huge chunk of MALL.
Memory speed scaling looked pretty bad when MALL was introduced.

Basically we ran out of shrinks to spam shader cores with before we ran out of bandwidth.
Oh the irony.
 
  • Haha
Reactions: GodisanAtheist

Timorous

Golden Member
Oct 27, 2008
1,973
3,854
136
Memory speed scaling looked pretty bad when MALL was introduced.

Basically we ran out of shrinks to spam shader cores with before we ran out of bandwidth.
Oh the irony.

Just compare N22 to N10. Same 40CU shader count but 192bit bus and 96MB MALL vs 256 bit bus. The ~35% die size increase led to a roughly 40% performance bump but I can't help but feel that sticking with a 256bit bus and just adding in more shaders would not have given more performance at the same ~335mm die area that N22 did use than going with the big L3 cache.

N23 also, performs like the 5070XT with half the bus due to the MALL but the MALL takes up the same area that a 128bit worth of GDDR6 PHYs would and would a 256 bit N23 have performed worse?
 

ToTTenTranz

Senior member
Feb 4, 2021
525
943
136
The ~35% die size increase led to a roughly 40% performance bump but I can't help but feel that sticking with a 256bit bus and just adding in more shaders would not have given more performance at the same ~335mm die area that N22 did use than going with the big L3 cache.

The problem with GDDR6 256bit bus is it would either be lacking with only 8GB VRAM in a >$400 card or having to pay all the way up to 16GB on a clamshell config.
 

poke01

Diamond Member
Mar 8, 2022
4,020
5,345
106
  • Like
Reactions: marees

ToTTenTranz

Senior member
Feb 4, 2021
525
943
136
One thing I don't really get.

When implemented in a SoC/APU configuration, what does an AT3 and AT4 get paired with exactly? Is it that Medusa Point IOD that already has 4xZen6 + 4xZen6c in it?
If so, doesn't that IOD already have 128bit LP5X/LP6 memory controllers and PHYs?

What happens when you pair e.g. an AT4 that also has 64bit LP5X/LP6? Is the AT4 treated as a dGPU with only 64bit LP5X (which is probably too narrow without lots of cache)?

Or do the memory controllers now work in parallel in an UMA fashion, so that both the CPU and the AT4 GPU get access to 192bit?
Is it like Apple's A5X where the GPU gets access to all the channels but the CPU can't? In that case, would the CPU only be able to access the 128bit in the IOD while the AT4 accesses its own 64bit + the IOD's 128bit?

Also, can the Medusa Point IOD pair with an AT3/4 and a 12 core Zen6 CCD at the same time?



In the chance that AT3 can be a client to the IOD's memory controllers as well as its own, it could result in a weird case where the dGPU version would have less memory bandwidth than the APU version, possibly resulting in a lower comparable performance for the dGPU.
 

Timorous

Golden Member
Oct 27, 2008
1,973
3,854
136
The problem with GDDR6 256bit bus is it would either be lacking with only 8GB VRAM in a >$400 card or having to pay all the way up to 16GB on a clamshell config.

That is for a 128 bit bus. 256bit with 2GB chips supports 16GB as standard or 32GB with clamshell.

Oh nyo it's smaller.
N33 even moreso.

251mm for N10. 237mm for N23. It is not that much of a saving.
 

Josh128

Golden Member
Oct 14, 2022
1,158
1,746
106
Lots of SoCs. Interesting that Medusa Halo and Medusa Premium have separate dies - for not exactly high volume products. Probably only makes sense (from volume perspective) if these are shared with consoles.

Medusa Halo should also have optional CCD, which would mean 3 chip CPU.

MLID said all of the CCDs are V-Cache compatible. Which would leave Medusa Premium one SKU without V-Cache capability.
It would make a lot of sense if the different GPU dies can also stand alone as discrete low end graphics cards...its possible that they somehow might. UDNA= Unified DNA, but werent we just told Medusa Point was going to use RDNA 3.5 and not UDNA? Now its all UDNA?
 
Last edited:

branch_suggestion

Senior member
Aug 4, 2023
748
1,615
106
When implemented in a SoC/APU configuration, what does an AT3 and AT4 get paired with exactly? Is it that Medusa Point IOD that already has 4xZen6 + 4xZen6c in it?
If so, doesn't that IOD already have 128bit LP5X/LP6 memory controllers and PHYs?
Presumably Medusa Premium/Halo have bespoke SoC dies. Medusa Point is not compatible with AT4 for the reasons listed.
Remember ATx nomenclature includes a new term, GMD.
Graphics Memory Die, it hosts the GPU and overall memory for the SoC.
When used as a dGPU, it is paired with an MID, Multimedia I/O Die.
When used as an APU, it is paired with a SoC die such as Magnus with AT2.
Also, can the Medusa Point IOD pair with an AT3/4 and a 12 core Zen6 CCD at the same time?
No. Wouldn't fit in FP10 for starters.
 
  • Like
Reactions: marees

ToTTenTranz

Senior member
Feb 4, 2021
525
943
136
That is for a 128 bit bus. 256bit with 2GB chips supports 16GB as standard or 32GB with clamshell.

Nvidia always put only 8GB on all their 256bit GDDR6 (non-X) Geforce models:

TU106 - RTX 2060 Super, RTX 2070
TU104 - RTX 2070, RTX 2070 Super, RTX 2080, RTX 2080 Super
GA104 - RTX 3060 Ti, RTX 3070


In this particular case, you were talking about Navi 22 which is contemporary to the RTX 3060 Ti and RTX 3070 8GB, both of which were competing directly with the RX 6700XT 12GB.

My point was that one of the points in favor of AMD going with a 192bit GDDR6 + Infinity Cache instead of 256bit GDDR6 was to get an adequate amount of VRAM without having to go all the way up to 16GB.
A 256bit Navi 22 / 6700XT like you suggested would either get 8 or 16GB of GDDR6. 8 is too short, 16 is too much, 12GB ended up being adequate on the long term.


Krackan Point can easily be cranked down to sub 10W if they chose to.
I wish some OEM would make a Krackan Point handheld. Disable 2 of the big Zen5 cores (end up with 2x Zen5 + 4x Zen5c), pair it with 128bit LPDDR5X 8000, bring its power down to 12W and it should shine compared to the Steam Deck.
This example shows a Krackan Point at 18W beating the Rembrandt Z2 Go at 40W, despite using slower memory: