Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 170 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146

Panino Manino

Senior member
Jan 28, 2017
869
1,119
136
Area not transistors matter & tell me about RDNA3+, I don't know about the specs.

Performance increase over RDNA3 didn't meet expectations.
On a Deck the GPU will not be clocked close to 3GHz anyway.
The new "dual issue" uses a lot of transistors.
Really, it's better keep with RDNA2 for extra CUs than would be possible with RDNA3.
But again, this is what "I" would do.
But seeing how conservative Valve was with just 4 Zen 2 cores, maybe...
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
Any chance that Valve keep RDNA2 for the next Deck?
IMHO RDNA3+ is a waste of transistors.
I would go with a custom 6 core Zen 3 CPU and 16-18CU RDNA2.
Really, I see no reason to chose RDNA3, instead use the extra transistor budget on a bit of IF.
Dragon Crest was supposed to be the successor to Van Gogh, no idea what became of it or if there ever was a tangible difference between the two. May well have been a case of Lucienne, Barcelo etc. de facto rebadges which in Dragon Crest's case saw no use since Steam Deck is taking all chips anyway.

For a Steam Deck hardware upgrade I'd expect the focus on efficiency so I have a hard time seeing a significant increase in screen resolution and amount of CUs. Let's assume 50% more, that would be 6 cores, 12 CUs, with a screen resolution of about 1568 x 980 px, the result would need to have the same or better battery life than the current unit. I feel something below those specs is more likely.

Maybe a more efficient shrink of Van Gogh on N4 with exactly the same specs first, and an actual new gen with a more polished RDNA4 much later.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
A 24 CU GPU is feasible if there's something that functions like infinity cache to alleviate the memory bandwidth bottleneck from using system memory instead of VRAM. I've been hoping that AMD would develop a shared last level cache between the CPU and GPU that could be utilized by either depending on the workload.

The Infinity cache reduces the need for BW only to a certain degree. The 24 CU APU of the rumor is a 9 TF part.

That is the same as RX 6600. RX 6600 has 224 MB/s of BW, and 32 MB of Infinity Cache. Without IC it would need more than that.

So even with IC, it seems like AM5 memory bandwidth would be very inadequate.

I don't necessarily believe we'll see that in the next generation of products, but it makes sense for them to head in that direction. Although not every user needs or even wants a beefy GPU, there's a segment of the market that wants a decent CPU and an entry-level GPU and AMD being able to offer that all in one package makes it less expensive for the end user and allows AMD to charge somewhere between the cost of the less capable APUs they have now and that CPU+GPU combination that people will purchase instead.

AMD could always make a niche part with more capable GPU, but they only do it on request. I don't see what changes that stance. I just see incremental evolution along the lines they have been doing all along.

The move to EUV is supposed to reduce the number of mask layers, which should reduce the upfront cost of having a separate product. Theoretically, having this beefier APU means that AMD can cut a lesser APU product down even more which decreases their cost per chip. All they'd be doing is realizing that a single chip can't span the entire market, in much the same way that there are several GPU dies that address different market segments.

EUV reduced the number of layers at 7nm, but once they started shrinking below that, the required layers go up again.

Very excellent Semi Analysis about 3nm published about a month ago.

Particularly challenging at 3nm is that SRAM (Cache) is didn't scale at all from 5nm. Zero SRAM scaling and increased process costs makes adding a big Cache challenging(AKA Expensive).

I thinking the chances of different GPU size APUs are very low. That is precisely where dGPUs come in.
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
The Infinity cache reduces the need for BW only to a certain degree. The 24 CU APU of the rumor is a 9 TF part.

That is the same as RX 6600. RX 6600 has 224 MB/s of BW, and 32 MB of Infinity Cache. Without IC it would need more than that.

So even with IC, it seems like AM5 memory bandwidth would be very inadequate.



AMD could always make a niche part with more capable GPU, but they only do it on request. I don't see what changes that stance. I just see incremental evolution along the lines they have been doing all along.



EUV reduced the number of layers at 7nm, but once they started shrinking below that, the required layers go up again.

Very excellent Semi Analysis about 3nm published about a month ago.

Particularly challenging at 3nm is that SRAM (Cache) is didn't scale at all from 5nm. Zero SRAM scaling and increased process costs makes adding a big Cache challenging(AKA Expensive).

I thinking the chances of different GPU size APUs are very low. That is precisely where dGPUs come in.
Logic scaling still works. The answer seems to be no more monolithic dies to keep costs from rising. Even on 6nm you can reduce costs by doubling the amount of cache for a given cost relative to a 6nm monolithic by doing what AMD did with the Zen 3D V-cache chiplet by using optimized libraries, So no, complete pessimism is not yet justified and per Twain, rumors of Moore's Law death appear to be greatly exaggerated .
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Logic scaling still works. The answer seems to be no more monolithic dies to keep costs from rising. Even on 6nm you can reduce costs by doubling the amount of cache for a given cost relative to a 6nm monolithic by doing what AMD did with the Zen 3D V-cache chiplet by using optimized libraries, So no, complete pessimism is not yet justified and per Twain, rumors of Moore's Law death appear to be greatly exaggerated .

If Moore's law (the cost side) isn't dead its in an ICU with a breathing machine stuck to it. Yes logic scaling may still reduce costs there (not nearly as much as it did in the past), but SRAM and I/O basically stops scaling and there's only so much of it that can be offloaded to other dies. Zen 4 CCDs still have lots of L3 on the CCD themselves, to say nothing of L0/L1/L2 caches which can't be disaggregated from CPU/GPU logic.

Offloading things into chiplets also brings appreciable tradeoffs in packaging cost and power (a consequence of having data run around longer paths off-chip), so it's not itself a panacea to the problem.
 
  • Like
Reactions: Heartbreaker

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,037
136
Any chance that Valve keep RDNA2 for the next Deck?
IMHO RDNA3+ is a waste of transistors.
I would go with a custom 6 core Zen 3 CPU and 16-18CU RDNA2.
Really, I see no reason to chose RDNA3, instead use the extra transistor budget on a bit of IF.
Performance increase over RDNA3 didn't meet expectations.
On a Deck the GPU will not be clocked close to 3GHz anyway.
The new "dual issue" uses a lot of transistors.
Really, it's better keep with RDNA2 for extra CUs than would be possible with RDNA3.
But again, this is what "I" would do.
But seeing how conservative Valve was with just 4 Zen 2 cores, maybe...
Number of transistors is not that important, what's important is actual size on the same process.

RDNA3 CU is supposedly a bit smaller than RDNA2 CU on the same process according to SkyJuice from Angstronomics.
As a result of this focus, PPA is significantly increased. In fact, at the same node, an RDNA 3 WGP is slightly smaller in area than an RDNA 2 WGP, despite packing double the ALUs.

Then this is from Locuza. If N5 provides 70% better scaling, then without It, It would still be 50% denser.
Fjza5VjWACMxCpy.jpg
There is no reason to use RDNA2 when RDNA3(+) has comparable size on the same node and is still faster.

Steam Deck has only 15W power budget for SoC and that SoC is 4C8T Zen2; 1-1.6GHz 8CU RDNA2 IGP using N7 process.
16-18CU RDNA2(3) is too much for N4 or N5 process and 15W.
Phoenix is using N4 instead of N7 and still kept 12CU IGP, only boost increased by 25%.
They can either keep 8CU and significantly increase frequency to 2-2.4GHz(+50%) depending on V/F curve or keep frequency but increase CU to 12(+50%).
Phoenix reviews will show us how high It can clock at limited TDP with 12CU.
 
Last edited:

leoneazzurro

Golden Member
Jul 26, 2016
1,052
1,716
136
Navi 33 will be smaller than Navi 23 and will be using N6 rather than N7. AMD i likely going to be getting N6 dies at lower price than equivalent N7 dies.

So, 2 areas of cost saving.

And it offers better features such as improved video decoder/encoder, and improved RT (for what it's worth in this class of GPUs). Transistor density improved almost 40% going from N23 to N33.
 
  • Like
Reactions: Tlh97 and Joe NYC

Kronos1996

Junior Member
Dec 28, 2022
15
17
41
That's just the Desktop part with a different name, made for high power laptops with dGPU.

Note that as Desktop part, it has a MUCH smaller GPU (only 2 CU) than the real laptop parts.

As always, there is strong pressure to make everything a small as possible.




Those aren't my guesses. It's Semi Analysis detailed work vs your guesses.

Of course they will move on to new processes. But that doesn't mean they are going to pay to do a large increase in transistors, when transistor costs are flat. Your faulty assumption is that they were getting a big increase in transistor budget for free, which they aren't.
Yes I’m aware, the point is that MCM packaging is ready for mobile applications. In fact Intel’s entire product stack will be MCM starting with Meteor Lake and I’m sure AMD will follow suit. Modular APU’s with different CPU/GPU/Cache configurations will be the norm very shortly.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
Yes I’m aware, the point is that MCM packaging is ready for mobile applications. In fact Intel’s entire product stack will be MCM starting with Meteor Lake and I’m sure AMD will follow suit. Modular APU’s with different CPU/GPU/Cache configurations will be the norm very shortly.

A desktop part used in a high power laptop, doesn't indicated MCM is ready for mobile. They have been putting high power desktop chips in laptops, for as long as their have been laptops.

Monolithic is still more power efficient, and is preferred option where mobile efficiency is needed, which is the bulk of the mainstream mobile market. Which is why Phoenix will still be monolithic. No indication on Strix Point yet.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Yes I’m aware, the point is that MCM packaging is ready for mobile applications. In fact Intel’s entire product stack will be MCM starting with Meteor Lake and I’m sure AMD will follow suit. Modular APU’s with different CPU/GPU/Cache configurations will be the norm very shortly.
For AMD, the Powerful iGPU+ powerful CPU combo will be monolithic. What will be on chiplets is Cache+memory controllers.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Number of transistors is not that important, what's important is actual size on the same process.

RDNA3 CU is supposedly a bit smaller than RDNA2 CU on the same process according to SkyJuice from Angstronomics.


Then this is from Locuza. If N5 provides 70% better scaling, then without It, It would still be 50% denser.
View attachment 75410
There is no reason to use RDNA2 when RDNA3(+) has comparable size on the same node and is still faster.

Steam Deck has only 15W power budget for SoC and that SoC is 4C8T Zen2; 1-1.6GHz 8CU RDNA2 IGP using N7 process.
16-18CU RDNA2(3) is too much for N4 or N5 process and 15W.
Phoenix is using N4 instead of N7 and still kept 12CU IGP, only boost increased by 25%.
They can either keep 8CU and significantly increase frequency to 2-2.4GHz(+50%) depending on V/F curve or keep frequency but increase CU to 12(+50%).
Phoenix reviews will show us how high It can clock at limited TDP with 12CU.
For Steam Deck 2 and 15W TDP you'll have completely standard Strix Point, smallest die.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
LPDDR5T is 64 bit per memory chip, 9600 MHz memory standard. So two of these and you 153 GB/s, and 4 of these with 256 bit bus - 306 GB/s.

That would be plenty enough for feeding even more than 24 CUs.

N24 has 1024 ALUs, 16 MB Infinity Cache, 64 bit bus, and 144 GB/s of memory bandwdith in highest possible SKU.
Strix POint is rumored to have 1536 ALUs, 32 MB L4 cache, 128 bit DDR5 memory controller most likely with 6400 MHz clock for 102 GB/s.

Strix Point will have 50% more ALUs, 100% more IC, and around 29% less memory bandwidth with more unlimited VRAM capacity.

I think Strix Point will be fine, even with 6400 MHz and only 102 GB of memory bandwdith will be fast enough to deliver 6000-6500 pts in 3DMark Time Spy.

And 6000 pts is RTX 2060-RX 5600 XT mobile levels of performance.
 
  • Like
Reactions: Tlh97 and Joe NYC

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
LPDDR5T is 64 bit per memory chip, 9600 MHz memory standard. So two of these and you 153 GB/s, and 4 of these with 256 bit bus - 306 GB/s.

There is no 256 bus AMD design for APUs. They are meant to run on standard 128 bit AM5 socket, with similar pinout when soldered into a laptop.

~150 GB/s is theoretical possible on 128 bit designs, if they support LPDDR5T. Do they even support LPDDR5X yet?
 

MrTeal

Diamond Member
Dec 7, 2003
3,614
1,816
136
I'm not sure I'd say 153GB/s is plenty for a 24CU GPU and the CPU, but it's certainly better. It would definitely help if Strix Point was used in something like a Steamdeck or console. For mobile, it's harder to say. Most big gaming laptops ship with DDR5 SODIMMs instead of LPDDR5. You see LPDDR5 in high margin ultralights like the Carbon, but sizing RAM is probably a bit of a nightmare if you're talking a shared memory system with soldered RAM.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
There is no 256 bus AMD design for APUs. They are meant to run on standard 128 bit AM5 socket, with similar pinout when soldered into a laptop.

~150 GB/s is theoretical possible on 128 bit designs, if they support LPDDR5T. Do they even support LPDDR5X yet?
What makes you think they are meant to run in standard AM5 boards?

What if they will come soldered to MoBos with soldered RAM, or with CAMM sockets?
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
What makes you think they are meant to run in standard AM5 boards?

What if they will come soldered to MoBos with soldered RAM, or with CAMM sockets?

Same reason, ALL their APUs in history fit their current socket.

Because it makes business sense, instead of being based on wishful thinking.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Same reason, ALL their APUs in history fit their current socket.

Because it makes business sense, instead of being based on wishful thinking.
Its not wishful thinking, its necessary.

Intel large SOCs are going to be soldered as well - They are mobile, BGA only parts. Those SOCs will not land on DIY platforms.

So now, let me ask you a question. Knowing that DIY is dying platform, and its not financially feasible to maintain low-end products, OEMs want different solutions, and next generation memory will require changing the tech, and well, soldering the RAM what will happen with desktop, DIY PCs, hmmm?

Why Intel and AMD are focusing on development of Mini-PCs that have external PCIe connection, like Compute Element from Intel, or full Mini-PC with external PCIe port that connects to docking station which then connects to a dGPU?

DIY is dying and it will become only the highest end, of highest end solutions. Thats why I have said that 90% of market in very close future are going to be APUs/SOCs.

And not being bound by DIY platform limitations will allow companies, like Intel or AMD innovate on Unified Memory architecture front.

Apple is their biggest competition, and OEMs want to compete with Apple. And Apple sells only Unified Memory Architecture solutions.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
Its not wishful thinking, its necessary.

Necessary, to fulfill your wish for a big integrated GPU. :rolleyes:

AMDs competition is Intel, and there is no sign Intel is even going to catch Rembrandt GPU performance with it's iGPU so no real pressure on AMD to increase it's iGPU.

When AMD needs to grow it's iGPU, then a half step would make more sense, so a 16-18 CU part.

24 CU is just the typical clickbait to excite people and get clicks. Rumors said Phoenix would be 24 CU as well. It's more exciting so it gets more clicks.

We have an APU thread where it's probably best continue those discussions there:
 
Last edited:
  • Like
Reactions: insertcarehere

MrTeal

Diamond Member
Dec 7, 2003
3,614
1,816
136
This is getting really far afield and off topic for this thread. Phoenix and Strix Point were being discussed because they are RDNA3(+) chips, and lately specifically the rumors that SP would have a 24CU iGPU. Unannounced possible future APUs with >2 channel memory architectures and powerful GPUs to compete with the Apple M chips might be an interesting thread, but it doesn't have anything to do with RDNA3.
 

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
The Infinity cache reduces the need for BW only to a certain degree. The 24 CU APU of the rumor is a 9 TF part.

Sure that seems like a lot, but eventually it won't be. It's roughly 50% more than top-end Polaris GPUs which are almost 7 years old at this point or essentially what a 3050 will get you today. That probably represents a good performance tier to aim for at the entry level. Considering even just the MSRP of a 3050, it makes a tempting pie to take a bite of.

Particularly challenging at 3nm is that SRAM (Cache) is didn't scale at all from 5nm. Zero SRAM scaling and increased process costs makes adding a big Cache challenging(AKA Expensive).

All the more reason to makes a shared last level cache that both the CPU and GPU cores can utilize. Having two separate dies that both need some cache just means extra silicon for each.

Maybe some company does need to work with AMD for them to want to make product like this, but as the market changes and evolves I think we'll see stronger APUs that creep upwards in capabilities to capture the eroding low-end of the GPU market.

Perhaps consider it as an entry-level GPU with a CPU attached rather than the other way around. Given how expensive even low-end GPUs have become, there's money to be had in that market segment given it's already so price sensitive.
 
  • Like
Reactions: Tlh97