Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 172 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,565
5,573
146

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
A future memory standard, that even including an Infinity Cache, would still be inadequate for the previous discussed increase in the APU.

If you are going to theorycraft, it at least needs to stand up to rudimentary analysis.

Unless you can get 200 MB/s of BW, on top of a sizeable Infinity Cache, then proposed increase in the APU makes little sense, just based on being bottlenecked by BW limitations.



It's really not valid, and never was, which is why AMD never does this. Because this is false comparison. This is not a niche part meant to compete with more powerful dGPUs. This is the generic mass market part, that must compete on a cost basis against Intel laptop parts going into the majority of laptops where buyers don't care about a more powerful GPU.

If you make a big GPU part for the majority of the market that doesn't care about having a big GPU, then you make an overpriced part that can't compete in this market.




More like you don't understand the full implications of what you are saying.

Most people just buy basic laptops and don't care about GPU at all. AMD's part must economically compete with Intels, so the cost must be contained. Going for a large GPU that most people don't care about, just drives up the cost, and makes it less competitive.

You make the mistake that many on forums do. Assuming what you want, is what everyone wants. Most people aren't looking for more powerful GPUs in their laptop, so this is not a case where a big APU laptop is competing against a more expensive dGPU laptop, it's a case where the more expensive big APU laptop ends up competing against a laptop with a less expensive Intel chip.

You think it's about competing against a dGPU because you want dGPU performance.

Think about it. It's pretty much always been the case that AMD could build a more powerful APU to challenge more dGPUs, but they NEVER do, and it's because big GPU APU is a niche part, not a mainstream part, and the APU needs to be a mainstream part.

Again. You. Are. Putting. In. My. Mouth. Words. I. Did. Not. Say.
I spoke about the possibility of having 24CU on Strix Point being OK and that I anyway expect iGPU to continue to improve with time, as well as memory technologies.
Have you tried tp estimate the area of 12 WGP RDNA3 on a N3 process? It's well under 100 mm^2 even assuming quite bad scaling.
I told that there are market cases where a reasonably powerful iGPU could kill lower range dGPUs like the 450Mx, which is already possible TODAY. In future, they could kill the like of a RX6400.
You are telling me I said that I want big APUS with mybe 400mm^2 dies competing with midrange GPUs. Which I never did.
You have comprehension issues or you are a troll.
Please stop here.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
4,222
5,224
136
I spoke about the possibility of having 24CU on Strix Point being OK and that I anyway expect iGPU to continue to improve with time, as well as memory technologies.

24 CUs will double (or greater) the transistor budget over the Phoenix 12 CU GPU.

Double is a LOT, when transistor costs are flat. Don't make an area argument when this is about transistor costs. You keep trying to pretend this is an inconsequential change. When it isn't.

24 CU part is a niche part, when the APU must be a mainstream part.

You are telling me I said that I want big APUS with mybe 400mm^2 dies competing with midrange GPUs. Which I never did.

No, now you are actually putting words in my mouth.

I'm pointing out that you are defending a uneconomical niche parts like 24 CU APU, that I just quoted you defending again in this post.

This whole massive thread has been about those that just see the APU continuing to evolve incrementally to fit in AM5 memory bandwidth, and mainstream costs.

Versus those that believe there will be a sudden dramatic jump to 24 CU, ignoring the inadequate bandwidth and costs that push it out the mainstream. You have clearly landed on this side of the argument.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
What you are ignoring is that for selling new CPUs and APUs semiconductors companies must offer something incrementally more powerful and/or with more features in time. Otherwise everyone will stick to older technology.
If all that mattered was cost of the die, then Phoenix too would be an useless part, because its die will cost way more than a Rembrandt, even using same amount of CU (which anyway use more transistors than in RDNA2). And why Rembrandt should have used 12 RDNA2 CU? Vega 8 would have been OK, if everything that mattered was die cost and having only to save transistors. So even if by little, Phoenix iGPU must be more powerful than Rembrandt, it does not matter if it's not a huge lead. And the same it's true for Strix Point. It must be more powerful on the CPU side and on the GPU side. And how to do it? You have to add more: more bandwidth, more compute power. That's it. You cannot have more performance out of nothing. And to sell, new CPUs, new APUs and new GPUs must offer more performance. For justifying the purchase.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,222
5,224
136
What you are ignoring is that for selling new CPUs and APUs semiconductors companies must offer something incrementally more powerful and/or with more features in time.

Reading comprehension problems? I just said "APU continuing to evolve incrementally", in the post directly over yours (and previous to that as well). It's like you don't even read the posts you react to. Again, the two sides of this argument are those that believe in incremental evolution (like me), and those that believe doubling GPU performance in a generation(like you).

If all that mattered was cost of the die, then Phoenix too would be an useless part, because its die will cost way more than a Rembrandt

No, I already covered the different economics of Phoenix in a previous response to you here: https://forums.anandtech.com/thread...a2-architectures-thread.2589999/post-40938664

Again, it's like you just don't read...
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
Reading comprehension problems? I just said "APU continuing to evolve incrementally", in the post directly over yours (and previous to that as well). It's like you don't even read the posts you react to. Again, the two sides of this argument are those that believe in incremental evolution (like me), and those that believe doubling GPU performance in a generation(like you).



No, I already covered the different economics of Phoenix in a previous response to you here: https://forums.anandtech.com/thread...a2-architectures-thread.2589999/post-40938664

Again, it's like you just don't read...

Lol, you are the one who does not even read, when did I say that doubling the CU would lead to doubling the performance? I said only that the increment must be enough to justify the increase in cost. Again, you are a troll and continually moving the topic (in fact the post you linked where you "explored the economics of Phoenix" does not contain any analysis of Phoenix, even less compared to its predecessor, especially when you omit that N4 has high costs, like N5, so it costs way more than N6). I was really stupid for trying even to discuss with such a subject.
 
Last edited:

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Phoenix and Strix Point don't need better iGPUs than Rembrandt to be incrementally more powerful; that's covered by having Zen 4 and Zen 5 CPU cores respectively.

I sure didn't see much gnashing of teeth when Cezanne basically reused Renoir's iGPU. And nor was it particularly poor in sales because of that observation.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
Lol, just look at AT review of Cezanne to see if there was no "gnashing of teeth".
In any case, there is not only AMD, and both Intel and Apple are going to increase/have increased substantially their iGPUs. But evidently they are idiots.
 
  • Like
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
4,222
5,224
136
I don't discuss with trolls who can't read or even understand that the two sentences are perfectly compatible. Good trolling.

You have been vociferously arguing in favor of substantially larger GPU in general, and 24 CU GPU specifically for pages. Using incorrect assumptions about bandwidth, and incorrect assumptions about transistor costs.

Whenever I point out the actual facts, you whine about trolling... :rolleyes:

Correcting you, and pointing out your inconsistencies is not trolling.
 

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
You have been vociferously arguing in favor of substantially larger GPU in general, and 24 CU GPU specifically for pages. Using incorrect assumptions about bandwidth, and incorrect assumptions about transistor costs.

Whenever I point out the actual facts, you whine about trolling... :rolleyes:

Correcting you, and pointing out your inconsistencies is not trolling.

redacted[/]. I pointed out there are new memory standards TODAY which provide +50% BW than it is available to current APUs in the same form factor, and you still say it's me who was inconsistent about bandwidth. I said that a 24CU APU on N3 is technically possible within the same die size of today's APUs and you started whining about costs which I did not even started mentioning. Always moving the topic. So yes, you are a troll.



Profanity is not allowed in the tech forums.


esquared
Anandtech Forum Director
 
Last edited by a moderator:

Heartbreaker

Diamond Member
Apr 3, 2006
4,222
5,224
136
. I pointed out there are new memory standards TODAY which provide +50% BW than it is available to current APUs in the same form factor, and you still say it's me who was inconsistent about bandwidth.

No, I pointed out, that isn't enough BW for the larger GPU under discussion (24 CU).

I said that a 24CU APU on N3 is technically possible within the same die size of today's APUs and you started whining about costs which I did not even started mentioning.

Costs are what drives everything. Again, you just keep viewing facts as a personal attack, instead of learning from them.
 
Last edited by a moderator:

leoneazzurro

Senior member
Jul 26, 2016
905
1,430
136
No, I pointed out, that isn't enough BW for the larger GPU under discussion (24 CU).

No, there is not enough bandwidth for doubling the performance (today). That does not mean the performance would not increase. And, once the limiting factor would be BW only, and increase of BW of +50% (available with today's tech) would mean +50% performance vs today's APUs.

Costs are what drives everything. Again, you just keep viewing facts as a personal attack, instead of learning from them.

Costs are what drives everything? So basically there is no market for highend CPUs or GPUs, because they cost more to produce? Why does not AMD, or Intel, or Apple, produce only APUs with 4 cores and 2CU/EU/whatever if only production costs are important? Or maybe someone could pay more for a more powerful APU and someone else will not? Just taking a look at what AMD did with its naming system fpr mobile units this year says a lot about their strategy: they are keeping on the market old technology for budget/mainstream options and new tech will be available for who wants/can to pay more. Just that.
 
Last edited:

Mopetar

Diamond Member
Jan 31, 2011
7,797
5,899
136
A 24 CU GPU is feasible if there's something that functions like infinity cache to alleviate the memory bandwidth bottleneck from using system memory instead of VRAM. I've been hoping that AMD would develop a shared last level cache between the CPU and GPU that could be utilized by either depending on the workload.

I don't necessarily believe we'll see that in the next generation of products, but it makes sense for them to head in that direction. Although not every user needs or even wants a beefy GPU, there's a segment of the market that wants a decent CPU and an entry-level GPU and AMD being able to offer that all in one package makes it less expensive for the end user and allows AMD to charge somewhere between the cost of the less capable APUs they have now and that CPU+GPU combination that people will purchase instead.

The move to EUV is supposed to reduce the number of mask layers, which should reduce the upfront cost of having a separate product. Theoretically, having this beefier APU means that AMD can cut a lesser APU product down even more which decreases their cost per chip. All they'd be doing is realizing that a single chip can't span the entire market, in much the same way that there are several GPU dies that address different market segments.
 
  • Like
Reactions: Tlh97 and Joe NYC

maddie

Diamond Member
Jul 18, 2010
4,722
4,627
136
A 24 CU GPU is feasible if there's something that functions like infinity cache to alleviate the memory bandwidth bottleneck from using system memory instead of VRAM. I've been hoping that AMD would develop a shared last level cache between the CPU and GPU that could be utilized by either depending on the workload.

I don't necessarily believe we'll see that in the next generation of products, but it makes sense for them to head in that direction. Although not every user needs or even wants a beefy GPU, there's a segment of the market that wants a decent CPU and an entry-level GPU and AMD being able to offer that all in one package makes it less expensive for the end user and allows AMD to charge somewhere between the cost of the less capable APUs they have now and that CPU+GPU combination that people will purchase instead.

The move to EUV is supposed to reduce the number of mask layers, which should reduce the upfront cost of having a separate product. Theoretically, having this beefier APU means that AMD can cut a lesser APU product down even more which decreases their cost per chip. All they'd be doing is realizing that a single chip can't span the entire market, in much the same way that there are several GPU dies that address different market segments.
EUV 1st generation has already reached the limit with N3 using dual-patterning for the earliest layers & that's why a less dense N3E was developed. Nigh NA EUV is needed but will not last too long either for single patterning.
 
  • Like
Reactions: Tlh97 and Joe NYC

Panino Manino

Senior member
Jan 28, 2017
813
1,010
136
Any chance that Valve keep RDNA2 for the next Deck?
IMHO RDNA3+ is a waste of transistors.
I would go with a custom 6 core Zen 3 CPU and 16-18CU RDNA2.
Really, I see no reason to chose RDNA3, instead use the extra transistor budget on a bit of IF.
 

Panino Manino

Senior member
Jan 28, 2017
813
1,010
136
Area not transistors matter & tell me about RDNA3+, I don't know about the specs.

Performance increase over RDNA3 didn't meet expectations.
On a Deck the GPU will not be clocked close to 3GHz anyway.
The new "dual issue" uses a lot of transistors.
Really, it's better keep with RDNA2 for extra CUs than would be possible with RDNA3.
But again, this is what "I" would do.
But seeing how conservative Valve was with just 4 Zen 2 cores, maybe...
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Any chance that Valve keep RDNA2 for the next Deck?
IMHO RDNA3+ is a waste of transistors.
I would go with a custom 6 core Zen 3 CPU and 16-18CU RDNA2.
Really, I see no reason to chose RDNA3, instead use the extra transistor budget on a bit of IF.
Dragon Crest was supposed to be the successor to Van Gogh, no idea what became of it or if there ever was a tangible difference between the two. May well have been a case of Lucienne, Barcelo etc. de facto rebadges which in Dragon Crest's case saw no use since Steam Deck is taking all chips anyway.

For a Steam Deck hardware upgrade I'd expect the focus on efficiency so I have a hard time seeing a significant increase in screen resolution and amount of CUs. Let's assume 50% more, that would be 6 cores, 12 CUs, with a screen resolution of about 1568 x 980 px, the result would need to have the same or better battery life than the current unit. I feel something below those specs is more likely.

Maybe a more efficient shrink of Van Gogh on N4 with exactly the same specs first, and an actual new gen with a more polished RDNA4 much later.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,222
5,224
136
A 24 CU GPU is feasible if there's something that functions like infinity cache to alleviate the memory bandwidth bottleneck from using system memory instead of VRAM. I've been hoping that AMD would develop a shared last level cache between the CPU and GPU that could be utilized by either depending on the workload.

The Infinity cache reduces the need for BW only to a certain degree. The 24 CU APU of the rumor is a 9 TF part.

That is the same as RX 6600. RX 6600 has 224 MB/s of BW, and 32 MB of Infinity Cache. Without IC it would need more than that.

So even with IC, it seems like AM5 memory bandwidth would be very inadequate.

I don't necessarily believe we'll see that in the next generation of products, but it makes sense for them to head in that direction. Although not every user needs or even wants a beefy GPU, there's a segment of the market that wants a decent CPU and an entry-level GPU and AMD being able to offer that all in one package makes it less expensive for the end user and allows AMD to charge somewhere between the cost of the less capable APUs they have now and that CPU+GPU combination that people will purchase instead.

AMD could always make a niche part with more capable GPU, but they only do it on request. I don't see what changes that stance. I just see incremental evolution along the lines they have been doing all along.

The move to EUV is supposed to reduce the number of mask layers, which should reduce the upfront cost of having a separate product. Theoretically, having this beefier APU means that AMD can cut a lesser APU product down even more which decreases their cost per chip. All they'd be doing is realizing that a single chip can't span the entire market, in much the same way that there are several GPU dies that address different market segments.

EUV reduced the number of layers at 7nm, but once they started shrinking below that, the required layers go up again.

Very excellent Semi Analysis about 3nm published about a month ago.

Particularly challenging at 3nm is that SRAM (Cache) is didn't scale at all from 5nm. Zero SRAM scaling and increased process costs makes adding a big Cache challenging(AKA Expensive).

I thinking the chances of different GPU size APUs are very low. That is precisely where dGPUs come in.
 

maddie

Diamond Member
Jul 18, 2010
4,722
4,627
136
The Infinity cache reduces the need for BW only to a certain degree. The 24 CU APU of the rumor is a 9 TF part.

That is the same as RX 6600. RX 6600 has 224 MB/s of BW, and 32 MB of Infinity Cache. Without IC it would need more than that.

So even with IC, it seems like AM5 memory bandwidth would be very inadequate.



AMD could always make a niche part with more capable GPU, but they only do it on request. I don't see what changes that stance. I just see incremental evolution along the lines they have been doing all along.



EUV reduced the number of layers at 7nm, but once they started shrinking below that, the required layers go up again.

Very excellent Semi Analysis about 3nm published about a month ago.

Particularly challenging at 3nm is that SRAM (Cache) is didn't scale at all from 5nm. Zero SRAM scaling and increased process costs makes adding a big Cache challenging(AKA Expensive).

I thinking the chances of different GPU size APUs are very low. That is precisely where dGPUs come in.
Logic scaling still works. The answer seems to be no more monolithic dies to keep costs from rising. Even on 6nm you can reduce costs by doubling the amount of cache for a given cost relative to a 6nm monolithic by doing what AMD did with the Zen 3D V-cache chiplet by using optimized libraries, So no, complete pessimism is not yet justified and per Twain, rumors of Moore's Law death appear to be greatly exaggerated .
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Logic scaling still works. The answer seems to be no more monolithic dies to keep costs from rising. Even on 6nm you can reduce costs by doubling the amount of cache for a given cost relative to a 6nm monolithic by doing what AMD did with the Zen 3D V-cache chiplet by using optimized libraries, So no, complete pessimism is not yet justified and per Twain, rumors of Moore's Law death appear to be greatly exaggerated .

If Moore's law (the cost side) isn't dead its in an ICU with a breathing machine stuck to it. Yes logic scaling may still reduce costs there (not nearly as much as it did in the past), but SRAM and I/O basically stops scaling and there's only so much of it that can be offloaded to other dies. Zen 4 CCDs still have lots of L3 on the CCD themselves, to say nothing of L0/L1/L2 caches which can't be disaggregated from CPU/GPU logic.

Offloading things into chiplets also brings appreciable tradeoffs in packaging cost and power (a consequence of having data run around longer paths off-chip), so it's not itself a panacea to the problem.
 
  • Like
Reactions: Heartbreaker

TESKATLIPOKA

Platinum Member
May 1, 2020
2,329
2,811
106
Any chance that Valve keep RDNA2 for the next Deck?
IMHO RDNA3+ is a waste of transistors.
I would go with a custom 6 core Zen 3 CPU and 16-18CU RDNA2.
Really, I see no reason to chose RDNA3, instead use the extra transistor budget on a bit of IF.
Performance increase over RDNA3 didn't meet expectations.
On a Deck the GPU will not be clocked close to 3GHz anyway.
The new "dual issue" uses a lot of transistors.
Really, it's better keep with RDNA2 for extra CUs than would be possible with RDNA3.
But again, this is what "I" would do.
But seeing how conservative Valve was with just 4 Zen 2 cores, maybe...
Number of transistors is not that important, what's important is actual size on the same process.

RDNA3 CU is supposedly a bit smaller than RDNA2 CU on the same process according to SkyJuice from Angstronomics.
As a result of this focus, PPA is significantly increased. In fact, at the same node, an RDNA 3 WGP is slightly smaller in area than an RDNA 2 WGP, despite packing double the ALUs.

Then this is from Locuza. If N5 provides 70% better scaling, then without It, It would still be 50% denser.
Fjza5VjWACMxCpy.jpg
There is no reason to use RDNA2 when RDNA3(+) has comparable size on the same node and is still faster.

Steam Deck has only 15W power budget for SoC and that SoC is 4C8T Zen2; 1-1.6GHz 8CU RDNA2 IGP using N7 process.
16-18CU RDNA2(3) is too much for N4 or N5 process and 15W.
Phoenix is using N4 instead of N7 and still kept 12CU IGP, only boost increased by 25%.
They can either keep 8CU and significantly increase frequency to 2-2.4GHz(+50%) depending on V/F curve or keep frequency but increase CU to 12(+50%).
Phoenix reviews will show us how high It can clock at limited TDP with 12CU.
 
Last edited: