Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 168 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146

Hail The Brain Slug

Diamond Member
Oct 10, 2005
3,513
2,464
136
Except even 128bit LDDDR5-9600 is only gonna be ~150GB/s in bandwidth. For reference, 28CU RDNA3 on 6nm (7600m) is listed as 128bit GDDR6 with 256 GB/s, and that also needs a 32mb IC on top. To actually use 24CU RDNA3 at the sort of clock speeds that N3 should permit would require a 32mb IC at the very, very least. Caches don't shrink well with node process so a decent IC to mitigate low memory bandwidth will take a substantial proportion of the precious die space within such an APU by themselves.

Just gonna mention here that my 6900HS with 4x32bit LPDDR5 6400 only actually measures ~50GB/s in AIDA64

Theoretical bandwidth is a useless figure for CPU's and always has been.
 
  • Like
Reactions: Tlh97

leoneazzurro

Golden Member
Jul 26, 2016
1,052
1,716
136
Except even 128bit LDDDR5-9600 is only gonna be ~150GB/s in bandwidth. For reference, 28CU RDNA3 on 6nm (7600m) is listed as 128bit GDDR6 with 256 GB/s, and that also needs a 32mb IC on top. To actually use 24CU RDNA3 at the sort of clock speeds that N3 should permit would require a 32mb IC at the very, very least. Caches don't shrink well with node process so a decent IC to mitigate low memory bandwidth will take a substantial proportion of the precious die space within such an APU by themselves.

And who told you that the target for Strix Point is N33 (which measures a bit more than 200mm^2 alone)? I spoke about low end dGPUs (GTX1650/2050/6400 class). Having a greater number of WGPs can also mean they can be clocked lower, getting a lower power consumption as well. Also, the CU can help doing some GPU compute if available.
 
  • Like
Reactions: Tlh97

Kronos1996

Junior Member
Dec 28, 2022
15
17
41
The Steam Deck 2 would be a prime candidate for an 8 core CPU/24 CU GPU (responding to comments I read about it the past few pages, have not had much free time).

When unplugged, the GPU can simply run at lower clocks to save power. When plugged in, clocks can boost to allow higher resolutions and refresh rates. One of the few complaints about the Steam Deck was that it doesn’t scale performance when plugged in.

Just a thought.



You are overpricing by several thousand dollars. N6 was significantly cheaper than N5 as of 6 months ago. A customer like AMD would pay somewhere around $8,000-$9,000 for N7 (note this was $7,000-$8,000 in 2019) based on the numbers I have seen. TSMC made N6 cheaper because of less machine time involved which leads to higher volume. The early numbers I heard for N6 were around $4,000-$5,000, but that was before supply chains blew up. The real (post supply chain issues) number is likely somewhere between $5,000-$7,000. TSMC really wants everyone to transition from N7 to N6 because they can output more wafers per month, which leads to more revenue.

With the economy struggling, those prices will possibly even drop a bit.

Note that most of the numbers I have referenced above came from various leaks in 2018/2019 and a few from last year. I don’t have access to a price sheet or anything, but the sources that provided the numbers were reliable ones.
I used $6,000 per 6nm wafer for my estimate so I guess that was pretty damn close. Not that it was super important for the point I was trying to make but it helps.
 

simplyspeculating

Junior Member
May 16, 2022
6
6
41
Instead of varying or heavily cutting(like in 6600u vs 6800u)the igpu CU count on a chip to differentiate products would it not make more sense now to just have a very small L3 cache for base models and use cache chiplets for the premium models?
2-4mb L3 for the office pc apu and 64mb chiplet +2-4mb L3 for the premium model for example. This would allow for much bigger igpu's while still keeping the chips destined for office pc's and notebooks cheap.

On another note the proportional area taken up by the cpu logic side of the igpu is sure to decline thanks to amdahls law(8 big cpu cores + 4 energy efficient cores will probably be the maximum for even a premium apu for some time) so it only makes sense that ever larger portions of the silicon budget will be taken up by the igpu, more cpu isnt really needed and cache just doesn't scale well enough. I guess im in the big igpu camp in this argument, if not for zen5 then surely for some of the successors.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
Steamdeck is actually a great counter example against the Big GPU APU meme. Steam deck has a custom part, so Valve could have ordered any size GPU in their APU that they wanted, but the Steamdeck has only half the GPU size vs AMDs standard 6800U gpu.

That's a dedicated handheld game machine, and they chose only half the GPU of AMDs standard APU.
It also has only half the amount of CPU cores. I don't think Van Gogh as used in Steam Deck fits into this discussion at all.

Steam Deck is actually a pretty well balanced system, rather portable, and the screen resolution/GPU/CPU performance ratio is comparable to current gen consoles at 4K which should ensure many games running well on consoles to run reasonably well on Steam Deck as well (potential compatibility issues aside).
 
  • Like
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
It also has only half the amount of CPU cores. I don't think Van Gogh as used in Steam Deck fits into this discussion at all.

Steam deck definitely comes into when answering someone that argues we need bigger GPU for hand held consoles, and Steam Deck ships with about half the generic APU for laptops.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
And who told you that the target for Strix Point is N33 (which measures a bit more than 200mm^2 alone)? I spoke about low end dGPUs (GTX1650/2050/6400 class). Having a greater number of WGPs can also mean they can be clocked lower, getting a lower power consumption as well. Also, the CU can help doing some GPU compute if available.
That implies AMD will go the "wide and slow" approach and trade off die area for power efficiency, or, in another words, the chips comparatively more expensive to make for a similar level of performance.

Its not impossible per se (Apple arguably does this with their SoCs), but AMD is not Apple and they don't have a track record of taking such an approach for their integrated products. Especially when their customers are OEMs who'd just take the cheaper chip any day.

Anyway, it seems this new AMD leaker has been willing to put more ballpark performance estimates on N33 and Phoenix:
 

moinmoin

Diamond Member
Jun 1, 2017
5,064
8,032
136
Steam deck definitely comes into when answering someone that argues we need bigger GPU for hand held consoles, and Steam Deck ships with about half the generic APU for laptops.
Maybe I read @leoneazzurro's post wrong but to me it sounded more like emphasizing the importance of "reasonably powerful iGPU" which is the case with Van Gogh combined with Steam Deck's low screen resolution.

I honestly don't see the point in the whole "big APU" discussion anyway. As somebody noted earlier already Subor launched a custom big APU before, so whoever sees a big market in doing the same should be able to follow suit. As long as nobody sees a market for repeating that why should AMD itself take such a risk (especially since AMD won't create the end product and as such needs to rely on the reluctant OEMs anyway)?
 

leoneazzurro

Golden Member
Jul 26, 2016
1,052
1,716
136
This All the Watts guy seems a copy of Greymon but even less accurate. His previous claims about N3x were called out as BS and I will be not surprised if he's comparing Mobile N33 SKUs to Desktop N22/23 cards...
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,037
136
Anyway, it seems this new AMD leaker has been willing to put more ballpark performance estimates on N33 and Phoenix:
I think with Phoenix he is wrong.
Dual issue still brings some performance improvement, although It depends on the game.
On the other hand, even If we don't know game clocks but boost is still 25% higher.
Just this should put It higher than Rembrandt.
From what he wrote It's just a bit better, but It's true that BW is a bottleneck.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
I think with Phoenix he is wrong.
Dual issue still brings some performance improvement, although It depends on the game.
On the other hand, even If we don't know game clocks but boost is still 25% higher.
Just this should put It higher than Rembrandt.
From what he wrote It's just a bit better, but It's true that BW is a bottleneck.

To be frank I don't think we can conclude that dual issue brings performance improvement by itself. Yes the computerbase test comparing 7900XT vs 6900XT shows some improvement (9% average at 4k) but those are not equal GPUs even if you normalize CU/clocks.
- 7900XT has 800gb/s of vram bandwidth vs 512 gb/s for 6900XT, this is counteracted somewhat by the higher IC the latter has, but at 4k the 7900XT still should have more usable bandwidth.
- 7900XT has 192 ROPs vs 128 ROPs for 6900 XT, so it has 50% higher Pixel Rate even at the same clocks.

How much of the observed improvement is due to 2x FP32 as opposed to the 7900XT just being more endowed in bandwidth and GPU front end?

In a heavily BW-constrained environment with no more front end (which Phoenix is going to be), I could see gains not being that significant vs Rembrandt, even taking into account clock speeds.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
I put the math there, feel free to use it. It seems in your reasoning then Phoenix should not have existed at all.

You put your made up guesses there.

Semi Analysis has a massive analysis of 3nm, and the bottom line is that it's a more complex, expensive node, that will be challenging to deliver any improvement in cost/transistor:

Moreover, with a standard monolithic chip (50% Logic + 30% SRAM + 20% Analog), density only increases by 1.3x. This is effectively flat on cost per transistor for the typical monolithic chip designs, with higher development costs.

During IEDM, TSMC revealed that N3E had a bit-cell size of 0.021 μm2, precisely the same as N5. This is a devastating blow to SRAM.

N3 designs with significantly more transistors, will cost significantly more. Even worse if you attempt to add a giant "infinity cache", to compensate for poor memory BW.
 

Timorous

Golden Member
Oct 27, 2008
1,748
3,240
136
You put your made up guesses there.

Semi Analysis has a massive analysis of 3nm, and the bottom line is that it's a more complex, expensive node, that will be challenging to deliver any improvement in cost/transistor:



N3 designs with significantly more transistors, will cost significantly more. Even worse if you attempt to add a giant "infinity cache", to compensate for poor memory BW.

The cache won't be on N3 just like the cache in RDNA3 MCM is not on N5.
 
  • Like
Reactions: Tlh97 and Joe NYC

leoneazzurro

Golden Member
Jul 26, 2016
1,052
1,716
136
You put your made up guesses there.

Semi Analysis has a massive analysis of 3nm, and the bottom line is that it's a more complex, expensive node, that will be challenging to deliver any improvement in cost/transistor:



N3 designs with significantly more transistors, will cost significantly more. Even worse if you attempt to add a giant "infinity cache", to compensate for poor memory BW.

Your guesses are not much better (which version of N3 will be used? The "standard" or the "density optimized" or, like it happened already with the current product, customer oriented variants? Because even in N5 there are variants where the density varies wildly) Also we don't even know if there is an Infinity Cache but you assume that there will be. Chip stacking may play a role in reducing the costs. In any case, what you say about the costs don't change anything about the target market scenario, or do you think that magically the new CPU and dGPU will keep the existing processes forever?
 
Last edited: