Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 170 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,630
5,938
146

Glo.

Diamond Member
Apr 25, 2015
5,707
4,552
136
No one is saying get rid of APU. AMDs current APUs are powerful enough for that role, and their continued evolution will keep them serving it.

The pushback is merely against the wishful thinking that seems to have an expectation of a sudden giant GPU in the APU. This is simply unrealistic.




Again, no one is against APUs, just the realism of the Big GPU, APU wishful thinking.

Steamdeck is actually a great counter example against the Big GPU APU meme. Steam deck has a custom part, so Valve could have ordered any size GPU in their APU that they wanted, but the Steamdeck has only half the GPU size vs AMDs standard 6800U gpu.

That's a dedicated handheld game machine, and they chose only half the GPU of AMDs standard APU.

Given that it seems unlikely that there is much OEM demand for standard APU to have a much bigger GPU section.
Strix point with 24 CU is NOT a big APU.

Its completely, and utterly standard die. Just like Rembrand, Phoenix Point are.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,227
5,228
136
Strix point with 24 CU is NOT a big APU.

Its completely, and utterly standard die. Just like Rembrand, Phoenix Point are.

That's a sudden doubling of GPU size, so it is a Much bigger GPU.

Where is any evidence that it will actually be 24 CUs, or is this just more wishful thinking?

Given AMD current GPU BW, 24 CUs (between Navi 24 and Navi 23) would need over 200 GB/S of BW, where would that come from? Note that is AFTER already having an infinity cache, so you can't pretend a cache would eliminate that need.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
4,227
5,228
136
Redgamingtech posted it, so it's basically money in the bank. That guy is never off on AMD stuff.

Sure people swear the same about MLID. It's all BS clickbait to me.

Did mister "never off" explain where the BW was coming from? It's on the same socket, so same DDR5 memory, which has about ~100GB/s of BW.

Edit: I looked up the Rumor. He's talking about 9 TFlops GPU, that is the same as a RX 6600.

RX 6600 has 32 MB of "Infinity Cache" to compensate for lower memory BW, AND 224 GB/s of Memory BW. Best case it seems the fictional part will have less than half the required Memory BW.
 
Last edited:
  • Haha
Reactions: maddie

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
Widening memory controller buses and/or adding dedicated on-chip caches (neither of which scale well with die shrinks by the way) for performance which may not be valued by buyers is stupid. Law Firms and Consultants are not. going. to. pay. more. for the next Thinkpad Carbon just because the chip has a big iGPU + 64MB SLC to feed said GPU.
That applies as long as the gpu is only there for games and professional visualisation. If we start using gpu compute more then it starts to make more sense to have big gpu's. If Nvidia had an x86 license you know that's what they would be doing - adding it to normal cpu's and then helping and incentivizing key software to use the gpu compute. They aren't because they lack the x86 license, but that doesn't mean AMD shouldn't do that, in fact there's a chance that Intel go down that route now they are back making serious gpu's. If AMD had any sense they'd be pro-actively going for this market, not waiting to get beaten to the punch then reacting late.
 
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Sure people swear the same about MLID. It's all BS clickbait to me.

Did mister "never off" explain where the BW was coming from? It's on the same socket, so same DDR5 memory, which has about ~100GB/s of BW.

Edit: I looked up the Rumor. He's talking about 9 TFlops GPU, that is the same as a RX 6600.

RX 6600 has 32 MB of "Infinity Cache" to compensate for lower memory BW, AND 224 GB/s of Memory BW. Best case it seems the fictional part will have less than half the required Memory BW.

Hopes and dreams, that's where the bandwidth would come from.

Still waiting for Mr "never off's" N33 & N31 btw:
1674661947491.png
1674662076251.png
 

leoneazzurro

Senior member
Jul 26, 2016
928
1,453
136
No one is saying get rid of APU. AMDs current APUs are powerful enough for that role, and their continued evolution will keep them serving it.

The pushback is merely against the wishful thinking that seems to have an expectation of a sudden giant GPU in the APU. This is simply unrealistic.

It depends on what you think a "big APU" is. The iGPU in Rembrandt is the equivalent of a midrange dGPU of some years ago, and more powerful than all the PS4 generation consoles but in terms of actual area it is not huge for sure. Of course you will not see 400+ mm^2 APU dies in a market that covers from low end to high end laptops (which are the primary target). But with 200 to 300mm^2 at disposal and new processes being available (and with MCM as the future), who knows if cannot really see what you could consider today a "big" APU in a matter of a couple of years.


Steamdeck is actually a great counter example against the Big GPU APU meme. Steam deck has a custom part, so Valve could have ordered any size GPU in their APU that they wanted, but the Steamdeck has only half the GPU size vs AMDs standard 6800U gpu.

That's a dedicated handheld game machine, and they chose only half the GPU of AMDs standard APU.

Given that it seems unlikely that there is much OEM demand for standard APU to have a much bigger GPU section.

That was because Valve gave more importance to battery life and cost than pure performance, also because they control the software layer. But there are other handhelds (even smaller than the Deck) in commerce already using a 6800U and soon with a 7040U class APU, for example. As there are several XSFF PCs (X here stays for Extra) used as multimedia and gaming stations, which can benefit in having a beefier iGPU compartment. This does not mean that we will immediately see huge dies with IGPU measuring 200+ mm^2 alone (well, will have in the HPC market, see MI300). But, with the proper balance between CPU and IGPU side, there are clear advantages for such solutions. I.e. in the highend corporate market, we saw often solutions based on only APU or with a CPU and a quite small dGPU (MX450/550 class). A Rembrandt/Phoenix already would kill all dGPU solutions of that class by offering similar performance at a lower cost (or higher margin for the OEM) and even in the mainstream notebook segment smaller dGPUs could be substituted by slightly beefier APUs than we see today. Main problem here is more availability than technical/cost issues.
 
Last edited:
  • Like
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
4,227
5,228
136
It depends on what you think a "big APU" is.

I've been pretty clear on that. Reasonable expectation is a continuation of current evolution. GPU section continues to evolve performance along with the evolution of memory BW on the standard socket.

It's unreasonable to expect that the GPU section will suddenly double while memory BW is stagnant. That would just be a waste of silicon bottlenecked by too weak memory BW.
 

leoneazzurro

Senior member
Jul 26, 2016
928
1,453
136
I've been pretty clear on that. Reasonable expectation is a continuation of current evolution. GPU section continues to evolve performance along with the evolution of memory BW on the standard socket.

It's unreasonable to expect that the GPU section will suddenly double while memory BW is stagnant. That would just be a waste of silicon bottlenecked by too weak memory BW.

There are technical solutions for improving effective bandwidth (cache stacking, new memory standards). Today a new LPDDR5 speed (LPDDR5-T where T stays for Turbo) was launched by Hynix, with effective transfer rate up to 9,600 Gbps. Yes there are costs related, it must be seen how and on what these costs can be justified, but there could be eve more segmentation in the future, based on the iGPU performance.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,227
5,228
136
There are technical solutions for improving effective bandwidth (cache stacking, new memory standards). Today a new LPDDR5 speed (LPDDR5-T where T stays for Turbo) was launched by Hynix, with effective transfer rate up to 9,600 Gbps. Yes there are costs related, it must be seen how and on what these costs can be justified, but there could be eve more segmentation in the future, based on the iGPU performance.

See above with the Strix Point/RX 6000 example. You need more than Double high speed DDR5 BW, and that's already after you have a large infinity cache mitigating lower memory speed.
 

leoneazzurro

Senior member
Jul 26, 2016
928
1,453
136
See above with the Strix Point/RX 6000 example. You need more than Double high speed DDR5 BW, and that's already after you have a large infinity cache mitigating lower memory speed.

You need double the BW if you need to double the performance in all departments, and that only if the BW is the only limiting factor. Maybe the target is different. And the new LPDDR5-T standard is already giving +50% BW in comparison to today's LPDDR5-6400. Also, if Strix Point is on N3, the area used on a 24CU/12WGP iGPU on RDNA3+ may be not be much different than the area used on Rembrandt, or Phoenix, iGPUs. Not saying that we will really see a 12 WGP (after the usual hype debacle, I will believe only in what it will be effectively delivered), but the possibility exists.
 
  • Like
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
4,227
5,228
136
You need double the BW if you need to double the performance in all departments, and that only if the BW is the only limiting factor. Maybe the target is different. And the new LPDDR5-T standard is already giving +50% BW in comparison to today's LPDDR5-6400. Also, if Strix Point is on N3, the area used on a 24CU/12WGP iGPU on RDNA3+ may be not be much different than the area used on Rembrandt, or Phoenix, iGPUs. Not saying that we will really see a 12 WGP (after the usual hype debacle, I will believe only in what it will be effectively delivered), but the possibility exists.

Why would you think area will stay constant, when wafers are getting more expensive at each process shrink?
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
You need double the BW if you need to double the performance in all departments, and that only if the BW is the only limiting factor. Maybe the target is different. And the new LPDDR5 standard is already giving +50% BW in comparison to today's LPDDR5-6400. Also, if Strix Point is on N3, the area used on a 24CU/12WGP iGPU on RDNA3+ may be not be much different than the area used on Rembrandt, or Phoenix, iGPUs.

Except even 128bit LDDDR5-9600 is only gonna be ~150GB/s in bandwidth. For reference, 28CU RDNA3 on 6nm (7600m) is listed as 128bit GDDR6 with 256 GB/s, and that also needs a 32mb IC on top. To actually use 24CU RDNA3 at the sort of clock speeds that N3 should permit would require a 32mb IC at the very, very least. Caches don't shrink well with node process so a decent IC to mitigate low memory bandwidth will take a substantial proportion of the precious die space within such an APU by themselves.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
You mean so long as we are talking about reality. So yeah it applies.
The reality is any modern pc is capable of some level of gpu compute, and the fact that it's not being used by a lot of software is due to lack of standardisation, and someone big willing to champion it. It's a great potential market for AMD as they have the x86 cpu's and the knowledge to make integrated gpu's that could do compute well. Like most things the market moves fast, if you don't take advantage someone else will.
 
  • Like
Reactions: Tlh97

Justinus

Diamond Member
Oct 10, 2005
3,174
1,516
136
Except even 128bit LDDDR5-9600 is only gonna be ~150GB/s in bandwidth. For reference, 28CU RDNA3 on 6nm (7600m) is listed as 128bit GDDR6 with 256 GB/s, and that also needs a 32mb IC on top. To actually use 24CU RDNA3 at the sort of clock speeds that N3 should permit would require a 32mb IC at the very, very least. Caches don't shrink well with node process so a decent IC to mitigate low memory bandwidth will take a substantial proportion of the precious die space within such an APU by themselves.

Just gonna mention here that my 6900HS with 4x32bit LPDDR5 6400 only actually measures ~50GB/s in AIDA64

Theoretical bandwidth is a useless figure for CPU's and always has been.
 
  • Like
Reactions: Tlh97

leoneazzurro

Senior member
Jul 26, 2016
928
1,453
136
Except even 128bit LDDDR5-9600 is only gonna be ~150GB/s in bandwidth. For reference, 28CU RDNA3 on 6nm (7600m) is listed as 128bit GDDR6 with 256 GB/s, and that also needs a 32mb IC on top. To actually use 24CU RDNA3 at the sort of clock speeds that N3 should permit would require a 32mb IC at the very, very least. Caches don't shrink well with node process so a decent IC to mitigate low memory bandwidth will take a substantial proportion of the precious die space within such an APU by themselves.

And who told you that the target for Strix Point is N33 (which measures a bit more than 200mm^2 alone)? I spoke about low end dGPUs (GTX1650/2050/6400 class). Having a greater number of WGPs can also mean they can be clocked lower, getting a lower power consumption as well. Also, the CU can help doing some GPU compute if available.
 
  • Like
Reactions: Tlh97

Kronos1996

Junior Member
Dec 28, 2022
15
17
41
The Steam Deck 2 would be a prime candidate for an 8 core CPU/24 CU GPU (responding to comments I read about it the past few pages, have not had much free time).

When unplugged, the GPU can simply run at lower clocks to save power. When plugged in, clocks can boost to allow higher resolutions and refresh rates. One of the few complaints about the Steam Deck was that it doesn’t scale performance when plugged in.

Just a thought.



You are overpricing by several thousand dollars. N6 was significantly cheaper than N5 as of 6 months ago. A customer like AMD would pay somewhere around $8,000-$9,000 for N7 (note this was $7,000-$8,000 in 2019) based on the numbers I have seen. TSMC made N6 cheaper because of less machine time involved which leads to higher volume. The early numbers I heard for N6 were around $4,000-$5,000, but that was before supply chains blew up. The real (post supply chain issues) number is likely somewhere between $5,000-$7,000. TSMC really wants everyone to transition from N7 to N6 because they can output more wafers per month, which leads to more revenue.

With the economy struggling, those prices will possibly even drop a bit.

Note that most of the numbers I have referenced above came from various leaks in 2018/2019 and a few from last year. I don’t have access to a price sheet or anything, but the sources that provided the numbers were reliable ones.
I used $6,000 per 6nm wafer for my estimate so I guess that was pretty damn close. Not that it was super important for the point I was trying to make but it helps.
 

simplyspeculating

Junior Member
May 16, 2022
6
6
41
Instead of varying or heavily cutting(like in 6600u vs 6800u)the igpu CU count on a chip to differentiate products would it not make more sense now to just have a very small L3 cache for base models and use cache chiplets for the premium models?
2-4mb L3 for the office pc apu and 64mb chiplet +2-4mb L3 for the premium model for example. This would allow for much bigger igpu's while still keeping the chips destined for office pc's and notebooks cheap.

On another note the proportional area taken up by the cpu logic side of the igpu is sure to decline thanks to amdahls law(8 big cpu cores + 4 energy efficient cores will probably be the maximum for even a premium apu for some time) so it only makes sense that ever larger portions of the silicon budget will be taken up by the igpu, more cpu isnt really needed and cache just doesn't scale well enough. I guess im in the big igpu camp in this argument, if not for zen5 then surely for some of the successors.
 
Last edited: