Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 164 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,705
6,427
146

Kepler_L2

Senior member
Sep 6, 2020
537
2,199
136
In the discussion about Strix Point, and 24 CUs.

IF the rumors are true, and if Kepler is correct about RDNA3 architecture being fixed for Strix Point - it would mean that Dual Issue is working properly, and we should expect the fabled 256 ALUs/WGP.

So if SP has 24 CUs/12 WGPs, then it also has 3072 vALUs/1536 ALUs.
That's not what's fixed in RDNA3+
 
  • Like
Reactions: Tlh97 and Glo.

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Yes, Nostradamus, please tell me how a Steam Deck that can empty its battery in 90 minutes as is benefits from having a 40-50W APU, which 24CU will need to get decent performance benefits. Or the sci-fi batteries that it would take to get a laptop-class GPU powered in a wearable of all things.
There is no need for that.

At 15W the iGPU should clock to around 2 GHz, which still would bring 6 TFLOPs of compute power.
And no, Intel is not doing the exactly the same thing given that they're breaking out the GPU into dedicated chiplets, instead of staying with a Monolithic design.

AMD evidently thought it was worth it to break out their CPUs and GPUs into small modular components (CCDs, IODs...etc) and incur additional design costs there. I struggle to see why that suddenly stops being the case with APUs.



Indeed, I think GPU chiplets are increasingly the way to go for iGPUs that aspire to be more than "boot up the computer" and "basic media acceleration", a single die with everything included will inevitably have some parts of it not be valued in a way that a more modular solution with chiplets can mitigate.
For that, I can only quote myself in post nr https://forums.anandtech.com/thread...ectures-thread.2589999/page-166#post-40935434

It is full answer and explenation.

At 3 nm process, it would cost AMD 2x the amount it would take to design single monolithic APU. Intel - is a different story on this front.

Intel is doing EXACTLY the same thing as AMD: they are building large iGPUs, for powerful, desktop class graphics performance in a tiny thermal envelope, and integrated into CPU packages. Why they execute it differently - I told you. Because they have their own fabs for CPUs. For Intel it will still be more beneficial to break the designs apart. For AMD - it wont, since they have CPU and GPU designs on TSMC process nodes.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
That's not what's fixed in RDNA3+
Very interesting. Thanks Kepler.
It wouldn't look bad as a refresh for Xbox Series S, but this would cost more to make than what's currently inside in Xbox, so I am sceptical. If Microsoft or Sony wants to make a refresh, then I don't see It happening without a price increase.

About that SLC cache, 32MB doesn't look much If It's shared. That's why I calculated with 64MB SLC.
If someone has RX 6600XT, then he could test It by downclocking the 16gbps Vram to see what happens at 1080p in some benchmark.
The only way I can see SLC being 64 MB - 3D VCache.
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.
32 MB of L4/IC for 1536 ALUs is more than Navi 23 and 33 have, for 2048 ALUs.

It should be fine(enough).

P.S. For Strix Point, since its rumored to have 24 CUs, and L4/IC cache Im willing to increase the perf. target for highest end SKU(24 Cu clocked at 3 GHz) to at least 8000 pts in 3DMark TS Graphics :).
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
I'm not sure you can say it won't be bandwidth starved with a shared 32MB L4 without knowing more details on how the cache works. Even in the absolute best case and it's 32MB dedicated IC, that's still 50-60% hit rate at 1080p,and then you're going out to the shared 90GB/s bus (at DDR5-5600) to main memory. That's a big if, and even then it's still a huge chunk of CUs with limited bandwidth. That's still 62% of the total bandwidth of the top Navi 24 part, and that's only 16 CU.

It'd be an interesting part as a 8800G or something, but it still feels like an answer in search of a problem. Something with that kind of silicon distribution feels a lot more like it'd be in a console or other custom solution, with a higher bandwidth memory interface.
I agree.

Cache size, hit rate and external memory bandwidth are all vital interrelated parts of a single solution, but it seems cache size is now the sole silver bullet solution.
 
  • Like
Reactions: Tlh97

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,037
136
At 15W the iGPU should clock to around 2 GHz, which still would bring 6 TFLOPs of compute power.
15W is for the whole SoC.
Phoenix with only 12CU needs 45W for 3GHz boost. It's for both CPU+IGP. Let's say during gaming, It is 1:2 ratio. Then 12CU IGP consumes 30W at <3GHz. Yet, you expect a 3nm 24CU IGP at 2GHz to consume only 15W.
I find It too optimistic, but Phoenix limited to 25W will tell us more.
32 MB of L4/IC for 1536 ALUs is more than Navi 23 and 33 have, for 2048 ALUs.

It should be fine(enough).

P.S. For Strix Point, since its rumored to have 24 CUs, and L4/IC cache Im willing to increase the perf. target for highest end SKU(24 Cu clocked at 3 GHz) to at least 8000 pts in 3DMark TS Graphics :).
If you clock 24CU to 3GHz sustained, then It will be exactly between 7700S and 7600M XT.
The other thing about IC is that It's used as a buffer for the GPU, so It doesn't need to move data from Vram.
32MB IC allows only 55% hitrate at 1080p. If 32MB is shared then even less.
Every time there is a miss, you have to go to a much slower system memory compared to what N33 has.
This is the reason why I wanted a bigger LLC, to increase hit rate.
 
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
639
607
136
There is no need for that.

At 15W the iGPU should clock to around 2 GHz, which still would bring 6 TFLOPs of compute power.
Phoenix APU needs 45w to clock 12CU RDNA3 at "up to" 3ghz, but apparently all it takes is a die shrink and twice the CUs can be run at one third the power - while leaving the CPU with enough to live on.

Just as a reference, the Steam deck's 8CU RDNA 2 APU runs at ~1.5ghz in 15w power envelope.

It is full answer and explenation.

At 3 nm process, it would cost AMD 2x the amount it would take to design single monolithic APU. Intel - is a different story on this front.

Intel is doing EXACTLY the same thing as AMD: they are building large iGPUs, for powerful, desktop class graphics performance in a tiny thermal envelope, and integrated into CPU packages. Why they execute it differently - I told you. Because they have their own fabs for CPUs. For Intel it will still be more beneficial to break the designs apart. For AMD - it wont, since they have CPU and GPU designs on TSMC process nodes.

Intel is absolutely not doing the same thing. They're breaking out the GPU and CPU bits out in Meteor lake to be able to pick and match parts according to specific customer requirements - which a monolithic APU cannot achieve.

Building a monolithic APU with a big iGPU means accepting that big iGPU is costing money in terms of extra silicon, buy that extra silicon is something which many OEMs simply won't pay a corresponding premium for. Dell/Lenovo will not pay more money for a laptop CPU with a big iGPU in a Thin & Light business notebook because the Big 4s/Mckinseys of the the world will not pay more money for a Thin & Light business notebook that can also game well. These sorts of considerations are probably at least some of the reason Intel chose the approach it did for MTL.
 
Last edited:

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,037
136
I agree.

Cache size, hit rate and external memory bandwidth are all vital interrelated parts of a single solution, but it seems cache size is now the sole silver bullet solution.
It's not a silver bullet solution, but probably the cheapest or easiest one to compensate low BW.
What would you do? Double bus width or use GDDR6 as a system memory?
If It was me, then I would use a single HBM stack. :)
 

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Intel is absolutely not doing the same thing. They're breaking out the GPU and CPU bits out in Meteor lake to be able to pick and match parts according to specific customer requirements - which a monolithic APU cannot achieve.

Building a monolithic APU with a big iGPU means accepting that big iGPU is costing money in terms of extra silicon, buy that extra silicon is something which many OEMs simply won't pay a corresponding premium for. Dell/Lenovo will not pay more money for a laptop CPU with a big iGPU in a Thin & Light business notebook because the Big 4s/Mckinseys of the the world will not pay more money for a Thin & Light business notebook that can also game well. These sorts of considerations are probably at least some of the reason Intel chose the approach it did for MTL.
As has been explained to you already. The ONLY reason why Intel went for tiles for their mobile MTL and ARL SOCs is because the CPU portion will be manufactured on Intel nodes, and GPU on TSMCs.

And partially, you are correct that Intel is not doing the same thing as AMD. Its the other way around, its AMD who has to compete with Intel's volume and wide availability of their products, which is why they have to build overkill products to sell. If Intel is going to release ARL-P with 384 EUs/3072 ALUs - AMD has to respond.

I agree.

Cache size, hit rate and external memory bandwidth are all vital interrelated parts of a single solution, but it seems cache size is now the sole silver bullet solution.
It isn't silver bullet. But for such small GPU as SP's - its enough to feed the ALUs.
 

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
It's not a silver bullet solution, but probably the cheapest or easiest one to compensate low BW.
What would you do? Double bus width or use GDDR6 as a system memory?
If It was me, then I would use a single HBM stack. :)

IMO, this discussion about an APU, with a big GPU, has been happening forever, and is really just pointless wishful thinking.

I'd like one too, but it isn't going to happen. These are laptop chips aimed at millions of generic laptops, there is substantial pressure to make these chips as inexpensive as possible (small) while remaining competitive.

Putting in a large GPU (this time about double the size of what standard BW can supply), is turning it into a more expensive niche part.

IMO the best we can hope for is just slow evolution we have been getting to stay within the generic memory bandwidth.

AMD would happily build a Big GPU part for anyone that wants to pay the costs (just like they do for consoles) but no one believes in this enough to commission it on the PC side, and neither does AMD.
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
As has been explained to you already. The ONLY reason why Intel went for tiles for their mobile MTL and ARL SOCs is because the CPU portion will be manufactured on Intel nodes, and GPU on TSMCs.

And partially, you are correct that Intel is not doing the same thing as AMD. Its the other way around, its AMD who has to compete with Intel's volume and wide availability of their products, which is why they have to build overkill products to sell. If Intel is going to release ARL-P with 384 EUs/3072 ALUs - AMD has to respond.


It isn't silver bullet. But for such small GPU as SP's - its enough to feed the ALUs.
Not if the low external bandwidth introduces a large penalty on misses. There must be balance, witness the 6500XT., same principle, just the bottleneck farther away from the L1
 

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
It's not a silver bullet solution, but probably the cheapest or easiest one to compensate low BW.
What would you do? Double bus width or use GDDR6 as a system memory?
If It was me, then I would use a single HBM stack. :)

As has been explained to you already. The ONLY reason why Intel went for tiles for their mobile MTL and ARL SOCs is because the CPU portion will be manufactured on Intel nodes, and GPU on TSMCs.

And partially, you are correct that Intel is not doing the same thing as AMD. Its the other way around, its AMD who has to compete with Intel's volume and wide availability of their products, which is why they have to build overkill products to sell. If Intel is going to release ARL-P with 384 EUs/3072 ALUs - AMD has to respond.


It isn't silver bullet. But for such small GPU as SP's - its enough to feed the ALUs.
Not if the low external bandwidth introduces a large penalty on misses. There must be balance, workable ratios. Witness the 6500XT., same principle, just bottlenecked farther away from the L1. Cache size alone in isolation is simplistic.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,523
3,037
136
Phoenix APU needs 45w to clock 12CU RDNA3 at "up to" 3ghz, but apparently all it takes is a die shrink and twice the CUs can be run at one third the power - while leaving the CPU with enough to live on.

Just as a reference, the Steam deck's 8CU RDNA 2 APU runs at ~1.5ghz..
I am also pretty sceptical, but Steam Deck has a 7nm APU with only 15W TDP, so 8CU clocking at 1.5GHz is actually pretty good. Doesn't mean 24CU 2Ghz 3nm IGP would consume only 15W.
Not if the low external bandwidth introduces a large penalty on misses. There must be balance, workable ratios. Witness the 6500XT., same principle, just bottlenecked farther away from the L1. Cache size alone in isolation is simplistic.
It's not like I don't know about this.
That's why I wanted someone with N23 to downclock Vram to see what happens to performance.
Then we will know exactly what to expect.

Does anybody know someone with this GPU, who would be willing to test It?
 
Last edited:

maddie

Diamond Member
Jul 18, 2010
4,881
4,951
136
I am also pretty sceptical, but Steam Deck has a 7nm APU with only 15W TDP, so 8CU clocking at 1.5GHz is actually pretty good. Doesn't mean 24CU 2Ghz 3nm IGP would consume only 15W.

It's not like I don't know about this.
That's why I wanted someone with N23 to downclock Vram to see what happens to performance.
Then we will know exactly what to expect.

Does anybody know someone with this GPU, who would be willing to test It?
That's a good test. Remember the CPU shares memory also.
 

insertcarehere

Senior member
Jan 17, 2013
639
607
136
And partially, you are correct that Intel is not doing the same thing as AMD. Its the other way around, its AMD who has to compete with Intel's volume and wide availability of their products, which is why they have to build overkill products to sell. If Intel is going to release ARL-P with 384 EUs/3072 ALUs - AMD has to respond.

Yes, so AMD should overbuild all their APUs with overkill iGPUs that the majority of its customers do not need or want...in case Intel makes an overbuilt tiled product with an overkill iGPU that the majority of its customers do not need or want. Great business strategy.

The last time Intel and AMD made an APU with a ton of hardware dedicated for iGPU was Kaby Lake-G. Notice that neither AMD nor Intel have been anxious to produce spiritual successors to that product, for good reason.
 

Kronos1996

Junior Member
Dec 28, 2022
15
17
41
If you are sure my 150mm2 design priced at $239 would be too close to Navi 33 then you can share with us how much N33 will be sold for.
My design would certainly have much better performance/price than N24. I can't tell how It would fare against N33, because I don't know Its price.


Mobile N24 is RX 6300M, RX 6450M, RX 6500M, RX 6550M and for mobile workstations Pro W6300M and Pro W6400M

There is such a huge demand for N24 in laptops(consumer + business), that I could find only 3 different laptops with Navi24.
Laptop models with 6500M: HP VICTUS, ThinkPad Z16 G1 and Bravo 15 B5E
Nothing else exist as far as I know.
This doesn't say anything positive about N24's sales.

It would be best If you provided some data to back up what you said.

Data? Every argument made here is based on estimates and numbers pulled out of our collective asses. There’s little public data to base any of this on. But since you asked:


Navi 33
  • 248 / $6000 = $24.20
  • 8 x $6 = $48
  • Silicon + Memory = $72.20

Navi 14
  • 384 / $6000 = $15.63
  • 8 x $6 = $48
  • Silicon + Memory = $63.63
A difference of $9 for something that should be ~50% faster. Similar power class so the PCB/Cooler costs are likely the same. Your ~150mm2 Navi 24 with 6GB could subtract $12 so now the difference is $21. Congratulations you just made the RX 480 for the 4th time at a higher MSRP. The IO doesn’t shrink much so it’d perform pretty similar to the 5500 XT/580/480 again, hence why I used it. You’re product is a very expensive refresh of something that already exists. The only way to get more performance was to make a slightly bigger die which is what they did for Navi 33 while making a barebones Navi 24 for budget laptops and especially business PC’s where AMD lacked an iGPU and needed something to be compettitive against Intel. Also for the business customers who need something with validated GPU drivers.

It serves a purpose, your design does not. It was never meant to be a DIY product in the first place. Navi 33 represents what a next-gen low-end GPU will be given the cost increases and scaling issues with IO. A small die is already mostly IO percentage-wise compared to larger ones. If you include AMD’s 50% margin then it’ll only cost $31-32 more then your design and should pack a meaningful performance improvement around ~50%. That’s a product worth making and even at $300 for a cut down model it would have way higher perf/$.

Business products are sold directly to customers in bulk so of course you won’t find them for sale at Microcenter.
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
Yes, so AMD should overbuild all their APUs with overkill iGPUs that the majority of its customers do not need or want...in case Intel makes an overbuilt tiled product with an overkill iGPU that the majority of its customers do not need or want. Great business strategy.

The last time Intel and AMD made an APU with a ton of hardware dedicated for iGPU was Kaby Lake-G. Notice that neither AMD nor Intel have been anxious to produce spiritual successors to that product, for good reason.
Strix Point will not be overbuilt. It will be in line with what we have had till this point: sub 200 mm2 die sizes.

Genuinely, this argument is completely and utterly ridiculous considering that leaked Driver data suggests that Intel is indeed bringing "overbuilt" iGPUs with MTL-P and ARL-P.
 

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
With the other rumors that NVidia won't have MX parts going forward, iGPUs could certainly step it up a bit more to fill in that gap. Not every consumer needs a beefy GPU, just like not everyone needs an 8-core CPU. Having such parts available does allow the market niches which demand them to be captured.
 
  • Like
Reactions: Kaluan

Heartbreaker

Diamond Member
Apr 3, 2006
4,340
5,464
136
With the other rumors that NVidia won't have MX parts going forward, iGPUs could certainly step it up a bit more to fill in that gap. Not every consumer needs a beefy GPU, just like not everyone needs an 8-core CPU. Having such parts available does allow the market niches which demand them to be captured.

More like the other way around. NVidia is killing parts that are worse than new APU like the 680m.

What OEM is going to buy a discrete GPU when when the iGPU/APU already present is of similar performance.
 

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
With the other rumors that NVidia won't have MX parts going forward, iGPUs could certainly step it up a bit more to fill in that gap. Not every consumer needs a beefy GPU, just like not everyone needs an 8-core CPU. Having such parts available does allow the market niches which demand them to be captured.

Problem is that wafer prices are why nVidia is killing MX. AMD's APUs have the same problem.
 

Mopetar

Diamond Member
Jan 31, 2011
8,114
6,770
136
Problem is that wafer prices are why nVidia is killing MX. AMD's APUs have the same problem.

APU avoids the cost of needing an extra board, VRAM, and cooling solution. It also removes any AIB profit margin from the consideration as well.

The economics aren't as bad as you might think. Sure costs are higher, but that means GPUs are more expensive as well. Having a beefier iGPU that can substitute for a discrete card for people who want to game occasionally adds a lot of value.

As long as they're not designing anything that will be horribly bottlenecked by the available memory bandwidth, it has a purpose.
 
  • Like
Reactions: Tlh97

Glo.

Diamond Member
Apr 25, 2015
5,803
4,777
136
APU avoids the cost of needing an extra board, VRAM, and cooling solution. It also removes any AIB profit margin from the consideration as well.

The economics aren't as bad as you might think. Sure costs are higher, but that means GPUs are more expensive as well. Having a beefier iGPU that can substitute for a discrete card for people who want to game occasionally adds a lot of value.

As long as they're not designing anything that will be horribly bottlenecked by the available memory bandwidth, it has a purpose.
Costs are actually lower for designing and manufacturing monolithic APU then a separate CPU and dGPU, adding everything up(boards, VRAM, general complexity of a platform). Also much more efficient.
 

jpiniero

Lifer
Oct 1, 2010
15,223
5,768
136
Costs are actually lower for designing and manufacturing monolithic APU then a separate CPU and dGPU, adding everything up(boards, VRAM, general complexity of a platform). Also much more efficient.

Not in the era where Moore's Law is dead. That's why AMD doesn't get much in the way of IGP only gaming focused OEM sales because Cezanne+3050 is cheaper.