Question Speculation: RDNA3 + CDNA2 Architectures Thread

Page 171 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,653
146

moinmoin

Diamond Member
Jun 1, 2017
5,240
8,454
136
Steamdeck is actually a great counter example against the Big GPU APU meme. Steam deck has a custom part, so Valve could have ordered any size GPU in their APU that they wanted, but the Steamdeck has only half the GPU size vs AMDs standard 6800U gpu.

That's a dedicated handheld game machine, and they chose only half the GPU of AMDs standard APU.
It also has only half the amount of CPU cores. I don't think Van Gogh as used in Steam Deck fits into this discussion at all.

Steam Deck is actually a pretty well balanced system, rather portable, and the screen resolution/GPU/CPU performance ratio is comparable to current gen consoles at 4K which should ensure many games running well on consoles to run reasonably well on Steam Deck as well (potential compatibility issues aside).
 
  • Like
Reactions: Tlh97

Heartbreaker

Diamond Member
Apr 3, 2006
5,062
6,636
136
It also has only half the amount of CPU cores. I don't think Van Gogh as used in Steam Deck fits into this discussion at all.

Steam deck definitely comes into when answering someone that argues we need bigger GPU for hand held consoles, and Steam Deck ships with about half the generic APU for laptops.
 

insertcarehere

Senior member
Jan 17, 2013
712
701
136
And who told you that the target for Strix Point is N33 (which measures a bit more than 200mm^2 alone)? I spoke about low end dGPUs (GTX1650/2050/6400 class). Having a greater number of WGPs can also mean they can be clocked lower, getting a lower power consumption as well. Also, the CU can help doing some GPU compute if available.
That implies AMD will go the "wide and slow" approach and trade off die area for power efficiency, or, in another words, the chips comparatively more expensive to make for a similar level of performance.

Its not impossible per se (Apple arguably does this with their SoCs), but AMD is not Apple and they don't have a track record of taking such an approach for their integrated products. Especially when their customers are OEMs who'd just take the cheaper chip any day.

Anyway, it seems this new AMD leaker has been willing to put more ballpark performance estimates on N33 and Phoenix:
 

moinmoin

Diamond Member
Jun 1, 2017
5,240
8,454
136
Steam deck definitely comes into when answering someone that argues we need bigger GPU for hand held consoles, and Steam Deck ships with about half the generic APU for laptops.
Maybe I read @leoneazzurro's post wrong but to me it sounded more like emphasizing the importance of "reasonably powerful iGPU" which is the case with Van Gogh combined with Steam Deck's low screen resolution.

I honestly don't see the point in the whole "big APU" discussion anyway. As somebody noted earlier already Subor launched a custom big APU before, so whoever sees a big market in doing the same should be able to follow suit. As long as nobody sees a market for repeating that why should AMD itself take such a risk (especially since AMD won't create the end product and as such needs to rely on the reluctant OEMs anyway)?
 

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
So you are expecting a big transistor increase for minimal cost increase.

The 2010's would like to inform you that ship has sailed.

I put the math there, feel free to use it. It seems in your reasoning then Phoenix should not have existed at all.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
This All the Watts guy seems a copy of Greymon but even less accurate. His previous claims about N3x were called out as BS and I will be not surprised if he's comparing Mobile N33 SKUs to Desktop N22/23 cards...
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,696
3,260
136
Anyway, it seems this new AMD leaker has been willing to put more ballpark performance estimates on N33 and Phoenix:
I think with Phoenix he is wrong.
Dual issue still brings some performance improvement, although It depends on the game.
On the other hand, even If we don't know game clocks but boost is still 25% higher.
Just this should put It higher than Rembrandt.
From what he wrote It's just a bit better, but It's true that BW is a bottleneck.
 

insertcarehere

Senior member
Jan 17, 2013
712
701
136
I think with Phoenix he is wrong.
Dual issue still brings some performance improvement, although It depends on the game.
On the other hand, even If we don't know game clocks but boost is still 25% higher.
Just this should put It higher than Rembrandt.
From what he wrote It's just a bit better, but It's true that BW is a bottleneck.

To be frank I don't think we can conclude that dual issue brings performance improvement by itself. Yes the computerbase test comparing 7900XT vs 6900XT shows some improvement (9% average at 4k) but those are not equal GPUs even if you normalize CU/clocks.
- 7900XT has 800gb/s of vram bandwidth vs 512 gb/s for 6900XT, this is counteracted somewhat by the higher IC the latter has, but at 4k the 7900XT still should have more usable bandwidth.
- 7900XT has 192 ROPs vs 128 ROPs for 6900 XT, so it has 50% higher Pixel Rate even at the same clocks.

How much of the observed improvement is due to 2x FP32 as opposed to the 7900XT just being more endowed in bandwidth and GPU front end?

In a heavily BW-constrained environment with no more front end (which Phoenix is going to be), I could see gains not being that significant vs Rembrandt, even taking into account clock speeds.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
5,062
6,636
136
I put the math there, feel free to use it. It seems in your reasoning then Phoenix should not have existed at all.

You put your made up guesses there.

Semi Analysis has a massive analysis of 3nm, and the bottom line is that it's a more complex, expensive node, that will be challenging to deliver any improvement in cost/transistor:

Moreover, with a standard monolithic chip (50% Logic + 30% SRAM + 20% Analog), density only increases by 1.3x. This is effectively flat on cost per transistor for the typical monolithic chip designs, with higher development costs.

During IEDM, TSMC revealed that N3E had a bit-cell size of 0.021 μm2, precisely the same as N5. This is a devastating blow to SRAM.

N3 designs with significantly more transistors, will cost significantly more. Even worse if you attempt to add a giant "infinity cache", to compensate for poor memory BW.
 

Timorous

Golden Member
Oct 27, 2008
1,976
3,861
136
You put your made up guesses there.

Semi Analysis has a massive analysis of 3nm, and the bottom line is that it's a more complex, expensive node, that will be challenging to deliver any improvement in cost/transistor:



N3 designs with significantly more transistors, will cost significantly more. Even worse if you attempt to add a giant "infinity cache", to compensate for poor memory BW.

The cache won't be on N3 just like the cache in RDNA3 MCM is not on N5.
 
  • Like
Reactions: Tlh97 and Joe NYC

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
You put your made up guesses there.

Semi Analysis has a massive analysis of 3nm, and the bottom line is that it's a more complex, expensive node, that will be challenging to deliver any improvement in cost/transistor:



N3 designs with significantly more transistors, will cost significantly more. Even worse if you attempt to add a giant "infinity cache", to compensate for poor memory BW.

Your guesses are not much better (which version of N3 will be used? The "standard" or the "density optimized" or, like it happened already with the current product, customer oriented variants? Because even in N5 there are variants where the density varies wildly) Also we don't even know if there is an Infinity Cache but you assume that there will be. Chip stacking may play a role in reducing the costs. In any case, what you say about the costs don't change anything about the target market scenario, or do you think that magically the new CPU and dGPU will keep the existing processes forever?
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
5,062
6,636
136
Dragon Range? They can make MCM laptop chips now just fine.

That's just the Desktop part with a different name, made for high power laptops with dGPU.

Note that as Desktop part, it has a MUCH smaller GPU (only 2 CU) than the real laptop parts.

As always, there is strong pressure to make everything a small as possible.


Your guesses are not much better (which version of N3 will be used? The "standard" or the "density optimized" or, like it happened already with the current product, customer oriented variants?) Also we don't even know if there is an Infinity Cache but you assume that there will be. Chip stacking may play a role in reducing the costs. In any case, what you say about the costs don't change anything about the target market scenario, or do you think that magically the new CPU and dGPU will keep the existing processes forever?

Those aren't my guesses. It's Semi Analysis detailed work vs your guesses.

Of course they will move on to new processes. But that doesn't mean they are going to pay to do a large increase in transistors, when transistor costs are flat. Your faulty assumption is that they were getting a big increase in transistor budget for free, which they aren't.
 
Last edited:

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
That's just the Desktop part with a different name, made for high power laptops with dGPU.

Note that as Desktop part, it has a MUCH smaller GPU (only 2 CU) than the real laptop parts.

As always, there is strong pressure to make everything a small as possible.




Those aren't my guesses. It's Semi Analysis detailed work vs your guesses.

Of course they will move on to new processes. But that doesn't mean they are going to pay to do a large increase in transistors, when transistor costs are flat. Your faulty assumption is that they were getting a big increase in transistor budget for free, which they aren't.

Semi Analysis details many N3 variants and there are also many N5 variants. So area density and transistor cost must be evaluated on the final design. Without knowing this everything is a guess. Otherwise we could not have a 39% increase of transistor density going from N23 to N33 when N7 to N6 theoretical increase is 18% for the logic only.

And, I never said it was for free. Please quote me where I said that. I said that the area dedicated to the GPU was kept constant and that is likely to stay flat. Which also means that costs will go higher, but this we have already seen with N5.
 

Heartbreaker

Diamond Member
Apr 3, 2006
5,062
6,636
136
And, I never said it was for free. Please quote me where I said that. I said that the area dedicated to the GPU was kept constant and that is likely to stay flat. Which also means that costs will go higher, but this we have already seen with N5.

You listed your guess of minimal increase in price/area, and a BIG increase in transistor density, that equals a big increase in free transistor budget.

When I Pointed this out you just answered with "I put the math there, feel free to use it. ".

You can't pretend you now meant something completely different.
 

Timorous

Golden Member
Oct 27, 2008
1,976
3,861
136
We are talking about the APU for laptops which are monolithic so far.

There will come a point where it is cheaper to 3d stack or tile 2 or more smaller dies than to make 1 larger monolithic die on an advanced node. Trying to predict what N3 products will look like is not easy. Look at MI300. 3d stacked and tiled with cache under the cores/shaders.

The tech is there, just needs scaling up.
 

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
You listed your guess of minimal increase in price/area, and a BIG increase in transistor density, that equals a big increase in free transistor budget.

When I Pointed this out you just answered with "I put the math there, feel free to use it. ".

You can't pretend you now meant something completely different.

Frankly, I never said it was free and that was your assumption only. The simple fact I assumed same area on a new process with higher wafer costs means that die/GPU area cost cost will go up. All die costs will go up. About the transistor increase, yes, it could be big depending on the design choices as demonstrated by actual examples (N23 vs N33) even on a similar node. If you have comprehension issues, please don't push them on others.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
5,062
6,636
136
Frankly, I never said it was free. About the transistor increase, yes, it could be big depending on the design choices. If you have your comprehension issues, please don't push them on others.

You don't understand the implications of your own math? Based on faulty assumptions as it was, it amounted to a large increase in transistor budget at the same cost.

If you are going to say something like "I put the math there", you should understand the implications of that math.

To spell it out for you, removing your faulty assumptions, just keeping the same area will increase costs significantly, so they won't do that.

And NO, the same does not apply for Phoenix.

Note that 4nm is actually an economical node, with improved transistor economics. Unlike 3nm where Semi Analysis says: "Shrinking finally costs more, Moore's Law is now dead in economic terms"

3nm is a particularly uneconomic node.

Even given the more favorable economics of 4nm, AMD still stayed with a 12 CU design and shrunk the APU by 18% area vs the previous generation.

Given worse transistor economics at 3nm, expect an even greater shrink to contain costs.
 
  • Like
Reactions: insertcarehere

leoneazzurro

Golden Member
Jul 26, 2016
1,114
1,867
136
You don't understand the implications of your own math? Based on faulty assumptions as it was, it amounted to a large increase in transistor budget at the same cost.

If you are going to say something like "I put the math there", you should understand the implications of that math.

To spell it out for you, removing your faulty assumptions, just keeping the same area will increase costs significantly, so they won't do that.

And NO, the same does not apply for Phoenix.

Note that 4nm is actually an economical node, with improved transistor economics. Unlike 3nm where Semi Analysis says: "Shrinking finally costs more, Moore's Law is now dead in economic terms"

3nm is a particularly uneconomic node.

Even given the more favorable economics of 4nm, AMD still stayed with a 12 CU design and shrunk the APU by 18% area vs the previous generation.

Given worse transistor economics at 3nm, expect an even greater shrink to contain costs.


My talk started from the point that as there are markets where APUs with a powerful iGPU side may have sense, because an APU will have generally lower costs than a CPU+ comparable dGPU*+accessory costs (* comparable dGPU being the lower mainstream class), a Strix Point with 12WGP at similar sizes than Phoenix -which means around up to 200mm^2- could could be entirely in the realms of possibility .
You started denying it first with considerations on BW (and I pointed out the even not using IC there are today already new memory standards offering way higher bandwidth than current solutions) then started attacking the costs not even understanding that the original point (an APU of similar size of current ones costing less than a discrete GPU+separate CPU+all the PCB and accessory costs) was still valid, as considerations about density and transistor costs (which can vary greatly even on the same process but you are negating that) apply also to the discrete components solution, or even worse as there will be some part with low scaling replicated on both CPU and GPU (i.e. memory controllers). And this will be valid not only for AMD but for other players as well (Apple being APU-only in the portable market should be a hint). Not even speaking of other advantages (possibility to implement very small form factors which usually come at a very hefty price premium). It seems You don't understand -or don't want to understand- what I'm saying since the beginning, and I will stop here because at this point this is pratically trolling.
 
Last edited:

Heartbreaker

Diamond Member
Apr 3, 2006
5,062
6,636
136
You started denying it first with considerations on BW (and I pointed out the even not using IC there are today already new memory standards offering way higher bandwidth than current solutions)

A future memory standard, that even including an Infinity Cache, would still be inadequate for the previous discussed increase in the APU.

If you are going to theorycraft, it at least needs to stand up to rudimentary analysis.

Unless you can get 200 MB/s of BW, on top of a sizeable Infinity Cache, then proposed increase in the APU makes little sense, just based on being bottlenecked by BW limitations.

then started attacking the costs not even understanding that the original point (an APU of similar size of current ones costing less than a discrete GPU+separate CPU+all the PCB and accessory costs) was still valid,

It's really not valid, and never was, which is why AMD never does this. Because this is false comparison. This is not a niche part meant to compete with more powerful dGPUs. This is the generic mass market part, that must compete on a cost basis against Intel laptop parts going into the majority of laptops where buyers don't care about a more powerful GPU.

If you make a big GPU part for the majority of the market that doesn't care about having a big GPU, then you make an overpriced part that can't compete in this market.


It seems You don't understand -or don't want to understand- what I'm saying since the beginning...

More like you don't understand the full implications of what you are saying.

Most people just buy basic laptops and don't care about GPU at all. AMD's part must economically compete with Intels, so the cost must be contained. Going for a large GPU that most people don't care about, just drives up the cost, and makes it less competitive.

You make the mistake that many on forums do. Assuming what you want, is what everyone wants. Most people aren't looking for more powerful GPUs in their laptop, so this is not a case where a big APU laptop is competing against a more expensive dGPU laptop, it's a case where the more expensive big APU laptop ends up competing against a laptop with a less expensive Intel chip.

You think it's about competing against a dGPU because you want dGPU performance.

Think about it. It's pretty much always been the case that AMD could build a more powerful APU to challenge more dGPUs, but they NEVER do, and it's because big GPU APU is a niche part, not a mainstream part, and the APU needs to be a mainstream part.
 
  • Like
Reactions: insertcarehere

maddie

Diamond Member
Jul 18, 2010
5,152
5,540
136
Compared to 5nm

N3 : - 25<>30% less power - 10<>15% more performance - 1.7X logic density
N3E : - 34% less power - 18% more performance - 1.6X logic density

"Ho says that TSMC's original N3 features up to 25 EUV layers and can apply multi-patterning for some of them for additional density. By contract, N3E supports up to 19 EUV layers and only uses single-patterning EUV, which reduces complexity, but also means lower density."

This means that a N3E wafer has to cost 60% more than a 5nm wafer for equal logic transistor cost.


N3 - $20K : N5 - $16K : N7 - $10K :

For logic, cost/transistor still falling.