Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mopetar

Diamond Member
Jan 31, 2011
6,743
3,813
136
(typing this on my Radeon VII-equipped machine for extra lulz)
I'm surprised anyone has one of those. They were okay for a semi-professional card on release, but they turned out to be so good at mining you could easily sell it for tripled what you would have paid and essentially trade up to a 6900XT or 3080 for free.

I suppose if you mined with it yourself you could have recouped the cost, and even turned a profit.
 

NTMBK

Diamond Member
Nov 14, 2011
9,776
3,787
136
L3 was indeed smaller for desktop/mobile APUs, however L2 was the same (512 KB per core) for all segments with Zen 3/Zen 2.

https://en.wikichip.org/wiki/amd/microarchitectures/zen_3
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
Sure, but an extra 4-8MB of cache is a pretty big addition to die area. Seems like an obvious thing to omit on a more cost sensitive consumer oriented part- you could use the die area for e.g. extra GPU compute units instead.
 

ahimsa42

Member
Jul 16, 2016
170
153
116
i think the big question is what is phoenix going to cost? from what i have read it will be considered a premium product so will likely come at a premium price. assuming it will be possible to get a 6800U laptop for around $800 by december, it may be wiser for many of us to go with RMB & skip the 7000 series APU's- which may not be out for another 6-12 months after that and at a significantly higher cost.
 
  • Like
Reactions: SpudLobby

mikk

Diamond Member
May 15, 2012
3,514
1,269
136
i think the big question is what is phoenix going to cost? from what i have read it will be considered a premium product so will likely come at a premium price. assuming it will be possible to get a 6800U laptop for around $800 by december, it may be wiser for many of us to go with RMB & skip the 7000 series APU's- which may not be out for another 6-12 months after that and at a significantly higher cost.

There won't be much below 1000€ since Rembrandt prices are 200-300€ up compared to the previous Cezanne generation because of DDR5 and other reasons. Even for Cezanne you can hardly find 5800U devices for around 800 USD. Rembrandt is not as cheap and sure Phoenix apparently will increase the prices over Rembrandt significantly. Note that Rembrandt-U is not yet available and this can repeat with Phoenix-U, volume is a big issue for AMD.
 
  • Like
Reactions: Tlh97 and ahimsa42

LightningZ71

Golden Member
Mar 10, 2017
1,273
1,292
136
I can’t see a world where a full specs phoenix ever goes for under $1000. In that regard, there are 3050ti systems out there now for sub $800 that will beat it in every non-memory-size-limited situation for gaming.
 
  • Like
Reactions: Tlh97 and ahimsa42

simplyspeculating

Junior Member
May 16, 2022
4
6
36
Its a real shame prices are going up from a combination of inflation and increasing design/wafer/process costs because the Zen5/8000 series strix point APU's have the capability to finally overcome bandwith limitations and push up the base level of computing and graphics processing power in the standard laptop and prebuilt/office desktop while being cheaper from a power and cost perspective than an equivalent cpu/dgpu combo.

DDR5 giving 2x+ bandwidth is great for Rembrandt and phoenix but the 16-24 rdna3 CU's of phoenix are most likely going to be bandwidth starved, even with higher ddr5 clocks and better compression algorithms. And while phoenix will be very powerful for an APU, it wont displace much more than the x500 and x050 class cards in its own generation at best.

But if speculation about zen5 at this early date is accurate strix point will have several new design features that could herald an end to the entry level and lower-mid level discrete gpu, at least for most use cases. This is all just my speculation below and I'd like to know what people think

-stacked cache will be much more common or even standard on zen 5, and if it is standard then adding it to strix point will result in a small cost increase, iirc the cache on the 5800x3d costs about $20 to add, thats including wafer costs for the cache and packaging.

-given that zen 5 will most likely be on 3nm, and the 64mb cache chiplets on 7nm are already tiny its likely the standard cache chiplet size will be increased to probably 128mb or even 256mb, it might even be fabricated on 5nm but I'm not sure if its possible yet to have logic and cache fabricated on completely different nodes(they are already using differently optimized libraries)

-this would already provide a huge bandwith boost with an L3 accessible by both cpu and gpu, and if AMD succeeds in creating a unified L3 cache that would magnify the benefits

-Assuming a 128mb cache chiplet, it would be possible to get rid of all on die L3 cache, normally that would cause heat transfer problems with whatever is below the cache chiplet, but if the rumored hybrid cpu architecture is implemented in strix point the cache could be layered on top of the efficiency zen4d cores which aren't expected to clock high anyway. This removal of L3 + the node jump would allow for logic optimized denser libraries and room for a lot more graphics CU's that arent bandwith starved(especially at 1080p which is going to be standard for laptops for a while) while still keeping a small overall chip size.

My wild guess is that we will see a 8 Zen 5 +4Zen 4d cpu core configuration, with 40 RDNA4(or rdna3 or 3+) CU's, and with 128mb of shared or unified L3 cache stacked on top and no L3 on the chip itself. Fabbed on 3nm given what we know about the process so far such a chip would be smaller than 200mm^2 and could cost less than $300 to produce meaning it would be viable for sub-$1000 ultrabooks and general laptops from 10-45w depending on the form factor, with higher watt G versions on the desktop. The reason such a chip would be desirable for AMD to manufacture is that costs for masks, design and validation of chips is skyrocketing with each new node and such a powerful APU would mean that AMD would only have to produce at most one monolithic GPU SKU(the x700 and/or x800 chip, aka the next gen navi33 equivalent) along with their new multi-gpu designs while still having competitive products up and down the product stack. It would also reuse cache chiplets and packaging that will most likely be in use in other product lines.

Is this just wishful thinking on my part or are we about to see the age of the super APU dawn?
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,221
4,556
116
DDR5 giving 2x+ bandwidth is great for Rembrandt and phoenix but the 16-24 rdna3 CU's of phoenix are most likely going to be bandwidth starved, even with higher ddr5 clocks and better compression algorithms. And while phoenix will be very powerful for an APU, it wont displace much more than the x500 and x050 class cards in its own generation at best.
Fore cost reasons I wouldn't expect any APUs upcming to do any more than that for the time being. An iGPU to overtake the -60 tier would be rather large physically speaking, it it wouldn't be desirable to ship one of those on every single laptop APU. Chiplets will have to play a part in making them cheaper, but that's assuming we see the iGPU disaggregated from I/O or other components. (Not saying it will or won't - I don't know - I'm just saying this would need to be done is all).

But if speculation about zen5 at this early date is accurate strix point will have several new design features that could herald an end to the entry level and lower-mid level discrete gpu, at least for most use cases. This is all just my speculation below and I'd like to know what people think
As I mentionned before, many of the old rumours are worthless now.

-stacked cache will be much more common or even standard on zen 5
Okay tbf this part I actually agree with wrt server and desktop. For mobile though... I'd rather wait and see still.

and the 64mb cache chiplets on 7nm are already tiny its likely the standard cache chiplet size will be increased to probably 128mb or even 256mb, it might even be fabricated on 5nm but I'm not sure if its possible yet to have logic and cache fabricated on completely different nodes(they are already using differently optimized libraries)
Cache chiplets on newer nodes doesn't make sense. It's cheaper to have them on N7 based nodes due to poor SRAM scaling on more modern ones.

It is possible to have them fabricated on different nodes.

-this would already provide a huge bandwith boost with an L3 accessible by both cpu and gpu, and if AMD succeeds in creating a unified L3 cache that would magnify the benefits
Well, we'll see about that.

-Assuming a 128mb cache chiplet
Okay, stop, no. That's way too large for a mobile focused product. Too cost prohibitive to be usable. 128MB doesn't make sense unless targetting 1440p/4K, which no APU is going to do. You're already well in diminishing returns territory by 64MB, much less 128MB.

1652777855900.png

I could maybe see a 64MB cache chiplet happening, but 32MB may be more in line.
 

DrMrLordX

Lifer
Apr 27, 2000
19,382
8,171
136
I'm surprised anyone has one of those. They were okay for a semi-professional card on release, but they turned out to be so good at mining you could easily sell it for tripled what you would have paid and essentially trade up to a 6900XT or 3080 for free.

I suppose if you mined with it yourself you could have recouped the cost, and even turned a profit.
I was tempted, but it's my only dGPU in the house right now (that works) and it's in a water loop that I don't feel like draining and refilling with a new block and card just to take advantage of the ridiculous price inflation on Radeon VII.

(Radeon VII is still selling for $855 or more on eBay, and that's for a card without a block)
 
  • Like
Reactions: Tlh97 and Mopetar

leoneazzurro

Senior member
Jul 26, 2016
622
893
136
APUs at the moment are focused on 720P. maximum 1080p gaming (with limitations). So 16-32Mb would be more than enough in this regard, for the time being.
 

simplyspeculating

Junior Member
May 16, 2022
4
6
36
Chiplets will have to play a part in making them cheaper, but that's assuming we see the iGPU disaggregated from I/O or other components. (Not saying it will or won't - I don't know - I'm just saying this would need to be done is all).
The problem with chiplets is their power draw is higher, particularly idle power draw. Its only a few watts extra and doesn't matter too much for desktop applications but it is not really viable yet for mobile. In a way, a cache chiplet similarly frees up space to a regular chiplet design but without the need for the extra power usage of an IO die and infinity fabric.






Cache chiplets on newer nodes doesn't make sense. It's cheaper to have them on N7 based nodes due to poor SRAM scaling on more modern ones.

It is possible to have them fabricated on different nodes.






Okay, stop, no. That's way too large for a mobile focused product. Too cost prohibitive to be usable. 128MB doesn't make sense unless targetting 1440p/4K, which no APU is going to do. You're already well in diminishing returns territory by 64MB, much less 128MB.

View attachment 61600

I could maybe see a 64MB cache chiplet happening, but 32MB may be more in line.
Your picture is for only infinity cache hit rates, not the needs of both CPU and GPU and 32mb would be too small for a cache that would need to feed a cpu and large igpu. At that point you would need an on die L3 and the packaging costs for the stacked cache would be much higher than the chiplet die costs and it wouldnt make sense, a 64mb cache fabbed cheaply on a 7nm class node would make good sense though given that they're already in production and TSMC/AMD already have experience with packaging those on to the 5800X3D.

With a 64mb cache my speculated design is still possible, just less performant and cheaper. I think it really comes down to which cache chiplet AMD will decide on for Zen5 because it seems logical they would standardize around one design and one cache chiplet size(maybe also for their GPU's as well). Regardless, I think this is definitely where were headed in the future though, HBM was always way too expensive for cheap mobile products and no one in their right mind is going to put a huge bus on a mobile apu, the apple M series is probably as big as its going to get in the near future at 128 bits(32x4), and that is overkill if you arent designing your own custom silicon. One way or another the bandwith limitation has to be overcome for APU's to live up to the potential and hype.
 
Last edited:
  • Like
Reactions: Tlh97 and Vattila

uzzi38

Platinum Member
Oct 16, 2019
2,221
4,556
116
The problem with chiplets is their power draw is higher, particularly idle power draw. Its only a few watts extra and doesn't matter too much for desktop applications but it is not really viable yet for mobile. In a way, a cache chiplet similarly frees up space to a regular chiplet design but without the need for the extra power usage of an IO die and infinity fabric.
Don't take VMR/MTS as examples for what the extra power draw would look like. Combined with better packaging techniques and better power gating on the actual I/O die itself the extra power consumption - whilst noticable - shouldn't be anything like the ~18W we see on desktop platforms and certainly not the ~90W on EPYC.

Your picture is for only infinity cache hit rates, not the needs of both CPU and GPU and 32mb would be too small for a cache that would need to feed a cpu and large igpu. At that point you would need an on die L3 and the packaging costs for the stacked cache would be much higher than the chiplet die costs and it wouldnt make sense, a 64mb cache fabbed cheaply on a 7nm class node would make good sense though given that they're already in production and TSMC/AMD already have experience with packaging those on to the 5800X3D.

With a 64mb cache my speculated design is still possible, just less performant and cheaper. I think it really comes down to which cache chiplet AMD will decide on for Zen5 because it seems logical they would standardize around one design and one cache chiplet size(maybe also for their GPU's as well). Regardless, I think this is definitely where were headed in the future though, HBM was always way too expensive for cheap mobile products and no one in their right mind is going to put a huge bus on a mobile apu, the apple M series is probably as big as its going to get in the near future at 128 bits(32x4), and that is overkill if you arent designing your own custom silicon. One way or another the bandwith limitation has to be overcome for APU's to live up to the potential and hype.
AMD seem quite happy to go with 16MB for CPU L3 in mobile so... yeah, 32MB total is still enough probably. The 6500XT does fine with 16MB. That die has many problems, but surprisingly memory bandwidth is not one of them
 
  • Like
Reactions: Tlh97 and Vattila

TESKATLIPOKA

Senior member
May 1, 2020
582
613
106
AMD seem quite happy to go with 16MB for CPU L3 in mobile so... yeah, 32MB total is still enough probably. The 6500XT does fine with 16MB. That die has many problems, but surprisingly memory bandwidth is not one of them
I wouldn't be so sure.
RX 6400 has 3.6TFLOPs, 128GB/s GDDR6 + 16MB IF.
RX 6500XT has 5.8TFLOPs, 144GB/s GDDR6 + 16MB IF.
It has 61% more TFLOPs and 12.5% more BW, yet It performs only 28% better(Link) in Full HD.
 
  • Like
Reactions: Tlh97

simplyspeculating

Junior Member
May 16, 2022
4
6
36
Not to mention that the phoenix APU will have at least as many, and up to 50% more CU's than the 6500xt which, like the replies above pointed out, is bandwidth constrained even with 4GB of gddr6 and 16mb cache exclusively used by the navi 24 die, and strix point will surely have significantly more CU's than phoenix. I really think 64mb of cache is the bare minimum to not be severely bandwidth constrained for APU's after phoenix especially when you have to take a CPU into account.
 
  • Like
Reactions: Tlh97

tomatosummit

Member
Mar 21, 2019
155
141
86
Not to mention that the phoenix APU will have at least as many, and up to 50% more CU's than the 6500xt which, like the replies above pointed out, is bandwidth constrained even with 4GB of gddr6 and 16mb cache exclusively used by the navi 24 die, and strix point will surely have significantly more CU's than phoenix. I really think 64mb of cache is the bare minimum to not be severely bandwidth constrained for APU's after phoenix especially when you have to take a CPU into account.
Do you mean a stacked cache chip?
Because I was thinking about how ddr5 really doesn't have the legs and considered that the cache chips navi31/32 might be a possibility. They're the 64MB of cache and a gddr6 controller in one. If it's going to be a premium product then they may as well give a compelling reason for the price tag it's going to have. Low end or low power skus can just not include the chiplet.
 

jamescox

Senior member
Nov 11, 2009
536
902
136
Do you mean a stacked cache chip?
Because I was thinking about how ddr5 really doesn't have the legs and considered that the cache chips navi31/32 might be a possibility. They're the 64MB of cache and a gddr6 controller in one. If it's going to be a premium product then they may as well give a compelling reason for the price tag it's going to have. Low end or low power skus can just not include the chiplet.
I wonder if it would be plausible to have some cache on the IO die and stack a gpu chiplet on top. If the IO die is made on a process closer to what is used for the current v-cache die, then 64 MB would be quite small.
 

jamescox

Senior member
Nov 11, 2009
536
902
136
Sure, but an extra 4-8MB of cache is a pretty big addition to die area. Seems like an obvious thing to omit on a more cost sensitive consumer oriented part- you could use the die area for e.g. extra GPU compute units instead.
The L2 size hasn’t varied before. It may be different in Zen 4c though, so they could start having more than one variant. If there is extra cache on the IO die, then having a smaller cache on the cpu die would still perform quite well.
 
  • Like
Reactions: Tlh97

jamescox

Senior member
Nov 11, 2009
536
902
136
The problem with chiplets is their power draw is higher, particularly idle power draw. Its only a few watts extra and doesn't matter too much for desktop applications but it is not really viable yet for mobile. In a way, a cache chiplet similarly frees up space to a regular chiplet design but without the need for the extra power usage of an IO die and infinity fabric.








Your picture is for only infinity cache hit rates, not the needs of both CPU and GPU and 32mb would be too small for a cache that would need to feed a cpu and large igpu. At that point you would need an on die L3 and the packaging costs for the stacked cache would be much higher than the chiplet die costs and it wouldnt make sense, a 64mb cache fabbed cheaply on a 7nm class node would make good sense though given that they're already in production and TSMC/AMD already have experience with packaging those on to the 5800X3D.

With a 64mb cache my speculated design is still possible, just less performant and cheaper. I think it really comes down to which cache chiplet AMD will decide on for Zen5 because it seems logical they would standardize around one design and one cache chiplet size(maybe also for their GPU's as well). Regardless, I think this is definitely where were headed in the future though, HBM was always way too expensive for cheap mobile products and no one in their right mind is going to put a huge bus on a mobile apu, the apple M series is probably as big as its going to get in the near future at 128 bits(32x4), and that is overkill if you arent designing your own custom silicon. One way or another the bandwith limitation has to be overcome for APU's to live up to the potential and hype.
HBM was expensive due needing a silicon interposer under the whole chip and the HBM chips. That isn’t needed with the new stacking tech that just uses a small bridge die. HBM will likely be used more widely by switching to much more economical EFB or other stacking technology. EFB could also be used to connect an IO die and a CPU chiplet. This would provide significantly lower power since it would be an HBM-like connection; possibly thousands of bits wide at low clock. Due to the power savings and presumably slightly higher cost, bringing this tech to mobile first makes sense. Next gen GPUs are looking like they may use at least two gpu die connected together with EFB plus one HBM stack for each gpu gpu component, also connected with EFB or similar.
 

jamescox

Senior member
Nov 11, 2009
536
902
136
I'm inclined to think that CU nomenclature will simply cease to exist with RDNA3 in favor of only counting WGP's, much as CCX's have with Zen3 in favour of CCD's.

Simplification of PR always seems the better road to travel, even if it introduces some short term confusion as it is likely to for RDNA2 -> RDNA3.

It's confusing enough as it is when you have ALU's, CU's, WGP's, SIMD's and Shader Groups to account for, to say nothing of TDP, mm2, the number of transistors and who knows what else I'm probably missing

🤣
The CCX/CCD thing is just due to Zen3 only having 1 CCX per CCD. That may not be the case with upcoming products, so CCX may not be going away. Zen 4c may have two 8-core CCX per CCD; not sure what the current rumor is. It does mostly make sense that all cores on one die are part of the same cluster. The naming of such internal bits isn’t really a marketing thing. Most customers have no idea what this stuff refers to and it has generally never been comparable across different companies, so the raw TFLOP numbers get used instead.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,188
3,047
136
Its a real shame prices are going up from a combination of inflation and increasing design/wafer/process costs because the Zen5/8000 series strix point APU's have the capability to finally overcome bandwith limitations and push up the base level of computing and graphics processing power in the standard laptop and prebuilt/office desktop while being cheaper from a power and cost perspective than an equivalent cpu/dgpu combo.

Is this just wishful thinking on my part or are we about to see the age of the super APU dawn?
We're going to see super APUs, but don't expect them to be cheap. Actually even "APUs" are much more expensive than chipset IGPs of old - they truly added only $5, while "APUs" you have to go up a tier to get them, so you are really paying $50+.

If they are going to be x50 dGPU class, then expect them to be priced like one. Actually if they have advantages like lower power and smaller form factor without the performance loss, I wouldn't be surprised to see them even higher priced.

Why would you sell a product that's superior in every category but price them like they are not?
 

simplyspeculating

Junior Member
May 16, 2022
4
6
36
As long as there is excess wafer supply and lowering the profit margin would be recouped or better with higher sales the price will be lower. As long as an APU is cheaper to produce than its equivalent discrete cpu and gpu counterparts(which is often the case) it will be sold on average for less.

I expect wafer supply will still be somewhat constrained when phoenix is released but by the strix point release supply could be better and we might even have a glut depending on fab capacity growth and economic conditions.
 

Glo.

Diamond Member
Apr 25, 2015
4,979
3,592
136
We're going to see super APUs, but don't expect them to be cheap. Actually even "APUs" are much more expensive than chipset IGPs of old - they truly added only $5, while "APUs" you have to go up a tier to get them, so you are really paying $50+.

If they are going to be x50 dGPU class, then expect them to be priced like one. Actually if they have advantages like lower power and smaller form factor without the performance loss, I wouldn't be surprised to see them even higher priced.

Why would you sell a product that's superior in every category but price them like they are not?
Phoenix can cost around 50$, per die, on 5 nm node.

Its not cheap, but it will be much, much cheaper than two separate designs, with one to which you have to add PCB, VRM, VRAM chips, and ship it.

You combine both, you move the PCB, VRAM, VRM, shipping costs onto the RAM and Mobo costs, and you saved a lot of money.

Secondly, its still a CPU, and has to be priced like a CPU. The Graphics accelerator supposedly is what you would get for "free".

I think AMD's lineup will consist of three different SKU tiers, based on two different dies, on AM5.

RMB will be used for entry level products, even up to Core i5, like Core i5 6500G, 6600G..
Phoenix without 3dVcache - a tier above it like Core i7 6700G.
Phoenix with Vcache - tier above it, potentially coming sildered to MoBo and with DDR5 memory soldered into the Motherboard, with a steep premium, but offering efficiency, and small form factor, like Ryzen 7 6800GX3D.

Thats my guess, how AMD can play the competitive game.
 
  • Like
Reactions: Tlh97

TESKATLIPOKA

Senior member
May 1, 2020
582
613
106
Phoenix can cost around 50$, per die, on 5 nm node.
....
A bit low in my opinion.
If a 5nm wafer costs 17000$ then a 225mm2 chip will cost 66$ per chip including the faulty ones. Link
For the chip to cost ~50$ It would need to be only 169mm2 and that's a bit low for Phoenix in my opinion.
Still pretty cheap to produce, and I expect some form of external cache to be present.
 

ASK THE COMMUNITY