Question AMD Phoenix/Zen 4 APU Speculation and Discussion

uzzi38 · Apr 28, 2022

I can finally make this thread.

https://twitter.com/x/status/1519669375283957760

Phoenix is indeed RDNA3. My advice to everyone: treat the old APU rumours as being out of date.

Mopetar · May 16, 2022

DrMrLordX said:
(typing this on my Radeon VII-equipped machine for extra lulz)

I'm surprised anyone has one of those. They were okay for a semi-professional card on release, but they turned out to be so good at mining you could easily sell it for tripled what you would have paid and essentially trade up to a 6900XT or 3080 for free.

I suppose if you mined with it yourself you could have recouped the cost, and even turned a profit.

NTMBK · May 16, 2022

rainy said:
L3 was indeed smaller for desktop/mobile APUs, however L2 was the same (512 KB per core) for all segments with Zen 3/Zen 2.

https://en.wikichip.org/wiki/amd/microarchitectures/zen_3
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2

Sure, but an extra 4-8MB of cache is a pretty big addition to die area. Seems like an obvious thing to omit on a more cost sensitive consumer oriented part- you could use the die area for e.g. extra GPU compute units instead.

ahimsa42 · May 16, 2022

i think the big question is what is phoenix going to cost? from what i have read it will be considered a premium product so will likely come at a premium price. assuming it will be possible to get a 6800U laptop for around $800 by december, it may be wiser for many of us to go with RMB & skip the 7000 series APU's- which may not be out for another 6-12 months after that and at a significantly higher cost.

mikk · May 16, 2022

ahimsa42 said:
i think the big question is what is phoenix going to cost? from what i have read it will be considered a premium product so will likely come at a premium price. assuming it will be possible to get a 6800U laptop for around $800 by december, it may be wiser for many of us to go with RMB & skip the 7000 series APU's- which may not be out for another 6-12 months after that and at a significantly higher cost.

There won't be much below 1000€ since Rembrandt prices are 200-300€ up compared to the previous Cezanne generation because of DDR5 and other reasons. Even for Cezanne you can hardly find 5800U devices for around 800 USD. Rembrandt is not as cheap and sure Phoenix apparently will increase the prices over Rembrandt significantly. Note that Rembrandt-U is not yet available and this can repeat with Phoenix-U, volume is a big issue for AMD.

LightningZ71 · May 16, 2022

I can’t see a world where a full specs phoenix ever goes for under $1000. In that regard, there are 3050ti systems out there now for sub $800 that will beat it in every non-memory-size-limited situation for gaming.

simplyspeculating · May 16, 2022

Its a real shame prices are going up from a combination of inflation and increasing design/wafer/process costs because the Zen5/8000 series strix point APU's have the capability to finally overcome bandwith limitations and push up the base level of computing and graphics processing power in the standard laptop and prebuilt/office desktop while being cheaper from a power and cost perspective than an equivalent cpu/dgpu combo.

DDR5 giving 2x+ bandwidth is great for Rembrandt and phoenix but the 16-24 rdna3 CU's of phoenix are most likely going to be bandwidth starved, even with higher ddr5 clocks and better compression algorithms. And while phoenix will be very powerful for an APU, it wont displace much more than the x500 and x050 class cards in its own generation at best.

But if speculation about zen5 at this early date is accurate strix point will have several new design features that could herald an end to the entry level and lower-mid level discrete gpu, at least for most use cases. This is all just my speculation below and I'd like to know what people think

-stacked cache will be much more common or even standard on zen 5, and if it is standard then adding it to strix point will result in a small cost increase, iirc the cache on the 5800x3d costs about $20 to add, thats including wafer costs for the cache and packaging.

-given that zen 5 will most likely be on 3nm, and the 64mb cache chiplets on 7nm are already tiny its likely the standard cache chiplet size will be increased to probably 128mb or even 256mb, it might even be fabricated on 5nm but I'm not sure if its possible yet to have logic and cache fabricated on completely different nodes(they are already using differently optimized libraries)

-this would already provide a huge bandwith boost with an L3 accessible by both cpu and gpu, and if AMD succeeds in creating a unified L3 cache that would magnify the benefits

-Assuming a 128mb cache chiplet, it would be possible to get rid of all on die L3 cache, normally that would cause heat transfer problems with whatever is below the cache chiplet, but if the rumored hybrid cpu architecture is implemented in strix point the cache could be layered on top of the efficiency zen4d cores which aren't expected to clock high anyway. This removal of L3 + the node jump would allow for logic optimized denser libraries and room for a lot more graphics CU's that arent bandwith starved(especially at 1080p which is going to be standard for laptops for a while) while still keeping a small overall chip size.

My wild guess is that we will see a 8 Zen 5 +4Zen 4d cpu core configuration, with 40 RDNA4(or rdna3 or 3+) CU's, and with 128mb of shared or unified L3 cache stacked on top and no L3 on the chip itself. Fabbed on 3nm given what we know about the process so far such a chip would be smaller than 200mm^2 and could cost less than $300 to produce meaning it would be viable for sub-$1000 ultrabooks and general laptops from 10-45w depending on the form factor, with higher watt G versions on the desktop. The reason such a chip would be desirable for AMD to manufacture is that costs for masks, design and validation of chips is skyrocketing with each new node and such a powerful APU would mean that AMD would only have to produce at most one monolithic GPU SKU(the x700 and/or x800 chip, aka the next gen navi33 equivalent) along with their new multi-gpu designs while still having competitive products up and down the product stack. It would also reuse cache chiplets and packaging that will most likely be in use in other product lines.

Is this just wishful thinking on my part or are we about to see the age of the super APU dawn?

uzzi38 · May 17, 2022

simplyspeculating said:
DDR5 giving 2x+ bandwidth is great for Rembrandt and phoenix but the 16-24 rdna3 CU's of phoenix are most likely going to be bandwidth starved, even with higher ddr5 clocks and better compression algorithms. And while phoenix will be very powerful for an APU, it wont displace much more than the x500 and x050 class cards in its own generation at best.

Fore cost reasons I wouldn't expect any APUs upcming to do any more than that for the time being. An iGPU to overtake the -60 tier would be rather large physically speaking, it it wouldn't be desirable to ship one of those on every single laptop APU. Chiplets will have to play a part in making them cheaper, but that's assuming we see the iGPU disaggregated from I/O or other components. (Not saying it will or won't - I don't know - I'm just saying this would need to be done is all).

simplyspeculating said:
But if speculation about zen5 at this early date is accurate strix point will have several new design features that could herald an end to the entry level and lower-mid level discrete gpu, at least for most use cases. This is all just my speculation below and I'd like to know what people think

As I mentionned before, many of the old rumours are worthless now.

simplyspeculating said:
-stacked cache will be much more common or even standard on zen 5

Okay tbf this part I actually agree with wrt server and desktop. For mobile though... I'd rather wait and see still.

simplyspeculating said:
and the 64mb cache chiplets on 7nm are already tiny its likely the standard cache chiplet size will be increased to probably 128mb or even 256mb, it might even be fabricated on 5nm but I'm not sure if its possible yet to have logic and cache fabricated on completely different nodes(they are already using differently optimized libraries)

Cache chiplets on newer nodes doesn't make sense. It's cheaper to have them on N7 based nodes due to poor SRAM scaling on more modern ones.

It is possible to have them fabricated on different nodes.

simplyspeculating said:
-this would already provide a huge bandwith boost with an L3 accessible by both cpu and gpu, and if AMD succeeds in creating a unified L3 cache that would magnify the benefits

Well, we'll see about that.

simplyspeculating said:
-Assuming a 128mb cache chiplet

Okay, stop, no. That's way too large for a mobile focused product. Too cost prohibitive to be usable. 128MB doesn't make sense unless targetting 1440p/4K, which no APU is going to do. You're already well in diminishing returns territory by 64MB, much less 128MB.

I could maybe see a 64MB cache chiplet happening, but 32MB may be more in line.

DrMrLordX · May 17, 2022

Mopetar said:
I'm surprised anyone has one of those. They were okay for a semi-professional card on release, but they turned out to be so good at mining you could easily sell it for tripled what you would have paid and essentially trade up to a 6900XT or 3080 for free.

I suppose if you mined with it yourself you could have recouped the cost, and even turned a profit.

I was tempted, but it's my only dGPU in the house right now (that works) and it's in a water loop that I don't feel like draining and refilling with a new block and card just to take advantage of the ridiculous price inflation on Radeon VII.

(Radeon VII is still selling for $855 or more on eBay, and that's for a card without a block)

leoneazzurro · May 17, 2022

APUs at the moment are focused on 720P. maximum 1080p gaming (with limitations). So 16-32Mb would be more than enough in this regard, for the time being.

simplyspeculating · May 17, 2022

uzzi38 said:
Chiplets will have to play a part in making them cheaper, but that's assuming we see the iGPU disaggregated from I/O or other components. (Not saying it will or won't - I don't know - I'm just saying this would need to be done is all).

The problem with chiplets is their power draw is higher, particularly idle power draw. Its only a few watts extra and doesn't matter too much for desktop applications but it is not really viable yet for mobile. In a way, a cache chiplet similarly frees up space to a regular chiplet design but without the need for the extra power usage of an IO die and infinity fabric.

uzzi38 said:
Cache chiplets on newer nodes doesn't make sense. It's cheaper to have them on N7 based nodes due to poor SRAM scaling on more modern ones.

It is possible to have them fabricated on different nodes.

Okay, stop, no. That's way too large for a mobile focused product. Too cost prohibitive to be usable. 128MB doesn't make sense unless targetting 1440p/4K, which no APU is going to do. You're already well in diminishing returns territory by 64MB, much less 128MB.

View attachment 61600

I could maybe see a 64MB cache chiplet happening, but 32MB may be more in line.

Your picture is for only infinity cache hit rates, not the needs of both CPU and GPU and 32mb would be too small for a cache that would need to feed a cpu and large igpu. At that point you would need an on die L3 and the packaging costs for the stacked cache would be much higher than the chiplet die costs and it wouldnt make sense, a 64mb cache fabbed cheaply on a 7nm class node would make good sense though given that they're already in production and TSMC/AMD already have experience with packaging those on to the 5800X3D.

With a 64mb cache my speculated design is still possible, just less performant and cheaper. I think it really comes down to which cache chiplet AMD will decide on for Zen5 because it seems logical they would standardize around one design and one cache chiplet size(maybe also for their GPU's as well). Regardless, I think this is definitely where were headed in the future though, HBM was always way too expensive for cheap mobile products and no one in their right mind is going to put a huge bus on a mobile apu, the apple M series is probably as big as its going to get in the near future at 128 bits(32x4), and that is overkill if you arent designing your own custom silicon. One way or another the bandwith limitation has to be overcome for APU's to live up to the potential and hype.

uzzi38 · May 17, 2022

simplyspeculating said:
The problem with chiplets is their power draw is higher, particularly idle power draw. Its only a few watts extra and doesn't matter too much for desktop applications but it is not really viable yet for mobile. In a way, a cache chiplet similarly frees up space to a regular chiplet design but without the need for the extra power usage of an IO die and infinity fabric.

Don't take VMR/MTS as examples for what the extra power draw would look like. Combined with better packaging techniques and better power gating on the actual I/O die itself the extra power consumption - whilst noticable - shouldn't be anything like the ~18W we see on desktop platforms and certainly not the ~90W on EPYC.

simplyspeculating said:
Your picture is for only infinity cache hit rates, not the needs of both CPU and GPU and 32mb would be too small for a cache that would need to feed a cpu and large igpu. At that point you would need an on die L3 and the packaging costs for the stacked cache would be much higher than the chiplet die costs and it wouldnt make sense, a 64mb cache fabbed cheaply on a 7nm class node would make good sense though given that they're already in production and TSMC/AMD already have experience with packaging those on to the 5800X3D.

With a 64mb cache my speculated design is still possible, just less performant and cheaper. I think it really comes down to which cache chiplet AMD will decide on for Zen5 because it seems logical they would standardize around one design and one cache chiplet size(maybe also for their GPU's as well). Regardless, I think this is definitely where were headed in the future though, HBM was always way too expensive for cheap mobile products and no one in their right mind is going to put a huge bus on a mobile apu, the apple M series is probably as big as its going to get in the near future at 128 bits(32x4), and that is overkill if you arent designing your own custom silicon. One way or another the bandwith limitation has to be overcome for APU's to live up to the potential and hype.

AMD seem quite happy to go with 16MB for CPU L3 in mobile so... yeah, 32MB total is still enough probably. The 6500XT does fine with 16MB. That die has many problems, but surprisingly memory bandwidth is not one of them

TESKATLIPOKA · May 17, 2022

uzzi38 said:
AMD seem quite happy to go with 16MB for CPU L3 in mobile so... yeah, 32MB total is still enough probably. The 6500XT does fine with 16MB. That die has many problems, but surprisingly memory bandwidth is not one of them

I wouldn't be so sure.
RX 6400 has 3.6TFLOPs, 128GB/s GDDR6 + 16MB IF.
RX 6500XT has 5.8TFLOPs, 144GB/s GDDR6 + 16MB IF.
It has 61% more TFLOPs and 12.5% more BW, yet It performs only 28% better(Link) in Full HD.

LightningZ71 · May 17, 2022

From the tech writeups that were published, the hit rate on 16mb of IfC is really low. Doubling it would make a big difference.

simplyspeculating · May 17, 2022

Not to mention that the phoenix APU will have at least as many, and up to 50% more CU's than the 6500xt which, like the replies above pointed out, is bandwidth constrained even with 4GB of gddr6 and 16mb cache exclusively used by the navi 24 die, and strix point will surely have significantly more CU's than phoenix. I really think 64mb of cache is the bare minimum to not be severely bandwidth constrained for APU's after phoenix especially when you have to take a CPU into account.

tomatosummit · May 17, 2022

simplyspeculating said:
Not to mention that the phoenix APU will have at least as many, and up to 50% more CU's than the 6500xt which, like the replies above pointed out, is bandwidth constrained even with 4GB of gddr6 and 16mb cache exclusively used by the navi 24 die, and strix point will surely have significantly more CU's than phoenix. I really think 64mb of cache is the bare minimum to not be severely bandwidth constrained for APU's after phoenix especially when you have to take a CPU into account.

Do you mean a stacked cache chip?
Because I was thinking about how ddr5 really doesn't have the legs and considered that the cache chips navi31/32 might be a possibility. They're the 64MB of cache and a gddr6 controller in one. If it's going to be a premium product then they may as well give a compelling reason for the price tag it's going to have. Low end or low power skus can just not include the chiplet.

jamescox · May 17, 2022

tomatosummit said:
Do you mean a stacked cache chip?
Because I was thinking about how ddr5 really doesn't have the legs and considered that the cache chips navi31/32 might be a possibility. They're the 64MB of cache and a gddr6 controller in one. If it's going to be a premium product then they may as well give a compelling reason for the price tag it's going to have. Low end or low power skus can just not include the chiplet.

I wonder if it would be plausible to have some cache on the IO die and stack a gpu chiplet on top. If the IO die is made on a process closer to what is used for the current v-cache die, then 64 MB would be quite small.

jamescox · May 17, 2022

NTMBK said:
Sure, but an extra 4-8MB of cache is a pretty big addition to die area. Seems like an obvious thing to omit on a more cost sensitive consumer oriented part- you could use the die area for e.g. extra GPU compute units instead.

The L2 size hasn’t varied before. It may be different in Zen 4c though, so they could start having more than one variant. If there is extra cache on the IO die, then having a smaller cache on the cpu die would still perform quite well.

jamescox · May 17, 2022

simplyspeculating said:
The problem with chiplets is their power draw is higher, particularly idle power draw. Its only a few watts extra and doesn't matter too much for desktop applications but it is not really viable yet for mobile. In a way, a cache chiplet similarly frees up space to a regular chiplet design but without the need for the extra power usage of an IO die and infinity fabric.

Your picture is for only infinity cache hit rates, not the needs of both CPU and GPU and 32mb would be too small for a cache that would need to feed a cpu and large igpu. At that point you would need an on die L3 and the packaging costs for the stacked cache would be much higher than the chiplet die costs and it wouldnt make sense, a 64mb cache fabbed cheaply on a 7nm class node would make good sense though given that they're already in production and TSMC/AMD already have experience with packaging those on to the 5800X3D.

With a 64mb cache my speculated design is still possible, just less performant and cheaper. I think it really comes down to which cache chiplet AMD will decide on for Zen5 because it seems logical they would standardize around one design and one cache chiplet size(maybe also for their GPU's as well). Regardless, I think this is definitely where were headed in the future though, HBM was always way too expensive for cheap mobile products and no one in their right mind is going to put a huge bus on a mobile apu, the apple M series is probably as big as its going to get in the near future at 128 bits(32x4), and that is overkill if you arent designing your own custom silicon. One way or another the bandwith limitation has to be overcome for APU's to live up to the potential and hype.

HBM was expensive due needing a silicon interposer under the whole chip and the HBM chips. That isn’t needed with the new stacking tech that just uses a small bridge die. HBM will likely be used more widely by switching to much more economical EFB or other stacking technology. EFB could also be used to connect an IO die and a CPU chiplet. This would provide significantly lower power since it would be an HBM-like connection; possibly thousands of bits wide at low clock. Due to the power savings and presumably slightly higher cost, bringing this tech to mobile first makes sense. Next gen GPUs are looking like they may use at least two gpu die connected together with EFB plus one HBM stack for each gpu gpu component, also connected with EFB or similar.

jamescox · May 17, 2022

soresu said:
I'm inclined to think that CU nomenclature will simply cease to exist with RDNA3 in favor of only counting WGP's, much as CCX's have with Zen3 in favour of CCD's.

Simplification of PR always seems the better road to travel, even if it introduces some short term confusion as it is likely to for RDNA2 -> RDNA3.

It's confusing enough as it is when you have ALU's, CU's, WGP's, SIMD's and Shader Groups to account for, to say nothing of TDP, mm2, the number of transistors and who knows what else I'm probably missing

🤣

The CCX/CCD thing is just due to Zen3 only having 1 CCX per CCD. That may not be the case with upcoming products, so CCX may not be going away. Zen 4c may have two 8-core CCX per CCD; not sure what the current rumor is. It does mostly make sense that all cores on one die are part of the same cluster. The naming of such internal bits isn’t really a marketing thing. Most customers have no idea what this stuff refers to and it has generally never been comparable across different companies, so the raw TFLOP numbers get used instead.

IntelUser2000 · May 17, 2022

simplyspeculating said:
Its a real shame prices are going up from a combination of inflation and increasing design/wafer/process costs because the Zen5/8000 series strix point APU's have the capability to finally overcome bandwith limitations and push up the base level of computing and graphics processing power in the standard laptop and prebuilt/office desktop while being cheaper from a power and cost perspective than an equivalent cpu/dgpu combo.

Is this just wishful thinking on my part or are we about to see the age of the super APU dawn?

We're going to see super APUs, but don't expect them to be cheap. Actually even "APUs" are much more expensive than chipset IGPs of old - they truly added only $5, while "APUs" you have to go up a tier to get them, so you are really paying $50+.

If they are going to be x50 dGPU class, then expect them to be priced like one. Actually if they have advantages like lower power and smaller form factor without the performance loss, I wouldn't be surprised to see them even higher priced.

Why would you sell a product that's superior in every category but price them like they are not?

simplyspeculating · May 18, 2022

As long as there is excess wafer supply and lowering the profit margin would be recouped or better with higher sales the price will be lower. As long as an APU is cheaper to produce than its equivalent discrete cpu and gpu counterparts(which is often the case) it will be sold on average for less.

I expect wafer supply will still be somewhat constrained when phoenix is released but by the strix point release supply could be better and we might even have a glut depending on fab capacity growth and economic conditions.

Glo. · May 18, 2022

IntelUser2000 said:
We're going to see super APUs, but don't expect them to be cheap. Actually even "APUs" are much more expensive than chipset IGPs of old - they truly added only $5, while "APUs" you have to go up a tier to get them, so you are really paying $50+.

If they are going to be x50 dGPU class, then expect them to be priced like one. Actually if they have advantages like lower power and smaller form factor without the performance loss, I wouldn't be surprised to see them even higher priced.

Why would you sell a product that's superior in every category but price them like they are not?

Phoenix can cost around 50$, per die, on 5 nm node.

Its not cheap, but it will be much, much cheaper than two separate designs, with one to which you have to add PCB, VRM, VRAM chips, and ship it.

You combine both, you move the PCB, VRAM, VRM, shipping costs onto the RAM and Mobo costs, and you saved a lot of money.

Secondly, its still a CPU, and has to be priced like a CPU. The Graphics accelerator supposedly is what you would get for "free".

I think AMD's lineup will consist of three different SKU tiers, based on two different dies, on AM5.

RMB will be used for entry level products, even up to Core i5, like Core i5 6500G, 6600G..
Phoenix without 3dVcache - a tier above it like Core i7 6700G.
Phoenix with Vcache - tier above it, potentially coming sildered to MoBo and with DDR5 memory soldered into the Motherboard, with a steep premium, but offering efficiency, and small form factor, like Ryzen 7 6800GX3D.

Thats my guess, how AMD can play the competitive game.

exquisitechar · May 18, 2022

I don't think we'll see Phoenix with V-Cache.

Glo. · May 18, 2022

exquisitechar said:
I don't think we'll see Phoenix with V-Cache.

https://twitter.com/x/status/1524984061277503488

Information from the packing plant indicates that PHX may have 3D caching, which may explain why the PHX increase is so large, but note that this is not certain and needs further verification.

TESKATLIPOKA · May 18, 2022

Glo. said:
Phoenix can cost around 50$, per die, on 5 nm node.
....

A bit low in my opinion.
If a 5nm wafer costs 17000$ then a 225mm2 chip will cost 66$ per chip including the faulty ones. Link
For the chip to cost ~50$ It would need to be only 169mm2 and that's a bit low for Phoenix in my opinion.
Still pretty cheap to produce, and I expect some form of external cache to be present.

Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Platinum Member

Diamond Member

Lifer

Senior member

Diamond Member

Platinum Member

Junior Member

Platinum Member

Lifer

Golden Member

Junior Member

Platinum Member

Platinum Member

Platinum Member

Junior Member

Member

Senior member

Senior member

Senior member

Senior member

Elite Member

Junior Member

Diamond Member

Senior member

Diamond Member

Platinum Member