Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
It's not getting 3 GHz and ~9 TFLOPs on a 35W TDP. Look at the 6400 which has around a 50W TDP. Granted that some of that is the memory itself, but compare it to the 6500 which has a similar core count, but actually pushes its clocks.

Perhaps a desktop part that's allowed to use that full 170W TDP that AM5 specifies, but some of you are expecting far too much out of it.

Frankly it doesn't even need to be half as good as it's being hyped up to in order to make for a great APU. At 6 TFLOPs it already would beat a 6500 XT.
 

Timorous

Golden Member
Oct 27, 2008
1,616
2,780
136
It's not getting 3 GHz and ~9 TFLOPs on a 35W TDP. Look at the 6400 which has around a 50W TDP. Granted that some of that is the memory itself, but compare it to the 6500 which has a similar core count, but actually pushes its clocks.

Perhaps a desktop part that's allowed to use that full 170W TDP that AM5 specifies, but some of you are expecting far too much out of it.

Frankly it doesn't even need to be half as good as it's being hyped up to in order to make for a great APU. At 6 TFLOPs it already would beat a 6500 XT.

I said the same thing about a 2.23 Ghz PS5 APU in a console power and thermal envelope and I was utterly wrong. AMD have found a way to have rather stupid clock speeds without going full Nvidia on power consumption.
 

Saylick

Diamond Member
Sep 10, 2012
3,170
6,402
136
I said the same thing about a 2.23 Ghz PS5 APU in a console power and thermal envelope and I was utterly wrong. AMD have found a way to have rather stupid clock speeds without going full Nvidia on power consumption.
Lol @ "without going full Nvidia".

There were some doubters that the PS5 could sustain its boost clocks of 2.23 GHz and now we know that 2.23 GHz is actually super low for RDNA 2. Idk what AMD did but integrating their CPU designers into the GPU team is paying off in spades.
 

Mopetar

Diamond Member
Jan 31, 2011
7,842
5,994
136
I said the same thing about a 2.23 Ghz PS5 APU in a console power and thermal envelope and I was utterly wrong. AMD have found a way to have rather stupid clock speeds without going full Nvidia on power consumption.

Navi 24 (6400 and 6500XT) is on N6. Going to N5 isn't going to let them hit 3 GHz while lowering the power to fit into a 35W TDP shared with the CPU.

Like I said, certainly possible if you've got 170W to play with. The 6500 XT already boosts to 2.8 GHz on ~100W. Doesn't even take an extra 10% to reach 3 GHz. Just won't be done with only 35W. The 6400 can only boost to around 2.2 GHz with 50W. Again, that counts memory which an APU won't, but going from 2.2 GHz to 3 GHz requires a 35% performance jump at the same power usage.
 

Frenetic Pony

Senior member
May 1, 2012
218
179
116
Wonder if they'll eventually make another drop in GPU chiplet. Double the resources should still ramp up power/performance on laptops or desktops regardless of TDP, and it'd be a closer match to Apple's M1Max chip.

Perhaps not till the end of next year or even 2024's refresh though.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
It's not getting 3 GHz and ~9 TFLOPs on a 35W TDP. Look at the 6400 which has around a 50W TDP. Granted that some of that is the memory itself, but compare it to the 6500 which has a similar core count, but actually pushes its clocks.

Perhaps a desktop part that's allowed to use that full 170W TDP that AM5 specifies, but some of you are expecting far too much out of it.

Frankly it doesn't even need to be half as good as it's being hyped up to in order to make for a great APU. At 6 TFLOPs it already would beat a 6500 XT.
I think 2.5 GHz on this iGPU will be possible in 35W TDP thermal envelope.

Which still will be mind blowing.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
Its funny, but Paul has leaked the SKU names for mobile Phoenix, and it appears its straight up replacement for Rembrandt chips, with Rafael mobile being above Phoenix.


From 14:29. The SKUs of those APUs are at the end of the APU section.
 

jamescox

Senior member
Nov 11, 2009
637
1,103
136
That's what I mean, that would be so bandwidth limited it would be a waste of space.
AMD generally would not put hardware on the device without the bandwidth to back it up. DDR5 is a significant increase in bandwidth, with two 32-bit channels per module. That, and other features, will allow it to deliver more bandwidth than the clock speed increase would indicate, but is that enough to keep such a gpu fed?

I doubt that it would have infinity cache unless it is a base die / top die stacking arrangement for infinity cache die rather than a bridge chip. I was expecting infinity cache to be a bridge chip used with EFB, but some rumors are saying a base die with compute or graphics die on top. They could possibly put v-cache on such a device, although there isn’t much area on an APU that is lower power. It seems like it would have to sit over top of GPU or CPU units, although an APU might have enough area in cache plus IO for a v-cache die. You don’t have any IO on a regular cpu chiplet.

It would be great if they pulled something like apple and just put DDR5 chips on the package. One stack of HBM would be massive also. It might not be too expensive if EFB is used. I would hope that laptop makers are asking for something that will be more competitive with Apple chips, so it is plausible that AMD will make something that is different from a standard APU. If a lot of laptops have such a powerful AMD APU, then there isn’t as much need for dedicated graphics. This may be a big reason behind Nvidia going heavily into ARM.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Let's not get ahead of ourselves here. AMD has improved for the better and outpaced Intel in some areas, but this reads like typical AMD marketing tripe.
It's not getting 3 GHz and ~9 TFLOPs on a 35W TDP. Look at the 6400 which has around a 50W TDP. Granted that some of that is the memory itself, but compare it to the 6500 which has a similar core count, but actually pushes its clocks.

Perhaps a desktop part that's allowed to use that full 170W TDP that AM5 specifies, but some of you are expecting far too much out of it.

Frankly it doesn't even need to be half as good as it's being hyped up to in order to make for a great APU. At 6 TFLOPs it already would beat a 6500 XT.
I think 2.5 GHz on this iGPU will be possible in 35W TDP thermal envelope.

Which still will be mind blowing.

Incorrect. Said mobile part is on a far smaller process than mobile parts that have been compared.

Memory bandwidth will not be an issue.

Phoenix is not the only "mobile" Zen 4 part either.
 
  • Like
Reactions: Kaluan

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Its funny, but Paul has leaked the SKU names for mobile Phoenix, and it appears its straight up replacement for Rembrandt chips, with Rafael mobile being above Phoenix.


From 14:29. The SKUs of those APUs are at the end of the APU section.

So he reads these forums. Good on him. Some of what you've stated is slightly inaccurate, however, the rest literally came from these forums.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,361
2,848
106
We all know a 128-bit 4800-6400MHz(76.8-102.4GB/s) DDR5 won't be enough.
For comparison:
RX 6400 has 3.6TFLOPs, 128GB/s GDDR6 + 16MB IF.
RX 6500XT has 5.8TFLOPs, 144GB/s GDDR6 + 16MB IF.
It has 61% more TFLOPs, yet It performs only 28% better(Link) in Full HD.
6 WGP IGP at only 2GHz has already 6.1 TFLOPs.
Phoenix will need to have something else than just DDR5.
We already know 64MB 3DV cache at 7nm is only 41mm2(Link).
RDNA3 will use IC chiplets, so adding a 64MB chiplet for Phoenix shouldn't be a problem. If the size is 49mm2 for example, then you get 1143 good chiplets and with $8000-10000 per wafer It would cost only ~$7-9, add packaging costs and I think It shouldn't be more than $20.
That's pretty cheap in my opinion.
 

maddie

Diamond Member
Jul 18, 2010
4,744
4,683
136
We all know a 128-bit 4800-6400MHz(76.8-102.4GB/s) DDR5 won't be enough.
For comparison:
RX 6400 has 3.6TFLOPs, 128GB/s GDDR6 + 16MB IF.
RX 6500XT has 5.8TFLOPs, 144GB/s GDDR6 + 16MB IF.
It has 61% more TFLOPs, yet It performs only 28% better(Link) in Full HD.
6 WGP IGP at only 2GHz has already 6.1 TFLOPs.
Phoenix will need to have something else than just DDR5.
We already know 64MB 3DV cache at 7nm is only 41mm2(Link).
RDNA3 will use IC chiplets, so adding a 64MB chiplet for Phoenix shouldn't be a problem. If the size is 49mm2 for example, then you get 1143 good chiplets and with $8000-10000 per wafer It would cost only ~$7-9, add packaging costs and I think It shouldn't be more than $20.
That's pretty cheap in my opinion.
Good work on the numbers. We keep reading about the magical DDR5 solution, but as you show, it does not work.

The issue with IF cache is where do you place it. One aspect of the cache which I think might be significant, but have not seen mentioned, is its placement vis a vis the shaders. The cache is NOT in a single or even a few compact blocks, but surround the WGPs on the outside. A blanket of cache surrounding the processing cores. I think this is important for the ability to clock highly and also power efficiency. It might not be possible to get the same benefits by using a Zen like 3D cache chiplet in a single block.

Then we have the issue of not placing it over active logic. Where do you place the chiplet?

Best bet for myself, is cache on die and selling for a premium.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,361
2,848
106
I am not sure about the placement. I don't think It will be stacked over the APU, but then it would have to be on package. Using interposer like HBM had would be expensive. Maybe something like eDRAM Intel had in package, but a lot faster. In a way, that would be a regression compared to RDNA2, which had IF cache on die, but you need to save die space and cache doesn't scale well on smaller processes.
 
Last edited:

jamescox

Senior member
Nov 11, 2009
637
1,103
136
I am not sure about the placement. I don't think It will be stacked over the APU, but then it would have to be on package. Using interposer like HBM had would be expensive. Maybe something like eDRAM Intel had in package, but a lot faster. In a way, that would be a regression compared to RDNA2, which had IF cache on die, but you need to save die space and cache doesn't scale well on smaller processes.
The rumors seem to be saying infinity cache in a base die with the graphics die on top for GPUs. Not sure what they will do for an APU. It would be nice to get an HBM stack or just some DDR5 die integrated into the package, like what Apple did, for mobile parts. May people don’t even own a desktop anymore, so making much more powerful mobile devices is needed.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
So, flip things around. It’s been said that stacking the CPU die on top of the cache die doesn’t work because it’s VERY difficult to get 100+ watts of power to the CPU die through TSVs on the cache die…

but what if you were restricted to just 45 watts? You could certainly shrink the APU cpu die by moving the L3 cache to an external die, the increased latency hidden by the larger L2. Then, you now have room for larger WGPs of the RDNA3 type, maybe even a few more, and also 16MB of Infinity Cache next to them. At this point, the gpu side literally dwarves the CCX.

The L3 cache doesn’t have to be big, 32 MB being plenty, and it could be on a cheaper N6 cache optimized process. You do have an issue with TSVs for all the IO, but, that’s not the end of the world.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,361
2,848
106
So, flip things around. It’s been said that stacking the CPU die on top of the cache die doesn’t work because it’s VERY difficult to get 100+ watts of power to the CPU die through TSVs on the cache die…

but what if you were restricted to just 45 watts? You could certainly shrink the APU cpu die by moving the L3 cache to an external die, the increased latency hidden by the larger L2. Then, you now have room for larger WGPs of the RDNA3 type, maybe even a few more, and also 16MB of Infinity Cache next to them. At this point, the gpu side literally dwarves the CCX.

The L3 cache doesn’t have to be big, 32 MB being plenty, and it could be on a cheaper N6 cache optimized process. You do have an issue with TSVs for all the IO, but, that’s not the end of the world.
Infinity cache in RDNA3 according to leaks will be on an external die, so I don't see why It should be on die in Phoenix, but L3 would be separate.

Size
[MB]
Bandwidth
[GB/s]
Hit rate at Full HD [%]Actual bandwidth [GB/s]
128 MB1987811609.5
64 MB993.572715
32 MB49755273
16 MB2483792
Only 16MB Infinity cache for a 6WGP(24CU) IGP would be small and slow.

The best thing would be a shared external cache of 64 MB for both CPU and IGP. Size would be still small, only one additional die and you could use a cheaper process for It. Main die would save a lot of space, which could be used for more WGP, CPU cores and so on.
 
Last edited:

randomhero

Member
Apr 28, 2020
181
247
86
7
Infinity cache in RDNA3 according to leaks will be on an external die, so I don't see why It should be on die in Phoenix, but L3 would be separate.

Size
[MB]
Bandwidth
[GB/s]
Hit rate at Full HD [%]Actual bandwidth [GB/s]
128 MB1987811609.5
64 MB993.572715
32 MB49755273
16 MB2483792
Only 16MB Infinity cache for a 6WGP(24CU) IGP would be small and slow.

The best thing would be a shared external cache of 64 MB for both CPU and IGP. Size would be still small, only one additional die and you could use a cheaper process for It. Main die would save a lot of space, which could be used for more WGP, CPU cores and so on.
I quite like what you wrote.
If they do share L3, they would share 80 MB of cache which would be more than enough for 1080p, somewhere around 400 GB/s on average.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,608
5,816
136
Infinity cache in RDNA3 according to leaks will be on an external die, so I don't see why It should be on die in Phoenix, but L3 would be separate.
N33 allegedly is a monolithic chip without stacking but using regular Infinity Cache like RDNA2.

The best thing would be a shared external cache of 64 MB for both CPU and IGP. Size would be still small, only one additional die and you could use a cheaper process for It. Main die would save a lot of space, which could be used for more WGP, CPU cores and so on.
I don't think on Zen4 the GPU can access the CPU L3. (It would need to be part of CCX)
The CCM/L3 (Cache Coherent Master) connects to the CCS/UMC (Cache Coherent Slave) via IF SDP and BW is quite low (BW depends on FCLK), but low latency and low power.
IGP connects to IOH/NB which connects to the CCS via IF.

Infinity Cache is marketing name for Memory Attached Last Level Cache, in this case, as the name suggests, it would need to be inserted before the connection to the IOH after the GPU L2. Otherwise it would still suffer from low SDP Bandwidth, which is similar to DDR5 BW (Not coincidentally, the SDP BW is over provisioned around that as well).

If any cache gets inserted at the UMC (in which case it becomes SLC, not CPU L3) the issue still exists that they need to raise the SDP BW in order to get to it.
There are leaks of dual SDP ports but the BW would still be nowhere near the advertised BW numbers by AMD for IF.

Future generations would probably use SLC, but not sure how this would look. But it is an interesting topic to see how it gets addressed.

Anyway, they cannot put so much cache on Phoenix, simply because the target market is far diverse and a big part of that market has no need for the huge L3 but more efficiency instead.
I suppose they could however add some TSVs on the base die and stack the cache on either/both CPU/GPU for some SKUs?
 

Ajay

Lifer
Jan 8, 2001
15,458
7,862
136
I suppose they could however add some TSVs on the base die and stack the cache on either/both CPU/GPU for some SKUs?
Whatever rumors are out there, I just don’t see this happening on a mass market mobile SoC. It’s one thing for servers, or a Hail Mary for gaming desktops, but given the solid boost in performance for this series, I can’t see any user demand.