Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Page 9 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Glo.

Diamond Member
Apr 25, 2015
5,711
4,558
136
A bit low in my opinion.
If a 5nm wafer costs 17000$ then a 225mm2 chip will cost 66$ per chip including the faulty ones. Link
For the chip to cost ~50$ It would need to be only 169mm2 and that's a bit low for Phoenix in my opinion.
Still pretty cheap to produce, and I expect some form of external cache to be present.
PHX is smaller than RMB, which was 208 mm2.

So yeah, it should be around 50$.

But that depends on the wafer price.
 

jpiniero

Lifer
Oct 1, 2010
14,605
5,223
136
PHX is smaller than RMB, which was 208 mm2.

Where would the smaller size come from? The Zen 4 core is roughly the same size, maybe a tad bigger. The GPU shouldn't be any smaller either. IO is probably not going to scale that great so that should be bigger.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
Where would the smaller size come from? The Zen 4 core is roughly the same size, maybe a tad bigger. The GPU shouldn't be any smaller either. IO is probably not going to scale that great so that should be bigger.

I dont know if PHX is smaller vs RMB but at 5nm RMB would be close to 130-140mm2.
 
  • Like
Reactions: Kaluan

SpudLobby

Senior member
May 18, 2022
574
359
96
I dont know if PHX is smaller vs RMB but at 5nm RMB would be close to 130-140mm2.
I assume they will use the logic shrink in order to implement a larger microarchitecture and/or more SRAM in the cache hierarchy, plus as we've now heard, more WGPs in the RDNA3 iGPU. Sure, they could lazily port Rembrandt - and still net more energy efficiency + chips per wafer but... why? Doesn't make any sense, I am sure there could be shrinkage still but I don't expect them to bag the die shrink gains entirely.
 

SpudLobby

Senior member
May 18, 2022
574
359
96
i think the big question is what is phoenix going to cost? from what i have read it will be considered a premium product so will likely come at a premium price. assuming it will be possible to get a 6800U laptop for around $800 by december, it may be wiser for many of us to go with RMB & skip the 7000 series APU's- which may not be out for another 6-12 months after that and at a significantly higher cost.
As far as mobile goes, it does seem like the actual volume releases have lagged the hype and announcement by a solid 6-12 months. Thin and light Rembrandt laptops should come from Lenovo this month or the next but I've seen "Q3" way too often.


Great laptop in principle and exactly what AMD needs more of but Q3 release after it was assured from many in the rumor mills Rembrandt would be out even before January.

I basically assume the same will be true of Phoenix at this point; so, I suspect it will be Q2 if not Q3, Q4 2023 before volume ultrabook-esque designs are out from Lenovo, Asus, HP, etc despite the inevitable crowds cheering as if the release is synchronized with the announcement in January 2023. Almost certainly the initial prices will be higher than for Rembrandt, but it will be justified I think.
 
Last edited:

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
It was only recently that the 5800u and 5600u made general retail availability outside of business targeted higher end laptops. I don't expect to see a lot of Rembrandt-U availability outside of select OEM business targeted SKUs until much closer to Christmas. Phoenix might just as well be vaporware for the next year in my opinion.
 
  • Like
Reactions: SpudLobby

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
You are only taking into component level costs, while most of the extra pricing that's shown in products is opportunity cost. If the "iGPU" performs like their mid range dGPU then it'll be priced like one. AMD is playing that game with Remembrandt where only the top 6800U chip has the maximum iGPU configuration.

When Intel had the eDRAM bundled Iris products, they were available as as a +$300 option in already halo tier products.
 

ahimsa42

Senior member
Jul 16, 2016
225
194
116
there must be solid business reasons for doing this but i don't understand the logic of announcing chips in feb 22 which won't be widely available until dec 22? perhaps the plan is to frustrate consumers enough that they buy 4000/5000 laptops instead?
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Uhhh... no, they were like 70-90$ above the eqivalent versions with HD 530.

And that would be a naive way of thinking. I'm talking about actual laptops here. I took an interest in high end ultrabooks for years and I even have one. You couldn't get Core i3 setups with decent RAM and SSD configuration, you had to get Core i5 or Core i7! They know such forced upgrades work on people.

90% of Iris versions were only available on the absolute top tier XPS/Spectre/Surface Pro esque laptops and $200-300 on top of that. Intel and manufacturers knew from the beginning the Iris parts were always for charging extra.

ARK is a placeholder for chips like laptop and server that don't directly go to end users. If you look at desktop chips all day, you'd be fooled into thinking it exactly reflects reality - it doesn't. Because desktop is the only segment that's sold directly to end users.

Never believe that they'll do high performance iGPUs cheaply. You guys should know by now prices to end users are based on market positioning and little to do with BOM(Bill Of Material).
 
Last edited:
  • Like
Reactions: AAbattery

mikk

Diamond Member
May 15, 2012
4,141
2,154
136
iGPU performance for the mass a bigger hope is Intel. The market has been flooded with TGL-U device last year, most of them are i5 with 80EUs and i7 with 96 EUs. AMD didn't add anything for the mass, their theoretical faster 6800U is more a premium SKU similar to Intels i7-1280P (6+8) and Cezanne was a mildly upgrade over Renoir. It will depend on MTL-M/P if AMD will be forced to bring a bigger iGPU improvement to the cheaper mass market next year, be it Phoenix or a refreshed cheaper Rembrandt-U. This year there is no pressure on AMD because Intel reused the same iGPU in ADL-P/ADL-U.
 

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,976
146
Not to mention that the phoenix APU will have at least as many, and up to 50% more CU's than the 6500xt which, like the replies above pointed out, is bandwidth constrained even with 4GB of gddr6 and 16mb cache exclusively used by the navi 24 die, and strix point will surely have significantly more CU's than phoenix. I really think 64mb of cache is the bare minimum to not be severely bandwidth constrained for APU's after phoenix especially when you have to take a CPU into account.
Stop thinking about RDNA3 as 4CUs per WGP. It's more accurate to say that it's still 2CUs per WGP, but each CU is capable of full wave64 throughput with the ability to instead potentially run 2 wave32 per cycle instead. We don't yet know for certain when VOPD can be used to do that, all we know currently is that it will allow for dual-issue wave32.

But it's probably safe to say that VOPD instructions might not always be applicable, and so I'd advise caution in calling it 24 CUs per say.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,558
136
And that would be a naive way of thinking. I'm talking about actual laptops here. I took an interest in high end ultrabooks for years and I even have one. You couldn't get Core i3 setups with decent RAM and SSD configuration, you had to get Core i5 or Core i7! They know such forced upgrades work on people.
Thats the point. Powerful iGPU was the premium product.

And next gen SOCs/APUs are going to "democratize" powerful iGPUs everywhere.

I don't mean that you will get them everywhere, but there will be 4P/8E/192 EU configs of MTL SOCs, and they should be relatively cheap. The point of those products is to simplify the products, and bring value, not premiums. Heck, expect that NUCs will have those chips in them(!).

For premium AMD and Intel will have dGPUs.

:p
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
What AMD brought to the masses was the 6&7 CU Vega iGPUs in their 6 core parts. Those were always plentiful and typically more affordable than anything else that was competent. You can still often find 7cu iGPU parts in laptops that are under $550, and often sub $500 usd. Those game within a few percent of the higher end parts in most cases.

the comparable 80eu i5 products are almost $100 or more expensive. Add to that the long term driver problems they have and it’s not really been much of a contest.
 
  • Like
Reactions: Kaluan

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
the comparable 80eu i5 products are almost $100 or more expensive. Add to that the long term driver problems they have and it’s not really been much of a contest.

Yea in general AMD has the better value product. Also you can tell with iGPUs the response to driver issues are much more muted. The problems they are having with iGPUs wouldn't fly when you have an RTX 3080 class product.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,356
2,848
106
Stop thinking about RDNA3 as 4CUs per WGP. It's more accurate to say that it's still 2CUs per WGP, but each CU is capable of full wave64 throughput with the ability to instead potentially run 2 wave32 per cycle instead. We don't yet know for certain when VOPD can be used to do that, all we know currently is that it will allow for dual-issue wave32.

But it's probably safe to say that VOPD instructions might not always be applicable, and so I'd advise caution in calling it 24 CUs per say.
If I understand you correctly, then:
RDNA3: 2 CUs per WGP => 2 SIMDs per CU => 64 ALUs per SIMD, right?
Or 2x TFLOPs per CU(WGP) compared to RDNA2.
In comparison:
RDNA(2): 2 CUs per WGP => 2 SIMDs per CU => 32 ALUs per SIMD
GCN: no WGP only CU => 4 SIMDs per CU => 16 ALUs per SIMD

One GCN SIMD was capable of Full wavefront(wave64) with 64 items every 4 cycles.
RDNA(2) SIMD is capable of a full wavefront(wave32) with 32 items every cycle.
RDNA3 will be capable of a full wavefront(wave64) with 64 items every cycle or 2 wavefronts(wave32) with 32 items in some cases, right?
 
Last edited:

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,976
146
If I understand you correctly, then:
RDNA3: 2 CUs per WGP => 2 SIMDs per CU => 64 ALUs per SIMD, right?
Or 2x TFLOPs per CU(WGP) compared to RDNA2.
In comparison:
RDNA(2): 2 CUs per WGP => 2 SIMDs per CU => 32 ALUs per SIMD
GCN: no WGP only CU => 4 SIMDs per CU => 16 ALUs per SIMD

One GCN SIMD was capable of Full wavefront(wave64) with 64 items every 4 cycles.
RDNA(2) SIMD is capable of a full wavefront(wave32) with 32 items every cycle.
RDNA3 will be capable of a full wavefront(wave64) with 64 items every cycle or 2 wavefronts(wave32) with 32 items in some cases, right?
Yeah, you've understood correctly. At least, that's what the Linux driver commits seem to suggest.
 

Saylick

Diamond Member
Sep 10, 2012
3,170
6,397
136
For my own knowledge's sake, what's the use case of being able to run a whole wave64 in one cycle vs. executing it over two cycles? I thought that games were already optimized for RDNA at this point by using wave32 almost exclusively. Is there an inherent advantage to double up SIMD resources per CU to allow it to execute one wave64/cycle vs. taking that single wave64 and breaking it up to two wave32 and then executing both wave32 across the two SIMDs (also in one cycle via dual issue)?
 

Olikan

Platinum Member
Sep 23, 2011
2,023
275
126
For my own knowledge's sake, what's the use case of being able to run a whole wave64 in one cycle vs. executing it over two cycles? I thought that games were already optimized for RDNA at this point by using wave32 almost exclusively. Is there an inherent advantage to double up SIMD resources per CU to allow it to execute one wave64/cycle vs. taking that single wave64 and breaking it up to two wave32 and then executing both wave32 across the two SIMDs (also in one cycle via dual issue)?
not a pro at this, but will do my best

It alleviates one of the most important GPUs bottleneck: register file... (a major bottlenck of GCN gpus)
doing a wave64 in 2 clocks means twice the time it needs to hold data... Problem shows up when the gpu needs to execute a critical wavefront, but the registry file is full, the gpu then deletes the registry file of a less important wavefront just to relaunch it latter, at the cost of bandwith and power.

The interesting part is the dual wave32: GPUs need 4 registry ports (x,y,z,t), for a 3D vector operation. But needs only 2 ports for simple math, like add. So launching 2 waves, filling up the 4 ports, is nice way of using already existing resources
 

SpudLobby

Senior member
May 18, 2022
574
359
96
Is there any word (read: rumor) on architectural improvements in power or energy efficiency with Zen 4 independent of that which is granted by the process improvement^1?

I imagine the best of such features will be in Phoenix, not Raphael, so we may have to wait to hear more, but I’m eager to know if any such leaks, rumors exist — particularly now that we know the PPC gain is probably only a gain of ~ 5% baseline, ~ 10% upper bound depending on the workload and frequency.

1: Should be 25-30% iso-frequency at some point on a curve, but with the amount of DTCO going on for HPC — it may differ from the standard gain, and the point on this efficiency curve may differ now depending on what they’ve done with the libraries.
 

moinmoin

Diamond Member
Jun 1, 2017
4,952
7,665
136
Is there any word (read: rumor) on architectural improvements in power or energy efficiency with Zen 4 independent of that which is granted by the process improvement^1?
Power gating is always being improved on. The mobile APUs are obviously the first recipients of such improvements, but Raphael should be the first desktop oriented package to see a much improved IOD in that regard. Outside of more and more aggressive power gating (which ideally allows any computation to fire up only the silicon area absolutely necessary for the given calculation) process improvement is indeed the main lever to improve general power efficiency.

Some more optimization would be possible by targeting and optimizing the core design only for the process node's most efficient frequencies (e.g. the frequencies the cache hierarchy runs at). But there AMD and Intel instead opt to enable highest possible frequency instead.
 

SpudLobby

Senior member
May 18, 2022
574
359
96
Power gating is always being improved on. The mobile APUs are obviously the first recipients of such improvements, but Raphael should be the first desktop oriented package to see a much improved IOD in that regard. Outside of more and more aggressive power gating (which ideally allows any computation to fire up only the silicon area absolutely necessary for the given calculation) process improvement is indeed the main lever to improve general power efficiency.

Some more optimization would be possible by targeting and optimizing the core design only for the process node's most efficient frequencies (e.g. the frequencies the cache hierarchy runs at). But there AMD and Intel instead opt to enable highest possible frequency instead.
Right. Well, we’ll have to wait I suppose.