Question AMD Phoenix/Zen 4 APU Speculation and Discussion

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,984
146
Current Mobile Alder Lake tops at 14C/20T, Raptor Lake will likely exceed that and AMD Knows that. So how are they supposed to have the highest Core/Thread/Cache count if not with 16C/32T?

View attachment 61110
Raptor Lake and Meteor Lake-P are both 6+8. The HX parts (if they exist) are likely to exceed 16c, but not the regular parts.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,362
2,854
106
So back on APU topic.
In part due to rumor:
And if you consider Intel's Meteor Lake the main competitor for Zen4 will feature iGPU with 192 EU that puts it in striking range of RTX 3060M
then AMD would be incompetent and stupid not to see ahead and pull along likewise.
A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.
Screenshot_1.png
Just by doubling the EU you won't even outperform GTX 1650.
looking up techpowerup for Nvidia GeForce RTX 3060M
115W version @1702 MHz has 10.94 TFLOPs FP32
=> 60W version @1282 MHz gives ~8.24 TFLOPs FP32
RDNA 3:
12 CU@2GHz ~6.144 TFLOPs FP32
16 CU@2.2GHz ~9 TFLOPs FP32
24 CU@2.2GHz ~13.52 TFLOPs FP32
As can be seen to reach parity the iGPU with 24CU only needs to be clocked at 1.34GHz while the 16CU needs 2.01GHz
Based on the leaks:
RDNA2: 1WGP = 2CU
RDNA3: 1WGP = 4CU
Phoenix certainly won't have 12WGP(48CU) IGP with 3072 shaders in total, but only half of It .
RDNA3 6WGP(24CU RDNA2) at 2GHz would produce 6.144 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 2.6GHz would produce 7.987 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 3GHz would produce 9.216 TFLOPs.

RTX 3060 Max-Q has 3840 Cuda and 1283 MHz (official Boost) which means 9.853 TFLOPs, but that doesn't represent the actual gaming performance.
RX 6700XT 12GB(13,215 TFLOPs) has the same performance as RTX 3060TI 8GB(16.197 TFLOPs ), Link.
Ampere needs ~23% more TFLOPs for a similar performance.
RDNA3 IGP would need ~8 TFLOPs, to be equal in performance to RTX 3060 Max-Q(I don't take into account any architectural improvements of RDNA3 or bandwidth bottlenecks).
 
Last edited:

insertcarehere

Senior member
Jan 17, 2013
639
607
136
If this is the case then there is a severe tdp bottleneck, with 12CU it should perform AT LEAST like a RX 6400 if there are no bottlenecks if the gpu clock is around 2300mhz. And the RX6400 trades blows with the GTX1650.

Considering it is a RDNA3 this this should already be matching a rx570 at the very minimum so no idea of what is going wrong here.

Regardless, i trust that youtuber, he has no reason to lie, and unless something is really really wrong there is no way RDR2 is running at 20fps at 1080p low, even a RX560 can get you consistent +40fps at 1080p low. Hell, even a 5700G can get you around 30.
1682865458938-png.80143


1682865419223-png.80142

Any FPS reviews that arent willing to directly do comparisons with other competitors are pretty pointless, given we have no idea the setup behind such framerates.

It's 6 months after RDNA3 launched and there still seems to be a focus on what RDNA3 could be instead of what it is at this point.

Even with AMD's own slides:
AMD%20Ryzen%207040U%20Slide%20Deck%206.PNG

Notice the lack of comparison with it's immediate predecessor (680M in 6800U). Either that means:
- AMD of all parties couldn't source a 6800U laptop as comparison (which raises serious questions about their competence); or more likely
-AMD did test the 6800U here and found that the resulting comparison would not reflect well on their new product, the 7840U.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
Where is the 4WGP part? I thought such configuration wasn't supported.
So did we...

Lenovo apparently has two "exclusive" SKUs, the 6860Z, which is a custom bin of the U series chip line, and a "Ryzen 7 pro with 8 CUs" which is probably 4 WGPs enabled of the 6 arranged in two clusters of two. I haven't seen the exact SKU name for it, but it's in the advertising.

Here's their product page still listing it...

https://www.lenovo.com/us/en/thinkpadz/?orgRef=https%3A%2F%2Fwww.google.com%2F

So, apparently it does exist...
 
  • Like
Reactions: scineram

uzzi38

Platinum Member
Oct 16, 2019
2,635
5,984
146
Not to mention that the phoenix APU will have at least as many, and up to 50% more CU's than the 6500xt which, like the replies above pointed out, is bandwidth constrained even with 4GB of gddr6 and 16mb cache exclusively used by the navi 24 die, and strix point will surely have significantly more CU's than phoenix. I really think 64mb of cache is the bare minimum to not be severely bandwidth constrained for APU's after phoenix especially when you have to take a CPU into account.
Stop thinking about RDNA3 as 4CUs per WGP. It's more accurate to say that it's still 2CUs per WGP, but each CU is capable of full wave64 throughput with the ability to instead potentially run 2 wave32 per cycle instead. We don't yet know for certain when VOPD can be used to do that, all we know currently is that it will allow for dual-issue wave32.

But it's probably safe to say that VOPD instructions might not always be applicable, and so I'd advise caution in calling it 24 CUs per say.
 

Abwx

Lifer
Apr 2, 2011
10,962
3,482
136
I'm not sure about that. The rest of the system at idle or in use? My Renoir laptop's lowest idle usage in normal usage (so with screen and wifi enabled) is slightly below 3W. But use of NVMe does visibly rise that value, though usually only for a very short time. Also best performance-per-watt is usually at the efficiency infliction point most CPUs have in the frequency/voltage curve, with the addition of the system power use moving that point slightly above that. This gives the best possible performance return for the available battery capacity which imo should be always the goal for all mobile use cases.

It can be proved by computing the energy/time first and second order derivatives, the optimum point is at CPU power = rest of system power.

To give a numerical exemple let s assume that rest of system power is 1W and that the task is executed in 1 second when the CPU use 1W, that amount to 2 joules for the full system.

If you increase frequency by 1.1x time will be reduced by 1.1x and CPU energy increase by 1.3, total energy consumed by the CPU will be 1.18 joule while the rest of system will consume 0.91 Joule, that makes a grand total of 2.1 joule..

If you increase frequency even further the efficency will decrease accordingly, but as said efficency and usability can be contradictory when it comes to the required power level to get a responsive enough system.
 
  • Like
Reactions: maddie

deasd

Senior member
Dec 31, 2013
520
763
136

Interesting article for future APU with L4 cache. If AMD manage to make it work with Phoenix Point, I am going to be impressed:

Zen 4 CPU with L1 Cache x 8
1MB L2 Cache x 8
16MB L3 Cache on CCD
32MB L4 Cache on MCD
LPDDR5 memory controller on MCD

Considering this was a 18 months old rumor like 'Zen4 29% IPC uplift', I would take it with a truck load of salt. Especially when it came with silly 'Zen5+Zen4D' hybrid rumor. It seems everybody overestimated something when Zen4 was still on paper.
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
664
1,014
106
Is PHX2 going to be released for mainstream laptops (yes I realize it will cost more, by mainstream I just mean same as the other Pheonix SKU's would be albeit lower cost/perf) or is this almost entirely for Steam Deck/handhelds?
As much as I would like to see AMD introduce Big.little already in this generation, so far there has only been one tweet by a person with next to no followers. Not even rumour mills like WCCFT have adapted this so far - so a truckload of salt is needed.
 
  • Like
Reactions: Tlh97 and scineram

NostaSeronx

Diamond Member
Sep 18, 2011
3,686
1,221
136
You didn't provide any reason why Steam couldn't use PHX2 instead of a custom chip. BTW the custom one Steam used is pretty much Van Gogh and that one was meant for Microsoft originally If I remember correctly, so It wasn't like they commissioned AMD to make them one, they used an existing one.
It is isn't why they couldn't use it, it was about why would they use it. If Valve got the custom Steamdeck processor without spending the silicon development cost good on them. However, if they want to do a Steamdeck 2 it would not be PHX2, but rather a custom solution with newer Zen_x/RDNA_y IP instead. Since, Valve is covered by the S3 Group at AMD. Every dollar saved on the first custom Steamdeck processor can be prepared for the second custom Steamdeck2 processor. Since, they can use customer feedback to make a better SoC -> leading to a better handheld or where ever.
4nm PHX2 based machine with more and costlier DDR5 RAM, bigger SSD would naturally cost more, so $299 for base without controllers is already pretty questionable and who would buy this console for that price without a controller?
People who are going to use it as a personal computer?

Atari VCS is currently $200 (Official Atari store) and to get the PC mode USB thing $40 (Also, Official Atari store). It is probably one of the current cheapest upgrade-able minipc's for Raven2. This is a lot cheaper than getting something from System76, purism, Think Penguin, other MiniPC+Linux vendors. As they can use game revenue to offset PC net profit loss.

It also gives them the benefit of just fitting under Apple, if they went MiniPC as well:
https://browser.geekbench.com/v5/cpu/18302986 <== Atari VCS
https://browser.geekbench.com/v5/cpu/19087977 <== Apple A16, (pointing towards Apple TV products but if they were actually aimed at the MiniPC market: TV has access to Apple Arcade. Which of course has a blend of phone-esque to console-esque games. Also, have to bring your own controller for Apple TV:Apple Arcade)

Atari upgrading to PHX2 and doing Streaming, Gaming, PC, etc stuff at least puts them on-par with Intel's cheapo minipcs and Apple's cheapo minipcs.
You also didn't explain why Atari would even need to use PHX2 in Atari VCS2, when the games they produce simply don't need so much horsepower and can be played on Windows. Atari would need first those visually better games and only after them could have a reason for this console to exist or for buyers to want It.
Games come after hardware releases. If there is going to be an improvement in figures of complexity with better 3D/2D framerates. A Phoenix2(RDNA3) VCS2 optimization would scale up or down from <-> Steamdeck(RDNA2) <-> XSS(RDNA2) <-> PS5(RDNA+) <-> XSX(RDNA2) <-> Gaming PC(RDNA/RDNA2/RDNA3), a lot easier than a VCS(Vega14-GCN) Raven2 optimization.

Atari VCS 1 (plagued by controversies, prior CEO) => Original launch window was Q4-2019 through Q1-2020, didn't become generally available till Q2-2021.
following industry <-- time between consoles -->
Atari VCS 2 (straight launch, sweeping VCS 1 under the rug, new CEO) => Launch window given +4 years = Q4-2023 through Q1-2024, bonus time till Q2-2025(avoiding this hopefully).
//New CEO -> ??? -> New Director of VCS* -> New Lead Architect of VCS*
* Not in known capacity, probably won't be till software is finished on VCS1. => No Crowdfunding.

The change away from mobile is to focus more on:
Nightdive's Remake/Remasters like System Shock Remake
Ziggurat's Remake/Remasters like Deadly Dozen Reloaded
But, for Atari games...

Convert an old game(that might not have been in 3D) into 3D. Then, first port is at least on one console and PC(before VCS2), then it is ported down to Atari VCS2(when it comes out). Visually beyond the scope of Haunted Houses in the Reimagined Series. It will take awhile before we see the actual games the PC/Console focus will actually lead to. As well as if Atari does use the PHX2 to extend their renewed VCS(translated: game console)/ST(translated: personal computer) ambitions.

Steamdeck: February 2022 +4 => February 2026, if Valve is following the industry. If not, they are more likely to follow Sony's rapid refresh cycles: 2020(7nm) -> 2021(7nm/lighter) -> 2022(6nm/lighter). In this, it is highly unlikely for Valve to swap out the Steamdeck processor for PHX2. Since, they would be going for a refresh with same exact specs and reduced power/heat like Oberon+: https://www.angstronomics.com/p/ps5-refresh-oberon-plus
With a February 2026 target it at least gives a window for porting down(Nintendo Sw1:20nm/Sw1+:16nm, same console different process type) or developing new(true SD2) on GAA-nodes.

Atari VCS has significant decay where the phone ports of 3D console games can run at 4K/1080p on Apple TV 4K, while it struggles at 720p~900p. Any extraneous work to make a port work would be mute if the hardware(aka using PHX2) was much closer to more modern hardware(2p+4e+>mid-speed iGPU).

Weighting benefits:
Steamdeck => Basically is already at Orin-Switch spec. Valve/Nintendo is already paired, no reason to switch to another chip of different spec than 4c/4wgp. There is no reason to downgrade to keep costs when valve highlights "great on deck" titles. Thus, anything lost on the hardware is given back by the cut of steam store. Which gives the opposite target, they will want an EVEN larger SoC than Phoenix1/Strix. Which will make them wait for mature GAA-nodes later in 2025+.
Atari VCS => Not paired with any current gen console, an upgrade is definitely desired. The minipc aspect is already way behind, Jasper Lake/Alder Lake-N/Alder Lake-M are pulling their cost-effective added-value weight. Thus, PHX2 has the highest benefit is for Atari not Valve. They also don't have billions of excess like Valve for a custom chip.

Distribution of sales:
Steamdeck1 + 64GB eMMC$399/$400Extreme minority of sales / Sold as loss
Steamdeck1 + 256GB NVMe$529/$530Minority of sales / Sold as profit
Steamdeck1 + 512GB NVMe$649/$650Majority of sales / Sold as profit
Given the above, the desire isn't to sale a worse model or a lite model. Since, the volume of sales by Gabe indicates preference of the best one.

A better SoC is preferred for big chunks of triple AAA:
Early millions $70 => $21 for every game license bought
Late millions $70 => $14 for every game license bought
It is up to developers/publishers selling on steam to have a SteamDeck mode(quality mode), since Valve isn't funding that. So a stronger, better SoC means less work for the developers/publishers who do decide to integrate a SteamDeck mode.

Steamdeck2 instead probably would prefer:
8c Zen4 (8p, 2p+6e, 4p+4e)
>8wgp RDNA3
$599 w/ 512 GB
$799 w/ 1 TB
$999 w/ 2 TB

Atari VCS Base (+ PC USB)$200 (+$40), less on salesMajority of sales / Sold as loss
Atari VCS All-in (+ PC USB)$300 (+$40), ' ' 'Minority of sales / Sold as loss
Given the table above, the desire appears more for a preference of a minipc rather than a game console. This is probably a symptom of launching the hardware with a barebone operating system, etc stuff.

Steamdeck was built off the SteamMachines/SteamOS. Atari VCS was a race track to get the hardware out. The current VCS plan is software(Games and getting the OS basic features: Dashboard, AtariLive(XboxLive clone), etc. features) with a shift to newer hardware(More likely to use PHX2).

With this timeline:
January 2022 => Atari new hardware was referenced
July 2022 => Atari has new hardware set in action
Time continues -> Atari finishes their development of the PC OS(usb stick) and Game OS(eMMC).
Time continues -> PHX2 AMD launch or reveal
Mid-2023/Q3-2023+ => Atari announces PHX2-based VCS and it is to use the hybrid-variant PC/Game OS on NVMe. Straight to release, no crowdfunding shenanigans/controversies/etc.

It is much more believable that Atari would use the PHX2 over Valve's Steamdeck successor using it.
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,855
1,518
136
I agree that Polaris is very inefficient. It is also the case that the RX550 is quite undersized for the amount of bandwidth available. The RX560 is specced for the exact same total bandwidth and is markedly faster at roughly 40% in most cases, depending on which model you are testing.

The RX570 is slightly faster than the gtx 1650 and slightly slower than the 1650 super. If Rembrandt was as fast as the desktop 1650 today, then I would believe that PHX will blow past the RX570. Unfortunately, the desktop 1650 is, on average, 30+% faster than the 680m, with the RX570 being a bit faster. 30+% is a lot to overcome when you start getting to those levels. Is it possible? I suppose, but it's going to take more than bigger caches and 20% more memory bandwidth with a slightly higher iGPU clock to overcome that deficit.

Remember there is a 64 bit version of the RX550 that is actually slower, so it is using that extra bandwidth, maybe not to full extend. The RX560 is petty much a double RX550, incluiding the L2 cache.

I still think there is something wrong with RMB that resulted in lower than expected performance and im not sure if memory bandwidth is the issue here. I think RDNA2 is build around IC and not having it here is creating some issues.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,643
136
It's not memory bandwidth that's the issue, it's that the FCLK drops extremely low when the iGPU is at full tilt (a power saving feature) which badly hurts CPU performance.

It's the reason why games that are more CPU heavy are the ones that struggle the most, e.g. God Of War struggles to maintain over 40fps at all resolutions.
God of War ain't CPU-heavy. Scaling stops beyond 4C/8T and it is mostly GPU bound. APUs, especially in laptops, will always be power-limited, which is why people need to dial back on their expectations with Phoenix. We saw that with Rembrandt. 680M in the 6800U was only as fast as an MX450 GDDR6, and the latter was faster in most cases unless the game in question had issues with the 2 GB framebuffer. I expect a GTX 1650 to be the upper limit of the performance that Phoenix APUs can deliver in thin laptops.
 

SpudLobby

Senior member
May 18, 2022
578
366
96
Cezanne has 8CU Vega IGP at 2.1GHz.
Rembrandt has 12CU RDNA2 IGP at 2.4GHz.
Rembrandt IGP improved everything -> specs, clocks and architecture.

Phoenix has the same specs as Rembrandt. Improvement in architecture is so far only ~9% in N31 although AMD claimed 17.4%.
Phoenix with less bugs could do better, but I don't expect better improvement than RDNA2 over Vega, which was 25-40% just from architecture.

If frequency is not a lot higher, then I don't see how Phoenix IGP could manage to perform comparably to a 63% faster RX 570.
Just to be on par you would need for example 20% improvement from architecture and 3300MHz clockspeed.
From leaks Phoenix ES IGP is only at ~2.6GHz, so you would need 50% improvement from architecture!

BTW even thinking about comparable improvement as Rembrandt had over Vega is totally unrealistic. Rembrandt is 2x faster than Cezanne!
View attachment 72840
Well, shit.
 
  • Like
Reactions: scineram

insertcarehere

Senior member
Jan 17, 2013
639
607
136
Got a source for N33 being botched?

Bondrewd @ B3D seems to be of the impression that all the Navi chips have some bugs, that 31 is by far the most buggy but fixed N31 is coming, that at least the clockspeed bug is fixed in N32, while N33 is quote "mostly un-(word I can't say here)ed" and should be pretty impressive.

Bondrewd also had this to say about Navi 31 a month ago:
Bondrewd said:
Good news, it won't.
AD102 barely edges ahead at 33% more SMs so...

He does not warrant any more publicity.

Based on my understanding, 4060 Ti should perform around 3070 Ti levels.

And that is within performance levels of Navi 33, easily, but much more efficient.

The 3070 is ~40% faster than a 6650XT at QHD, or about/slightly above the performance uplift from the 6950XT to the 7900XTX.
Capture6.JPG
If 96CU Navi 31 on 5nm with over double the transistor count barely manages ~40% uplift vs 80CU Navi 21. I am extremely doubtful 32 CU Navi 33 with similar transistor count to Navi 23 (200mm^2 N6) can get close to the same vs 32 CU Navi 23, especially given how fast the latter clocks already.
 

TESKATLIPOKA

Platinum Member
May 1, 2020
2,362
2,854
106
Phoenix-U will be limited to 15-28W.
Phoenix-HS will be 35W and up to 54W maybe?
What will be interesting is Phoenix-HS versus 8C Dragon Range.

BTW All The Watts!! updated Phoenix SKUs. According to him It should look like this:
Ryzen 9 7940HS 8c 12 CU.
Ryzen 7 7840HS 8c 12 CU.
Ryzen 5 7640HS 6c 6 CU.

Ryzen 7 7840U 8c 12 CU
Ryzen 5 7640U 6c 6 CU
PHX2.
Ryzen 7 7740U 6c 4 CU.
Ryzen 5 7540U 6c 4 CU.
Ryzen 3 7340U 4c 2 CU.
I don't think a 6C PHX2 is faster than 6C PHX, yet naming says otherwise.
I hope the marketing department didn't receive any Christmas bonuses, I certainly wouldn't give them any after this BS naming scheme.
 
Last edited:

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Why can't AMD simply put a larger memory controller in Phoenix Point? What is stopping them?
Cost. Let me tell you a heuristic I once heard about platform cost. $0.01 is noise, $0.10 gets grumbling, $1.00 gets meetings between the companies, and $10 gets meetings between high-level executives.

More memory channels is certainly one way to provide the bandwidth needed to feed a big GPU, but it's an incredibly delicate balance. PCs aren't limited to iGPUs like Macs are, so if you increase costs too much, OEMs might just go for a dGPU anyway, especially with AMD's current gaps vs Nvidia. There's a very narrow band of high price, high (but not too high) performance, and constrained form factor that can support such a product. I wouldn't be surprised if AMD or even Intel eventually makes one, but it's no surprise they're not quick to make those tradeoffs for a mainstream product.
Pair that with LPDDR5 (which i am sure is cheaper than standard DDR5
LPDDR is moderately more expensive than DDR.
 

Abwx

Lifer
Apr 2, 2011
10,962
3,482
136
As usual, you make nonsensical, irrelevant arguments.

The non sense is to say that there s no gain while there s a 12% gain, frequency is just as relevant as IPC since they use a smaller node and dont increase the thermals, at this rate you should compare a Zen1 to a Zen 4 and point the IPC improvement as only perf improvement.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,821
3,643
136
That RAM speed is only 4800 MHz, that's not enough even for Rembrandt!
The highest gain is in Assassin's Creed Valhalla +23%.
The lowest gain is in Shadow of the Tomb Rider +9%.
Give them 6400MHz LPDDR5 with 33% higher BW, then we can talk how good or bad 780M is.

BTW, I wonder If this is really a proper comparison.
We know memory is slow and likely with bad timings, but 680M has only 2200MHz and 780M has 2800MHz clock speed.
I don't think they have the same TDP limit.
Doesn't matter how fast the RAM is. The pre-announcement Chinese review had LPDDR5-6400 on both systems, and the 780M was 12% faster.

TDP is irrelevant. Both the -H and -HS in laptops like the Asus TUF will happily boost to 60-70 W in their highest power mode.

This is real-world comparison using similar type of systems that you can buy right now.
 
  • Like
Reactions: insertcarehere

Brunnis

Senior member
Nov 15, 2004
506
71
91
Here are some benchmarks I've run myself on my new Asus Zephyrus G14 with Ryzen 9 7940HS, 32GB DDR5-4800 and RTX 4070. Several of these are older benchmarks that I've used for benchmarking my laptops over many years. The Thinkpad T14s G3 was returned after a few days due to hardware instability, while I still have the Dell in possession. Please note that the Dell benchmarks were run on an old but well kept Windows 11 installation, while the others are completely fresh. No instability or weirdness so far with the Asus.

1685427773277.png