Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Page 7 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

TESKATLIPOKA

Senior member
May 1, 2020
568
603
106
So back on APU topic.
In part due to rumor:
And if you consider Intel's Meteor Lake the main competitor for Zen4 will feature iGPU with 192 EU that puts it in striking range of RTX 3060M
then AMD would be incompetent and stupid not to see ahead and pull along likewise.
A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.
Screenshot_1.png
Just by doubling the EU you won't even outperform GTX 1650.
looking up techpowerup for Nvidia GeForce RTX 3060M
115W version @1702 MHz has 10.94 TFLOPs FP32
=> 60W version @1282 MHz gives ~8.24 TFLOPs FP32
RDNA 3:
12 CU@2GHz ~6.144 TFLOPs FP32
16 CU@2.2GHz ~9 TFLOPs FP32
24 CU@2.2GHz ~13.52 TFLOPs FP32
As can be seen to reach parity the iGPU with 24CU only needs to be clocked at 1.34GHz while the 16CU needs 2.01GHz
Based on the leaks:
RDNA2: 1WGP = 2CU
RDNA3: 1WGP = 4CU
Phoenix certainly won't have 12WGP(48CU) IGP with 3072 shaders in total, but only half of It .
RDNA3 6WGP(24CU RDNA2) at 2GHz would produce 6.144 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 2.6GHz would produce 7.987 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 3GHz would produce 9.216 TFLOPs.

RTX 3060 Max-Q has 3840 Cuda and 1283 MHz (official Boost) which means 9.853 TFLOPs, but that doesn't represent the actual gaming performance.
RX 6700XT 12GB(13,215 TFLOPs) has the same performance as RTX 3060TI 8GB(16.197 TFLOPs ), Link.
Ampere needs ~23% more TFLOPs for a similar performance.
RDNA3 IGP would need ~8 TFLOPs, to be equal in performance to RTX 3060 Max-Q(I don't take into account any architectural improvements of RDNA3 or bandwidth bottlenecks).
 
Last edited:

Glo.

Diamond Member
Apr 25, 2015
4,973
3,588
136
A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.
View attachment 61431
Just by doubling the EU you won't even outperform GTX 1650.
The thing is: Iris Xe has different front end than DG2, which already changes the throughput of those ALUs, secondly, Xe is heavily, like HEAVILY memory bandwidth starved, and thirdly, Meteor Lake will use Battlemage architecture tiles for the iGPU.

So it will be completely diferent architecture.

I personally do not expect 3060 performance levels. But I will not be surprised with GTX 1660 Ti performance from 192 EU MTL-S SOCs.

Thats plenty enough, for entry level GPU.
 
  • Like
Reactions: ryan20fun and Tup3x

Mopetar

Diamond Member
Jan 31, 2011
6,677
3,720
136

Anhiel

Junior Member
May 12, 2022
10
0
6
Just by doubling the EU you won't even outperform GTX 1650.


Based on the leaks:
RDNA2: 1WGP = 2CU
RDNA3: 1WGP = 4CU

Phoenix certainly won't have 12WGP(48CU) IGP with 3072 shaders in total, but only half of It .
RDNA3 6WGP(24CU RDNA2) at 2GHz would produce 6.144 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 2.6GHz would produce 7.987 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 3GHz would produce 9.216 TFLOPs.
Wrong.
Core i7 12700H which uses UHD Graphics 770 which is Xe-LP
MTL uses Xe-HPG. check for details yourself if you don't belief


Wrong.
RDNA2: 1WGP = 2CU = 4 SMD32 = 128 shaders
RDNA3: 1WGP = 2CU = 8 SMD32 = 256 shaders

See my references at the bottom of the post. You literally confirmed the calculations.[/QUOTE]
 
Last edited:

mikk

Diamond Member
May 15, 2012
3,461
1,202
136
A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.
View attachment 61431
Just by doubling the EU you won't even outperform GTX 1650.

It's not only a EU doubling, it's also a major GPU clock speed improvement over the first Xe version on Intels 10nm/7. Xe LP design was limited to ~1.3-1.5 Ghz (12700H boost up to 1.40 Ghz). Xe HPG on TSMC 6nm can do 2.4+ Ghz on DG2-128 or DG2-512.

Xe LP performs surprisingly well by the way if it's using either LPDDR4x-4266 or DDR5-4800. In several reviews Rembrandt-H only performs about 50-60% better and Vega 8 looks hopeless. Xe LP was heavily DDR4 bandwidth starved and Xe HPG will be more efficient (64 EUs instead of 96 EUs per slice helps feeding the EUs better)

The thing is: Iris Xe has different front end than DG2, which already changes the throughput of those ALUs, secondly, Xe is heavily, like HEAVILY memory bandwidth starved, and thirdly, Meteor Lake will use Battlemage architecture tiles for the iGPU.
MTL is using Xe HPG Core, same as DG2. Battlemage graphics comes in Lunar Lake.
 
  • Like
Reactions: Glo.

jamescox

Senior member
Nov 11, 2009
506
821
136
It's still true today. Yes, AMD is more flexible using its different multi chips approaches. But those designs still needed to be planned well beforehand to reach that point.


I'm starting to think Raphael-H alias Dragon Range is the APU gone chiplet which will see further mobile oriented optimizations in later gens. I still doubt it will replace lower cost monolithic APUs though.
Yeah, they still need to be planned, but that flexibility can allow for a lot of different parts that they didn’t plan for 3 to 5 years ago. I suspect that the v-cache chiplet was designed almost completely for Milan-x, but it is very simple to make a desktop product based on it, if needed. AM4 was supposed to be EOL in like 2020, but with Covid, really expensive DDR5, and whatever else, we got the 5800X3D.

The low end will still be a monolithic die, but with the power consumption improvements of 5 nm and improved design, a chiplet based mobile part is doable. A stacked mobile part would be better and a stacked mobile part with HBM or other on package memory would be the best. There have been some rumors of an APU with really high clocked memory. I don’t think you would see memory modules clocked that high in such a system, so I am thinking that might be on package memory. I am not sure how they would do a chiplet based solution though. It seems like the desktop IO die will have a small GPU, not suitable to a high-end mobile APU. Perhaps a stacked GPU chiplet on top of the IO die?
 

TESKATLIPOKA

Senior member
May 1, 2020
568
603
106
Wrong.
Core i7 12700H which uses UHD Graphics 770 which is Xe-LP
MTL uses Xe-HPG. check for details yourself if you don't belief
You wrote a 192 EU Meteor Lake IGP will be in a striking zone of RTX 3060M.
I just wrote that by doubling UHD Graphics 770 you won't even get to the level of desktop GTX 1650 and didn't expand It more, so I will do It now.
Screenshot_3.png
RTX 3060M is 4.5x faster than UHD 770.
Meteor Lake IGP would need to be ~4x faster than UHD 770 to be close enough to this Ampere.

Wrong.
RDNA2: 1WGP = 2CU = 4 SMD32 = 128 shaders
RDNA3: 1WGP = 2CU = 8 SMD32 = 256 shaders
RDNA2: 1WGP = 2CU = 128 shaders
RDNA3: 1WGP = 4CU = 256 shaders
I honestly don't care If RDNA3 WGP will have 2x more CU or one CU will have 2x more shaders. From TFLOPs perspective, It's the same thing for me.
See my references at the bottom of the post. You literally confirmed the calculations.
I saw It the first time and my point was that Phoenix won't have an IGP with 8-12 WGPs (2048-3072 shaders), just one with 6 WGPs (1536 shaders) based on leaks.
 
Last edited:
  • Like
Reactions: Kaluan

soresu

Golden Member
Dec 19, 2014
1,805
961
136
I honestly don't care If RDNA3 WGP will have 2x more CU or one CU will have 2x more shaders. From TFLOPs perspective, It's the same thing for me.
I'm inclined to think that CU nomenclature will simply cease to exist with RDNA3 in favor of only counting WGP's, much as CCX's have with Zen3 in favour of CCD's.

Simplification of PR always seems the better road to travel, even if it introduces some short term confusion as it is likely to for RDNA2 -> RDNA3.

It's confusing enough as it is when you have ALU's, CU's, WGP's, SIMD's and Shader Groups to account for, to say nothing of TDP, mm2, the number of transistors and who knows what else I'm probably missing

🤣
 

Ajay

Lifer
Jan 8, 2001
11,218
5,033
136
Raja never had a good history with drivers.
He’s not in charge of the driver team. He has a direct report who is. The problem is getting Intel's driver team on level with the more experienced teams at AMD and NV. AMD managed the change when they dumped most of their US team and built up a team in China. AMD did start with a better game oriented driver stack from the get go, however.
 

Mopetar

Diamond Member
Jan 31, 2011
6,677
3,720
136
Was more of a comment that during his time at AMD during the release of Vega a lot of the performance shortcoming was said to be due to drivers that would eventually materialize.

I'm sure that the driver team for Intel will be back from their trip to the corner store for cigarettes any minute now. Maybe they ran into the Vega team there.
 

soresu

Golden Member
Dec 19, 2014
1,805
961
136
Was more of a comment that during his time at AMD during the release of Vega a lot of the performance shortcoming was said to be due to drivers that would eventually materialize.
Pretty sure that the µArch itself was b0rked in the 16nm implementation - even the 7nm spin still didn't have NGG working properly, tho exactly how much of that was µArch and how much was drivers I have no idea.
 
  • Like
Reactions: ryan20fun

izaic3

Member
Nov 19, 2019
45
45
61
So to sum up the leaks so far, Phoenix will have ~20% ipc increase, and ~100% gpu capability increase over Rembrandt?
 

Gideon

Golden Member
Nov 27, 2007
1,481
3,015
136
So to sum up the leaks so far, Phoenix will have ~20% ipc increase, and ~100% gpu capability increase over Rembrandt?
100% more GPU units. Considering the memory subsystem and bandwidth remains mostly the same, I don't believe this will correspond to anywhere near 100% uplift of performance. That could only be a general thing when it also has Infinity Cache (which I doubt due-to die-are reasons). It would still be a hefty increase for sure, and ALU bound stuff will get the 2x perf uplift.

And speaking of the +20% IPC, it sure would be nice, but I'm a bit sceptical considering how little the architecutre has changed (that we know of) . I'ts easily doable when we also count AVX-512, but I'd like to see more evidence, to believe it will be so in most general workloads.
 

leoneazzurro

Senior member
Jul 26, 2016
608
853
136
So far the rumor for Phoenix is +100% GPU units, revised architecture and higher clocks. So on the math side the GPU should have more than 100% uplift. Which will of course not translate in every case in a 100% FPS uplift.
 

LightningZ71

Golden Member
Mar 10, 2017
1,236
1,250
136
While it won't double, we do expect that the memory bandwidth will improve with higher clocked DRAM. In addition, the doubled L2 caches, coupled with continuing the policy of exclusive caches, should help slightly reduce memory bus contention. In addition, RDNA3 is supposed to also bring larger internal GPU caches outside of any infinity cache present. All of that ahould translate to a substantial gpu performance uplift.
 

NTMBK

Diamond Member
Nov 14, 2011
9,695
3,535
136
While it won't double, we do expect that the memory bandwidth will improve with higher clocked DRAM. In addition, the doubled L2 caches, coupled with continuing the policy of exclusive caches, should help slightly reduce memory bus contention. In addition, RDNA3 is supposed to also bring larger internal GPU caches outside of any infinity cache present. All of that ahould translate to a substantial gpu performance uplift.
Sorry, is that the CPU or GPU L2 that is getting doubled?
 

Shivansps

Diamond Member
Sep 11, 2013
3,608
1,282
136
RDNA3 may be more memory efficient. I would find very hard to belive they would do 100% IGP increase whiout any changes to the mem subsystem.
 

LightningZ71

Golden Member
Mar 10, 2017
1,236
1,250
136
I also suspect that the RDNA3 iGPU will last for a couple of generations, meaning that it may be “forward looking” with respect to DDR5 memory bandwidth. In other words, it may be moderately memory starved in Phoenix, but may be more performant in later generations with even higher clocks and faster bins of memory.
 

NTMBK

Diamond Member
Nov 14, 2011
9,695
3,535
136
Zen 4 core will have 1 MB of L2.

Btw, RDNA 3 would have bigger L2 as well.
And the CPU L2 increase is definitely coming to APUs? We've seen different cache sizes between server and APU designs before.
 

maddie

Diamond Member
Jul 18, 2010
3,791
2,863
136
RDNA3 may be more memory efficient. I would find very hard to belive they would do 100% IGP increase whiout any changes to the mem subsystem.
It would have to be if the Navi 3 rumors are true. I also agree with you on wasting area ( surplus shaders) is unbelievable.
 

Glo.

Diamond Member
Apr 25, 2015
4,973
3,588
136
RDNA3 may be more memory efficient. I would find very hard to belive they would do 100% IGP increase whiout any changes to the mem subsystem.
GCN to RDNA1 resulted in better memory bandwidth efficiency. RDNA2 amplified it by using Infinity Cache. Based on speculation in the web, RDNA3 also is supposed to do more with less, similarly to the jump from GCN to RDNA1.

So yeah, people expecting only 60% performance increase from 100% ALU increase, 100% L2 cache increase, 33% core clocks increase, and potentially 20-25% "IPC" of the ALUs increase are a bit pessimistic.

We should see AT LEAST GTX 1660 Ti desktop performance from Phoenix in highest end SKU, because of the maths that we know about, or we can make educated guesses. What we don't know is how 25% per-ALU performance increase on Navi 33 would be achieved. What changes has RDNA3 undergone compared to RDNA1 and 2.

And if its inherently part of the throughput of the ALUs, Phoenix's iGPU should be around RTX 2060 desktop in performance, even without the Infinity Cache.
 

ASK THE COMMUNITY