Question AMD Phoenix/Zen 4 APU Speculation and Discussion

uzzi38 · Apr 28, 2022

I can finally make this thread.

https://twitter.com/x/status/1519669375283957760

Phoenix is indeed RDNA3. My advice to everyone: treat the old APU rumours as being out of date.

TESKATLIPOKA · May 13, 2022

Anhiel said:
So back on APU topic.
In part due to rumor:

AMD Phoenix RDNA3 iGPU could be as fast as the slowest GeForce RTX 3060 mobile GPU - VideoCardz.com

AMD Phoenix APU with mid-range discrete GPU performance Greymon55 claims that the upcoming integrated graphics into the AMD Phoenix APU, could show comparable performance to the most power-restricted version of the RTX 3060 mobile. AMD Ryzen 7000 series will officially include two mobile series...

videocardz.com

And if you consider Intel's Meteor Lake the main competitor for Zen4 will feature iGPU with 192 EU that puts it in striking range of RTX 3060M
then AMD would be incompetent and stupid not to see ahead and pull along likewise.

A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.

AMD Radeon 680M: Die Ryzen-6000-iGPU im Test: Benchmarks der Radeon 680M in Spielen

AMD Radeon 680M im Test: Benchmarks der Radeon 680M in Spielen / Leistung in FHD mit mittleren Details / Leistung in FHD mit hohen Details

www.computerbase.de

Just by doubling the EU you won't even outperform GTX 1650.

Anhiel said:
looking up techpowerup for Nvidia GeForce RTX 3060M
115W version @1702 MHz has 10.94 TFLOPs FP32
=> 60W version @1282 MHz gives ~8.24 TFLOPs FP32
RDNA 3:
12 CU@2GHz ~6.144 TFLOPs FP32
16 CU@2.2GHz ~9 TFLOPs FP32
24 CU@2.2GHz ~13.52 TFLOPs FP32
As can be seen to reach parity the iGPU with 24CU only needs to be clocked at 1.34GHz while the 16CU needs 2.01GHz

Based on the leaks:
RDNA2: 1WGP = 2CU
RDNA3: 1WGP = 4CU
Phoenix certainly won't have 12WGP(48CU) IGP with 3072 shaders in total, but only half of It .
RDNA3 6WGP(24CU RDNA2) at 2GHz would produce 6.144 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 2.6GHz would produce 7.987 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 3GHz would produce 9.216 TFLOPs.

RTX 3060 Max-Q has 3840 Cuda and 1283 MHz (official Boost) which means 9.853 TFLOPs, but that doesn't represent the actual gaming performance.
RX 6700XT 12GB(13,215 TFLOPs) has the same performance as RTX 3060TI 8GB(16.197 TFLOPs ), Link.
Ampere needs ~23% more TFLOPs for a similar performance.
RDNA3 IGP would need ~8 TFLOPs, to be equal in performance to RTX 3060 Max-Q(I don't take into account any architectural improvements of RDNA3 or bandwidth bottlenecks).

Glo. · May 13, 2022

TESKATLIPOKA said:
A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.

AMD Radeon 680M: Die Ryzen-6000-iGPU im Test: Benchmarks der Radeon 680M in Spielen

AMD Radeon 680M im Test: Benchmarks der Radeon 680M in Spielen / Leistung in FHD mit mittleren Details / Leistung in FHD mit hohen Details

www.computerbase.de

View attachment 61431
Just by doubling the EU you won't even outperform GTX 1650.

The thing is: Iris Xe has different front end than DG2, which already changes the throughput of those ALUs, secondly, Xe is heavily, like HEAVILY memory bandwidth starved, and thirdly, Meteor Lake will use Battlemage architecture tiles for the iGPU.

So it will be completely diferent architecture.

I personally do not expect 3060 performance levels. But I will not be surprised with GTX 1660 Ti performance from 192 EU MTL-S SOCs.

Thats plenty enough, for entry level GPU.

Mopetar · May 13, 2022

nicalandia said:
Yeah, good luck to Intel beating AMD RDNA2 Level performance. Intel can't even get their Software/Drivers right.

Intel Arc A-Series Graphics Cards Delayed: Company Confirms Citing Software Readiness, Launch Rescheduled For Late Summer 2022

Intel's Lisa Pearce has confirmed in a blog post that their Arc A-Series Graphics Cards have been delayed due to poor software readiness.

wccftech.com

Raja never had a good history with drivers.

Anhiel · May 13, 2022

TESKATLIPOKA said:
Just by doubling the EU you won't even outperform GTX 1650.

Based on the leaks:
RDNA2: 1WGP = 2CU
RDNA3: 1WGP = 4CU

Phoenix certainly won't have 12WGP(48CU) IGP with 3072 shaders in total, but only half of It .
RDNA3 6WGP(24CU RDNA2) at 2GHz would produce 6.144 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 2.6GHz would produce 7.987 TFLOPs.
RDNA3 6WGP(24CU RDNA2) at 3GHz would produce 9.216 TFLOPs.

Wrong.
Core i7 12700H which uses UHD Graphics 770 which is Xe-LP
MTL uses Xe-HPG. check for details yourself if you don't belief

Wrong.
RDNA2: 1WGP = 2CU = 4 SMD32 = 128 shaders
RDNA3: 1WGP = 2CU = 8 SMD32 = 256 shaders

See my references at the bottom of the post. You literally confirmed the calculations.[/QUOTE]

mikk · May 13, 2022

TESKATLIPOKA said:
A 192 EU IGP in Meteor Lake will be nowhere near a RTX 3060 Max-Q(60W).
Just check out the review you posted.

AMD Radeon 680M: Die Ryzen-6000-iGPU im Test: Benchmarks der Radeon 680M in Spielen

AMD Radeon 680M im Test: Benchmarks der Radeon 680M in Spielen / Leistung in FHD mit mittleren Details / Leistung in FHD mit hohen Details

www.computerbase.de

View attachment 61431
Just by doubling the EU you won't even outperform GTX 1650.

It's not only a EU doubling, it's also a major GPU clock speed improvement over the first Xe version on Intels 10nm/7. Xe LP design was limited to ~1.3-1.5 Ghz (12700H boost up to 1.40 Ghz). Xe HPG on TSMC 6nm can do 2.4+ Ghz on DG2-128 or DG2-512.

Xe LP performs surprisingly well by the way if it's using either LPDDR4x-4266 or DDR5-4800. In several reviews Rembrandt-H only performs about 50-60% better and Vega 8 looks hopeless. Xe LP was heavily DDR4 bandwidth starved and Xe HPG will be more efficient (64 EUs instead of 96 EUs per slice helps feeding the EUs better)

Glo. said:
The thing is: Iris Xe has different front end than DG2, which already changes the throughput of those ALUs, secondly, Xe is heavily, like HEAVILY memory bandwidth starved, and thirdly, Meteor Lake will use Battlemage architecture tiles for the iGPU.

MTL is using Xe HPG Core, same as DG2. Battlemage graphics comes in Lunar Lake.

jamescox · May 13, 2022

moinmoin said:
It's still true today. Yes, AMD is more flexible using its different multi chips approaches. But those designs still needed to be planned well beforehand to reach that point.

I'm starting to think Raphael-H alias Dragon Range is the APU gone chiplet which will see further mobile oriented optimizations in later gens. I still doubt it will replace lower cost monolithic APUs though.

Yeah, they still need to be planned, but that flexibility can allow for a lot of different parts that they didn’t plan for 3 to 5 years ago. I suspect that the v-cache chiplet was designed almost completely for Milan-x, but it is very simple to make a desktop product based on it, if needed. AM4 was supposed to be EOL in like 2020, but with Covid, really expensive DDR5, and whatever else, we got the 5800X3D.

The low end will still be a monolithic die, but with the power consumption improvements of 5 nm and improved design, a chiplet based mobile part is doable. A stacked mobile part would be better and a stacked mobile part with HBM or other on package memory would be the best. There have been some rumors of an APU with really high clocked memory. I don’t think you would see memory modules clocked that high in such a system, so I am thinking that might be on package memory. I am not sure how they would do a chiplet based solution though. It seems like the desktop IO die will have a small GPU, not suitable to a high-end mobile APU. Perhaps a stacked GPU chiplet on top of the IO die?

TESKATLIPOKA · May 14, 2022

Anhiel said:
Wrong.
Core i7 12700H which uses UHD Graphics 770 which is Xe-LP
MTL uses Xe-HPG. check for details yourself if you don't belief

You wrote a 192 EU Meteor Lake IGP will be in a striking zone of RTX 3060M.
I just wrote that by doubling UHD Graphics 770 you won't even get to the level of desktop GTX 1650 and didn't expand It more, so I will do It now.

RTX 3060M is 4.5x faster than UHD 770.
Meteor Lake IGP would need to be ~4x faster than UHD 770 to be close enough to this Ampere.

Anhiel said:
Wrong.
RDNA2: 1WGP = 2CU = 4 SMD32 = 128 shaders
RDNA3: 1WGP = 2CU = 8 SMD32 = 256 shaders

RDNA2: 1WGP = 2CU = 128 shaders
RDNA3: 1WGP = 4CU = 256 shaders
I honestly don't care If RDNA3 WGP will have 2x more CU or one CU will have 2x more shaders. From TFLOPs perspective, It's the same thing for me.

Anhiel said:
See my references at the bottom of the post. You literally confirmed the calculations.

I saw It the first time and my point was that Phoenix won't have an IGP with 8-12 WGPs (2048-3072 shaders), just one with 6 WGPs (1536 shaders) based on leaks.

soresu · May 14, 2022

TESKATLIPOKA said:
I honestly don't care If RDNA3 WGP will have 2x more CU or one CU will have 2x more shaders. From TFLOPs perspective, It's the same thing for me.

I'm inclined to think that CU nomenclature will simply cease to exist with RDNA3 in favor of only counting WGP's, much as CCX's have with Zen3 in favour of CCD's.

Simplification of PR always seems the better road to travel, even if it introduces some short term confusion as it is likely to for RDNA2 -> RDNA3.

It's confusing enough as it is when you have ALU's, CU's, WGP's, SIMD's and Shader Groups to account for, to say nothing of TDP, mm2, the number of transistors and who knows what else I'm probably missing

🤣

Ajay · May 14, 2022

Mopetar said:
Raja never had a good history with drivers.

He’s not in charge of the driver team. He has a direct report who is. The problem is getting Intel's driver team on level with the more experienced teams at AMD and NV. AMD managed the change when they dumped most of their US team and built up a team in China. AMD did start with a better game oriented driver stack from the get go, however.

Mopetar · May 14, 2022

Was more of a comment that during his time at AMD during the release of Vega a lot of the performance shortcoming was said to be due to drivers that would eventually materialize.

I'm sure that the driver team for Intel will be back from their trip to the corner store for cigarettes any minute now. Maybe they ran into the Vega team there.

soresu · May 14, 2022

Mopetar said:
Was more of a comment that during his time at AMD during the release of Vega a lot of the performance shortcoming was said to be due to drivers that would eventually materialize.

Pretty sure that the µArch itself was b0rked in the 16nm implementation - even the 7nm spin still didn't have NGG working properly, tho exactly how much of that was µArch and how much was drivers I have no idea.

DrMrLordX · May 14, 2022

soresu said:
even the 7nm spin still didn't have NGG working properly, tho exactly how much of that was µArch and how much was drivers I have no idea.

Allegedly it was always a hardware problem. I don't think AMD bothered fixing NGG for Vega20 since it was mostly an enterprise product anyway.

(typing this on my Radeon VII-equipped machine for extra lulz)

izaic3 · May 15, 2022

So to sum up the leaks so far, Phoenix will have ~20% ipc increase, and ~100% gpu capability increase over Rembrandt?

Gideon · May 16, 2022

izaic3 said:
So to sum up the leaks so far, Phoenix will have ~20% ipc increase, and ~100% gpu capability increase over Rembrandt?

100% more GPU units. Considering the memory subsystem and bandwidth remains mostly the same, I don't believe this will correspond to anywhere near 100% uplift of performance. That could only be a general thing when it also has Infinity Cache (which I doubt due-to die-are reasons). It would still be a hefty increase for sure, and ALU bound stuff will get the 2x perf uplift.

And speaking of the +20% IPC, it sure would be nice, but I'm a bit sceptical considering how little the architecutre has changed (that we know of) . I'ts easily doable when we also count AVX-512, but I'd like to see more evidence, to believe it will be so in most general workloads.

leoneazzurro · May 16, 2022

So far the rumor for Phoenix is +100% GPU units, revised architecture and higher clocks. So on the math side the GPU should have more than 100% uplift. Which will of course not translate in every case in a 100% FPS uplift.

LightningZ71 · May 16, 2022

While it won't double, we do expect that the memory bandwidth will improve with higher clocked DRAM. In addition, the doubled L2 caches, coupled with continuing the policy of exclusive caches, should help slightly reduce memory bus contention. In addition, RDNA3 is supposed to also bring larger internal GPU caches outside of any infinity cache present. All of that ahould translate to a substantial gpu performance uplift.

NTMBK · May 16, 2022

LightningZ71 said:
While it won't double, we do expect that the memory bandwidth will improve with higher clocked DRAM. In addition, the doubled L2 caches, coupled with continuing the policy of exclusive caches, should help slightly reduce memory bus contention. In addition, RDNA3 is supposed to also bring larger internal GPU caches outside of any infinity cache present. All of that ahould translate to a substantial gpu performance uplift.

Sorry, is that the CPU or GPU L2 that is getting doubled?

rainy · May 16, 2022

NTMBK said:
Sorry, is that the CPU or GPU L2 that is getting doubled?

Zen 4 core will have 1 MB of L2.

Btw, RDNA 3 would have bigger L2 as well.

Glo. · May 16, 2022

NTMBK said:
Sorry, is that the CPU or GPU L2 that is getting doubled?

4 MB of L2 for 1536 ALUs in PHX, vs 2 MB of L2 for 768 ALUs in RMB.

Shivansps · May 16, 2022

RDNA3 may be more memory efficient. I would find very hard to belive they would do 100% IGP increase whiout any changes to the mem subsystem.

LightningZ71 · May 16, 2022

I also suspect that the RDNA3 iGPU will last for a couple of generations, meaning that it may be “forward looking” with respect to DDR5 memory bandwidth. In other words, it may be moderately memory starved in Phoenix, but may be more performant in later generations with even higher clocks and faster bins of memory.

NTMBK · May 16, 2022

rainy said:
Zen 4 core will have 1 MB of L2.

Btw, RDNA 3 would have bigger L2 as well.

And the CPU L2 increase is definitely coming to APUs? We've seen different cache sizes between server and APU designs before.

maddie · May 16, 2022

Shivansps said:
RDNA3 may be more memory efficient. I would find very hard to belive they would do 100% IGP increase whiout any changes to the mem subsystem.

It would have to be if the Navi 3 rumors are true. I also agree with you on wasting area ( surplus shaders) is unbelievable.

Glo. · May 16, 2022

Shivansps said:
RDNA3 may be more memory efficient. I would find very hard to belive they would do 100% IGP increase whiout any changes to the mem subsystem.

GCN to RDNA1 resulted in better memory bandwidth efficiency. RDNA2 amplified it by using Infinity Cache. Based on speculation in the web, RDNA3 also is supposed to do more with less, similarly to the jump from GCN to RDNA1.

So yeah, people expecting only 60% performance increase from 100% ALU increase, 100% L2 cache increase, 33% core clocks increase, and potentially 20-25% "IPC" of the ALUs increase are a bit pessimistic.

We should see AT LEAST GTX 1660 Ti desktop performance from Phoenix in highest end SKU, because of the maths that we know about, or we can make educated guesses. What we don't know is how 25% per-ALU performance increase on Navi 33 would be achieved. What changes has RDNA3 undergone compared to RDNA1 and 2.

And if its inherently part of the throughput of the ALUs, Phoenix's iGPU should be around RTX 2060 desktop in performance, even without the Infinity Cache.

rainy · May 16, 2022

NTMBK said:
And the CPU L2 increase is definitely coming to APUs? We've seen different cache sizes between server and APU designs before.

L3 was indeed smaller for desktop/mobile APUs, however L2 was the same (512 KB per core) for all segments with Zen 3/Zen 2.

https://en.wikichip.org/wiki/amd/microarchitectures/zen_3
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2

Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Member

Diamond Member

Senior member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Member

Platinum Member

Golden Member

Platinum Member

Lifer

Senior member

Diamond Member

Diamond Member

Platinum Member

Lifer

Diamond Member

Diamond Member

Senior member