Question AMD Phoenix/Zen 4 APU Speculation and Discussion

uzzi38 · Apr 28, 2022

I can finally make this thread.

https://twitter.com/x/status/1519669375283957760

Phoenix is indeed RDNA3. My advice to everyone: treat the old APU rumours as being out of date.

TESKATLIPOKA · Dec 28, 2022

BorisTheBlade82 said:
@TESKATLIPOKA
For this to work AMD would need to unify cache access for CPU and GPU first. Intel is moving into the opposite direction because of disaggregation. My guess is that AMD will eventually follow suit with their Mobile lineup.

IMHO adding a unified cache by 3D stacking over/under multiple chiplets is cost prohibitive for the nearer future. Doing this with 2.5D would be rather expensive as you need a lot of bandwidth - so EFB or InFo-R to each consuming chiplet would be needed in order to keep consumption in check.
Not saying that it won't happen - but there is quite some way to go.

Ok, then maybe 3d stacking is too costly and out of the equation.
I think AMD will use a unified cache in the future.
It could look like this:
1 or 2 CPU chiplets, not sure If they won't separate big little cores sot hey can use different libraries, although for N3 It supposedly no longer matters.
1 cache chiplet, 1 IGP chiplet and 1 IO chiplet which could be also part of cache chiplet.

BorisTheBlade82 · Dec 28, 2022

TESKATLIPOKA said:
Ok, then maybe 3d stacking is too costly and out of the equation.
I think AMD will use a unified cache in the future.
It could look like this:
1 or 2 CPU chiplets, not sure If they won't separate big little cores sot hey can use different libraries, although for N3 It supposedly no longer matters.
1 cache chiplet, 1 IGP chiplet and 1 IO chiplet which could be also part of cache chiplet.

Yep, this is pretty much my line of thinking as well. An IOD with stacked cache connected to the CPU/GPU chiplets with a high bandwidth Interconnect just like RDNA3.
The thing is: For this to work you would need a >1 TByte/s connection to each chiplet, which is more than all the MCD to GCD connections of RDNA3 in total.
That is the beauty of the current design: Because of the big L3 cache on the CCD you can compensate the relatively low bandwidth of the IFoP.

Glo. · Dec 28, 2022

TESKATLIPOKA said:
Low end dGPUs won't become irrelevant, even if Phoenix would perform like them or better. You should realize Intel has a weak IGP, and they sell much larger volumes than AMD.

Once the iGPUs have enough memory bandwidth to compete even with 107 dies from Nvidia, and this goes for both: AMD and Intel - its game over for entry level dGPUs.

People who have budget of 250$ on dGPU, because that will be the minimum cost of 107 dies, from this moment on, will have a choice.

Either they buy miniPC with everything integrated, or building separate PC, with... god knows how high budget.

69$ - case, 69$ - PSU, dGPU - 250$, CPU - minimum 120$, you already are in 500$ price range.

Minisforum UM690 with 6900HX, 16 GB RAM, and 512 GB SSD - 650$, total, and it comes with Windows 11 Pro.

Lets say that future generations of MiniPCs will increase performance exponentially, and we will get either Strix Point, or Arrow Lake-p NUC, both with the same iGPU performance - RX 6600-RTX 3060.

What incentive do you guys see for building your own PC?

MiniPC will be cheaper to buy, more efficient.

And yes, I wish we would get 256 bit memory channels on mainstream CPUs. 8000 MHz 256 bit DDR5 - 256 GB/s bandwidth.

Glo. · Dec 28, 2022

TESKATLIPOKA said:
Ok, then maybe 3d stacking is too costly and out of the equation.
I think AMD will use a unified cache in the future.
It could look like this:
1 or 2 CPU chiplets, not sure If they won't separate big little cores sot hey can use different libraries, although for N3 It supposedly no longer matters.
1 cache chiplet, 1 IGP chiplet and 1 IO chiplet which could be also part of cache chiplet.

I believe that Chiplets, cache chiplets will work similarly to Navi 31.

Cache+ memory controller. That way - you design monolithic die for CPU and GPU, and move caches outside the logic dies. And you tick all of the boxes.

BorisTheBlade82 · Dec 28, 2022

Glo. said:
I believe that Chiplets, cache chiplets will work similarly to Navi 31.

Cache+ memory controller. That way - you design monolithic die for CPU and GPU, and move caches outside the logic dies. And you tick all of the boxes.

As I wrote before: Moving Caches out of (Graphics) Compute Dies poses big challenges: The sheer bandwidth needed is nothing to sneeze at. It is much simpler to disintegrate logic including caches than separating both.

Glo. · Dec 28, 2022

BorisTheBlade82 said:
As I wrote before: Moving Caches out of (Graphics) Compute Dies poses big challenges: The sheer bandwidth needed is nothing to sneeze at. It is much simpler to disintegrate logic including caches than separating both.

Nobody said anything about moving caches out of the die

. I did used that phrase, but in the context of what I have said, before that phrase it should be obvious to what I point.

It will literally work the same way as in RDNA3. As a bandwidth accelerator for the Logic. CPU will still have 16 MBs of L3 cache, and iGPU will still have L2 cache. Its Infinity Cache that will be on the Chiplets, alongside the Memory Controllers.

TESKATLIPOKA · Dec 28, 2022

Glo. said:
Lets say that future generations of MiniPCs will increase performance exponentially, and we will get either Strix Point, or Arrow Lake-p NUC, both with the same iGPU performance - RX 6600-RTX 3060.

That performance range is pretty wide. How many points in Time Spy graphic?
RTX 3060 Laptop is 6285-9235 points in Time Spy graphic.

Glo. · Dec 28, 2022

TESKATLIPOKA said:
That performance range is pretty wide. How many points in Time Spy graphic?
RTX 3060 Laptop is 6285-9235 points in Time Spy graphic.

I think safe, and reasonable, target is around 6000-6500 pts in TS.

TESKATLIPOKA · Dec 28, 2022

Glo. said:
I think safe, and reasonable, target is around 6000-6500 pts in TS.

So that's about RTX 3060 65W level of performance.
Truthfully, that's a lot.

If phoenix somehow manages 4000 points, then Strix Point would need to achieve 50-63% higher performance.
I think 16CU IGP for Strix Point is most likely and that's only 33% more CU, so frequency needs to be 13-23% higher.
Because scaling is not linear, you will need even higher frequency to compensate for this all.
And all this within 45W? I am pretty skeptical.

BorisTheBlade82 · Dec 28, 2022

Glo. said:
Nobody said anything about moving caches out of the die . I did used that phrase, but in the context of what I have said, before that phrase it should be obvious to what I point.

It will literally work the same way as in RDNA3. As a bandwidth accelerator for the Logic. CPU will still have 16 MBs of L3 cache, and iGPU will still have L2 cache. Its Infinity Cache that will be on the Chiplets, alongside the Memory Controllers.

"Cache+ memory controller. That way - you design monolithic die for CPU and GPU, and move caches outside the logic dies."

What the heck are you even writing about?

Glo. · Dec 28, 2022

TESKATLIPOKA said:
So that's about RTX 3060 65W level of performance.
Truthfully, that's a lot.

If phoenix somehow manages 4000 points, then Strix Point would need to achieve 50-63% higher performance.
I think 16CU IGP for Strix Point is most likely and that's only 33% more CU, so frequency needs to be 13-23% higher.
Because scaling is not linear, you will need even higher frequency to compensate for this all.
And all this within 45W? I am pretty skeptical.

Im not, because Strix Point will have Infinity Cache.

Secondly, the CU configuration may not be correct after all, at least for initial rumors.

But lets see what will happen on this front.

BorisTheBlade82 said:
"Cache+ memory controller. That way - you design monolithic die for CPU and GPU, and move caches outside the logic dies."

What the heck are you even writing about?

Infinity Cache. Isn't it obvious? Its funny that you completely missed this:

Glo. said:
I believe that Chiplets, cache chiplets will work similarly to Navi 31.

In the same post that you quoted.

moinmoin · Dec 28, 2022

FlameTail said:
Phoenix has an iGPU so powerful it renders low-end dGPUs irrelevant.

Previous mobile APU gens did that as well. OEMs still often coupled high-end APUs with unnecessary low-end dGPUs since they figured such products sell better. Only once that changes we may see a demand for and a development toward Apple M series style x86 laptops. Personally I'm not holding my breath.

TESKATLIPOKA said:
I forgot you can have only 2x more cache, so It could be 16MB+32MB or 32MB+64MB depending on how much you have inside die.

Is that an actual limit? That's news to me.

TESKATLIPOKA · Dec 28, 2022

Glo. said:
Im not, because Strix Point will have Infinity Cache.

Secondly, the CU configuration may not be correct after all, at least for initial rumors.

But lets see what will happen on this front.

If there will be 20CU(+67%), then that's much better, and you won't need higher clocks or at least not for shaders.

We don't know how much IGP performance will Phoenix lose because of low BW.
Adding IC to Strix Point will certainly help a lot, but first It needs to compensate the increased TFLOPs moving from Phoenix -> Strix Point, only after that It can compensate for the missing performance in Phoenix.
In my opinion, you would need at least ~32MB Infinity Cache to fully compensate for using only LPDDR5 memory.

Glo. · Dec 28, 2022

TESKATLIPOKA said:
If there will be 20CU(+67%), then that's much better, and you won't need higher clocks or at least not for shaders.

We don't know how much IGP performance will Phoenix lose because of low BW.
Adding IC to Strix Point will certainly help a lot, but first It needs to compensate the increased TFLOPs moving from Phoenix -> Strix Point, only after that It can compensate for the missing performance in Phoenix.
In my opinion, you would need at least ~32MB Infinity Cache to fully compensate for using only LPDDR5 memory.

If AMD will increase the number of CUs in Strix Point, compared to Phoenix - they will go to either for 16 or 24 CUs.

And 32 MB of IC - thats the most realistic IC cache size that we will get with Strix Point.

And don't worry about the clock speeds, I expect that SP will go to 3 GHz on Shaders.

TESKATLIPOKA · Dec 28, 2022

moinmoin said:
Is that an actual limit? That's news to me.

5800X3D had 64MB over 32MB L3 cache.
It looks like 3d cache die is not the same die size as L3 cache in CCD, but I don't think It's the best idea being on top of cores.
3d cache die is 41mm2(Link) and 32MB L3 cache is ~36mm2 as shown below.

TESKATLIPOKA · Dec 28, 2022

Glo. said:
If AMD will increase the number of CUs in Strix Point, compared to Phoenix - they will go to either for 16 or 24 CUs.

And 32 MB of IC - thats the most realistic IC cache size that we will get with Strix Point.

And don't worry about the clock speeds, I expect that SP will go to 3 GHz on Shaders.

There is no reason why It couldn't be 20CU (10 WGP). If they wanted, It could be also 18 or 22CUs If there is only one shader engine.

I am not worried about clocks, I am worried If 45W is enough to feed this + the CPU portion.

Glo. · Dec 28, 2022

TESKATLIPOKA said:
There is no reason why It couldn't be 20CU (10 WGP). If they wanted, It could be also 18 or 22CUs depending on how much die space is available.

I am not worried about clocks, I am worried If 45W is enough to feed this + the CPU portion.

To be fair, true, 10 WGP is also possible configuration.

BorisTheBlade82 · Dec 28, 2022

Glo. said:
Im not, because Strix Point will have Infinity Cache.

Secondly, the CU configuration may not be correct after all, at least for initial rumors.

But lets see what will happen on this front.

Infinity Cache. Isn't it obvious? Its funny that you completely missed this:
In the same post that you quoted.

I did not miss this. You seem to believe, that Infinity Cache is some magical thing. I can tell you, it is not. The same constraints and limits I already described do apply for this as well. It is just a nice marketing name for another ordinary kind of cache.

moinmoin · Dec 28, 2022

TESKATLIPOKA said:
5800X3D had 64MB over 32MB L3 cache.
It looks like 3d cache die is not the same die size as L3 cache in CCD, but I don't think It's the best idea being on top of cores.
3d cache die is 41mm2(Link) and 32MB L3 cache is ~36mm2 as shown below.

Early on there were rumors (and a screenshot of server BIOS settings) indicating multiple cache layers are possible which would multiply the amount added respectively. Furthermore it's a matter of planning what the cache die can cover. If a theoretical mobile APU is designed to run cold and at low frequency to begin with a bigger cache die could be both possible as well as a performance and efficiency boost. And the industry aught to be looking already at more efficient ways to cool within and between dies. But that's just me fantasizing, I just don't think there's a technical hard size limit for cache dies.

uzzi38 · Dec 28, 2022

FlameTail said:
Instead of messing around with sod infinity caches or expensive DDR5 modules,
Why can't AMD simply put a larger memory controller in Phoenix Point? What is stopping them? Apple already does this with their M chips and the results are marvellous.

256-bit or even 192-bit would suffice. Pair that with LPDDR5 (which i am sure is cheaper than standard DDR5, and you get plenty of bandwidth).

OEMs didn't want it before Apple did the thing. 256b is more expensive and less flexible than adding in a seperate dGPU. And at the end of the day, there's no point in producing chips that nobody wants.

That has changed now though, so it's only a matter of time...

poke01 · Dec 28, 2022

uzzi38 said:
OEMs didn't want it before Apple did the thing. 256b is more expensive and less flexible than adding in a seperate dGPU. And at the end of the day, there's no point in producing chips that nobody wants.

That has changed now though, so it's only a matter of time...

I would love a Zen 4/RDNA 3 based SoC. It will be very efficient and fast.

TESKATLIPOKA · Dec 29, 2022

poke01 said:
I would love a Zen 4/RDNA 3 based SoC. It will be very efficient and fast.

Then you are a lucky guy. Phoenix is what you want.

Glo. · Dec 29, 2022

uzzi38 said:
OEMs didn't want it before Apple did the thing. 256b is more expensive and less flexible than adding in a seperate dGPU. And at the end of the day, there's no point in producing chips that nobody wants.

That has changed now though, so it's only a matter of time...

The biggest problem for 256 bit memory controller is actually... the power draw.

Power consumption of mobile SOCs with 256 bit bus will go from 35-45W TDPs to 65W minimum.

uzzi38 · Dec 29, 2022

poke01 said:
I would love a Zen 4/RDNA 3 based SoC. It will be very efficient and fast.

Uh maybe not that soon for >128b

Kaluan · Dec 29, 2022

TESKATLIPOKA said:
If that leak was about gaming frequency, then the difference in clock speed will be larger, but that's something we don't know yet.

Well, that was that was my whole point, can't compare game clocks to boost clocks (the former are more realistic in mobile setups anyway) or cut die (6600M) to full die (7600M/7700M?)
(again, under similar TGP config, less silicon needed to be powered up likely means higher clocks)
Didn't say your numbers are wrong, I said the reasoning is. Apples to oranges IMO.
That would also more easily explain the performance rumour/projection.
But we still don't know for sure if that leaker was talking about mobile for sure lol

Exist50 said:
Sure, but by the time you can actually buy Phoenix, Ada-based GPUs will probably be in the market, so that's still only the low end of the stack. That's worth something, for sure, but when we start talking about even larger GPUs and more memory channels, it's a whole 'nother ball game.

AMD would have to design a new chip/platform, which is far from cheap both for them and OEMs. There are certainly some potential advantages, but it's a very tricky product to position.

And by the time we get Strix Point (CES 2024?) It'll only have a modest Lovelace refresh at best to contend with. Intel Battlemage being a wildcard.

Either way, I don't think people can say the rate of progress of IGPs hasn't dented the perception of low end dGPUs in the market. Especially on the laptop side.

Seems the thread's going some interesting speculative places 😂
Less than a week away and we can move that to a new Strix Point/Zen5/RDNA3+ one

Question AMD Phoenix/Zen 4 APU Speculation and Discussion

Platinum Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Senior member