Discussion AMD’s Custom APU Discussion Thread

Tigerick · Jul 26, 2022

Next Gen XBox Consoles and Handheld Preliminary Specs

	ROG Xbox Ally / SteamDeck OLED	ROG Xbox Ally X	Xbox Series X	PS5 Pro	Magnus
Date	2025	2025	2020	2024
Codename	Aerith Plus	Strix Point		Custom
Model	Z2A	Z2 Extreme AI
				N4P	N3P + N3P ?
				279 mm2	144 + 264 = 408 mm2
CPU	4 x Zen 2	4 x Zen 5 + 4 x Zen 5c	8 x Zen 2	8 x Zen 2	Zen 6
GPU	RDNA2 8CU	RDNA3.5 16CU	RDNA2 52CU	RDNA2 60CU	Navi5 68CU
Memory	128-bit LPDDR5-6400	128-bit LPDDR5x-8000	320-bit 10GB GDDR6 + 192-bit 6GB GDDR6	256-bit 16GB GDDR6	192-bit GDDR7
Memory Bandwidth	102 GB/s	120 GB/s	560 GB/s	576 GB/s	864 GB/s

marees · Jul 18, 2025

Tigerick said:
View attachment 127297

MLID leaked about Magnus: could be SoC for upcoming PS6. PS5 Pro's SoC is having 279 mm2 @ N4P, Magnus comes with total 408 mm2 @ 3nm. Pretty big die but kind of expected for next generation console.

Could be xbox

https://twitter.com/x/status/1946062711730602372

MS_AT · Jul 18, 2025

3 big cores and 8 dense ones, looks weird.

marees · Jul 18, 2025

So the man said he used chat gpt to conclude it is for Sony

It is chat gpt vs @Kepler_L2 now

Kepler_L2 · Jul 18, 2025

MS_AT said:
3 big cores and 8 dense ones, looks weird.

Probably 10C/20T in the console, with one extra Zen6c for binning

basix · Jul 18, 2025

Or they simply had some space left, to put another core on the Die.

But 144m2 for the SoC part seems to be rather big. Are there additional memory PHY on the SoC Die? Or does the GPU Die really have a 384bit SI instead of 192bit?

MS_AT · Jul 18, 2025

Kepler_L2 said:
Probably 10C/20T in the console, with one extra Zen6c for binning

So 2 + 8 or 3+7? The latter looks even more weird (sorry if something is not a power of 2 it looks weird

)

Still I would expect for 12 cores a mix of 8 + 4 dense with 4 reserved for the OS/background stuff, and 8 perf fully available for games. Now the devs will need to make core aware thread placement.

Kepler_L2 · Jul 18, 2025

basix said:
Or they simply had some space left, to put another core on the Die.

But 144m2 for the SoC part seems to be rather big. Are there additional memory PHY on the SoC Die? Or does the GPU Die really have a 384bit SI instead of 192bit?

It looks like the GPU die is just Shader Engines + GPU Front-end, Memory controllers + MALL? And SoC is everything else (CPU, Display, Media, PCIe, NPU?)

ToTTenTranz · Jul 18, 2025

Kepler_L2 said:
It looks like the GPU die is just Shader Engines + GPU Front-end, Memory controllers + MALL? And SoC is everything else (CPU, Display, Media, PCIe, NPU?)

Does it make sense to put the NPU far from the GPU cores and caches? That way they can't use the NPU to offload FSR4 or other ML upscalers in games, for example.

Kepler_L2 · Jul 18, 2025

ToTTenTranz said:
Does it make sense to put the NPU far from the GPU cores and caches? That way they can't use the NPU to offload FSR4 or other ML upscalers in games, for example.

They aren't going to run upscalers on the NPU.

gdansk · Jul 18, 2025

Heterogeneous cores in a console. Running an operating system with a Microsoft-written scheduler.
That's a recipe for success.

marees · Jul 19, 2025

The speculation is that performance will be between 5080 and 4090/5080 super

Assuming this releases next november what would be the street price of the 5080 super at that point in time ? I don't see the console selling below that

As a sop to gamers, Microsoft could partner with AMD for an RDNA 5 based z3 extreme ? for handhelds by various 3rd party partners

Not sure if microsoft will also partner with AMD for a medusa halo like APU

basix · Monday at 11:04 AM

Kepler_L2 said:
They aren't going to run upscalers on the NPU.

Wouldn't that be a really nice use case on APUs? XDNA2 does support FP8 (50 TFLOPS) und is much more powerful than the iGPU in this regard. It would give AMDs iGPUs a quite relevant competitive edge.

When I look at FSR4 Redstone and its upcoming features (Neural Radiance Caching, Ray Regeneration, ML based frame generation, enhanced ML based super resolution), all could potentially benefit from NPU acceleration. Even without RT, only FG and SR would already be worth it.
So instead of having a mediocre gaming experience or using precious Die area for a bigger GPU, leverage the NPU to amortize its cost. At least for gaming use cases. Currently, the NPU is dark silicon for most of the time. If APU gaming gets into the mix, only few people would say no to that and the added cost for the NPU is better leveraged.

Kepler_L2 · Monday at 11:11 AM

basix said:
Wouldn't that be a really nice use case on APUs? XDNA2 does support FP8 (50 TFLOPS) und is much more powerful than the iGPU in this regard. It would give AMDs iGPUs a quite relevant competitive edge.

When I look at FSR4 Redstone and its upcoming features (Neural Radiance Caching, Ray Regeneration, ML based frame generation, enhanced ML based super resolution), all could potentially benefit from NPU acceleration. Even without RT, only FG and SR would already be worth it.
So instead of having a mediocre gaming experience or using precious Die area for a bigger GPU, leverage the NPU to amortize its cost. At least for gaming use cases. Currently, the NPU is dark silicon for most of the time. If APU gaming gets into the mix, only few people would say no to that and the added cost for the NPU is better leveraged.

It's a lot of work to port GPU code to NPU, it isn't really that fast and upscaling on the NPU introduces an additional frame of input lag.

basix · Monday at 11:23 AM

I just talk about the matrix accelerated part of the algorithm. Everything else would stay on the GPU. That would not add a single frame latency.
Just look at the NPU as "remote matrix core / accelerator" but as it is on Die, not that remote as if PCIe or something else would be in-between.

Sure, it would be effort but it should be doable nevertheless. Prequisite is that matrix operations are cleanly separated from the rest of the algorithm. If it is intertwined with many other operations, it would get a mess.
We should be able to evaluate that when AMD releases the FSR4 code on GPUOpen.

Kepler_L2 · Monday at 12:26 PM

basix said:
I just talk about the matrix accelerated part of the algorithm. Everything else would stay on the GPU. That would not add a single frame latency.
Just look at the NPU as "remote matrix core / accelerator" but as it is on Die, not that remote as if PCIe or something else would be in-between.

Sure, it would be effort but it should be doable nevertheless. Prequisite is that matrix operations are cleanly separated from the rest of the algorithm. If it is intertwined with many other operations, it would get a mess.
We should be able to evaluate that when AMD releases the FSR4 code on GPUOpen.

There is an extra frame of latency, just look at the Windows Upscaler using the X Elite NPU.

basix · Monday at 3:02 PM

As far as I know, that one functions completely differently compared to FSR or DLSS.

Edit:
Here is a presentation from ARM at Siggraph 2024, where they describe different ML approaches to upsampling. What they came up with was basically a super-charged FSR2 ("v3 - parameter prediction"):
- "Regular" Upsampling algorithm like FSR2+ or DLSS2+ (FSR2 was the basis in the ARM presentation)
- Filter weight / parameter prediction with a DNN showed best results ("v3" in the presentation; Microsoft's Auto SR is more of an image prediction like "v1" in the presentation). Pretty much what AMD and Nvidia are doing with FSR4 and DLSS2+

https://community.arm.com/cfs-file/__key/communityserver-blogs-components-weblogfiles/00-00-00-20-66/siggraph_5F00_mmg_5F00_2024_5F00_mobile_5F00_nss_2D00_LiamONeil_2D00_v7_2D00_speakers_5F00_notes.pdf

Now, if the DNN has clean interfaces (pre-compute/post-compute in the presentation get executed on the GPU, DNN in-between on an "accelerator"), it might get executed on any DNN accelerator HW. Might be matrix cores in the GPU or offloaded to an NPU.
I do not see any fundamental obstacle to that. Sure - datapaths, caches etc. need to be supported and any kind of API to do that on a chip. But as it is possible to move data from CPU to GPU and vice versa, that would also be doable looking at GPU and NPU. Also, no added frames of latency.

In the presentation it is not clear, what ARM has in mind (NPU or GPU acceleration). They just call it "𝐴𝑐𝑐𝑒𝑙𝑒𝑟𝑎𝑡𝑜𝑟𝑇𝑂𝑃𝑠"

marees · Thursday at 10:18 PM

More code names drop

orion (PS6 ?)
Robin (PS5 handheld ?)
Robin plus (PS5 handheld revision ?)
Canis (Microsoft handheld?)

CORRECTIONS:

orion (PS6)
Robin (PS5 blockchain mining version)
Robin plus (PS5 blockchain mining version)
Canis (PS6 handheld)

https://twitter.com/x/status/1948823527072710917

soresu · Friday at 8:06 AM

marees said:
Canis (Microsoft handheld?)

Siriusly? 😁

marees · Friday at 4:06 PM

soresu said:
Siriusly? 😁

Update:

marees said:
More code names drop

orion (PS6 ?)

Robin (PS5 handheld ?)

Robin plus (PS5 handheld revision ?)

Canis (Microsoft handheld?)

CORRECTIONS:

orion (PS6)

Robin (PS5 blockchain mining version)

Robin plus (PS5 blockchain mining version)

Canis (PS6 handheld)

https://twitter.com/x/status/1948823527072710917

soresu · Friday at 4:50 PM

marees said:
Update:

I think my joke was a bit too subtle, my bad 😅

Sirius = Alpha Canis Majoris (brightest star of the Great Dog constellation)

soresu · Friday at 4:57 PM

Seems very odd for PS6 and PS6 handheld to have opposing codenames tho.

It would have made a lot more sense for PS6 to be Magnus or Majoris rather than Xbox Next.

DZero · Friday at 5:02 PM

Kepler_L2 said:
Probably 10C/20T in the console, with one extra Zen6c for binning

Or maybe the core is for the UI and background tasks

marees said:
https://twitter.com/x/status/1948823527072710917

Is so funny how the situation evolved. From a funny situation to see the unveiling of the return of the handheld consoles.

soresu · 2025-07-26T16:23:54-0400

DZero said:
From a funny situation to see the unveiling of the return of the handheld consoles

I mean, they never left and on Nintendo side have usually been the best selling consoles in any generation.

Given Nintendo are being increasingly full of themselves that may not be the case for Switch 2 this time around.

Sony have a real chance to make inroads this time if they don't mess it up.

If they made a 'lite' version of PSVR to go with it for a HMD/NED in place of TV tethering then it could be just what the doctor ordered.

Something along the lines of Bigscreen Beyond in size/weight would be perfect, with the console acting as the compute/battery/IO 'puck' equivalent to Apple Vision Pro.

Doesn't even need to do full tracked VR, just to display a beeg screen with some visual passthrough.

Discussion AMD’s Custom APU Discussion Thread

Senior member

Golden Member

Senior member

Golden Member

Senior member

Member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Golden Member

Member

Senior member

Member

Senior member

Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member