Discussion AMD's Soundwave ARM APU: The Beginning of Transformation !!!

Page 26 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

marees

Platinum Member
Apr 28, 2024
2,271
2,891
96
Why do Apple still keep Neural Engine in M5?

With Neural Accelerator (aka NV's Tensor core) feature being introduced in GPU, why don't Apple ditch the Neural Engine aka NPU? According to Bilibili, a lot of AI calculation will benefit from GPU + NA as shown in the graph below:

View attachment 132487

He doesn't really specify the reason for keeping NE. I could guess Apple still has some dedicated features require dedicated NPU. But Apple is not expanding the TOPS of NPU as shown below:

View attachment 132488

About 7% faster is aligned with my calculation. It seems everybody knows about TOPS formula, but nobody really knows the MAC units and frequency as shown in Hubweb. Hubweb gets the correct clock speed but not the TOPS.

Back to Windows platform, Microsoft is the one who set the rules for Windows. 80TOPS of INT8 NPU should be the standard of Windows 12. Thus all OEMs have to adhere to the standard. As shown in the layout of D9500 below, Mediatek has to spare about 20% of die area (~30mm2) to accommodate 16384 MAC units in order to hit 100TOPS. And if you are studying my table, AMD will introduce XDNA3 with different clock speed with different TDP. Soundwave is designed to be power efficient APU, that's why AMD won't set much higher TOPS than 80; unlike Medusa Point and Premium. Hope you guys learned something in TOPS calculation including power TDP. :cool:

Oh ya, Nova Lake's NPU6 will most likely comes with 16384 MAC units in order to hit 80TOPS. PTL as usual is one step behind in NPU's TOPS...:p

View attachment 132489
I still don't see the need for an NPU
 

Tigerick

Senior member
Apr 1, 2022
942
857
106
There is Hope: It is AMD Soundwave

Sounwave Specs.jpg

Hoho, here is my final speculation about Soundwave APU: the challenger of Apple M5. I know most of the leakers are getting same roadmap as MLID. My advice still remains: Don't believe blindly without your own judgement. AMD won't spend billion designing ARM APU just for low power APU which itself is similar with upcoming NV N1x (4-core A725)..

What I believe:
  • 128-bit LPDDR5x-9600, same as M5
  • 16MB of MALL Cache. Do you really think AMD will bundle 4 CU with 16MB Cache?
Here are my speculated specs:
  • About 200mm2 die size, same as Medusa Point. Based on my calculation of FF5 dimension, I am expecting 200mm2 die size, bigger than M5's ~180mm2...
  • 4th generation AIE. First lie, there are only two generations of XDNA. Thus AMD will use XDNA3 with ~80TOPS as calculated here., the standard of Windows 12.
  • 2P+4E CPU. As listed in the table above, AMD most likely using combination of Prime and premium cores; 6+6 config is what I suspected; the same as upcoming Qualcomm's X2.
  • 4 x RDNA3.5+ CU. AMD has been trying to hide the GPU architecture in the roadmap by mentioning RDNA3, 4 and 5. I am believing AMD will name the latest generation of RDNA as RDNA4.5 as speculated here. I am expecting 12CU with 16MB MALL Cache. Unlike Medusa Point which will employ 8CU RDNA4.5; AMD won't use MALL Cache for Medusa Point. That's the difference between clock speeds in x86 and ARM platform. AMD will create more dies than Zen5 generation to cater for Zen6 and ARM platforms. No doubt AMD will be trying to use the same IP for GPU and NPU for N3P node. I am expecting more than 10 dies of 3nm for desktop and mobile APU from AMD. And many dies are being created for one sole reason: XDNA3 AIE.
  • One PCIe5.0 x4 controller to support high speed SSD. X2-EE has PCIe 5.0 x 12 support; wonder what is Qualcomm's usage?
  • Integrated 5G modem: Hoho, remember AMD did sign agreement with Samsung in 2019.
  • Android PC support: AMD has provided Android driver support for Samsung Exynos for years. Thus, it is safe to assume AMD is ready to support upcoming Android PC from day one. How the tide will turn: ARM platform will be getting first party support for Android PC, OTOH x86 platform will face app compatibility issue. AND AMD is ready from the beginning starting with Soundwave APU: The Beginning of Transformation.
  • TDP: ~20 (Fanless Surface) - 28W (Fan T&L notebooks). The same position as M5: Fanless iPad and Fan-based MacBook Pro.
  • SRP: > USD $1000
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
942
857
106
RDNA4+.jpg
Hoho, what is RDNA4-derived micro-architecture ???

Doesn't it refer to RDNA4+ or RDNA4.5 ???

Per agreement in 2019: Why does AMD license their latest RDNA IP to Samsung with Android support? Think guys.... :rolleyes:

I have calculated FP32 with above info. There are few ways to hit 6TF with 1.4GHz. One of them is through doubling of SP per CU. Let's see is it the RDNA4.5 improvement which we will see in Soundwave and E2600. As I said E2600 5G is little brother of Soundwave with half memory bandwidth. Do you still think Soundwave which is fabbed by N3P and 128-bit memory bus comes with tiny CPU and GPU cores? I have decoded the XDNA3 AIE, now it is RDNA4.5 turn...:cool:

PS: E2600 with 8CU's SNL scores hit 3100+, higher than A19 Pro. Go figures. :cool:
 
Last edited:

Doug S

Diamond Member
Feb 8, 2020
3,836
6,784
136
Why does Apple still keep Neural Engine in M5?

With Neural Accelerator (aka NV's Tensor core) feature being introduced in GPU, why don't Apple ditch the Neural Engine aka NPU? According to Bilibili, a lot of AI calculation will benefit from GPU + NA as shown in the graph below:

The NPU is optimized for low power, plus it doesn't take up much space. Think of the GPU as being the AI "P cores" and the NPU being the AI "E cores".
 
  • Like
Reactions: marees

marees

Platinum Member
Apr 28, 2024
2,271
2,891
96
You think Valve is going from ~131mm 6nm part to a ~370mm 4nm part? I guess anything's possible...
Very similar in specs to Fremont (steam frame) although as of now discrete 7600m probably cheaper than 8060s igpu
 

Tigerick

Senior member
Apr 1, 2022
942
857
106
Deckard's SoC most likely is using upcoming XR2 Gen3 which should be launching in CES2026. Based on leaks from Brad, XR2 Gen3 should be using variants of Oryon; that's mean 8 Elite SoC, the first gen, N3E with 64-bit LPDDR5x memory interface. He is wrong about GPU performance though; iGPU performance should be double than XR2 Gen2 as compared here.

As for Steam Deck 2 changing SoC to ARM platform: Hoho, I know eventually Valve will be switching platform but didn't expect in second generation. Steam Deck currently commands about 50% of all handheld market share. That's mean by 2028, SD2 (OLED-1080p?) with ARM SoC should be available along with SD1 (OLED-720P) with Aerith+ SoC. Why do you think Valve is going for ARM platform in which Valve has to provide PROTON layer for x86-ARM CPU and DirectX-Vulkan GPU. The jobs are much harder than OS-layer translation. And yet Valve goes for it....I will let you guys think why. :p

AMD loses the contract most likely because they don't have off-the-shelf solution for ARM platform. Soundwave is too powerful and expensive (~200mm2) for SD2. Valve needs something like E2600 5G with LPDDR6 support in which Qualcomm could supply.

AND here is my prediction: Once Valve changes to ARM platform, they won't switch back to x86. That's mean the successor of SD2 will be using ARM platform as well. And Valve gets to choose whoever vendor offering the best features and prices including Qualcomm, AMD or NV. That's how business works. :rolleyes:

I hope that put everyone in perspective: ARM is the future of computing including handheld and tablet, period.
 
Last edited:

Thunder 57

Diamond Member
Aug 19, 2007
4,291
7,097
136
Deckard's SoC most likely is using upcoming XR2 Gen3 which should be launching in CES2026. Based on leaks from Brad, XR2 Gen3 should be using variants of Oryon; that's mean 8 Elite SoC, the first gen, N3E with 64-bit LPDDR5x memory interface. He is wrong about GPU performance though; iGPU performance should be almost double than XR2 Gen2 as compared here.

As for Steam Deck 2 changing SoC to ARM platform: Hoho, I know eventually Valve will be switching platform but didn't expect in second generation. Steam Deck currently commands about 50% of all handheld market share. That's mean by 2028, SD2 (OLED-1080p?) with ARM SoC should be available along with SD1 (OLED-720P) with Aerith+ SoC. Why do you think Valve is going for ARM platform in which Valve has to provide PROTON layer for x86-ARM CPU and DirectX-Vulkan GPU. The jobs are much harder than OS-layer translation. And yet Valve goes for it....I will let you guys think why. :p

AMD loses the contract most likely because they don't have off-the-shelf solution for ARM platform. Soundwave is too powerful and expensive (~200mm2) for SD2. Valve needs something like E2600 5G with LPDDR6 support in which Qualcomm could supply.

AND here is my prediction: Once Valve changes to ARM platform, they won't switch back to x86. That's mean the successor of SD2 will be using ARM platform as well. And Valve gets to choose whoever vendor offering the best features and prices including Qualcomm, AMD or NV. That's how business works. :rolleyes:

I hope that put everyone in perspective: ARM is the future of computing including handheld and tablet, period.

Ah yes the ARMada to let us know again what the future holds. I would be surprised if the Deck 2 went ARM. Perhaps in the future but right now it would probably piss off too many people. Just look at what happened days ago with the AMD GPU driver debacle.
 

Tigerick

Senior member
Apr 1, 2022
942
857
106
I have created a table in the frontpage in anticipate of AMD's new roadmap unveiling in Financial Analyst Day. Can't wait to fill in the slots later .... :p

I have said AMD has prepared more than 10 dies of 3nm nodes (some might be N2) for 2026/2027. We know Zen7 will be available from Q3-2029/2030. Do you really think AMD will have empty hands in 2028-2029?

A14 is considered 2nd Gen GAA node from TSMC which will be used for Zen7 CPU+APU. How about N2-N2P-A16 nodes? Does AMD have any plans for first gen GAA?

I have rough ideas how AMD going to position their roadmap in the next five years. Thanks to leaks from MLID about Zen7 Grimlock letting me have a clearer picture as well. Here is the list of AMD mainstream APU in the past and future:
  1. Rembrandt: Q2-2022, N6, 128-bit LPDDR5-6400
  2. Phoenix: Q1-2023, N4, 128-bit LPDDR5x-7500
  3. Strix Point: Q2-2024, N4P, 128-bit LPDDR5x-8000
  4. Soundwave?: Q1-2026, N3P, 128-bit LPDDR5x-9600 ~ Apple M5, Q4-2025, N3P, 128-bit LPDDR5x-9600
  5. Medusa Point MDS1?: Q1-2027, N3P, 128-bit LPDDR5x-9600
  6. 2028
  7. 2029
  8. Grimlock Point 2?: Q1-2030, A14, ?
Grimlock Point1 should be replacing Medusa Premium, Gator Range and canceled MDS1+CCD. Thus, Grimlock Point 2 should be the mainstream APU solution which should be released in Q1-2030. There are at least two years gap between the releases of MDS1 and Grimlock2...we shall hear more in 2 days times. :cool:
 
Last edited:

Thunder 57

Diamond Member
Aug 19, 2007
4,291
7,097
136
I have created a table in the frontpage in anticipate of AMD's new roadmap unveiling in Financial Analyst Day. Can't wait to fill in the slots later .... :p

I have said AMD has prepared more than 10 dies of 3nm nodes (some might be N2) for 2026/2027. We know Zen7 will be available from Q3-2029/2030. Do you really think AMD will have empty hands in 2028-2029?

A14 is considered 2nd Gen GAA node from TSMC which will be used for Zen7 CPU+APU. How about N2-N2P-A16 nodes? Does AMD have any plans for first gen GAA?

I have rough ideas how AMD going to position their roadmap in the new five years. Thanks to leaks from MLID about Zen7 Grimlock letting me have a clearer picture as well. Here is the list of AMD mainstream APU in the past and future:
  1. Rembrandt: Q2-2022, N6, 128-bit LPDDR5-6400
  2. Phoenix: Q1-2023, N4, 128-bit LPDDR5x-7500
  3. Strix Point: Q2-2024, N4P, 128-bit LPDDR5x-8000
  4. Soundwave?: Q1-2026, N3P, 128-bit LPDDR5x-9600 ~ Apple M5, Q4-2025, N3P, 128-bit LPDDR5x-9600
  5. Medusa Point MDS1?: Q1-2027, N3P, 128-bit LPDDR5x-9600
  6. 2028
  7. 2029
  8. Grimlock Point 2?: Q1-2030, A14, ?
Grimlock Point1 should be replacing Medusa Premium, Gator Range and canceled MDS1+CCD. Thus, Grimlock Point 2 should be the mainstream APU solution which should be released in Q1-2030. There are at least two years gap between the release of MDS1 and Grimlock2...we shall hear more in 2 days times. :cool:

That sounds far too early for Soundwave.
 
  • Like
Reactions: marees

Tigerick

Senior member
Apr 1, 2022
942
857
106
Geez, what a disappointed FAD. AMD can't even list proper yearly roadmap like last time. NP, I have listed all leaked APUs in the thread above; let's see what APUs AMD going to release in the next five years...

Have you wonder why AMD has not mentioned about Verano CPU; which supposed to be Zen6 Venice+ according to leakers. OTOH, AMD has been keen on mentioning Helios rack with Venice and MI400 series. Where is next gen AI Rack with Verano? AMD has even shown MI500's projected AI performance which will be released in 2027; same timing as next gen AI rack and Verano. Why keeping quiet? Something to think about.... :rolleyes: :p Not to mention about AMD's unveiling of Zen7 roadmap...
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
942
857
106
XDNA3 and XDNA4 Roadmap Unveils

Hoho, as decoded by me here, Soundwave will be using 3rd gen of AIE, namely XDNA3 which is coming online in 2026. Not some bluff of 4th Gen AIE in the leaked specs of SWV, I will let you guess why. XDNA4 which should be bundled with Zen7 APU offers better scale and power efficiency. It sounds like XDNA4's MAC units should remain the same, just better efficiency due to A14 node. Things could change, and my formula is ready for the changes in the future. :cool:

As for second slide, as calculated by me with the help from MLID, Medusa could have up to 110 TOPS is indeed correct. Thus, 10X AI Performance increases refer to the TOPS from 10 to 110. What the first slide shown is just marketing talk, as long as you know how to calculate the TOPS, you won't be fool..:p And I hope you realize my assumption of Magnus SoC = Medusa Premium SoC is correct. AMD is indeed double down on the AIE; in fact many Medusa dies are being created just for one sole reason: XDNA3.

AIE-XDNA Roadmap.jpg
AIE-TOPS.jpg
 
Last edited:

eek2121

Diamond Member
Aug 2, 2005
3,482
5,174
136
Ah yes the ARMada to let us know again what the future holds. I would be surprised if the Deck 2 went ARM. Perhaps in the future but right now it would probably piss off too many people. Just look at what happened days ago with the AMD GPU driver debacle.

ICYMI: Valve has been working on FEX, an x86 emulation layer for non x86 devices. Their new VR headset (using Qualcomm ARM chips) is relying on that and also PC streaming.

They will use that platform to take the same approach they took with SteamOS: refine FEX and other tooling to the point where games run just as well as Windows. At that point, they will drop an ARM based Steam Deck.

Valve is doing all the hard work other companies aren’t doing. Kudos to them. ARM will finally become a viable gaming platform…eventually.
 
  • Like
Reactions: marees

NTMBK

Lifer
Nov 14, 2011
10,523
6,048
136
ICYMI: Valve has been working on FEX, an x86 emulation layer for non x86 devices. Their new VR headset (using Qualcomm ARM chips) is relying on that and also PC streaming.

They will use that platform to take the same approach they took with SteamOS: refine FEX and other tooling to the point where games run just as well as Windows. At that point, they will drop an ARM based Steam Deck.

Valve is doing all the hard work other companies aren’t doing. Kudos to them. ARM will finally become a viable gaming platform…eventually.
This is fundamentally different from Proton. Proton/Wine is an alternative implementation of the Windows APIs- if Valve do a good job, that's no reason why they can't match or exceed native Windows performance. But emulating a different instruction set is going to always have an overhead; whether using on the fly recompilation, or runtime interpretation, the overhead is going to exist.