Discussion AMD's Soundwave ARM APU: The Beginning of Transformation !!!

Page 23 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Jan Olšan

Senior member
Jan 12, 2017
603
1,182
136
Upcoming Q1? X Elite 2 should honestly embarrass Panther Lake CPU wise.


Apples not a chip maker, makes sense why they leave it out as the chips only come with their devices.

However, Qualcomm doesn’t and that’s weird….
It's a norm in the mobile/ARM ecosystem as such (or perhaps x86/PC is the odd helpful one in the first place).

Have you ever seen TDP value quoted for *any* phone SoC?
 

poke01

Diamond Member
Mar 8, 2022
4,606
5,916
106
Have you ever seen TDP value quoted for *any* phone SoC?
Nope

TDPs are funny in laptops and frankly kinda useless especially the way Intel implements them.

Take the 226V, Intel rates the turbo TDP as 37watts maximum but the SoC goes up to 55 watts under full load which Intel does not mention. The base TDP is 17watts which is only useful if turbo boost is turned off.


And the way AMD implements them on mobile CPUs is better but still vague, they give a default TDP and a configurable TDP.
 
Last edited:

Doug S

Diamond Member
Feb 8, 2020
3,746
6,613
136
It's a norm in the mobile/ARM ecosystem as such (or perhaps x86/PC is the odd helpful one in the first place).

Have you ever seen TDP value quoted for *any* phone SoC?

Why would they? It is meaningless since you can't choose another CPU for your phone or build your own. It started in the PC world because you can select different models with different CPUs, or buy a CPU and install it in a board yourself and you need to know what kind of cooling it requires.
 

fastandfurious6

Senior member
Jun 1, 2024
838
999
96
Zen6 with 12-core and 48MB L3 cache will be fabbed by N3P process, not N2 as last rumor leaked. Zen6c with 32-core and 128MB die will be fabbed by N2 process.

can't be true......

if true then amd will lose #1.... why would they concede lead just to save some bucks?
 
  • Like
Reactions: atokuz

Tigerick

Senior member
Apr 1, 2022
919
834
106
7. Zen 7 no more - Prometheus is AMD's first custom ARM core after Soundwave.
Lisa Su says AMD is on track to a 100x power efficiency improvement by 2027.

Remember Lisa Su mentioned AMD is going to achieve 100x (times not percentages) power efficiency improvement by 2027. Verano seems to be the platform for it. No, it is not Zen6c platform refresh (Venice with 256-core consumes about 600W TDP), it is going to be new ARM platform with focus on power efficiency so that AMD could divert the power onto GPU/AI just like NV's Vera.

NV's Vera is targeting for end of 2026. AMD's Verona is a total platform solution, the CPU cores could be ready by those times as well. If I have to guess, Qualcomm's ARM server CPU could be ready by end of 2026/2027 pending roadmap unveiling. And the man behind all the changes is Microsoft's WoA Server Edition. Just like client platform, Microsoft is the one initiated and set the rules (70TOPS AI) for upcoming WoA 12. It is the new operating system for next decade focusing on AI. Qualcomm, NV, AMD and Mediatek are going to pursue the opportunity. Except Intel :cool:, that's why Intel bets heavily on foundry business..

If 70 TOPS is mandatory requirement for Windows 12, AMD's Gorgon Point and Intel's PTL are not able to meet Co-Pilot++ status. I bet all new ARM SoCs are able to achieve that including AMD's Soundwave.

I have already done explaining the changes behind in the past pages, let's wait for Soundwave and WoA12 unveiling...:cool:
 
Last edited:

inquiss

Senior member
Oct 13, 2010
590
852
136
Lisa Su says AMD is on track to a 100x power efficiency improvement by 2027.

Remember Lisa Su mentioned AMD is going to achieve 100x (times not percentages) power efficiency improvement by 2027. Verano seems to be the platform for it. No, it is not Zen6c platform refresh (Venice with 256-core consumes about 600W TDP), it is going to be new ARM platform with focus on power efficiency so that AMD could divert the power onto GPU/AI just like NV's Vera.

NV's Vera is targeting for end of 2026. AMD's Verona is a total platform solution, the CPU cores could be ready by those times as well. If I have to guess, Qualcomm's ARM server CPU could be ready by end of 2026/2027 pending roadmap unveiling. And the man behind all the changes is Microsoft's WoA Server Edition. Just like client platform, Microsoft is the one initiated and set the rules (70TOPS AI) for upcoming WoA 12. It is the new operating system for next decade focusing on AI. Qualcomm, NV, AMD and Mediatek are going to pursue the opportunity. Except Intel :cool:, that's why Intel bets heavily on foundry business..

If 70 TOPS is mandatory requirement for Windows 12, AMD's Gorgon Point and Intel's PTL are not able to meet Co-Pilot++ status. I bet all new ARM SoCs are able to achieve that including AMD's Soundwave.

I have already done explaining the changes behind in the past pages, let's wait for Soundwave and WoA12 unveiling...:cool:
So that I understand, they spent a bunch of time convincing people that you need more CPU grunt to get the most out of GPU. You think that AMD doing a from scratch Arm design is the way they will get above zen 6 performance?
 

StefanR5R

Elite Member
Dec 10, 2016
6,793
10,828
136
This figure is about datacenter-level power efficiency in model training, and it takes not only efficiency gains per node and of the rack-level and DC-level interconnects into account, but also speculated software advances. I.e., the figure includes more than AMD's own part of it.

The figure also has a lot to do with the fact that AMD doesn't have an actual integrated rack-level AI solution on the market before 2026, unlike a certain competitor.

Finally, AMD proposing this figure also has a lot to do with countering potential concerns of investors regarding the "The Limits to Growth" (you may want to look it up if it doesn't ring a bell) in AI.
 
Last edited:

Panino Manino

Golden Member
Jan 28, 2017
1,150
1,389
136
Podcast Advent of Computing released an episode about the iAPX286, which got me thinking again.
Besides how "bad" Intel was at that time, that maybe we have am opportunity to get rid of the x86 legacy.
Suppose AMD gets really serious about their ARM CPUs and in a feel years their ARM are as good or better than their Ryzen CPUs. If Intel once again misses the timing and Qualcomm is still trying to bring their CPUs to the desktop and Windows, couldn't the market agree to make the switch? Legacy software could still be run by emulation.

Because one day we'll eventually move on from x86, right?
Can't this day start in the next three years?
 

Thunder 57

Diamond Member
Aug 19, 2007
4,206
6,995
136
Podcast Advent of Computing released an episode about the iAPX286, which got me thinking again.
Besides how "bad" Intel was at that time, that maybe we have am opportunity to get rid of the x86 legacy.
Suppose AMD gets really serious about their ARM CPUs and in a feel years their ARM are as good or better than their Ryzen CPUs. If Intel once again misses the timing and Qualcomm is still trying to bring their CPUs to the desktop and Windows, couldn't the market agree to make the switch? Legacy software could still be run by emulation.

Because one day we'll eventually move on from x86, right?
Can't this day start in the next three years?

I'd bet AMD has some ARM project hidden away, but I doubt that is their priority. Why compete with many when you can compete with one?
 

Cheesecake16

Member
Aug 5, 2020
40
164
106
This figure is about datacenter-level power efficiency in model training, and it takes not only efficiency gains per node and of the rack-level and DC-level interconnects into account, but also speculated software advances. I.e., the figure includes more than AMD's own part of it.

The figure also has a lot to do with the fact that AMD doesn't have an actual integrated rack-level AI solution on the market before 2026, unlike a certain competitor.

Finally, AMD proposing this figure also has a lot to do with countering potential concerns of investors regarding the "The Limits to Growth" (you may want to look it up if it doesn't ring a bell) in AI.
Ehhh... I would disagree with calling the NVL72 system a rackscale system as it isn't actually a single system image (SSI) across the whole NVL72 as there are actually 18 system images in a NVL72 system (1 per compute node). Which actually means that the DGX/HGX system is the most GPUs you can get in a single system image, which is the same as the MI350X platform.

Now, if MI400 also isn't a SSI then I would also not call that a rackscale system either... and honestly, SGI beat them all to a rackscale system back in the early '00s with the Altix 3000 series of supercomputers.
 

StefanR5R

Elite Member
Dec 10, 2016
6,793
10,828
136
(OT) SSI has remained a niche in supercomputing. SSI or not, what matters is that your accelerators have local memory and then there is remote memory, and your application has got to deal with that. As for AI, I admit I am not into it and am not informed what the actual state-of-the art scaling factors [or feasible model sizes] are with one vendor or the other once multiple nodes are involved.
 

marees

Platinum Member
Apr 28, 2024
2,038
2,672
96
Highly Confident:

When Reuter reported NV and AMD are going to launch ARM SoC for PC:


My first reaction is NV yeah possible, AMD no way. With full lineup of mobile APU coming namely Sonoma Valley, Kraken Point, Strix Point and Sarlak. What is the point? All the above SoCs almost covers all the performance (from 64-bit to 256-bit LPDDR5x) and price points.

However, when I started a thread for upcoming LPDDR6 here, I think AMD might create new lineup for ARM SoC with LPDDR6, more info below:

I contacted my source, turned up there might have some truths there. Apparently, MS Surface which presumably will upgrade Surface Pro X to Snapdragon X Elite this year. Next year 2025 and later, NV and AMD have won the contract for upcoming Surface X series. PS: And I believe here is the source of Reuter's leaks. If Microsoft going to launch new Surface in 2025, the specs must be better than upcoming Surface Pro X w/ X Elite.......or not? See below:



Highly Speculation

My source insisted that AMD will create custom off-shelf ARM core for Microsoft. They won't be used on standard PC lineup, I actually have different opinion.

Dimensity 9500AMD's SoundwaveMedusa Point R7Medusa Point R9Gorgon PointPanther Lake-HQualcomm's X G2Qualcomm's X Elite G2
Launch DateQ4-2025Q4-2025Q1-2027 ?Q1-2027 ?Q1-2026Q1-2026?Q4-2025
NodeN3PN3PN3P ?N3P + N3P ?N4P18-A + N3E + N6N3PN3P
Die111213 + Base11
5G ModemIntegratedNANANANANAExternalExternal
Memory Bus64-bit LPDDR5T-9600 ?192-bit LPDDR6 ???128-bit LPDDR5x128-bit LPDDR5x128-bit LPDDR5x192-bit LPDDR6 ?
CoolingFanlessFanFanFanFanFanFanFan
CPU
Prime1 x X935 + 3 x X930?4 x Zen 616 x Zen 64 x Zen 546 x Oryon V312 x Oryon V3
Performance4 x A730?4 x Zen 6c4 x Zen 6c8 x Zen 5c86 x Oryon V3-M6 x Oryon V3-M
Total Cores8?82012121218
Total Threads8?16 + 2 LPe40 + 2 LPe2412 + 4 LPe1218
ExtensionSMESMEAVX-512AVX-512AVX-512AVX-10SME + SVE2SME + SVE2
GPUDrage G935 ?RDNA3.5+ FSR4 16MB ?RDNA3.5+ FSR4 ?RDNA3.5+ FSR4 ?RDNA3.5XE3??
Compute Unit12?8 CU8 CU16 CU12??
NPU100 TOPS ?70 TOPS ?70 TOPS ?70 TOPS ?65 TOPS50 TOPS70 TOPS ?70 TOPS ?




Cortex-X3Cortex-X4Cortex-X925Cortex-X935Cortex-X945 ?
SoC Launch DateQ4 2022Q4 2023Q4 2024Q4 2025Q4 2026
SoCSD 8 Gen 2SD 8 Gen 3D9400D9500D9600
NodeN4N4PN3EN3PN2
Clock Speed2.0 - 3.2 GHz3.3 GHz
L2 Cache1 MB2 MB ?
L3 Cache8 MB12 MB
ROB320384
Decode610
Int ALU68
Max Memory Speed64-bit LPDDR5X-840064-bit LPDDR5X-960064-bit LPDDR5X-960096-bit LPDDR6 ?
Memory BW67 GB/s77 GB/s77 GB/s
GB6 1T~ 2043~ 2277~ 2800
Can Valve use this for a steam deck 2 ??
 

Tigerick

Senior member
Apr 1, 2022
919
834
106
Team Red on WoA - AMD and Samsung


Ask Exynos.jpg

I asked the question one and half year ago, now I have the answer below:

Red ARM Lineup.jpg

Samsung and AMD have signed a deal in 2019; AMD would supply RDNA IP to Samsung. I wonder why ? TLDW, please read my team green ARM lineup you will get the same reason behind the partnerships between AMD and Samsung. There is a difference though, I am afraid the deal only applies to three SoCs, namely E2400, E2500 and E2600. The reason lies in the table above.

I am expecting Team Red to target 30% out of 100M per year, that would be 30M per year.
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
919
834
106
Once you guys studied the table, you will know Soundwave is actually a bigger brother of Exynos 2600 with C1 cores. Thus, Soundwave should have same price target of upcoming X2 or Apple M5. However, Qualcomm has released binned version of X2-EE with 128-bit memory bus. Let's check out the specs of X2 Elite with 128-bit LPDDR5x-9600 memory:
  • 12 Prime cores up to 4.7GHz
  • 6 performance cores up to 3.4GHz
  • X2-90 iGPU running 1.7GHz. Geez, Qualcomm won't even bother to name its GPU. I am expecting 6 WGP x 4 CU = 24CU.
  • 80 TOPS (INT8)
  • Integrated X75 5G modem
  • TDP : ~50W


Damn impressive, how about upcoming X2 model with 128-bit LPDDR5x-8533? :
  • 6 Prime cores
  • 6 performance cores
  • iGPU should have 4 WGP x 4 CU = 16 CU
  • 80 TOPS (INT8)
  • Integrated 5G Modem
  • TDP: ~28W


Thus my expectation of Soundwave with 128-bit LPDDR5x-9600? is below:
  • Combination of C1-Ultra and C1-Premium - more than 12 core count (STX and Medusa Premium have 4+8 with SMT)
  • RDNA4.5 - at least 12CU
  • 80 TOPS (INT8)
  • Integrated 5G Modem support?
 
Last edited:

Tigerick

Senior member
Apr 1, 2022
919
834
106
RDNA4.5: Real Dual Issue Implementation?

When I appraised E2600 which shown boosted total clock speeds, I thought SF finally improve their PPA with SF2 process. Turn out it is due to the reduction of CU from 16 to 8 in iGPU. Then it hits me: Could RDNA4.5 has finally implement real dual issue design?

Apple has tweaked FP16 pipelines to boost 85% TOPS as explained here. And based on SNL scores, Soundwave's 16CU? might get 78% boost compared to STX's 16CU RDNA3.5.

Soundwave and Exynos 2600 might be getting new generation of RDNA architecture. I would say RDNA4.5 is more like mobile version of RDNA5. That explains why Medusa Point has only 8CU because there are big changes in pipeline design.
 
Last edited:

Farfle

Member
Jan 10, 2006
95
5
71
Is Intel Panther Lake confirmed to not have an 80tops NPU? With Qualcomm X Elite 2 having 80tops NPU, and with your rumors above about AMD SoundWave having an 80tops NPU as well, it surely points to your rumors of Microsoft brewing some sort of WoA 12 edition. Exciting!
 

Tigerick

Senior member
Apr 1, 2022
919
834
106
RDNA3.5+: Real Dual Issue Implementation?

When I appraised E2600 which shown boosted total clock speeds, I thought SF finally improve their PPA with SF2 process. Turn out it is due to the reduction of CU from 16 to 8 in iGPU. Then it hits me: Could RDNA3.5+ has finally implement real dual issue design?

Apple has tweaked FP16 pipelines to boost 85% TOPS as explained here. And based on SNL scores, Soundwave's 16CU? might get 78% boost compared to STX's 16CU RDNA3.5.

Soundwave and Exynos 2600 might be getting new generation of RDNA architecture. I would say RDNA3.5+ is more like mobile version of RDNA5. That explains why Medusa Point has only 8CU because there are big changes in pipeline design.

Even though A19 Pro's FP16 shown 85% boost, Geekerwan's benchmarking of 3 real games shown average 54% faster FPS. Thus, I am assuming new RDNA architecture will boost average 40% in FPS, not as high as SNL but more realistic estimation.

AMD is definitely working on new shader architecture, namely GFX125x. As for any new design which boost FPS, AMD will try to hide the technique especially with new ARM platform. Therefore, I am creating tables below trying to demystify the ISA codenames. If anyone has any new info, feel free to pitch in. :D

GFX11.5 - RDNA 3.5CUSPShader ISA
Strix Point161024gfx1150
Strix Halo402560gfx1151
Kracken Point8512gfx1152
GFX12.0 - RDNA 4
N44322048gfx1200
N48644096gfx1201



GFX12.5 - RDNA 4.5?PlatformCU (New ISA)CU (Old ISA)Shader ISA
Exynos 2600ARM8 CU16 CU
SoundwaveARM12 CU?24 CU
Medusa Point MDS1x868 CU16 CU
Medusa Point MDS2x864 CU8 CU
Medusa Point MDS3x862 CU4 CU



GFX13.0 - RDNA 5?PlatformCU (New ISA)CU (Old ISA)Shader ISA
Medusa Premiumx862448
ARM + AT4 ?ARM2448
Medusa Halox864896
ARM + AT3 ?ARM4896
AT2x8664128
AT0x86154308

PS: Almost all the above new SoCs / GPUs are targeting to be released in 2027; well except E2600 and Soundwave....
 
Last edited:
  • Like
Reactions: Schmide

Tigerick

Senior member
Apr 1, 2022
919
834
106
2. Performance

ArchitectureDate AnnouncedFeaturesARM coreCustom CoreRISC-V
ARMv8.6-AQ4-2019
  • GEMM
  • Apple A17 Pro (N3B, Q3-2023)
ARMv8.7-AQ4-2020
  • Oryon V1
ARMv9.0-AQ1-2021 ?
  • SVE2
  • TME
  • CCA
  • Cortex-X2 (2021)
  • Cortex-X3 (N4, 2022)
  • Apple M3 (N3B, Q4-2023)
RVA-23 - SiFive P870
ARMv9.2-AQ4-2020
  • SME
  • Cortex-X4 (N4P, 2023)
  • Cortex-X925 (N3E, 2024)
  • Apple M4 (N3E, Q2-2024)
  • Apple A18 Pro (N3E, Q3-2024)
ARMv9.3-AQ4-2021
  • Non-maskable interrupts
  • Instructions to optimize memcpy() and memset() style operations
  • Enhancements to PAC
  • Hinted conditional branches
  • C1 (N3P, Q3-2025)
  • Apple A19 Pro (N3P, Q3-2025) ?
ARMv9.4-AQ4-2022
  • Virtual Memory System Architecture (VMSA) enhancements.
  • SME2
  • Guarded Control Stack (GCS)
  • Confidential Computing
  • C2 (N2, Q3-2026) ?
  • NV Vera ?
ARMv9.5-AQ4-2023
  • FP8 support (E5M2 and E4M3 formats) added to SME2, SVE2, Advanced SIMD (Neon)
  • C3 (N2P, Q3-2027) ?
ARMv9.6-AQ4-2024
  • Improved SME efficiency with structured sparsity and quarter tile operation
  • New SVE instructions for expand/compact and finding first/last active element
ARMv9.7-AQ4-2025
  • Targeted memory invalidation broadcasts
  • Flexible resource management (MPAMv2)
  • 6-bit data types for Artificial Intelligence
  • Video-codecs updates
  • GICv5
ARMv9.8-AQ4-2026

Table said thousands of words...I also listed Risc-V latest profile, RVA23, the performance is about Cortex-X2 level. RISC-V still very behind in the development cycle, let's see how much improvement in newer profile..

Yep, ARM has published latest ISA, ARM v9.7-A this month. I have updated the table above to show the progress of ARM ISA. Usually it takes 3-4 years for new ISA to be released as end product. Sometimes, C1 could mix features from v9.3-A and v9.4-A. That's mean ARM v9.7-A SoC could be released by the end of 2029.

The upcoming NV Vera most likely based on V9.4-A with SMT support. And all the vendors are accessing the same ISA, it is up to them to custom with features like SMT, that includes Apple, Qualcomm and AMD.
 
Last edited:
  • Like
Reactions: igor_kavinski
Jul 27, 2020
28,173
19,203
146
The upcoming NV Vera most likely based on V9.4-A with SMT support. And all the vendors are accessing the same ISA, it is up to them to custom with features like SMT, that includes Apple, Qualcomm and AMD.
Finally! I think this is where AMD could excel with their ARM APU. Their SMT implementation seems to be the strongest in the industry. Well, if you discount IBM but they operate in niche markets.