Question Zen 6 Speculation Thread

Page 229 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

adroc_thurston

Diamond Member
Jul 2, 2023
6,330
8,910
106
Honestly I've come around on the AMD NPUs, they're great dataflow accelerators when used for non-AI tasks, even if the focus on low precision datatypes looks kind of annoying when trying to get the full performance out of them for other purposes.
But generally the architecture and hardware is fascinating and looks very capable, even if using them for general purpose tasks is entirely underexplored IMO and likely will remain that way if they do get abandoned.
(Think of them as a non-graphics GPU with less flexible compute scheduling (cores don't change what code they're executing on the fly) but able to utilize some of the routing and spatial optimization tricks you normally see in FPGAs to shuffle data around more efficiently)

That said, I'd not be suprised at all if they end up replaced by more capable graphics hardware considering the die space they occupy and the fact that people actually do program for GPUs already, which can do many of the same things even if not always quite as good, and do a lot of other things better.
They're just a waste of area when amdgcn exists.
 
  • Like
Reactions: Joe NYC

Anacapols

Junior Member
Mar 2, 2025
8
19
41
No argument there, they don't really have a place in consumer CPUs when considering a product or cost perspective (which are both extremely relevant ofc, the die area is currently better spend on other things for 99.9% of users), but they're interesting and IMO underexplored nonetheless.
 
Last edited:
  • Like
Reactions: BorisTheBlade82

marees

Golden Member
Apr 28, 2024
1,458
2,050
96
Honestly I've come around on the AMD NPUs, they're great dataflow accelerators when used for non-AI tasks, even if the focus on low precision datatypes looks kind of annoying when trying to get the full performance out of them for other purposes.
But generally the architecture and hardware is fascinating and looks very capable, even if using them for general purpose tasks is entirely underexplored IMO and likely will remain that way if they do get abandoned.
(Think of them as a non-graphics GPU with less flexible compute scheduling (cores don't change what code they're executing on the fly) but able to utilize some of the routing and spatial optimization tricks you normally see in FPGAs to shuffle data around more efficiently)

That said, I'd not be suprised at all if they end up replaced by more capable graphics hardware considering the die space they occupy and the fact that people actually do program for GPUs already, which can do many of the same things even if not always quite as good, and do a lot of other things better.
AMD has claimed that at anytime they can add back the NPU as a xilinx fpga in an instant if customers need it

Right now only Qualcomm has an use case for NPU but that is a large one to replace GPU for some kind of edge inference (which the halo chip can easily do now)
 
  • Like
Reactions: Elfear

marees

Golden Member
Apr 28, 2024
1,458
2,050
96
AMD has claimed that at anytime they can add back the NPU as a xilinx fpga in an instant if customers need it

Right now only Qualcomm has an use case for NPU but that is a large one to replace GPU for some kind of edge inference (which the halo chip can easily do now)
This one

Dell's new laptop ditches the GPU for a discrete NPU — here's why that's a big deal​

News
By Luke J. Alden published 26 May 2025
Dell’s Pro Max Plus is built to run massive AI models locally. No GPU, no cloud, no compromises.

Dell ran a 109-billion-parameter Llama 4 model in a live demo on the laptop without an Internet connection or cloud server.
you get 32 AI cores, 64GB of LPDDR4x memory, and around 450 TOPS (trillions of operations per second) of 8-bit AI compute.



The Qualcomm AI 100 card is built on a 7nm process and uses two chips connected over PCIe. Each one offers 16 AI cores and 32GB of memory. Together, they act as a unified engine with enough bandwidth to handle some of the largest models available today.

In terms of thermal management, the card is designed to operate under a 75W thermal design power, which is considerably more than typical NPUs found in consumer laptops (usually under 10W).


https://www.laptopmag.com/laptops/dells-new-laptop-ditches-gpu-for-npu


AMD: We’re Exploring A Discrete GPU Alternative For PCs​

By Dylan Martin
July 30, 2025, 4:08 PM EDT

Rahul Tikoo, a top AMD PC executive, tells CRN that the chip designer is ‘talking to customers’ about ‘use cases’ and ‘potential opportunities’ for a dedicated accelerator chip that is not a GPU but could be a neural processing unit.We can get there pretty quickly,’ he says.

Rahul Tikoo, the head of AMD’s client CPU business, confirmed that the Santa Clara, Calif.-based company is “talking to customers” about “use cases” and “potential opportunities” for a dedicated accelerator chip that is not a GPU but could be a neural processing unit (NPU) in response to a CRN question at a briefing held last month before AMD’s Advancing AI event.


The CTO of AMD systems integration partner Sterling Computers told CRN last week that he believes the way AMD is using the AI engine technology from its Xilinx acquisition to serve as the basis for an NPU component in Ryzen processors “opens up a broad path” for the company to introduce discrete products with faster NPU performance in the future.
https://www.crn.com/news/components...re-exploring-discrete-gpu-alternative-for-pcs
 
Last edited:

Magras00

Member
Aug 9, 2025
28
60
46
How is pJ-per-byte for LP6 compared to GDDR7?
It does lend to very high memory capacities that many will be looking for running AI models locally.

Micron stated 4.5pJ/bit for GDDR7 vs GDDR6's is 6.5pJ/bit

Soldered LPDDR5X in Grace CPU is ~5pJ/bit, other figures I saw mentioned around 4-4.1pJ/bit. LPDDR6 will prob go sub 3pJ/bit

There's also Samsung LPW going as low as 1.2pJ/bit, while the figure I saw quoted multiple times is 1.9pJ/bit for other LPW designs. Irrelevant see #5,707.

No idea which implementation AMD will use, but like @adroc_thurston said power draw it's much lower.
 
Last edited:
  • Like
Reactions: Joe NYC

Thibsie

Golden Member
Apr 25, 2017
1,111
1,305
136

AMD: We’re Exploring A Discrete GPU Alternative For PCs​

By Dylan Martin
July 30, 2025, 4:08 PM EDT

Rahul Tikoo, a top AMD PC executive, tells CRN that the chip designer is ‘talking to customers’ about ‘use cases’ and ‘potential opportunities’ for a dedicated accelerator chip that is not a GPU but could be a neural processing unit.We can get there pretty quickly,’ he says.

Rahul Tikoo, the head of AMD’s client CPU business, confirmed that the Santa Clara, Calif.-based company is “talking to customers” about “use cases” and “potential opportunities” for a dedicated accelerator chip that is not a GPU but could be a neural processing unit (NPU) in response to a CRN question at a briefing held last month before AMD’s Advancing AI event.


The CTO of AMD systems integration partner Sterling Computers told CRN last week that he believes the way AMD is using the AI engine technology from its Xilinx acquisition to serve as the basis for an NPU component in Ryzen processors “opens up a broad path” for the company to introduce discrete products with faster NPU performance in the future.
https://www.crn.com/news/components...re-exploring-discrete-gpu-alternative-for-pcs

IMO this is just a marketing blabla basically 'cos they axed the NPU but blabla we can put it back if clients ask us to blabla.
 

Magras00

Member
Aug 9, 2025
28
60
46
I find it quite intriguing that AMD is able to contain so much of the bandwidth requirements using on die L2s, to the point that AMD can get away with LPDDR memory.

LPDDR6 is very fast even with early spec. AT3 with 384bit quad channel LPDDR6 @12Gbps has 576GB/s memory BW. Half-way between 4070 TI and 9070XT. Should be plenty with nextgen clean slate RDNA 5 µarch and ISA and unified L2 and MALL like NVIDIA.

Still wondering how big that L2 will be on AT3 and AT4. 20MB seems too low, but perhaps they'll beef up wiring and cache control circuitry like they did with RDNA 4 and even 32MB might be enough.

AMD could be trolling NVidia on low end with big LPDDR5 memory sizes.

What I wonder though, why not doe the same throughout the stack?

If NVidia can go up to 512 bit memory bus (8 channels) why not go to 6 LPDDR6 channels with high end card, which would be 576 bits?

Because then, if the biggest LPDDR5 memory chip is 64 GB, the high end professional / AI card could have 384 GB, which would be maximum trolling.

Or maybe split high end gaming to use GDDR7 and high end professional / AI with LPDDR6.

But, it's also good to keep in mind that NVidia is also doing a lot of work with LPDDR across the product stack, so AMD may not have a monopoly here.

There's no need. 4GB modules over 192bit = 24GB, so doubt PS6 will go any higher. 24GB seems like the sweet spot.

Gaming stack could look like this
AT0 36/48GB
AT2 24GB
AT3 24/32GB
AT4 12/16GB

At some point LPDDR6 PHYs become comically large and GDDR7 makes more sense. Maybe a split memory controller design (LPDDR6 + GDDR7) but that will probably be too much work.

AT3 can already tap out at 512GB without clamshell. 64GB x 8 = 512GB. 576bit would be 768GB.

Datacenter and very high end professional will probably lean into some other tech like HBF in addition to HBM, but we'll see.

Yeah isn't N1X and N1 using LPDDR5X?
 
Last edited:
  • Like
Reactions: Joe NYC and marees

511

Diamond Member
Jul 12, 2024
3,708
3,491
106
LPDDR6 is very fast even with early spec. AT3 with 384bit quad channel LPDDR6 @12Gbps has 576GB/s memory BW. Half-way between 4070 TI and 9070XT. Should be plenty with nextgen clean slate RDNA 5 µarch and ISA and unified L2 and MALL like NVIDIA.
The introductory speed is 10667 same as LPDDR5X 10677 the only difference is channel width don't forget DDR5 got introduced with DDR5-4800
1755944119451.png