Question Zen 6 Speculation Thread

adroc_thurston · 2025-08-22T18:53:01-0400

Anacapols said:
Honestly I've come around on the AMD NPUs, they're great dataflow accelerators when used for non-AI tasks, even if the focus on low precision datatypes looks kind of annoying when trying to get the full performance out of them for other purposes.
But generally the architecture and hardware is fascinating and looks very capable, even if using them for general purpose tasks is entirely underexplored IMO and likely will remain that way if they do get abandoned.
(Think of them as a non-graphics GPU with less flexible compute scheduling (cores don't change what code they're executing on the fly) but able to utilize some of the routing and spatial optimization tricks you normally see in FPGAs to shuffle data around more efficiently)

That said, I'd not be suprised at all if they end up replaced by more capable graphics hardware considering the die space they occupy and the fact that people actually do program for GPUs already, which can do many of the same things even if not always quite as good, and do a lot of other things better.

They're just a waste of area when amdgcn exists.

Anacapols · 2025-08-22T18:57:43-0400

No argument there, they don't really have a place in consumer CPUs when considering a product or cost perspective (which are both extremely relevant ofc, the die area is currently better spend on other things for 99.9% of users), but they're interesting and IMO underexplored nonetheless.

marees · 2025-08-22T19:24:34-0400

Anacapols said:
Honestly I've come around on the AMD NPUs, they're great dataflow accelerators when used for non-AI tasks, even if the focus on low precision datatypes looks kind of annoying when trying to get the full performance out of them for other purposes.
But generally the architecture and hardware is fascinating and looks very capable, even if using them for general purpose tasks is entirely underexplored IMO and likely will remain that way if they do get abandoned.
(Think of them as a non-graphics GPU with less flexible compute scheduling (cores don't change what code they're executing on the fly) but able to utilize some of the routing and spatial optimization tricks you normally see in FPGAs to shuffle data around more efficiently)

That said, I'd not be suprised at all if they end up replaced by more capable graphics hardware considering the die space they occupy and the fact that people actually do program for GPUs already, which can do many of the same things even if not always quite as good, and do a lot of other things better.

AMD has claimed that at anytime they can add back the NPU as a xilinx fpga in an instant if customers need it

Right now only Qualcomm has an use case for NPU but that is a large one to replace GPU for some kind of edge inference (which the halo chip can easily do now)

marees · 2025-08-22T19:27:22-0400

marees said:
AMD has claimed that at anytime they can add back the NPU as a xilinx fpga in an instant if customers need it

Right now only Qualcomm has an use case for NPU but that is a large one to replace GPU for some kind of edge inference (which the halo chip can easily do now)

This one

Dell's new laptop ditches the GPU for a discrete NPU — here's why that's a big deal

News
By Luke J. Alden published 26 May 2025
Dell’s Pro Max Plus is built to run massive AI models locally. No GPU, no cloud, no compromises.

Dell ran a 109-billion-parameter Llama 4 model in a live demo on the laptop without an Internet connection or cloud server.
you get 32 AI cores, 64GB of LPDDR4x memory, and around 450 TOPS (trillions of operations per second) of 8-bit AI compute.

The Qualcomm AI 100 card is built on a 7nm process and uses two chips connected over PCIe. Each one offers 16 AI cores and 32GB of memory. Together, they act as a unified engine with enough bandwidth to handle some of the largest models available today.

In terms of thermal management, the card is designed to operate under a 75W thermal design power, which is considerably more than typical NPUs found in consumer laptops (usually under 10W).

https://www.laptopmag.com/laptops/dells-new-laptop-ditches-gpu-for-npu

AMD: We’re Exploring A Discrete GPU Alternative For PCs

By Dylan Martin
July 30, 2025, 4:08 PM EDT

Rahul Tikoo, a top AMD PC executive, tells CRN that the chip designer is ‘talking to customers’ about ‘use cases’ and ‘potential opportunities’ for a dedicated accelerator chip that is not a GPU but could be a neural processing unit. ‘We can get there pretty quickly,’ he says.

Rahul Tikoo, the head of AMD’s client CPU business, confirmed that the Santa Clara, Calif.-based company is “talking to customers” about “use cases” and “potential opportunities” for a dedicated accelerator chip that is not a GPU but could be a neural processing unit (NPU) in response to a CRN question at a briefing held last month before AMD’s Advancing AI event.

The CTO of AMD systems integration partner Sterling Computers told CRN last week that he believes the way AMD is using the AI engine technology from its Xilinx acquisition to serve as the basis for an NPU component in Ryzen processors “opens up a broad path” for the company to introduce discrete products with faster NPU performance in the future.
https://www.crn.com/news/components...re-exploring-discrete-gpu-alternative-for-pcs

yottabit · 2025-08-22T19:32:26-0400

If they’re going to waste die space on borderline fixed-function hardware maybe they can integrate an ASIC to compile shaders for Unreal Engine games

Magras00 · 2025-08-23T02:32:28-0400

ToTTenTranz said:
How is pJ-per-byte for LP6 compared to GDDR7?
It does lend to very high memory capacities that many will be looking for running AI models locally.

Micron stated 4.5pJ/bit for GDDR7 vs GDDR6's is 6.5pJ/bit

Soldered LPDDR5X in Grace CPU is ~5pJ/bit, other figures I saw mentioned around 4-4.1pJ/bit. LPDDR6 will prob go sub 3pJ/bit

~~There's also Samsung LPW going as low as 1.2pJ/bit, while the figure I saw quoted multiple times is 1.9pJ/bit for other LPW designs.~~ Irrelevant see #5,707.

~~No idea which implementation AMD will use, but~~ like @adroc_thurston said power draw it's much lower.

adroc_thurston · 2025-08-23T03:20:38-0400

Magras00 said:
There's also Samsung LPW going as low as 1.2pJ/bit

That's just HBM for people too poor (or not bandwidth-hungry enough) for HBM.

Thibsie · 2025-08-23T05:13:22-0400

marees said:
AMD: We’re Exploring A Discrete GPU Alternative For PCs
By Dylan Martin
July 30, 2025, 4:08 PM EDT

Rahul Tikoo, a top AMD PC executive, tells CRN that the chip designer is ‘talking to customers’ about ‘use cases’ and ‘potential opportunities’ for a dedicated accelerator chip that is not a GPU but could be a neural processing unit. ‘We can get there pretty quickly,’ he says.

Rahul Tikoo, the head of AMD’s client CPU business, confirmed that the Santa Clara, Calif.-based company is “talking to customers” about “use cases” and “potential opportunities” for a dedicated accelerator chip that is not a GPU but could be a neural processing unit (NPU) in response to a CRN question at a briefing held last month before AMD’s Advancing AI event.

The CTO of AMD systems integration partner Sterling Computers told CRN last week that he believes the way AMD is using the AI engine technology from its Xilinx acquisition to serve as the basis for an NPU component in Ryzen processors “opens up a broad path” for the company to introduce discrete products with faster NPU performance in the future.
https://www.crn.com/news/components...re-exploring-discrete-gpu-alternative-for-pcs

IMO this is just a marketing blabla basically 'cos they axed the NPU but blabla we can put it back if clients ask us to blabla.

Magras00 · 2025-08-23T06:05:53-0400

Joe NYC said:
I find it quite intriguing that AMD is able to contain so much of the bandwidth requirements using on die L2s, to the point that AMD can get away with LPDDR memory.

LPDDR6 is very fast even with early spec. AT3 with 384bit quad channel LPDDR6 @12Gbps has 576GB/s memory BW. Half-way between 4070 TI and 9070XT. Should be plenty with nextgen clean slate RDNA 5 µarch and ISA and unified L2 and MALL like NVIDIA.

Still wondering how big that L2 will be on AT3 and AT4. 20MB seems too low, but perhaps they'll beef up wiring and cache control circuitry like they did with RDNA 4 and even 32MB might be enough.

Joe NYC said:
AMD could be trolling NVidia on low end with big LPDDR5 memory sizes.

What I wonder though, why not doe the same throughout the stack?

If NVidia can go up to 512 bit memory bus (8 channels) why not go to 6 LPDDR6 channels with high end card, which would be 576 bits?

Because then, if the biggest LPDDR5 memory chip is 64 GB, the high end professional / AI card could have 384 GB, which would be maximum trolling.

Or maybe split high end gaming to use GDDR7 and high end professional / AI with LPDDR6.

But, it's also good to keep in mind that NVidia is also doing a lot of work with LPDDR across the product stack, so AMD may not have a monopoly here.

There's no need. 4GB modules over 192bit = 24GB, so doubt PS6 will go any higher. 24GB seems like the sweet spot.

Gaming stack could look like this
AT0 36/48GB
AT2 24GB
AT3 24/32GB
AT4 12/16GB

At some point LPDDR6 PHYs become comically large and GDDR7 makes more sense. Maybe a split memory controller design (LPDDR6 + GDDR7) but that will probably be too much work.

AT3 can already tap out at 512GB without clamshell. 64GB x 8 = 512GB. 576bit would be 768GB.

Datacenter and very high end professional will probably lean into some other tech like HBF in addition to HBM, but we'll see.

Yeah isn't N1X and N1 using LPDDR5X?

511 · 2025-08-23T06:16:54-0400

Magras00 said:
LPDDR6 is very fast even with early spec. AT3 with 384bit quad channel LPDDR6 @12Gbps has 576GB/s memory BW. Half-way between 4070 TI and 9070XT. Should be plenty with nextgen clean slate RDNA 5 µarch and ISA and unified L2 and MALL like NVIDIA.

The introductory speed is 10667 same as LPDDR5X 10677 the only difference is channel width don't forget DDR5 got introduced with DDR5-4800

https://www.jedec.org/sites/default/files/Brett%20Murdock_FINAL_Mobile_2024.pdf

Magras00 · 2025-08-23T09:30:02-0400

511 said:
The introductory speed is 10667 same as LPDDR5X 10677 the only difference is channel width

AT3 based dGPU likely no earlier than late 2027 which is why I used LPDDR6 12000 and not 10677, but even with LPDDR6 10677 over 384bit bus mem BW is still ~2% ahead of 4070 TI.

soresu · 2025-08-23T10:39:55-0400

511 said:
The introductory speed is 10667 same as LPDDR5X 10677 the only difference is channel width

I think we can safely assume that bits/W is a significant difference, likely a result of the differing channel width among other features.

Search

Question Zen 6 Speculation Thread

adroc_thurston

Diamond Member

Anacapols

Junior Member

marees

Golden Member

marees

Golden Member

Dell's new laptop ditches the GPU for a discrete NPU — here's why that's a big deal

AMD: We’re Exploring A Discrete GPU Alternative For PCs

yottabit

Golden Member

Magras00

Member

adroc_thurston

Diamond Member

Thibsie

Golden Member

AMD: We’re Exploring A Discrete GPU Alternative For PCs

Magras00

Member

511

Diamond Member

Magras00

Member

soresu

Diamond Member

TRENDING THREADS

Question Zen 6 Speculation Thread

Diamond Member

Junior Member

Golden Member

Golden Member

Dell's new laptop ditches the GPU for a discrete NPU — here's why that's a big deal​

AMD: We’re Exploring A Discrete GPU Alternative For PCs​

Golden Member

Member

Diamond Member

Golden Member

AMD: We’re Exploring A Discrete GPU Alternative For PCs​

Member

Diamond Member

Member

Diamond Member

Dell's new laptop ditches the GPU for a discrete NPU — here's why that's a big deal

AMD: We’re Exploring A Discrete GPU Alternative For PCs

AMD: We’re Exploring A Discrete GPU Alternative For PCs