Question Zen 6 Speculation Thread

Page 357 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Fjodor2001

Diamond Member
Feb 6, 2010
4,585
734
126
Well, possibly both.

For anyone that works professionally and NEEDS this type of performance, they will absolutely have a discrete GPU.

For the average Joe (or Jill ;) ), the iGPU in Zen6 will likely run circles around a CPU + AMX solution.

I guess I just don't see the market here for AMX.
Even if you don't NEED this type of performance via mostly AMX instructions, it'll impact workloads where part of the instructions are potentially performed by AMX on CPU, or iGPU / NPU. So it may still affect the perf to 10%, 30%, or whatever.

Regarding the tiny peasant iGPU in Zen6 running circles around 24C/48T Zen6 (or 52C/52T NVL-S) with AMX, what do you base that on? E.g. any benchmark, or just speculation?
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,455
11,185
106
Regarding the tiny peasant iGPU in Zen6 running circles around 24C/48T Zen6 with AMX (or 52C/52T NVL-S), what do you base that on? E.g. any benchmark, or just speculation?
those gemm blobs are one or two per cluster.
Their point isn't to outrace iGPs, but to pump a higher geekbench score.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,585
734
126
It's the only DT that matters. Rest are no margin povertyholes.
Again, margins does not matter in this case, since we're not talking about sales but technical feasibility.
Yeah those tend to accelerate GEMM via iGPs or dedicated GEMM blobs.
ACE/SME are for pumping out Geekbench scores.
Maybe better to use NPU instead of iGPU as AMX CPU alternative? Unless NPU is occupied with AI stuff.

Same on DT which will get NPU too.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,455
11,185
106

Fjodor2001

Diamond Member
Feb 6, 2010
4,585
734
126
Sales are only worth anything at good margin. Otherwise, you're Intel.
Again, we're talking about about technical feasibility. Regardless of whether AMD, Intel, or someone else, and what their sales are.
GEMM blobs suck at modern ML.
Translate that into links showing actual benchmarks of NPU being worse than iGPU at handling AMX CPU like tasks, given same silicon area to play with.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,455
11,185
106
Translate that into links showing actual benchmarks of NPU being worse than iGPU at handling AMX CPU like tasks, given same silicon area to play with.
You gotta wait for M5 Pro/Max for that.
Apple is the only OEM with both NPU and GPU as a first class s/w citizen.
 

MS_AT

Senior member
Jul 15, 2024
933
1,852
96
Translate that into links showing actual benchmarks of NPU being worse than iGPU at handling AMX CPU like tasks, given same silicon area to play with.
https://fastflowlm.com/ I will leave that to you to see how iGPU in Strix Point and Halo compare. NPU though gives lower power draw. After all there is a reason AMD software folks prefer to use hybrid approach at best on Strix Point. (Read up on their Lemonade server).
 

Doug S

Diamond Member
Feb 8, 2020
3,813
6,749
136
Assuming you can expect developers to actually target your GPU with their software.

It isn't like there are a lot of GPU architectures out there. Where do you expect this to be a problem? Yeah if you have an out of date GPU it may not be supported by a given software package, but if you are doing something where you need tons of matrix op throughput you can rectify that situation.

The real world use for a lot of matrix op throughput is kinda limited. AI, big data, math geeks...anything I'm missing? Which is why wasting the area on putting a beefy unit in every CPU core makes no sense.
 

Doug S

Diamond Member
Feb 8, 2020
3,813
6,749
136
The opposite, they added matmul piles to their GPU IP because ANE sucks.

ANE is designed for low power background processing, like recognizing when you say "hey Siri". You don't want to wake up the (comparatively) power hungry SME unit (let alone the GPU) every time the microphone receives audio input to check if you're talking to your phone.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,455
11,185
106
ANE is designed for low power background processing, like recognizing when you say "hey Siri"
That was the OG idea but they've pumped matmul rates there ever since.
This was their primary ML offering with first-class s/w support but no one liked it.
For a good reason, too.
 

Doug S

Diamond Member
Feb 8, 2020
3,813
6,749
136
That was the OG idea but they've pumped matmul rates there ever since.
This was their primary ML offering with first-class s/w support but no one liked it.
For a good reason, too.

They haven't increased the size of the ANE, it has always been 16 core. It has become a little faster as new processes allow higher clock rates without compromising power efficiency, and they added support for smaller datatypes which doubled the claimed TOPS. It hasn't changed all that much since they introduced it nearly a decade ago, since its intended role hasn't changed.
 

Doug S

Diamond Member
Feb 8, 2020
3,813
6,749
136
the core matmul rates went up.
Bigger configs had dual-ANE iirc.

Again, this was their primary ML offering with first-class SW and **no one** liked it. So GPU it is.

There were no "dual ANE" unless you count Ultra. That had dual ANE but because it used two Max dies so that wasn't a choice, but rather a consequence of how they chose to implement Ultra.

If you think Apple made such a blunder here what about Qualcomm, Intel and AMD, all of whom have a similar separate NPU for similar roles, have GPU AI capability, and are also adding SME / ACE. Sure seems like they all agree on this, so I'll trust the four of them rather than you "bro".
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,455
11,185
106
If you think Apple made such a blunder here what about Qualcomm, Intel and AMD, all of whom have a similar separate NPU for similar roles, have GPU AI capability, and are also adding SME / ACE. Sure seems like they all agree on this, so I'll trust the four of them rather than you "bro".
All those dumb VLIW blobs are goddamn useless for doing actual real ML.
GPU is what you want.
 

Joe NYC

Diamond Member
Jun 26, 2021
4,189
5,753
136
DIY margins are higher.
It's a tricky market to win, though.

I think it is kind of a lesson AMD is learning. When you "own" the market, like DIY desktop, suddenly a lot of money in revenue and profits starts pouring in.

I think Lisa and Jean like it and will want to replicate it in other segments (such as mobile, dGPU).

AMD is more used to trying to squeeze a drop of water from a dry rock, while other companies walk away with fat profits.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,455
11,185
106
I think it is kind of a lesson AMD is learning. When you "own" the market, like DIY desktop, suddenly a lot of money in revenue and profits starts pouring in.
They own DIY market via pure accident. They shipped server scraps and just won.
It happened but that was never, ever the intent.
I think Lisa and Jean like it and will want to replicate it in other segments (such as mobile, dGPU).
Mobile is heavily commoditized with margin pressure exerted from multiple directions.
discrete AIC graphics hates the sacks outta Radeon. Not viable.
Mobile APU-driven gfx is a *maybe*. But generally speaking proles and goymers hate the sacks outta Radeon for not providing 'competition' (i.e. NV pricecuts).
AMD is more used to trying to squeeze a drop of water from a dry rock, while other companies walk away with fat profits.
This was true 10 years ago.
Now Intel is the one with 35% GM lmao.
 

OneEng2

Senior member
Sep 19, 2022
989
1,195
106
Even if you don't NEED this type of performance via mostly AMX instructions, it'll impact workloads where part of the instructions are potentially performed by AMX on CPU, or iGPU / NPU. So it may still affect the perf to 10%, 30%, or whatever.

Regarding the tiny peasant iGPU in Zen6 running circles around 24C/48T Zen6 (or 52C/52T NVL-S) with AMX, what do you base that on? E.g. any benchmark, or just speculation?
Possibly a bad assumption, but an assumption just the same. GPU's and NPU's (pitiful or not) are architected to handle matrix loads. They generally handle them better for the same amount of die space IMO.

For actual real world loads that need these operations (not Geekbench), I believe (we will see) the iGPU in Zen 6 will outperform AMX in Intel CPU's.

Lets revisit it later this year ;).
They own DIY market via pure accident. They shipped server scraps and just won.
It happened but that was never, ever the intent.
What a day it is when I agree with something logical adroc stated!