Question Zen 6 Speculation Thread

Page 356 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

OneEng2

Senior member
Sep 19, 2022
998
1,200
106
As I mentioned, it’s perf/thread AND thread count. Thread count is 48T in both cases though, so that’s leaves perf/thread for comparison.
Yes, but total FULL cores matter for overall throughput of the chip. SMT helps a single core have more throughput .... and is therefore good (as others stated).

I agree with your assessment that 48 full cores without SMT are likely to eclipse 24 cores with SMT in highly thread scalable applications.
Threads don't execute anything. They are simply steams of (decoded) instructions. The only valid thing to care about for rendering is total throughput. Any uniform divisor is nearly useless. Perf per core is misleading in heterogenous designs and perf per thread is misleading for SMT designs.
Agree. What does deserve some thought IMO is PPA as this effects the profitability of the company. It is also important to understand how many cores you can fit in a power envelope as this effects the overall throughput per socket.
SMT Zen5 in CB26 gives +37%.
Really? Do you have a link? I haven't seen SMT give such a boost in desktop. Thanks.
I'd be surprised if AMD supports AMX.
Why would they need to? AMD GPU's run circles around any AMX supported Intel CPU in matrix operations.

I think Intel is barking up the wrong tree with AMX. Just my opinion.
 
  • Like
Reactions: marees

Doug S

Diamond Member
Feb 8, 2020
3,814
6,753
136
think about why all ARM vendors who also have SME (Qualcomm/mediatek) went per cluster and not per core. Its not something your average user is gonna use every day so its a waste to do it per core.

People here don't seem to grasp the tradeoff between throughput and time to result. If you need to perform a LOT of these types of matrix operations then having a dedicated AMX/SME unit in every CPU core doesn't make sense - because beyond a certain total amount of work a GPU will always be faster. So the AMX/SME unit is sized such that it can handle a limited amount of work faster than passing it to the GPU (because the overhead of that back and forth dominates the time to result) but beyond a certain level it doesn't make sense - because the GPU being faster more than makes up for the back and forth and you get your result more quicker.

Thus the unit shouldn't be too large NOR TOO NUMEROUS because even beyond the area penalty it wouldn't make sense to have dozens of AMX/SME units all churning away - GPUs have far more resources to throw at such work.
 

reaperrr3

Member
May 31, 2024
173
495
96
As is gfx9+ iirc. Unless it was also bugged and disabled.
Guess what...


That being said, all I remember from synthetic tests of Vega in the good ol' times of hardware.fr etc., aka when real tech sites still existed, is that the gap AMD had to bridge was MASSIVE in terms of tiling, culling, compression etc., one of the reasons Vega was so meh vs. Maxwell and Pascal.

No idea how RDNA4 stacks up nowadays in terms of tiling/culling, since as I mentioned, the skill and knowledge to perform the kind of synthetic tests hardware.fr did has apparently been lost to youtubers and websites run by clueless people.
It's entirely possible AMD's tiling/binning still isn't on Nvidia's level and that AMD won't have true TBIMR until gfx13.
 

basix

Senior member
Oct 4, 2024
316
612
96
If gfx13 makes good improvent towards a (better) TBIMR we could expect decent efficiency gains (energy and memory bandwidth).
 

OneEng2

Senior member
Sep 19, 2022
998
1,200
106
R5 9600X + DDR5 2x16GB 5600MT
CB26 ~5.4GHz
ST 525p
SC(SMT) 719p (+37%)
How does CB23 and CB24 stack up?

What I am getting at is the question of if the benchmarks have any real world application equivalent. In other words, do these benchmarks represent performance of popular desktop applications that do real work for people? .... or are they just an exercise in selling clicks?
 

Z O X

Member
Oct 31, 2022
35
37
61
How does CB23 and CB24 stack up?

What I am getting at is the question of if the benchmarks have any real world application equivalent. In other words, do these benchmarks represent performance of popular desktop applications that do real work for people? .... or are they just an exercise in selling clicks?
Yes, some of us use C4D for work and benches use current gen rendering engine.
From Nehalem up, one gets at least 25% better rendering perf with SMT on/more threads available.
You can see the difference in everyday situations: it takes 10s longer for system to boot and load everything with SMT off (5700x3d).
The price for this is cca 0,2ms of additional system latency ...
 

Win2012R2

Golden Member
Dec 5, 2024
1,325
1,363
96
Both Intel and AMD are doing 1 ACE unit per 2 cores
Well this ratio is ok-ish, like half rate AVX512 to drive adoption without high penalty, that's sensible.

arswpes claim was that it's 1 per cluster, which one can naturally assume is a CCD - 12 or 16 cores, I even used 12 as example and he did not content that ratio will be a lot more sane.

so in a 12 core CPU there are 6 ACE units. Hopefully those units are small
That would be sensible ratio for client.

We are talking about Zen 6 LP cores. So one AMX unit per 2C/4C cluster sounds very reasonable to me

Yes that would be reasonable ratio, discussion was however about "1 per cluster" with numbers in cluster thrown around as 8/12 and nobody before Kepler commented that it will be 1 per 2, this obviously makes it far more feasible.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,595
739
126
Why would they need to? AMD GPU's run circles around any AMX supported Intel CPU in matrix operations.
Problem is not everyone has a discrete (AMD) GPU. Or do you mean that the iGPU in Zen6 DT CPUs will be sufficient, so it's at least as fast as using AMX on the CPU?
 

StefanR5R

Elite Member
Dec 10, 2016
6,844
10,998
136
Arachnophobia is the term for the irrational fear of spiders.
Belonephobia is the term for the irrational fear of needles and pins.
Claustrophobia is the term for the irrational fear of confined spaces.
What was the term again for the irrational fear of missing out on Intel® AMX?
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,595
739
126
iGPU should be faster, with higher latency and higher power usage tho, but who is using NPUs on client right now? Copilot bombed
Depends on what iGPU we're talking about. The one in e.g. 9950X and friends is tiny. Will that still be sufficient?

Also, what's the relationship to the NPU, which you mentioned. Do you mean that it should be used instead of iGPU (for the "AMX on CPU" replacement use case)?
The only thing that matters mang.
Otherwise you become Intel.
From a sales perspective it matters, yes. But it does not matter when we're just juding the technical aspect of whether iGPU will be sufficient to make AMX on CPU pointless.
 

OneEng2

Senior member
Sep 19, 2022
998
1,200
106
Problem is not everyone has a discrete (AMD) GPU. Or do you mean that the iGPU in Zen6 DT CPUs will be sufficient, so it's at least as fast as using AMX on the CPU?
Well, possibly both.

For anyone that works professionally and NEEDS this type of performance, they will absolutely have a discrete GPU.

For the average Joe (or Jill ;) ), the iGPU in Zen6 will likely run circles around a CPU + AMX solution.

I guess I just don't see the market here for AMX.