Question Zen 6 Speculation Thread

Page 353 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Win2012R2

Golden Member
Dec 5, 2024
1,319
1,358
96
we don't care about Intel because Intel sucks.
They are certainly benchmark for AMX execution and will be compared against AMD.

So the question stands: if Intel uses dedicated per core AMX units then why the **** perf won't be terrible if single AMX unit gets shared by multiple cores, potentially high number like 8 or more.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,347
11,111
106
They are certainly benchmark for AMX execution and will be compared against AMD.
no, the benchmark is Apple since they ship it everywhere.
So the question stands: if Intel uses dedicated per core AMX units then why the **** perf won't be terrible if single AMX unit gets shared by multiple cores, potentially high number like 8 or more.
a) you know you can build a bigger matmul unit for 8 cores. Apple one is p chungus and has a ton of juice. very nice for DGEMM.
b) no, seriously, ARM SME works. read the docs and look the benchmarks.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,347
11,111
106

Joe NYC

Diamond Member
Jun 26, 2021
4,169
5,709
136
They are certainly benchmark for AMX execution and will be compared against AMD.

So the question stands: if Intel uses dedicated per core AMX units then why the **** perf won't be terrible if single AMX unit gets shared by multiple cores, potentially high number like 8 or more.

I bet that when Intel implements ACE, AMX will be depreciated into legacy.

Then the question is if ACE makes it to Diamond Rapids or Coral Rapids.

On AMD side, it will be in Zen 7, which should launch between Diamond and Coral.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,347
11,111
106
Ok, shared unit too, so what will run faster at the same frequency -
1) 12 Intel cores with AMX as is now
or
2) 12 AMD cores with per cluster AMX
The former.
But why would you want that? If you wanna do Actual Real Matmul at AMD, GPU is your friend.
It should also be a competition, by proxy, between TSMC A14 and Intel 14A, in time to HVM.
man that's an optimistic view of 14A lmao
 

Win2012R2

Golden Member
Dec 5, 2024
1,319
1,358
96
The former.
Ok, good, dedicated per core units will run faster all other things equal.

Now how much faster would Intel be with 12 cores scenario in this case vs AMD - like 2 times faster, or perhaps 12?
But why would you want that? If you wanna do Actual Real Matmul at AMD, GPU is your friend.
Indeed, why implement this in CPU in the first place, Intel had to do it because of lack of GPU, having integrated GPU seems far better way, maybe with higher latency than CPU.
 

Win2012R2

Golden Member
Dec 5, 2024
1,319
1,358
96
How much faster is a single gfx1311 CU is over an AMX core in GEMM?
No idea, we are talking about CPUs here, you just don't want to answer obvious question that a shared single unit for 12 cores under heavy usage will be slower like more than 12 times than in scenario of 1 dedicated unit per core: ie dogsh*t slow like I said, which is obvious to anybody who actually profiles stuff and runs into contentions like this would be.

Solution is obviously running this stuff on GPUs and drop AMX crap.

It's not 'faster', you just dedicate more area to matmul.
Oh wow, 600 sq mm GPU is not faster than 200 sq mm, it's just got more area dedicated.

Or an 8 core CPU isn't faster than 1 core, it's just got more area dedicated to cores.
 

adroc_thurston

Diamond Member
Jul 2, 2023
8,347
11,111
106
No idea, we are talking about CPUs here, you just don't want to answer obvious question that a shared single unit for 12 cores under heavy usage will be slower like more than 12 times than in scenario of 1 dedicated unit per core: ie dogsh*t slow like I said, which is obvious to anybody who actually profiles stuff and runs into contentions like this would be.
shared units give you nice matmul rates are relatively minimal area investment.
Again, see Apple SGEMM/DGEMM benchmarks on M4-era stuff.
Solution is obviously running this stuff on GPUs and drop AMX crap.
No, cuz shared units are cheap and nice nuff.
 
  • Like
Reactions: Joe NYC

Win2012R2

Golden Member
Dec 5, 2024
1,319
1,358
96
shared units give you nice matmul rates are relatively minimal area investment.
Yes sure, nice for you to acknowledge it is shared unit, thus leading to contention = reduction in speed.

How relevant will it be for AMX that almost nobody is using, well, I reckon THOSE few people who actually use it will want proper perf from all cores, shared contention will result in non-deterministic latencies and lower throughput, so it seems dumb to support this very niche thing in the first place, especially in client and especially when you sell GPUs.
 

poke01

Diamond Member
Mar 8, 2022
4,788
6,111
106
why implement this in CPU in the first place, Intel had to do it because of lack of GPU, having integrated GPU seems far better way, maybe with higher latency than CPU.
Because the cpu is standardised unlike GPU or NPU. It’s easy for devs too