Question Post your Geekbench AI scores!

Jul 27, 2020
19,613
13,475
146
Here ya go, folks!


So what are we looking at here?

Same system. The faster one is just overclocked using Intel XTU (5.1 GHz P-cores 4.2ghz E-cores 275W 320A).

The red scores? That's when the system thermally throttled pretty hard.

Hail's 7950X beating the crap out of my system: https://browser.geekbench.com/ai/v1/compare/4729?baseline=6860

Det0x's 9950X ES taking the wind out of my PC: https://browser.geekbench.com/ai/v1/compare/4408?baseline=6860
 

poke01

Golden Member
Mar 8, 2022
1,988
2,524
106
Can someone post the Ryzen HX 370 result?
On paper it has the highest, 50 TOPS.

However Apple claimed to the fastest NPU with the M4 with 38 TOPS but that was before HX 370 came out.
True to Apple's claims they do have the fastest NPU from what I seen so far at least in (INT8).

M4 NPU:

8 Gen 3, Qualcomm's best NPU:

Snapdragon X Elite - X1E78100 NPU:
 
Jul 27, 2020
19,613
13,475
146
M4 NPU:

8 Gen 3, Qualcomm's best NPU:

Snapdragon X Elite - X1E78100 NPU:
There is something else to note there.

Correction: The Apple NPU is more accurate than the CPUs!
 
Jul 27, 2020
19,613
13,475
146
True, thats why I'm curious to see how Strix point NPU performs.
I don't think we are gonna see that soon. Either AMD hasn't been able to get the necessary software support ready or no one who has a HX 370 laptop knows enough to test the NPU. I'm kinda leaning towards the former.
 

Hitman928

Diamond Member
Apr 15, 2012
6,024
10,351
136
Can someone post the Ryzen HX 370 result?
On paper it has the highest, 50 TOPS.

However Apple claimed to the fastest NPU with the M4 with 38 TOPS but that was before HX 370 came out.
True to Apple's claims they do have the fastest NPU from what I seen so far at least in (INT8).

M4 NPU:

8 Gen 3, Qualcomm's best NPU:

Snapdragon X Elite - X1E78100 NPU:

This benchmark doesn’t support the AMD NPU yet.
 
  • Like
Reactions: poke01

FlameTail

Diamond Member
Dec 15, 2021
3,755
2,203
106
This benchmark doesn’t support the AMD NPU yet.
Yes, the signal65 article noted that;
Despite having one of the fastest NPUs on paper, Geekbench AI 1.0 still doesn’t have the ability to measure the AMD Ryzen AI NPU performance. When I asked Primate Labs about this, I was told the reasoning was that it was a representation of where AMD stood today in terms of its consumer AI framework implementations, and that trying to integrate support through Vitis software (carryover from Xilinx) just wasn’t working out. Disappointing for sure, but also is mirrored by the fact that you cannot run the Procyon AI benchmark on AMD NPUs. Hopefully we’ll have a solution from AMD on this soon.
 

mikegg

Golden Member
Jan 30, 2010
1,833
459
136
DeviceSingle PrecisionHalf PrecisionQuantized
Intel Core Ultra 9 185H (NPU)7172717611000
Qualcomm Snapdragon X Elite 80-100 (NPU)21771106921549
M3 (NPU)24991397114877
M4 (NPU)47023205240743
NVIDIA GeForce RTX 4090368005053127568

Source: https://signal65.com/research/ai/new-geekbench-ai-1-0-benchmark-analysis-and-early-results/

Using best framework for each NPU. Added RTX GPU (ONNX DirectML) for reference.

Based on this benchmark, we can clearly see that a GPU is geared towards training (FP32 & FP16) and is not very efficient for inference (INT8/INT4).
 
Last edited:

itsmydamnation

Platinum Member
Feb 6, 2011
2,911
3,522
136
DeviceSingle PrecisionHalf PrecisionQuantized
Intel Core Ultra 9 185H (NPU)7172717611000
Qualcomm Snapdragon X Elite 80-100 (NPU)21771106921549
M3 (NPU)24991397114877
M4 (NPU)47023205240743
NVIDIA GeForce RTX 4090368005053127568

Source: https://signal65.com/research/ai/new-geekbench-ai-1-0-benchmark-analysis-and-early-results/

Using best framework for each NPU. Added RTX GPU (ONNX DirectML) for reference.

Based on this benchmark, we can clearly see that a GPU is geared towards training (FP32 & FP16) and is not very efficient for inference (INT8/INT4).
is it driver restriction ? Nvidia do that alot on allowed throughput rates of "datacentre" formats , Nv should be eating most of the various precision alive.
 
  • Like
Reactions: lightmanek

itsmydamnation

Platinum Member
Feb 6, 2011
2,911
3,522
136
No. Highly doubt it is.

Gaming emphasizes FP32.
Not really been using packed math for years

The GA10x SM continues to support double-speed FP16 (HFMA) operations which are
supported in Turing. And similar to TU102, TU104, and TU106 Turing GPUs, standard FP16
operations are handled by the Tensor Cores in GA10x GPU
 
Jul 27, 2020
19,613
13,475
146
Anyone know how one may ascertain the NPU TOPS from these GB scores? Or should one just double the Quantized score to arrive at the TOPS? That would mean the M4 NPU has 80 TOPS!
 
  • Wow
Reactions: FlameTail

Doug S

Platinum Member
Feb 8, 2020
2,698
4,577
136
Anyone know how one may ascertain the NPU TOPS from these GB scores? Or should one just double the Quantized score to arrive at the TOPS? That would mean the M4 NPU has 80 TOPS!

Since different vendors are reporting different things with "TOPS" (e.g. some may be INT8, some INT4, some FP8) there's no formula for conversion. But we'll be able to see what "TOPS" figures their marketers claim, and compare to GB AI scores, and figure out a "fudge factor" to compare e.g. the TOPS figure for Qualcomm to Intel, or whatever. Obviously that's pointless once GB AI scores are available, but when something new is announced but not yet released, vendor claimed TOPS are all you have to go by.
 

Det0x

Golden Member
Sep 11, 2014
1,216
3,785
136
Did some testruns in preperation for hwbot, found out that this benchmark dont care about threads at all.. Getting pretty much same score with SMT enabled/disabled on my 9950X

16/32 SMT enabled
1726093232707.png

16/16 SMT disabled
1726093265316.png
 

MS_AT

Member
Jul 15, 2024
199
459
96
Did some testruns in preperation for hwbot, found out that this benchmark dont care about threads at all.. Getting pretty much same score with SMT enabled/disabled on my 9950X

16/32 SMT enabled
View attachment 107381

16/16 SMT disabled
View attachment 107382
Have you observed thread utilization? OpenVino might limit itself to physical cores since HT won't give you lots of benefits in backend bound code. What you might see is noticeable performance scaling with DDR MT/s if the benchmark is using LLMs underneath.
 
  • Like
Reactions: igor_kavinski