igor_kavinski
Lifer
Det0x did the honors here: https://forums.anandtech.com/thread...ranite-ridge-ryzen-9000.2607350/post-41277152
I'll try to post my score as soon as I get home.
I'll try to post my score as soon as I get home.
Maybe V1.0 bug.I tried running on my M1 Pro NPU and it's been stuck at Pose Estimation (Q) for 5 minutes.
There is something else to note there.M4 NPU:
8 Gen 3, Qualcomm's best NPU:
Samsung Galaxy S24 - Geekbench
Benchmark results for a Samsung Galaxy S24 with an ARM ARMv8 processor.browser.geekbench.com
Snapdragon X Elite - X1E78100 NPU:
LENOVO 83ED - Geekbench
Benchmark results for a LENOVO 83ED with a Snapdragon X Elite - X1E78100 - Qualcomm Oryon processor.browser.geekbench.com
True, thats why I'm curious to see how Strix point NPU performs.There is something else to note there.
The NPUs are more accurate than CPUs!
I don't think we are gonna see that soon. Either AMD hasn't been able to get the necessary software support ready or no one who has a HX 370 laptop knows enough to test the NPU. I'm kinda leaning towards the former.True, thats why I'm curious to see how Strix point NPU performs.
It's only been a couple of weeks since they were released and only a couple of models from Asus. As for the desktop side those were really released this week for the 9 level and I don't recall if they had AI or not.HX 370 laptop
Can someone post the Ryzen HX 370 result?
On paper it has the highest, 50 TOPS.
However Apple claimed to the fastest NPU with the M4 with 38 TOPS but that was before HX 370 came out.
True to Apple's claims they do have the fastest NPU from what I seen so far at least in (INT8).
M4 NPU:
8 Gen 3, Qualcomm's best NPU:
Samsung Galaxy S24 - Geekbench
Benchmark results for a Samsung Galaxy S24 with an ARM ARMv8 processor.browser.geekbench.com
Snapdragon X Elite - X1E78100 NPU:
LENOVO 83ED - Geekbench
Benchmark results for a LENOVO 83ED with a Snapdragon X Elite - X1E78100 - Qualcomm Oryon processor.browser.geekbench.com
Yes, the signal65 article noted that;This benchmark doesn’t support the AMD NPU yet.
Despite having one of the fastest NPUs on paper, Geekbench AI 1.0 still doesn’t have the ability to measure the AMD Ryzen AI NPU performance. When I asked Primate Labs about this, I was told the reasoning was that it was a representation of where AMD stood today in terms of its consumer AI framework implementations, and that trying to integrate support through Vitis software (carryover from Xilinx) just wasn’t working out. Disappointing for sure, but also is mirrored by the fact that you cannot run the Procyon AI benchmark on AMD NPUs. Hopefully we’ll have a solution from AMD on this soon.
| Device | Single Precision | Half Precision | Quantized |
|---|---|---|---|
| Intel Core Ultra 9 185H (NPU) | 7172 | 7176 | 11000 |
| Qualcomm Snapdragon X Elite 80-100 (NPU) | 2177 | 11069 | 21549 |
| M3 (NPU) | 2499 | 13971 | 14877 |
| M4 (NPU) | 4702 | 32052 | 40743 |
| NVIDIA GeForce RTX 4090 | 36800 | 50531 | 27568 |
Device Single Precision Half Precision Quantized Intel Core Ultra 9 185H 7172 7176 11000 Qualcomm Snapdragon X Elite 80-100 2177 11069 21549 M3 2499 13971 14877 M4 4702 32052 40743
is it driver restriction ? Nvidia do that alot on allowed throughput rates of "datacentre" formats , Nv should be eating most of the various precision alive.
Device Single Precision Half Precision Quantized Intel Core Ultra 9 185H (NPU) 7172 7176 11000 Qualcomm Snapdragon X Elite 80-100 (NPU) 2177 11069 21549 M3 (NPU) 2499 13971 14877 M4 (NPU) 4702 32052 40743 NVIDIA GeForce RTX 4090 36800 50531 27568
Source: https://signal65.com/research/ai/new-geekbench-ai-1-0-benchmark-analysis-and-early-results/
Using best framework for each NPU. Added RTX GPU (ONNX DirectML) for reference.
Based on this benchmark, we can clearly see that a GPU is geared towards training (FP32 & FP16) and is not very efficient for inference (INT8/INT4).
No. Highly doubt it is.is it driver restriction ? Nvidia do that alot on allowed throughput rates of "datacentre" formats , Nv should be eating most of the various precision alive.
Not really been using packed math for yearsNo. Highly doubt it is.
Gaming emphasizes FP32.
The GA10x SM continues to support double-speed FP16 (HFMA) operations which are
supported in Turing. And similar to TU102, TU104, and TU106 Turing GPUs, standard FP16
operations are handled by the Tensor Cores in GA10x GPU
Anyone know how one may ascertain the NPU TOPS from these GB scores? Or should one just double the Quantized score to arrive at the TOPS? That would mean the M4 NPU has 80 TOPS!
Have you observed thread utilization? OpenVino might limit itself to physical cores since HT won't give you lots of benefits in backend bound code. What you might see is noticeable performance scaling with DDR MT/s if the benchmark is using LLMs underneath.Did some testruns in preperation for hwbot, found out that this benchmark dont care about threads at all.. Getting pretty much same score with SMT enabled/disabled on my 9950X
16/32 SMT enabled
View attachment 107381
16/16 SMT disabled
View attachment 107382
Yeah thats exacly what it looked like when i had hwinfo open while runningHave you observed thread utilization? OpenVino might limit itself to physical cores since HT won't give you lots of benefits in backend bound code.