igor_kavinski
Lifer
- Jul 27, 2020
- 24,980
- 17,360
- 146
AVX2 I wonder, it's finally present in enough processors to be viable as an assumed baseline (though still with a runtime-detection fallback for older/Atom processors). As for AVX-512, it seems GB6 may use it but it apparently adds close to no performance so you don't need to worry about that, sadly. Likely didn't get the attention/effot SME got in Primate Labs or whoever wrote the code, although I would argue it's more general and more widely used.For multi-core performance and also single core performance with anything involving recent Apple chips, GB6 is useless.
Multicore benchmark no longer measures peak performance.
Single core with Apple chips (M4 and later) added an advantage for SME. I would not normally raise this particular issue except last I checked, GB6.x was the only application to support SME.
To GB6 authors: Maybe don’t put the cart before the horse. Also, for GB7, maybe also measure peak “theoretical“ performance and put that in a third number. Also, since we are adding more numbers, can we exclude SME, AVX2, AVX-512, or at least get dedicated tests that test the actual int/fp performance of these chips?)
AVX2 is really old. It is in most modern chips.AVX2 I wonder, it's finally present in enough processors to be viable as an assumed baseline (though still with a runtime-detection fallback for older/Atom processors). As for AVX-512, it seems GB6 may use it but it apparently adds close to no performance so you don't need to worry about that, sadly. Likely didn't get the attention/effot SME got in Primate Labs or whoever wrote the code, although I would argue it's more general and more widely used.
can we exclude SME, AVX2, AVX-512, or at least get dedicated tests that test the actual int/fp performance of these chips?)
As you yourself mention, there is going to be SIMD usage in the libraries already. And in the kernel. Unfair to prohibit the application code to use the same.
You browse internet without javascript?It is pointless to include fp instructions in a single core result
You have pretty negative view on SIMD and at least somewhat outdated. To get what you want you would have to use unrealistic compiler settings, prohibit use of existing libriaries and disable standard libraries of some languages. What you propose would best fit as legacy score, for apps written years ago.Perhaps so, but that's primarily hand crafted assembly - and in the case of stuff like memcpy() it is far from being generally useful. That is the code has to check whether calling the SIMD code is worth doing which it isn't for short copies. Ditto for the kernel's use of SIMD.
I think it's fine to use a well supported instruction set. The thing that irked me was that Geekbench did it RIGHT AWAY with Apple, despite not a single app supporting SME.I'd like to see a single core "native int" result that doesn't leverage any SIMD (other than incidental use by system libraries i.e. if a memcpy() call uses it, but the benchmarks compiled for GB6 would have SSE/AVX/NEON/SVE/SME disabled in the compiler flags) and doesn't include any fp, just test the regular integer instructions. It is pointless to include fp instructions in a single core result - I can't think of any real world tasks that are limited by single thread fp, when you're doing fp (and 98% of the time when you're heavily using SIMD) you're running across multiple threads and the ST number doesn't matter all that much to you.
Then you have three MT results: integer "max" MT, floating point "max" MT, and an integer "cooperative" MT. The "max" tests would be sort of like Cinebench type stuff where the more cores the better, at least until you run out of memory bandwidth or other resources. The "cooperative" test would something where all the threads have to talk to each other so it would test the efficiency of the fabric, OS scheduling and locking efficiency, that sort of thing. I think that's what GB6's MT test was intended to do but it doesn't do all that good a job of it.
Then I guess whatever AI/GPU type test(s) people feel are needed.
Apple Intelligence probably uses SME on M4.The thing that irked me was that Geekbench did it RIGHT AWAY with Apple, despite not a single app supporting SME.
Why? Without a custom CPU it’s meh. Only the GPU is interesting but then you can just get a RTX 5090 or RTX 6000 Pro with an x86 CPU.do hope we get some additional details on the NVIDIA platform pretty soon. It is actually the first ARM platform outside of the Raspberry Pi that has me interested
Pretty sure Apple ML stack is heavy on GPU for slow stuff, and ANE for anything real time.Apple Intelligence probably uses SME on M4.
And how many months after GB6,3 was Apple blessed with Intelligence? Investors wonder!Apple Intelligence probably uses SME on M4.
Okay I was wrong it's all on ANE.Pretty sure Apple ML stack is heavy on GPU for slow stuff, and ANE for anything real time.
You browse internet without javascript?unless you mean to say that browser performance is irrelevant to the market.
SME is there for Geekbench padding.
I know what GEMM is and no, you're not doing it on dinky client CPUs.There are many uses for matrix multiplication that have nothing to do with AI you know.
because GeForce. We get ARM Windows drivers for GeForce, and Geforce GPUs with the platform.Why? Without a custom CPU it’s meh. Only the GPU is interesting but then you can just get a RTX 5090 or RTX 6000 Pro with an x86 CPU.
I guess the CUDA with 128GB of RAM for $3000 is interesting but other than that it’s completely boring.
They were defeaturing it since GB5? You need to add some specific commands to sort. It used to be you can sort peak scores, and also filter out using OS(Linux/Android is 5-10% faster than Windows). And GB4 used to separate out the results into ST Int, ST FP, MT Int, MT FP, and Cryptography. Now you need to look at it one by one and combine them yourself.I used GB5 quite a bit to determine whether a piece of hardware was performing as it should, and to a lesser extent, overclocking/undervolting. It ran fast, results were reproducible (again, in a controlled environment), and they were roughly comparable, even across chip architectures. They butchered it all, and as a result, I've not bothered to buy GB6, so congrats, I guess? 🤣