- Jul 27, 2020
- 24,226
- 16,887
- 146
Geekbench 6 - Geekbench Blog
Weird choice of baseline CPU and even weird is that the baseline score is 2500.
i7-12700 does hardly 2000 in GB5 with the fastest DDR5.
There is a distinct lack of good cross platform benchmark suites. While not optimal GB4 and GB5 were rather serviceable. Now the MT score is pretty much useless for any kind of system, not only servers.Geekbench was never meant to be, nor ever has been, a good workload by which to judge servers.
GB6 also including trivially parallelizable tasks is meaningless if the overall MT score barely reflects the actual core count. Ideally GB would split up MT scores between workload tests only using a limited amount of cores and parallelizable tests extending to all available threads.
I think the "unrealistic expectations" part is spot on here. With current set of tests, there are not that many where AVX512 could help much with direct recompile of same code. Maybe photo library stuff, photo filter.
The other tests could hardly use AVX512 without proper handwritten code paths and that would defeat multi platform and vendor "agnostic" part of the tests - ARM guys would ask for SVE3 with 666bit vectors and so on. In previous GB5 we had retarded outliers like say FFT or encryption that were directly calculating "throughput" and would probably make sense to double FLops just by using some vendor library or some short code that gets autovectorized and support AVX512.
Even then CPU vendors found that it is easier to pad the score by including some 256bit V_AES instruction with ridiculous throughput that made some new laptop beat a whole server in AES encryption throughput ( and of course utterly suck in real world, as to actually serve up content for encryption and deliver it/from requires actual server).
So GB6 is my opinion a great desktop/workstation performance test that emphasizes the way people actually use CPUs in 2023 and is brave enough to shatter illusions of people who disagree that 8 strong cores are plenty and the rest ( be it 8 more strong cores or 16 marketing cores ) gives very diminishing gains. Kudos to them for not catering to Cinebench/DC runner crowd.
It's not even Good for phones, it measure only Burst performance, most phones nowadays can't even sustain 70% of their performance.
Notice how they are all Xiaomi see devices. Cheating, they are.On a less serious note, seems there are currently some really powerful ARM cores out there![]()
![]()
I don't think it's that much off for code compile. Compilation is known to show diminishing returns with increasing core count.
Timed LLVM Compilation Benchmark - OpenBenchmarking.org
openbenchmarking.org
Here you can see how doubling the amount of cores (with the same power/frequency per core) doesn't reduce the time in half. The scaling observed there isn't that different from what we see in GB6.
This is most certainly a bug...Notice how they are all Xiaomi see devices. Cheating, they are.
No Xiaomi is known to boost scores and why is bug only for Xiaomi devices. Other Android devices are fine.This is most certainly a bug...
Such inflated numbers are clearly a bug. When companies trick benchmarks, historically that's been accomplished with whitelists and higher power limits/overclocks. That nets you a couple more percent, not triple.No Xiaomi is known to boost scores and why is bug only for Xiaomi devices. Other Android devices are fine.
Yes, because they want people to think that their phones are 4x faster in ST than the fastest CPU in the world.No Xiaomi is known to boost scores and why is bug only for Xiaomi devices. Other Android devices are fine.
As always, there is truth to that, but there is also a fact that a lot depends on which compiler you're using, what type and size of the project, and what is your target language, so your mileage might vary a lot.
My brother does reasonably large C/C++ projects on his laptop and after increasing cores by 50% he got almost exactly 50% speedup in the projects he works on.
Besides, game developers often offload compile tasks to a dedicated box and it can run multiple compile jobs in parallel.
You are plain misleading people with statements like this. There is a huge difference between having all cores available and effectively scaling across all cores available. For single workloads you can measure a chips MT performance until the point that workload stops scaling with any more threads. Below that point the chip is the bottleneck, above that point the workload is the bottleneck. The more cores a chip has the more workloads are incapable to effectively scaling across all cores available, the more GB6's MT score is hampered and misleading by its increasing larger part of bottlenecking benchmarks. With GB6's MT score on bigger chips you are not measuring the overall MT performance of the chip but a mix of ST performances limited to the cores effectively used and workloads incapable of scaling across the remaining idling cores.It does reflect the actual core count and it does use all cores for every test.
In Geekbench 6, the biggest change is probably the way multi-core scores are calculated, measuring "how cores cooperate to complete a shared task" rather than assigning different tasks to each core. This is meant to better reflect how actual multi-core workloads operate, especially for hybrid CPU architectures that mix big, fast cores and small, power-efficient ones, an ever-growing category of chips that includes most modern ARM processors and Intel's 12th- and 13th-generation CPUs.
That sounds like the "MT" score is mainly for comparing hybrid processors to otherwise similar non-hybrid processors.Some of background information about choices stright from creator.
![]()
Geekbench’s creator on version 6 and why benchmarks matter in the real world
“How hard can it be to write a benchmark? Maybe I should write my own.”…arstechnica.com
Some of background information about choices stright from creator.
Most funny is that they removed XTS/AES, so can we conclude that such instructions are no more used in phones FI..?.
@Doug S those are good points that i agree with in general, but we have to draw a line somewhere, as there are benchamarks from all camps:
1) We have benchmarks where source code is available - for example SPEC that follows what You wrote to the letter, they even allow to override malloc library use for allocations and use some custom heap library. And then compiler and vendor games fully start and results between two vendors or even two systems are not really comparable and who knows what they mean?
He's saying the change is to better reflect real workloads, which just happens to include how they utilize hybrid core schemes. Are you seeing some similar, comparable workload with radically different characteristics?That sounds like the "MT" score is mainly for comparing hybrid processors to otherwise similar non-hybrid processors.
Not that I mind (aside the misleading title for the score). But that's a huge scheduler can of worms they get into right there. For hybrid processors it's essentially benchmarking how well the scheduler works.
Well, the quote is:He's saying the change is to better reflect real workloads, which just happens to include how they utilize hybrid core schemes. Are you seeing some similar, comparable workload with radically different characteristics?
I think you're reading too much into it. It's just an observation about how hybrid CPUs are utilized in the real world. Most workloads don't spawn exactly N identical tasks for each core. They have a few ST-sensitive threads, and some more ST-insensitive ones.The especially in there has a different weight to me than your just happens.
That's not because of all the E-cores, but rather primarily the 8 P-cores yielding strong lightly threaded performance. If someone has 8+0 vs 8+8 vs 8+16 numbers, that would probably help illustrate it.If that was it the exakt opposite is actually happening right now, with 13900K/F/S filling pages of "MT" results before chips with more cores appear.
I guess, but how's that an advantage for hybrid designs? If anything, it makes the benchmark harder than blindly giving each thread an identical, isolated task.How cores cooperate to complete a shared task is a behaviour that's mainly enforced by schedulers, unless the software applies more specific definitions what cores to use how.
Well that does appear to be a succinct summary of (primarily) client MT workloads, so yeah, I think that was their intention. Other, better benchmarks exist for specifically workstation/server usage.What I know is that so far I see no use for GB6's "MT" scores as a comparative metric since it's completely unclear what quality it is supposed to represent. Aside "best performance for an isolated lightly multi-threaded workload" maybe.
I actually agree. I was complaining before already that the correlation of the MT score to the ST score is way too strong now.That's not because of all the E-cores, but rather primarily the 8 P-cores yielding strong lightly threaded performance. If someone has 8+0 vs 8+8 vs 8+16 numbers, that would probably help illustrate it.
It's not. I was previously complaining that the new MT score moves the bottleneck from the chip to the workload. And if this MT test suite is (as I interpreted the quote) indeed targeted at benching the particular difference between hybrid and non-hybrid designs, the bottleneck on hybrid designs possibly moves onward to the scheduler.I guess, but how's that an advantage for hybrid designs? If anything, it makes the benchmark harder than blindly giving each thread an identical, isolated task.
Well, I managed to test the Poco X4 GT and the scores are the following:No Xiaomi is known to boost scores and why is bug only for Xiaomi devices. Other Android devices are fine.