Question Geekbench 6 released and calibrated against Core i7-12700

igor_kavinski · Feb 14, 2023

Geekbench 6 - Geekbench Blog

www.geekbench.com

Weird choice of baseline CPU and even weird is that the baseline score is 2500.

i7-12700 does hardly 2000 in GB5 with the fastest DDR5.

roger_k · Feb 15, 2023

igor_kavinski said:
There is still a flaw inherent in GB6's testing methodology. It's not measuring multitasking efficiency which is what AMD excels at. They should run at least two different tests concurrently to see how the CPU handles multiple workloads thrown its way.

What I'd like to see is some sort of a long, complex, hybrid workload. Like building and packaging a game, which can consist of multiple processing steps like texture generation, mesh optimisation, pre-backed retraced lighting, compilation, asset compression etc., all using a parallel build system. This kind of stuff goes in the direction you mention while still being practically relevant.

igor_kavinski · Feb 15, 2023

senttoschool said:
Can you define this more?

ASRock Industrial NUC BOX-1260P and 4X4 BOX-5800U Review: Alder Lake-P and Cezanne UCFF Faceoff

www.anandtech.com

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!

igor_kavinski · Feb 15, 2023

senttoschool said:
Do we have to have benchmarks that are designed to favor AMD? It's never going to be 100% fair.

It's not about favoring AMD. It's a valid use case. Lots of people will run some workload and start doing something else while the original workload chugs along in the background. AMD is just better at it and if there is more awareness about that through benchmarks exposing this benefit of AMD's architecture, it will also pressure Intel to improve the deficiencies in their architecture to address this common use case.

igor_kavinski · Feb 15, 2023

Kocicak said:
How do you know that? What benchmarks measure simultaneusly running different sorts of tasks?

See post #102

Kocicak · Feb 15, 2023

igor_kavinski said:
ASRock Industrial NUC BOX-1260P and 4X4 BOX-5800U Review: Alder Lake-P and Cezanne UCFF Faceoff

www.anandtech.com

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!

They run various benchmarks while looping video transcoding task. I am sorry, but quantifying effects of ONE specific concurrent task with ONE set of parameters (video formats, transcoding parameters) has very limited general significance.

What if Alder lake CPU gets too distracted by one particular video format, while it could run some other format effortessly? Did they allow integrated graphics to help with the encoding, and if not, is it fair to test a product with a part intended for this specific workload disabled? Etc etc etc.

So yeah, I am not convinced that AMD CPUs are better in multitasking than Intel CPUs by this at all.

nicalandia · Feb 15, 2023

Whover Thinks a 7950X and 13900K are better than The ThreadRipper Pro 5995WX because GeekBench6 says so has gone Senile. Simple as that. GB6 is a light "Real World" Benchmark that is design for Phones and Laptops.

When Xeon W9 releases this will be the worst app to use to test it.

Simply put it, Do you want to test Workstation performance? Don't use GB6, but PugetSystem or something else.

mikegg · Feb 15, 2023

nicalandia said:
Whover Thinks a 7950X and 13900K are better than The ThreadRipper Pro 5995WX because GeekBench6 says so has gone Senile. Simple as that. GB6 is a light "Real World" Benchmark that is design for Phones and Laptops.

When Xeon W9 releases this will be the worst app to use to test it.

Simply put it, Do you want to test Workstation performance? Don't use GB6, but PugetSystem or something else.

I 100% believe that a 7950x and a 13900K are better than a Zen3 Threadripper for most consumer applications.

nicalandia · Feb 15, 2023

senttoschool said:
I 100% believe that a 7950x and a 13900K are better than a Zen3 Threadripper for most consumer applications.

Do you believe that an Android Phone is better than a 13900K?

Xiaomi M2012K11C vs ASUS System Product Name - Geekbench

mikegg · Feb 15, 2023

igor_kavinski said:
ASRock Industrial NUC BOX-1260P and 4X4 BOX-5800U Review: Alder Lake-P and Cezanne UCFF Faceoff

www.anandtech.com

View attachment 76537

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!

So why don't you just run Geekbench 6 while VLC is encoding some video and see. It seems bizarre that you think Geekbench should implement this sort of test.

Hitman928 · Feb 15, 2023

nicalandia said:
Do you believe that an Android Phone is better than a 13900K?
View attachment 76550

Xiaomi M2012K11C vs ASUS System Product Name - Geekbench

You're pointing at a borked result for the Xiaomi. There are ways of spoofing Geekbench into giving unrealistically high scores; you've found one such result.

Abwx · Feb 15, 2023

senttoschool said:
I 100% believe that a 7950x and a 13900K are better than a Zen3 Threadripper for most consumer applications.

7 ZIP, Handbrake or Agisoft are all consumers apps, besides tests that focus on a single task with a low core count are useless to estimate the perfs since the CPU is underused while the task use very low time for exe, that s akin to microbenchmarks that say nothing about the perfs when there s a really demanding set of loads or a soft like the ones i quoted.

RTX2080 · Feb 15, 2023

igor_kavinski said:
ASRock Industrial NUC BOX-1260P and 4X4 BOX-5800U Review: Alder Lake-P and Cezanne UCFF Faceoff

www.anandtech.com

View attachment 76537

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!

This is the big-little design disaster. In theoretical test it's impossible that 1260P would lose to a 5800U and even not any better than a 4800U, but the multitasking while handled by crappy Thread Detector is the only explaination for this. Frequently switching between big and little cause too much latency and perf loss. Similar situations happened every time I heard from others are either multitask lagging or perf loss during aggressive switching, and sometimes even as bad as only the little cores working while big cores keep idle.

mikegg · Feb 15, 2023

Abwx said:
7 ZIP, Handbrake or Agisoft are all consumers apps, besides tests that focus on a single task with a low core count are useless to estimate the perfs since the CPU is underused while the task use very low time for exe, that s akin to microbenchmarks that say nothing about the perfs when there s a really demanding set of loads or a soft like the ones i quoted.

I said most. Not all. There's a name for this kind of fallacy where someone tries to argument a statement as not true by providing a few niche examples.

How much 7zip, Handbrake, Agisoft are people doing on a 13900k?

nicalandia said:
Do you believe that an Android Phone is better than a 13900K?
View attachment 76550

Xiaomi M2012K11C vs ASUS System Product Name - Geekbench

It depends. What application are we trying to use? An Android phone could be better than a 13900k. It depends, right?

Also, this erroneous result doesn't change anything.

Why are people so sensitive about this? This statement isn't controversial: A 7950x and a 13900k should run consumer applications better than a Zen3 Threadripper. Relax people.

Abwx · Feb 15, 2023

senttoschool said:
I said most. Not all. There's a name for this kind of fallacy where someone tries to argument a statement as not true by providing a few niche examples.

How much 7zip, Handbrake, Agisoft are people doing on a 13900k?

It depends. What application are we trying to use? An Android phone could be better than a 13900k. It depends, right?

Also, this erroneous result doesn't change anything.

Why are people so sensitive about this? This statement isn't controversial: A 7950x and a 13900k should run consumer applications better than a Zen3 Threadripper. Relax people.

The thing is that GB does no more provide the extent of the perfs, whatever has enough perf is stuck to about 20 000 pts, the fallacy is to state that such an arbitrary measurement is good enough hence anything that is better at some tasks will be counted as equal if not slower...

FI a 7700X@88W is quite faster in the MT test than a 5950X, yet in general usage ,like browsing, there wont be any difference while in heavy MT the latter will be way faster, so the usage perf is not reflected, a browser that react at 0.05s instead of 0.06s is a useless measurement because it s not that kind of tasks whose exe speed does matter to the user.

Edit : Computerbase made an article about GB6 and a graph populated by their forumers with ST et MT scores, i guess it s now clear for whom it was specifically re "designed", and certainly not for the users needs...

Geekbench 6: Die neue Benchmark-Suite im Leser-Benchmark

Der Geekbench 6 bringt einige neue Workloads mit, bestehende wurden mit neuer Datenbasis aktualisiert. Leser können Ergebnisse einreichen.

www.computerbase.de

poke01 · Feb 15, 2023

igor_kavinski said:
There is still a flaw inherent in GB6's testing methodology. It's not measuring multitasking efficiency which is what AMD excels at. They should run at least two different tests concurrently to see how the CPU handles multiple workloads thrown its way.

Comparing a 13900k and a 7950x. The 13900k comes out ahead in MT. This makes sense because 13900K has more cores.

But was Geekbench never perfect in MT.

nicalandia said:
Do you believe that an Android Phone is better than a 13900K?
View attachment 76550

Xiaomi M2012K11C vs ASUS System Product Name - Geekbench

No non-x86 chip on GB6 is close to the i9 but Apple's. That's fake.

JoeRambo · Feb 15, 2023

roger_k said:
P.S. I do think that excluding AVX-512 was a bit of a shame. If a CPU supports the technology and can perform faster on a task, it should be used. At the same time, I do understand the decision — AVX512 isn't really used all that much in desktop software, and basing benchmarks off it might create unrealistic expectations.

I think the "unrealistic expectations" part is spot on here. With current set of tests, there are not that many where AVX512 could help much with direct recompile of same code. Maybe photo library stuff, photo filter.
The other tests could hardly use AVX512 without proper handwritten code paths and that would defeat multi platform and vendor "agnostic" part of the tests - ARM guys would ask for SVE3 with 666bit vectors and so on. In previous GB5 we had retarded outliers like say FFT or encryption that were directly calculating "throughput" and would probably make sense to double FLops just by using some vendor library or some short code that gets autovectorized and support AVX512.
Even then CPU vendors found that it is easier to pad the score by including some 256bit V_AES instruction with ridiculous throughput that made some new laptop beat a whole server in AES encryption throughput ( and of course utterly suck in real world, as to actually serve up content for encryption and deliver it/from requires actual server).

So GB6 is my opinion a great desktop/workstation performance test that emphasizes the way people actually use CPUs in 2023 and is brave enough to shatter illusions of people who disagree that 8 strong cores are plenty and the rest ( be it 8 more strong cores or 16 marketing cores ) gives very diminishing gains. Kudos to them for not catering to Cinebench/DC runner crowd.

moinmoin · Feb 15, 2023

eek2121 said:
They aren't 'ratios' or anything like that.

No, but there is a strong relation between the ST and the MT score, regardless of the actual amount of cores present. Which makes the latter effectively meaningless.

eek2121 said:
Zen 4 Threadripper will very likely tell the complete story. A combination of high clocked Zen 4 cores should, in theory, beat out the 7950X significantly.

The numbers I see so far and their ratios/balance between ST and MT make me expect Zen 4 Threadripper to beat 7950X slightly, but by far not reflecting the additional amount of cores, rather reflecting the higher ST achievable on more cores across the chip.

poke01 said:
GB6 is highly tuned to be CPU agnostic.

Beyond a rather small amount GB6's MT score is highly tuned to be agnostic to the amount of cores present.

naukkis said:
That's actually how most workloads scale to MT. Main thread is dominant and there's a limit how much can be offloaded to child threads. ARM designs for phones are tuned preciously to that, one prime core to run main thread and somehow less powerful cores to offload main thread. Only very special case of works scale to unlimited number of threads - and for normal desktop/phone use cores beyond ~8 are just as beneficial as Geekbench6 shows them to be.

Yeah, single specific workload run in isolation. That's not what happens in reality. That's also not what users looking at 7950X, Threadripper, workstation or server chips usually want. Instead they want to run a higher amount of workloads concurrently without hitting bottlenecks. For that purpose GB6's MT score is completely misleading while GB4's and GB5's MT scores were rather serviceable.

senttoschool said:
Through marketing, first AMD, and now Intel, the x86 CPU industry has brainwashed people into using Cinebench

Reminder that before Ryzen Intel actually started with Cinebench as marketing benchmark (wtftech article and slide from 2011). Cinebench was known as Intel-optimized to boot, so obviously once AMD could showcase beating Intel with it using Ryzen they did just that.

moinmoin · Feb 15, 2023

Abwx said:
Geekbench 6: Die neue Benchmark-Suite im Leser-Benchmark

Der Geekbench 6 bringt einige neue Workloads mit, bestehende wurden mit neuer Datenbasis aktualisiert. Leser können Ergebnisse einreichen.

www.computerbase.de

The current MT and ST scores with ratio between them, sorted by that.

hemedans · Feb 15, 2023

nicalandia said:
Whover Thinks a 7950X and 13900K are better than The ThreadRipper Pro 5995WX because GeekBench6 says so has gone Senile. Simple as that. GB6 is a light "Real World" Benchmark that is design for Phones and Laptops.

When Xeon W9 releases this will be the worst app to use to test it.

Simply put it, Do you want to test Workstation performance? Don't use GB6, but PugetSystem or something else.

It's not even Good for phones, it measure only Burst performance, most phones nowadays can't even sustain 70% of their performance.

lightmanek · Feb 15, 2023

Geekbench 6 is good for ST and benchamrking OS/App responsiveness, but it is nowhere near good at indicating MT performance for prosumers. Luckily we have a ton of other, more useful benchmarks for that, so I'm OK with GB 6 showing MT performance as it is, as long as people understand that is not what you look at to evaluate CPU's for HPC/heavy CPU transcode/Code Compile/Scientific and many more.
One feature I would like to see in GB is to have ability to select duration of each task, so you could adjust the load on CPU and see how it can sustain high clocks at light MT loads and when you might be hitting power / cache limits with increased loads.

GB6 to me is like EPA car mileage to real world use - yes, it is a good indication between manufacturers and you can strive to achieve claimed results when using a car in very specific way, but your mileage will vary depending on a lot of factors in real world.

name99 · Feb 15, 2023

Abwx said:
At the same time one has to wonder why GB did shrink the eventual buyer s population, that s not the best way to sell more of their stuff, quite the contrary, and so much that it s somewhat suspicious...

Uhh, wot?
So your contention is that most people were buying GB5 to engage in dick-measuring, and are upset that it's now targeted at useful information rather than dick-measuring?
OK...

roger_k · Feb 16, 2023

moinmoin said:
No, but there is a strong relation between the ST and the MT score, regardless of the actual amount of cores present. Which makes the latter effectively meaningless.

No, it just shows that multithreading has overhead and does not scale all that well with the number of cores. And it also shows that Intel's E-cores are only useful for trivial parallel workloads.

moinmoin said:
Beyond a rather small amount GB6's MT score is highly tuned to be agnostic to the amount of cores present.

Not really. It's just how CPUs are marketed. Higher-end CPUs usually have both more cores and higher single-core boost. Which is why for many of the current models the two scores appear correlated. Well, because they are, that's how the chip is configured to begin with.

It becomes a bit more clear with Apple who is probably the only manufacturer that ships CPUs with consistent clock configuration across models. If you look at MT scores of M1, M1 Pro/Max, and M1 Ultra you will see that every additional cluster of four cores adds about 50% of the original MT score. That's your locking/cache coherency overhead. Makes sense to me.

moinmoin said:
Yeah, single specific workload run in isolation. That's not what happens in reality. That's also not what users looking at 7950X, Threadripper, workstation or server chips usually want. Instead they want to run a higher amount of workloads concurrently without hitting bottlenecks. For that purpose GB6's MT score is completely misleading while GB4's and GB5's MT scores were rather serviceable.

For some workloads, sure. But GB6 also includes trivially parallelizable tasks that scale very well with the number of cores, like RT. And of course, if you have some specific use case in mind you should benchmark that use case.

moinmoin said:
Reminder that before Ryzen Intel actually started with Cinebench as marketing benchmark (wtftech article and slide from 2011). Cinebench was known as Intel-optimized to boot, so obviously once AMD could showcase beating Intel with it using Ryzen they did just that.

Cinebench massively favours high core counts (trivially parallel work) as well as fast/wide SIMD units and caches (long data dependency chains, SIMD heavy). It overestimates performance in any kind of workload that doesn't use SIMD that much or actual needs some cooperation between the cores.

lightmanek said:
Luckily we have a ton of other, more useful benchmarks for that, so I'm OK with GB 6 showing MT performance as it is, as long as people understand that is not what you look at to evaluate CPU's for HPC/heavy CPU transcode/Code Compile/Scientific and many more.

I don't think it's that much off for code compile. Compilation is known to show diminishing returns with increasing core count.

Timed LLVM Compilation Benchmark - OpenBenchmarking.org

openbenchmarking.org

Here you can see how doubling the amount of cores (with the same power/frequency per core) doesn't reduce the time in half. The scaling observed there isn't that different from what we see in GB6.

Kocicak · Feb 16, 2023

deleted for stupidity

moinmoin · Feb 16, 2023

roger_k said:
For some workloads, sure. But GB6 also includes trivially parallelizable tasks that scale very well with the number of cores, like RT. And of course, if you have some specific use case in mind you should benchmark that use case.

GB6 also including trivially parallelizable tasks is meaningless if the overall MT score barely reflects the actual core count. Ideally GB would split up MT scores between workload tests only using a limited amount of cores and parallelizable tests extending to all available threads.

Isolated benchmarks only make sense for isolated workloads. As you yourself point out at length workloads that scale well to any number of cores are the exception. But typical usage of chips that contain a massive amount of cores is not running isolated workloads, especially in servers it's running a lot of workloads concurrently. For that purpose GB6's MT score is just completely useless at best and misleading people at worst.

Exist50 · Feb 16, 2023

moinmoin said:
GB6 also including trivially parallelizable tasks is meaningless if the overall MT score barely reflects the actual core count. Ideally GB would split up MT scores between workload tests only using a limited amount of cores and parallelizable tests extending to all available threads.

Isolated benchmarks only make sense for isolated workloads. As you yourself point out at length workloads that scale well to any number of cores are the exception. But typical usage of chips that contain a massive amount of cores is not running isolated workloads, especially in servers it's running a lot of workloads concurrently. For that purpose GB6's MT score is just completely useless at best and misleading people at worst.

Geekbench was never meant to be, nor ever has been, a good workload by which to judge servers.

Question Geekbench 6 released and calibrated against Core i7-12700

Lifer

Member

Lifer

Lifer

Lifer

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Lifer

Senior member

Golden Member

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Senior member

Senior member

Member

Golden Member

Diamond Member

Platinum Member