Question Geekbench 6 released and calibrated against Core i7-12700

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

roger_k

Member
Sep 23, 2021
102
219
86
There is still a flaw inherent in GB6's testing methodology. It's not measuring multitasking efficiency which is what AMD excels at. They should run at least two different tests concurrently to see how the CPU handles multiple workloads thrown its way.

What I'd like to see is some sort of a long, complex, hybrid workload. Like building and packaging a game, which can consist of multiple processing steps like texture generation, mesh optimisation, pre-backed retraced lighting, compilation, asset compression etc., all using a parallel build system. This kind of stuff goes in the direction you mention while still being practically relevant.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
24,237
16,892
146
Jul 27, 2020
24,237
16,892
146
Do we have to have benchmarks that are designed to favor AMD? It's never going to be 100% fair.
It's not about favoring AMD. It's a valid use case. Lots of people will run some workload and start doing something else while the original workload chugs along in the background. AMD is just better at it and if there is more awareness about that through benchmarks exposing this benefit of AMD's architecture, it will also pressure Intel to improve the deficiencies in their architecture to address this common use case.
 
  • Like
Reactions: moinmoin

Kocicak

Golden Member
Jan 17, 2019
1,177
1,232
136

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!
They run various benchmarks while looping video transcoding task. I am sorry, but quantifying effects of ONE specific concurrent task with ONE set of parameters (video formats, transcoding parameters) has very limited general significance.

What if Alder lake CPU gets too distracted by one particular video format, while it could run some other format effortessly? Did they allow integrated graphics to help with the encoding, and if not, is it fair to test a product with a part intended for this specific workload disabled? Etc etc etc.

So yeah, I am not convinced that AMD CPUs are better in multitasking than Intel CPUs by this at all.
 
  • Like
Reactions: Nothingness

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Whover Thinks a 7950X and 13900K are better than The ThreadRipper Pro 5995WX because GeekBench6 says so has gone Senile. Simple as that. GB6 is a light "Real World" Benchmark that is design for Phones and Laptops.

When Xeon W9 releases this will be the worst app to use to test it.

Simply put it, Do you want to test Workstation performance? Don't use GB6, but PugetSystem or something else.
 

mikegg

Golden Member
Jan 30, 2010
1,885
501
136
Whover Thinks a 7950X and 13900K are better than The ThreadRipper Pro 5995WX because GeekBench6 says so has gone Senile. Simple as that. GB6 is a light "Real World" Benchmark that is design for Phones and Laptops.

When Xeon W9 releases this will be the worst app to use to test it.

Simply put it, Do you want to test Workstation performance? Don't use GB6, but PugetSystem or something else.
I 100% believe that a 7950x and a 13900K are better than a Zen3 Threadripper for most consumer applications.
 

mikegg

Golden Member
Jan 30, 2010
1,885
501
136

View attachment 76537

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!
So why don't you just run Geekbench 6 while VLC is encoding some video and see. It seems bizarre that you think Geekbench should implement this sort of test.
 
  • Like
Reactions: Nothingness

Abwx

Lifer
Apr 2, 2011
11,783
4,692
136
I 100% believe that a 7950x and a 13900K are better than a Zen3 Threadripper for most consumer applications.

7 ZIP, Handbrake or Agisoft are all consumers apps, besides tests that focus on a single task with a low core count are useless to estimate the perfs since the CPU is underused while the task use very low time for exe, that s akin to microbenchmarks that say nothing about the perfs when there s a really demanding set of loads or a soft like the ones i quoted.
 
  • Haha
  • Like
Reactions: mikegg and Tlh97

RTX2080

Senior member
Jul 2, 2018
334
533
136

View attachment 76537

I don't know who came up with this test (Anand or Ganesh or someone else) but it's pure genius. Shows how 4800U/5800U trump ADL-mobile in concurrent workloads. 12th gen is worse than even Comet Lake!


This is the big-little design disaster. In theoretical test it's impossible that 1260P would lose to a 5800U and even not any better than a 4800U, but the multitasking while handled by crappy Thread Detector is the only explaination for this. Frequently switching between big and little cause too much latency and perf loss. Similar situations happened every time I heard from others are either multitask lagging or perf loss during aggressive switching, and sometimes even as bad as only the little cores working while big cores keep idle.
 
Last edited:
  • Love
Reactions: igor_kavinski

mikegg

Golden Member
Jan 30, 2010
1,885
501
136
7 ZIP, Handbrake or Agisoft are all consumers apps, besides tests that focus on a single task with a low core count are useless to estimate the perfs since the CPU is underused while the task use very low time for exe, that s akin to microbenchmarks that say nothing about the perfs when there s a really demanding set of loads or a soft like the ones i quoted.

I said most. Not all. There's a name for this kind of fallacy where someone tries to argument a statement as not true by providing a few niche examples.

How much 7zip, Handbrake, Agisoft are people doing on a 13900k?

It depends. What application are we trying to use? An Android phone could be better than a 13900k. It depends, right?

Also, this erroneous result doesn't change anything.

Why are people so sensitive about this? This statement isn't controversial: A 7950x and a 13900k should run consumer applications better than a Zen3 Threadripper. Relax people.
 

Abwx

Lifer
Apr 2, 2011
11,783
4,692
136
I said most. Not all. There's a name for this kind of fallacy where someone tries to argument a statement as not true by providing a few niche examples.

How much 7zip, Handbrake, Agisoft are people doing on a 13900k?


It depends. What application are we trying to use? An Android phone could be better than a 13900k. It depends, right?

Also, this erroneous result doesn't change anything.

Why are people so sensitive about this? This statement isn't controversial: A 7950x and a 13900k should run consumer applications better than a Zen3 Threadripper. Relax people.


The thing is that GB does no more provide the extent of the perfs, whatever has enough perf is stuck to about 20 000 pts, the fallacy is to state that such an arbitrary measurement is good enough hence anything that is better at some tasks will be counted as equal if not slower...

FI a 7700X@88W is quite faster in the MT test than a 5950X, yet in general usage ,like browsing, there wont be any difference while in heavy MT the latter will be way faster, so the usage perf is not reflected, a browser that react at 0.05s instead of 0.06s is a useless measurement because it s not that kind of tasks whose exe speed does matter to the user.

Edit : Computerbase made an article about GB6 and a graph populated by their forumers with ST et MT scores, i guess it s now clear for whom it was specifically re "designed", and certainly not for the users needs...

 
Last edited:
  • Like
Reactions: igor_kavinski

poke01

Diamond Member
Mar 8, 2022
3,433
4,713
106
There is still a flaw inherent in GB6's testing methodology. It's not measuring multitasking efficiency which is what AMD excels at. They should run at least two different tests concurrently to see how the CPU handles multiple workloads thrown its way.
Comparing a 13900k and a 7950x. The 13900k comes out ahead in MT. This makes sense because 13900K has more cores.

But was Geekbench never perfect in MT.
No non-x86 chip on GB6 is close to the i9 but Apple's. That's fake.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
P.S. I do think that excluding AVX-512 was a bit of a shame. If a CPU supports the technology and can perform faster on a task, it should be used. At the same time, I do understand the decision — AVX512 isn't really used all that much in desktop software, and basing benchmarks off it might create unrealistic expectations.

I think the "unrealistic expectations" part is spot on here. With current set of tests, there are not that many where AVX512 could help much with direct recompile of same code. Maybe photo library stuff, photo filter.
The other tests could hardly use AVX512 without proper handwritten code paths and that would defeat multi platform and vendor "agnostic" part of the tests - ARM guys would ask for SVE3 with 666bit vectors and so on. In previous GB5 we had retarded outliers like say FFT or encryption that were directly calculating "throughput" and would probably make sense to double FLops just by using some vendor library or some short code that gets autovectorized and support AVX512.
Even then CPU vendors found that it is easier to pad the score by including some 256bit V_AES instruction with ridiculous throughput that made some new laptop beat a whole server in AES encryption throughput ( and of course utterly suck in real world, as to actually serve up content for encryption and deliver it/from requires actual server).

So GB6 is my opinion a great desktop/workstation performance test that emphasizes the way people actually use CPUs in 2023 and is brave enough to shatter illusions of people who disagree that 8 strong cores are plenty and the rest ( be it 8 more strong cores or 16 marketing cores ) gives very diminishing gains. Kudos to them for not catering to Cinebench/DC runner crowd.
 
  • Like
Reactions: Etain05 and Exist50

moinmoin

Diamond Member
Jun 1, 2017
5,206
8,367
136
They aren't 'ratios' or anything like that.
No, but there is a strong relation between the ST and the MT score, regardless of the actual amount of cores present. Which makes the latter effectively meaningless.

Zen 4 Threadripper will very likely tell the complete story. A combination of high clocked Zen 4 cores should, in theory, beat out the 7950X significantly.
The numbers I see so far and their ratios/balance between ST and MT make me expect Zen 4 Threadripper to beat 7950X slightly, but by far not reflecting the additional amount of cores, rather reflecting the higher ST achievable on more cores across the chip.

GB6 is highly tuned to be CPU agnostic.
Beyond a rather small amount GB6's MT score is highly tuned to be agnostic to the amount of cores present.

That's actually how most workloads scale to MT. Main thread is dominant and there's a limit how much can be offloaded to child threads. ARM designs for phones are tuned preciously to that, one prime core to run main thread and somehow less powerful cores to offload main thread. Only very special case of works scale to unlimited number of threads - and for normal desktop/phone use cores beyond ~8 are just as beneficial as Geekbench6 shows them to be.
Yeah, single specific workload run in isolation. That's not what happens in reality. That's also not what users looking at 7950X, Threadripper, workstation or server chips usually want. Instead they want to run a higher amount of workloads concurrently without hitting bottlenecks. For that purpose GB6's MT score is completely misleading while GB4's and GB5's MT scores were rather serviceable.

Through marketing, first AMD, and now Intel, the x86 CPU industry has brainwashed people into using Cinebench
Reminder that before Ryzen Intel actually started with Cinebench as marketing benchmark (wtftech article and slide from 2011). Cinebench was known as Intel-optimized to boot, so obviously once AMD could showcase beating Intel with it using Ryzen they did just that.

intel_ivy_bridge_perf1tern.jpg
 
  • Like
Reactions: inf64

hemedans

Senior member
Jan 31, 2015
254
143
116
Whover Thinks a 7950X and 13900K are better than The ThreadRipper Pro 5995WX because GeekBench6 says so has gone Senile. Simple as that. GB6 is a light "Real World" Benchmark that is design for Phones and Laptops.

When Xeon W9 releases this will be the worst app to use to test it.

Simply put it, Do you want to test Workstation performance? Don't use GB6, but PugetSystem or something else.
It's not even Good for phones, it measure only Burst performance, most phones nowadays can't even sustain 70% of their performance.
 

lightmanek

Senior member
Feb 19, 2017
508
1,245
136
Geekbench 6 is good for ST and benchamrking OS/App responsiveness, but it is nowhere near good at indicating MT performance for prosumers. Luckily we have a ton of other, more useful benchmarks for that, so I'm OK with GB 6 showing MT performance as it is, as long as people understand that is not what you look at to evaluate CPU's for HPC/heavy CPU transcode/Code Compile/Scientific and many more.
One feature I would like to see in GB is to have ability to select duration of each task, so you could adjust the load on CPU and see how it can sustain high clocks at light MT loads and when you might be hitting power / cache limits with increased loads.

GB6 to me is like EPA car mileage to real world use - yes, it is a good indication between manufacturers and you can strive to achieve claimed results when using a car in very specific way, but your mileage will vary depending on a lot of factors in real world.
 
Last edited:
  • Like
Reactions: gdansk

name99

Senior member
Sep 11, 2010
597
491
136
At the same time one has to wonder why GB did shrink the eventual buyer s population, that s not the best way to sell more of their stuff, quite the contrary, and so much that it s somewhat suspicious...

Uhh, wot?
So your contention is that most people were buying GB5 to engage in dick-measuring, and are upset that it's now targeted at useful information rather than dick-measuring?
OK...
 

roger_k

Member
Sep 23, 2021
102
219
86
No, but there is a strong relation between the ST and the MT score, regardless of the actual amount of cores present. Which makes the latter effectively meaningless.

No, it just shows that multithreading has overhead and does not scale all that well with the number of cores. And it also shows that Intel's E-cores are only useful for trivial parallel workloads.


Beyond a rather small amount GB6's MT score is highly tuned to be agnostic to the amount of cores present.

Not really. It's just how CPUs are marketed. Higher-end CPUs usually have both more cores and higher single-core boost. Which is why for many of the current models the two scores appear correlated. Well, because they are, that's how the chip is configured to begin with.

It becomes a bit more clear with Apple who is probably the only manufacturer that ships CPUs with consistent clock configuration across models. If you look at MT scores of M1, M1 Pro/Max, and M1 Ultra you will see that every additional cluster of four cores adds about 50% of the original MT score. That's your locking/cache coherency overhead. Makes sense to me.

Yeah, single specific workload run in isolation. That's not what happens in reality. That's also not what users looking at 7950X, Threadripper, workstation or server chips usually want. Instead they want to run a higher amount of workloads concurrently without hitting bottlenecks. For that purpose GB6's MT score is completely misleading while GB4's and GB5's MT scores were rather serviceable.

For some workloads, sure. But GB6 also includes trivially parallelizable tasks that scale very well with the number of cores, like RT. And of course, if you have some specific use case in mind you should benchmark that use case.



Reminder that before Ryzen Intel actually started with Cinebench as marketing benchmark (wtftech article and slide from 2011). Cinebench was known as Intel-optimized to boot, so obviously once AMD could showcase beating Intel with it using Ryzen they did just that.

Cinebench massively favours high core counts (trivially parallel work) as well as fast/wide SIMD units and caches (long data dependency chains, SIMD heavy). It overestimates performance in any kind of workload that doesn't use SIMD that much or actual needs some cooperation between the cores.

Luckily we have a ton of other, more useful benchmarks for that, so I'm OK with GB 6 showing MT performance as it is, as long as people understand that is not what you look at to evaluate CPU's for HPC/heavy CPU transcode/Code Compile/Scientific and many more.

I don't think it's that much off for code compile. Compilation is known to show diminishing returns with increasing core count.


Here you can see how doubling the amount of cores (with the same power/frequency per core) doesn't reduce the time in half. The scaling observed there isn't that different from what we see in GB6.
 

moinmoin

Diamond Member
Jun 1, 2017
5,206
8,367
136
For some workloads, sure. But GB6 also includes trivially parallelizable tasks that scale very well with the number of cores, like RT. And of course, if you have some specific use case in mind you should benchmark that use case.
GB6 also including trivially parallelizable tasks is meaningless if the overall MT score barely reflects the actual core count. Ideally GB would split up MT scores between workload tests only using a limited amount of cores and parallelizable tests extending to all available threads.

Isolated benchmarks only make sense for isolated workloads. As you yourself point out at length workloads that scale well to any number of cores are the exception. But typical usage of chips that contain a massive amount of cores is not running isolated workloads, especially in servers it's running a lot of workloads concurrently. For that purpose GB6's MT score is just completely useless at best and misleading people at worst.
 
  • Like
Reactions: lightmanek

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
GB6 also including trivially parallelizable tasks is meaningless if the overall MT score barely reflects the actual core count. Ideally GB would split up MT scores between workload tests only using a limited amount of cores and parallelizable tests extending to all available threads.

Isolated benchmarks only make sense for isolated workloads. As you yourself point out at length workloads that scale well to any number of cores are the exception. But typical usage of chips that contain a massive amount of cores is not running isolated workloads, especially in servers it's running a lot of workloads concurrently. For that purpose GB6's MT score is just completely useless at best and misleading people at worst.
Geekbench was never meant to be, nor ever has been, a good workload by which to judge servers.