They're (Almost) All Dirty: The State of Cheating in Android Benchmarks

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
Guys the benchmarks are poorly coded and the OEM govenors are set to loosely to trigger all cores to hit full clocks through out the whole bench and that is why OEMS are putting these in place to make sure the bench runs all the cores without shutting a few off when it tests single threads.

Please someone anyone post one single piece of prof saying the snap dragon 800 can't run all 4 cores at 2.3ghz in regular use.
So why are they only enabling the governors on a "few" select apps rather than on all apps.
The benchmarks are poorly coded, but all other Android apps are coded perfectly so there's no need optimize for them? Lol...

Then why limit it to only benchmarking apps rather than allowing all apps to do the same if the SoC can already run all 4 cores at 2.3GHz in regular use?
 

grkM3

Golden Member
Jul 29, 2011
1,407
0
0
So why are they only enabling the governors on a "few" select apps rather than on all apps.
The benchmarks are poorly coded, but all other Android apps are coded perfectly so there's no need optimize for them? Lol...

Then why limit it to only benchmarking apps rather than allowing all apps to do the same if the SoC can already run all 4 cores at 2.3GHz in regular use?

If the app is not stressing the soc enough it will not run the cores at max clocks.I'm guessing you have never messed with kernel settings and governors on android.

You usually have 4 to 5 choices in the kernel with power voltages and clocks speeds that adapt with how much demand the soc is seeing.Samsung also added a really low power saving profile that you can enable called ....power saver!

OEMS are trying to balance power with demand but if the app is not demand enough load on the CPUs the main profile its running will take affect and not clock the chip max clocks because say its set to lazy governor or say aggressive or max performance etc.there are usually 4-5 profiles in the kernel and the benchmark app should make the cell run in max performance profile and this is built into the android kernel.

Its there for devs to use and Samsung and others are doing what the devs are not doing and changing the profile for them when a bench app is run

People are also over clocking the socs by 300-400 MHz with modified kernels to allow over clocking straps.

Care to explain how the chip is benching way past 2.3ghz on some modded kernels without throttling?

Againg show me proof that the s800 can't run all 4 cores in regular use at 2.3ghz.you seem to think that the soc never hits 2.3,GHz on its own without these optimizations and that is just plain wrong.

Go YouTube reference s800 boards with CPU power meters running showing how it ramps up all 4 cores when pushed hard.
 
Last edited:

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
Guys the benchmarks are poorly coded and the OEM govenors are set to loosely to trigger all cores to hit full clocks through out the whole bench and that is why OEMS are putting these in place to make sure the bench runs all the cores without shutting a few off when it tests single threads.
I guess this is why Motorola and Google never do this.

I'll bet anything you can see all 4 cores being used recording 4k video on the note 3 and you can record at 4k for 5 min at a time.
Snapdragon 800 has a dedicated hardware H.264 encoder for video.

The a7 in the 5s also clocks at 1.7ghz not 1.2 like some are saying
Anand says 1.3 GHz. Dunno where you got 1.7 GHz from.

Anyways, it seems you're doing the same thing you always do in these threads, which is to pull values and specs out of thin air.
 

WelshBloke

Lifer
Jan 12, 2005
33,108
11,287
136
The problem is with the benchmarks.

From the "cheaters" side the active cycle of the benchmark is so short that the act of ramping up and down the speed of the CPU from idle to full to idle affects the score negatively.

From the consumers side the active cycle of the benchmark is so short that it doesn't really put the CPU under any stress so it can perform abnormally well for the short time of the benchmark.

Easy solution. Make the active span of the benchmark much longer. Even if the OEM tries to cheat it should be more indicative of the real world performance of the CPU.
 

WelshBloke

Lifer
Jan 12, 2005
33,108
11,287
136
AnTuTu releasing AnTuTu X to combat cheating by Android OEMs.

http://antutu.com/view.shtml?id=7242

NO.2To cheat with a temporary high performance aiming at AnTuTu

  Those devices have added a recognition to AnTuTu Benchmark in their system. For instance, the highest CPU frequency in normal condition is 480MHz, while after recognized the operation of AnTuTu Benchmark, the CPU frequency will self-improve to above 500MHz. By adjusting voltage, a short-time over-clock won’t cause the calorification and instability of devices, but can somehow boost the final result. As a consequence, the result customers saw is differ with the real fluency feel in daily use.


In test process, we will call the upper limit performance of system. A full load operation do fully express the performance of devices, but by using a non-normal over-clock method to ger a high score is useless.

That seems like a sticking plaster over the fact that their benchmark is so short that thermal issues don't get taken into effect.

The answer, as I said earlier, is to make the benchmark so it stresses the device for longer not to guestimate what you think the device should be running at.
 

MrX8503

Diamond Member
Oct 23, 2005
4,529
0
0
Guys the benchmarks are poorly coded and the OEM govenors are set to loosely to trigger all cores to hit full clocks through out the whole bench and that is why OEMS are putting these in place to make sure the bench runs all the cores without shutting a few off when it tests single threads.

Please someone anyone post one single piece of proof saying the snap dragon 800 can't run all 4 cores at 2.3ghz in regular use.

I'll bet anything you can see all 4 cores being used recording 4k video on the note 3 and you can record at 4k for 5 min at a time.
G
The a7 in the 5s also clocks at 1.7ghz not 1.2 like some are saying

You have to set these profiles as the apos are not coded to max out the cores through the whole bench and the kernel is always trying to adapt the soc for max battery life and power boost when it sees full load.

Make a benchmark that runs 4threads and demands full CPU calculations and the kernel will clock the s800 max clocks until it needs to throttle and these benchmarks are not pushing the chip long enough to throttle it.

I can run 4 back to back runs with the scores staying within 2% and the cell is not hot at all.it actually gets warmer using 4g and web surfing.

The s800 soc runs pretty cold and can easily sustain 2.3ghz on all 4 cores and the exynos 5 octa evolved can run all 8 at once along with pushing code to the GPUs

You sure do make up a lot of stuff.

This is the same guy that said Samsung could have released an A15 smartphone in 2011 if they wanted to, lol. Its funny how some people are trying to give these OEMs a pass.
 
Last edited:

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
I dunno, what he wrote there seems pretty accurate actually.
Nope. MrX8503 had it right, and given who posted that, it should come as no surprise.

That seems like a sticking plaster over the fact that their benchmark is so short that thermal issues don't get taken into effect.

The answer, as I said earlier, is to make the benchmark so it stresses the device for longer not to guestimate what you think the device should be running at.
I agree that benchmarks can be improved, but re: that bolded part, are you talking about them not wanting to run on overclocked devices? Or maybe it's their English, because English is obviously not their first language.

In the meantime, trying to camouflage themselves from benchmark detectors is a reasonable first step.
 
Last edited:

WelshBloke

Lifer
Jan 12, 2005
33,108
11,287
136
Nope. MrX8503 had it right, and given who posted that, it should come as no surprise.

I dont tend to remember a lot of posters histories (theres a few notable exceptions) so I was just going on what was in the post he made. Ignoring the clock speed figures he posted (as I cant remember if those are right or not) what seemed off with his post?


I agree that benchmarks can be improved, but re: that bolded part, are you talking about them not wanting to run on overclocked devices? Or maybe it's their English, because English is obviously not their first language.

In the meantime, trying to camouflage themselves from benchmark detectors is a reasonable first step.

Benchmarks should test how fast the device can run at the speed its running, it shouldn't test at the speed you think it should be running. If a device is stable and sold at a speed which is higher than others why is that bad?
 

shortylickens

No Lifer
Jul 15, 2003
80,287
17,081
136
You'd be amazed how many 3D games I DONT play on my phone.

So long as web pages dont take 20 seconds to load, I'm basically happy.
 

ControlD

Diamond Member
Apr 25, 2005
5,440
44
91
You'd be amazed how many 3D games I DONT play on my phone.

So long as web pages dont take 20 seconds to load, I'm basically happy.

Yep. I'm not sure exactly what the purpose is behind cheating at these benchmarks in the first place. I suppose in the Android world there is more of an incentive in trying to set yourself apart from the crowd because there are so many players using similar hardware.

Still, I can't believe benchmarks probably play a huge role in most consumers' decisions on what phone to purchase. I don't see many apps out there requiring cutting edge hardware either. Angry Birds and Candy Crush play just fine on a iPhone 4 or Droid X. I bought my Note 2 because of the features it gave me, not because it had a Snap dragon quad core processor in it.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
I dont tend to remember a lot of posters histories (theres a few notable exceptions) so I was just going on what was in the post he made. Ignoring the clock speed figures he posted (as I cant remember if those are right or not) what seemed off with his post?
We shouldn't give him a free pass on the clock speeds, given his history. And since the 1.3 GHz number comes from AnandTech, we definitely shouldn't be giving him a pass on this. ;)

Also, he was trying to use the example of 4k video recording being OK for 5 minutes, as as an example why the chip is fine with all 4 cores stressed for extended periods. So, I pointed out (in not so many words) that that is unlikely to be true, since Snapdragon 800 has a hardware H.264 video encoder.

http://www.anandtech.com/show/7082/...ce-preview-qualcomm-mobile-development-tablet


Benchmarks should test how fast the device can run at the speed its running, it shouldn't test at the speed you think it should be running. If a device is stable and sold at a speed which is higher than others why is that bad?
Well, this modification of the AnTuTu benchmark is not just to address the issues with the major international OEMs, but also some of the smaller Chinese ones, which do even more acrobatics to game the system.

Furthermore your question is like asking why OEMs shouldn't be allowed to overclock all their systems from the factory, but just for benchmarks, because they can be stable enough for several minutes of usage. If you suggested that to the desktop people they'd all laugh at you, for good reason.

I should also clarify that AnTuTu releasing a new version of the software to combat benchmarking shenanigans doesn't preclude you from running the regular version, which is still available. It sounds like from their post that the internals for the benchmark suite are identical, but the special version goes through checks to ensure the cheating isn't happening. If fact, I hope tech sites run BOTH versions from now on, just to see the differences.

Yep. I'm not sure exactly what the purpose is behind cheating at these benchmarks in the first place. I suppose in the Android world there is more of an incentive in trying to set yourself apart from the crowd because there are so many players using similar hardware.

Still, I can't believe benchmarks probably play a huge role in most consumers' decisions on what phone to purchase. I don't see many apps out there requiring cutting edge hardware either. Angry Birds and Candy Crush play just fine on a iPhone 4 or Droid X. I bought my Note 2 because of the features it gave me, not because it had a Snap dragon quad core processor in it.
One benchmark I think some end users might be interested in is the fact that video exporting (ie. encoding) takes twice as long on the iPhone 5 as it does on the 5S. That's a real world benchmark that has real significance to consumers.
 

ControlD

Diamond Member
Apr 25, 2005
5,440
44
91
One benchmark I think some end users might be interested in is the fact that video exporting (ie. encoding) takes twice as long on the iPhone 5 as it does on the 5S. That's a real world benchmark that has real significance to consumers.

That's a good example, but how many people are actually doing much encoding on their phones? It bet it is somewhere around (but not quite) 0%.

Is video exporting part of the standard benchmark suite used in these reviews?
 

WelshBloke

Lifer
Jan 12, 2005
33,108
11,287
136
...
Furthermore your question is like asking why OEMs shouldn't be allowed to overclock all their systems from the factory, but just for benchmarks, because they can be stable enough for several minutes of usage. If you suggested that to the desktop people they'd all laugh at you, for good reason...

No. If you suggested to desktop people that benchmark software would only benchmark hardware at stock clocks you'd get quite a reaction as well.

I wasn't suggesting what you are saying. The reason the OEMs are gaming the benchmarks is because the benchmarks are making no effort to behave like real world software. If they did there'd be no point in not letting all apps run at the boosted settings.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
That's a good example, but how many people are actually doing much encoding on their phones? It bet it is somewhere around (but not quite) 0%.
Just about anyone emailing a slo-mo video from their 5S of their toddler dancing to Sesame Street music video will be doing video processing on the phone. Cuz the clip is actually captured as full-speed 120 fps. If you export it as is, there is no slo-mo at all. Plus it's full high bit rate 120 fps 1080p, not something you'd be emailing too often.

No. If you suggested to desktop people that benchmark software would only benchmark hardware at stock clocks you'd get quite a reaction as well.

I wasn't suggesting what you are saying. The reason the OEMs are gaming the benchmarks is because the benchmarks are making no effort to behave like real world software. If they did there'd be no point in not letting all apps run at the boosted settings.
Like I said, BOTH versions of the benchmark software are available. One regular one, and one with the benchmark no-cheating modifications.

This wasn't available when Ars did their tests, so they created their own quick hack of the benchmark software with one of those no-cheating modifications. That's how they were able to show graphs with cheating on vs. cheating off.
 

grkM3

Golden Member
Jul 29, 2011
1,407
0
0
When the a7 came out I read on 2 sites it was clocking up to 1.7ghz if that is not correct I'm sorry but please someone post me some proof that s800 or s600 or exynos octa can't run max cores at max MHz.

If you watch big little videos at ces they show all cores pegging max clocks when pushed hard and for Mr MX Samsung had the a15 made in 2011 and maybe you should bring back some of your old posts saying a6 was not apples version of the a15/a9 hybrid like the snapdragon was and kept saying apple will come out with an a15 soc next.

Again if you go on YouTube and watch CPU monitors showing s800 or octa they all show instances when all the cores are pegged and the new exynos evolvled will peg all 8 cores max clocks if the demand is there
 

MrX8503

Diamond Member
Oct 23, 2005
4,529
0
0
No. If you suggested to desktop people that benchmark software would only benchmark hardware at stock clocks you'd get quite a reaction as well.

You can't run the Note 3 at these boosted speeds all the time whereas you CAN OC a desktop and run it 24/7. There's a big difference.
 

grkM3

Golden Member
Jul 29, 2011
1,407
0
0
You can't run the Note 3 at these boosted speeds all the time whereas you CAN OC a desktop and run it 24/7. There's a big difference.

Prove it

PS you can't run any chip maxed out full load but under normal use there is no reason why a note 3 can't hit full clocks across all cores until the chip gets over its tdp limit.
 

desura

Diamond Member
Mar 22, 2013
4,627
129
101
lol @ android advocates who whip out their benchmarks to prove superiority of the platform.

As opposed to um, a holistic view of usability and actual practicality.
 

ControlD

Diamond Member
Apr 25, 2005
5,440
44
91
lol @ android advocates who whip out their benchmarks to prove superiority of the platform.

As opposed to um, a holistic view of usability and actual practicality.

Yeah, because nobody is bragging about the A7 benchmarks at all.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
PS you can't run any chip maxed out full load but under normal use there is no reason why a note 3 can't hit full clocks across all cores until the chip gets over its tdp limit.
I ran my desktop CPUs at full load for months at a time (distributed computing) before I decided it was too much of a waste of electricity. Now my computers sleep when I'm not around.
 

MrX8503

Diamond Member
Oct 23, 2005
4,529
0
0
Prove it

PS you can't run any chip maxed out full load but under normal use there is no reason why a note 3 can't hit full clocks across all cores until the chip gets over its tdp limit.

You can't prove that it can run 24/7 without thermals or battery life hindering your usage. The OEMs cheated, accept it.
 

WelshBloke

Lifer
Jan 12, 2005
33,108
11,287
136
... Like I said, BOTH versions of the benchmark software are available. One regular one, and one with the benchmark no-cheating modifications.

This wasn't available when Ars did their tests, so they created their own quick hack of the benchmark software with one of those no-cheating modifications. That's how they were able to show graphs with cheating on vs. cheating off.

But both of those benchmarks are still essentially worthless. They are testing a high load for a very short amount of time. Just because a phone gets through that without thermally throttling doesn't mean that it could do it in the real world.
Just renaming the apk isn't addressing the fundamental problem.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
But both of those benchmarks are still essentially worthless. They are testing a high load for a very short amount of time. Just because a phone gets through that without thermally throttling doesn't mean that it could do it in the real world.
Just renaming the apk isn't addressing the fundamental problem.
I'm sure they appreciate people calling all their hard work worthless. Maybe more constructive criticism is an idea.

I agree though that mobile benchmarketing in general needs to be improved. Calling out the OEMs for doing this is an important first step. Kudos to Anand for putting their work on this on the front page of AnandTech.

BTW, there are some other benchmarking tests available, like battery life tests. These typically run the phone at low usage, to see how long they last. Maybe you should check some of those out, if you don't think a high load for short time is important.