They're (Almost) All Dirty: The State of Cheating in Android Benchmarks

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

ControlD

Diamond Member
Apr 25, 2005
5,440
44
91
Hmm I do somewhat get that - but then how do you take bias out of a benchmark b/c it doesn't run equally well on all devices and certainly not across platforms.

Frankly, it seems like the only true real life "benchmarks" would be actual real life tests - not synthetic tests.
- production games available across platforms
- heavy duty video encoding using the same codec
- some super heavy Excel number crunching
- others like this

I agree completely with your conclusion. Real world tests are what should be stressed in the end.

To the prior point, that is what the benchmark is showing. How is the same code handled on different systems? Not running equally well on different systems is what is being looked for. The benchmark should be showing the strengths and weaknesses of different platforms by stressing the systems (with the same code) with no tweaks employed.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
I should have been clearer, I was talking about modern (Core i series chips) not ancient Core 2 systems. Still the point is fair, mobile CPUs are getting to be quite powerful in a short period of time.
It's been quite remarkable. ARM has been doubling CPU speed almost every generation or so in recent years.

However, that said, I'm predicting that Apple's A8 will be an incremental improvement speed-wise over today's A7, not a doubling.
 

dawheat

Diamond Member
Sep 14, 2000
3,132
93
91
For example, the Snapdragon 800 is the same across different brands of phone models. The chip manufacturer (Qualcomm) has designed the chip to work in a specific way. Those who don't game the system allow the chip to function as designed. Those who do (esp. Samsung) specifically design the OS to bypass this normal behaviour, but they do it ONLY for benchmarks and nothing else. For EVERYTHING else, it works the way the manufacturer intended. Furthermore, if you simply change the name of the benchmark, it works the way the manufacturer intended, because the OS doesn't know you're running that specific benchmark.

That is indefensible.

The SOC manufacturer designed their chips to work at certain frequencies within certain thermal windows. How well a program takes advantage of the hardware is hugely dependent on how well it's optimized, etc.

You do have a point is that the same benchmark should run similarly on the same SOC within the bounds of slightly higher clockspeeds and additional memory. But no one really cares about comparing the same SOC - these benchmarks are most useful to compare across SOCs which was my primary concern. How can you be confident that a benchmark doesn't have a bias (almost always unintentional) toward a certain SOC and not others. What if it does or does not take advantage of an optimization that exists in a certain SOC but only needs software manufactures to add in the future - is that right or wrong? We've all seen where taking advantage of some new hardware extension can fundamentally change a benchmark score - should the benchmark be updated to support that extension? Should it support all extensions possible for fairness? Only the most popular ones?
 
Last edited:

dawheat

Diamond Member
Sep 14, 2000
3,132
93
91
I dunno, but that is a really big waste of time IMO. What these guys really should be doing is optimizing the hell out of the OS. That's what Apple does. Apple also optimizes the compilers to make use of the chip it designs as well as possible. That does lead to faster benchmark results, but it also leads to faster software in general.

Samsung looking bad x 2 here, because not only are they wasting time paying attention to benchmarks, they're doing so with a product that doesn't even run basic apps properly. The worst example of this was illustrated in Ars' review. It took 2.5 MINUTES to open the Gallery app just to look at pictures on the phone. What is this 1995?

Seriously will you stop beating a dead horse. I get you have a bias - but you don't have to have such a hardon about it. There are actually threads on xda from Note 3 owners saying Gallery pops up instantly and I don't think I've seen anyone yet be able to replicate the issue Ars did. There clearly is a bug in the code, likely related to when there are a lot of Google+/Dropbox photos. This will get fixed, just like iMessage dropping messages for Apple users on iOS7. It's a bug - no more, no less.

Unless you want to state that Apple can't even run basic apps like Messaging properly :) I'd hazard to say dropped messages are a bit more critical than slow Gallery loading in day to day life.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
Seriously will you stop beating a dead horse. I get you have a bias - but you don't have to have such a hardon about it. There are actually threads on xda from Note 3 owners saying Gallery pops up instantly and I don't think I've seen anyone yet be able to replicate the issue Ars did. There clearly is a bug in the code, likely related to when there are a lot of Google+/Dropbox photos. This will get fixed, just like iMessage dropping messages for Apple users on iOS7. It's a bug - no more, no less.
Of course it's a bug. And it's one that shouldn't have been let through.

Unless you want to state that Apple can't even run basic apps like Messaging properly :) I'd hazard to say dropped messages are a bit more critical than slow Gallery loading in day to day life.
I agree. I don't run iMessage at all because I find it less reliable than SMS. And I think that's lame.

SMS, at least with the carriers I deal with, is bulletproof. iMessage, not so much. I have it turned off for my wife's iPhone 4, and will have it turned off for my iPhone 5S. I had turned it off when it first came out because it was unreliable. Then I tried a few weeks later and it was still unreliable. Then I had left it off ever since, but even in 2013 it seems Apple still hasn't figured this stuff out.

http://techcrunch.com/2013/10/01/ap...message-users-seeing-issues-working-on-a-fix/

The only really good thing about iMessage IMO is that its introduction has forced the carriers to rethink their plans. Because of the revamped plans, I now have free unlimited domestic and international SMS and MMS, so there is no reason for me at all to go back to iMessage.

SMS and MMS have the advantage of being cross-platform too, and don't even require a smartphone.
 
Last edited:

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
Hmm I do somewhat get that - but then how do you take bias out of a benchmark b/c it doesn't run equally well on all devices and certainly not across platforms.

Frankly, it seems like the only true real life "benchmarks" would be actual real life tests - not synthetic tests.
- production games available across platforms
- heavy duty video encoding using the same codec
- some super heavy Excel number crunching
- others like this
What exactly makes you think those "real life" tasks/benchmarks you mentioned can't be cheated upon?

It a bit like stating "3DMarks is a synthetic benchmark", run "Quake 3" or "Tomb Raider" which are true real life benchmarks instead.
ATI didn't cheat in Quake 3?
Nvidia didn't cheat in Tomb Raider?

Or are you automatically assuming that it's not possible to cheat in video encoding or Excel number crunching tests?
I guarantee you that if those tests prove to be as popular as running 3DMark or GLBenchmark, Samsung/HTC/LG will cheat there as well.
 

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
I dunno, but that is a really big waste of time IMO. What these guys really should be doing is optimizing the hell out of the OS. That's what Apple does. Apple also optimizes the compilers to make use of the chip it designs as well as possible. That does lead to faster benchmark results, but it also leads to faster software in general.

Samsung looking bad x 2 here, because not only are they wasting time paying attention to benchmarks, they're doing so with a product that doesn't even run basic apps properly. The worst example of this was illustrated in Ars' review. It took 2.5 MINUTES to open the Gallery app just to look at pictures on the phone. What is this 1995?
No, thank you.
I already hate TouchWiz, Sense, and LG's nonsense. If they doubled their efforts there, then they would only make them even more crappy.

I don't get the uproar with Intel on AnTuTu, and I don't see how them using their own compiler is cheating.
Why should an x86 processor be gimped by running an ARM compiler and not an x86 one developed by Intel?
 

dawheat

Diamond Member
Sep 14, 2000
3,132
93
91
My last post in this thread - only b/c I don't think it's productive for me to stay in it.

Not to rag on the OP since he seems like a decent guy, but he clearly finds it necessary to post and repeat a specific Note 3 issues (Gallery) that one reviewer experienced, while clearly not interested in buying the Note 3 or an Android device for that matter.

So what is this - some sort of PSA service? I get it - you like Apple devices. And there's a lot to like there, but no it's not the perfect device either. You don't have to validate your choices by being negative on the competition. My wife prefers her iPhone and I'm glad that she has a device she likes.

The Note 3 is a fantastic device, the posts by owners on XDA and my own in-store experience have made me very eager to get mine delivered, and no amount of posting by you is going to change the fact that it is an all-round exceptional device with capabilities no other smartphone has. No it's not perfect and yes it has bugs - but you're not going to change anyone's mind with your repetitive yammering.
 

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
Of course it's a bug. And it's one that shouldn't have been let through.


I agree. I don't run iMessage at all because I find it less reliable than SMS. And I think that's lame.

SMS, at least with the carriers I deal with, is bulletproof. iMessage, not so much. I have it turned off for my wife's iPhone 4, and will have it turned off for my iPhone 5S. I had turned it off when it first came out because it was unreliable. Then I tried a few weeks later and it was still unreliable. Then I had left it off ever since, but even in 2013 it seems Apple still hasn't figured this stuff out.

http://techcrunch.com/2013/10/01/ap...message-users-seeing-issues-working-on-a-fix/

The only really good thing about iMessage IMO is that its introduction has forced the carriers to rethink their plans. Because of the revamped plans, I now have free unlimited domestic and international SMS and MMS, so there is no reason for me at all to go back to iMessage.

SMS and MMS have the advantage of being cross-platform too, and don't even require a smartphone.
iMessage forced carriers to stop ass raping on SMS and instead start ass raping on data by reducing data caps, setting data overages, and/or getting rid of unlimited data plans.

I don't see that as a good thing at all...
 

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
My last post in this thread - only b/c I don't think it's productive for me to stay in it.
You were typing your last response when I was typing my first response and we posted at almost the same time, so I hope you'll still respond to it.
 

thedosbox

Senior member
Oct 16, 2009
961
0
0
Not to rag on the OP since he seems like a decent guy, but he clearly finds it necessary to post and repeat a specific Note 3 issues (Gallery) that one reviewer experienced

Both androidpolice and ars experienced the Gallery slowdown.
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
My last post in this thread - only b/c I don't think it's productive for me to stay in it.

Not to rag on the OP since he seems like a decent guy, but he clearly finds it necessary to post and repeat a specific Note 3 issues (Gallery) that one reviewer experienced, while clearly not interested in buying the Note 3 or an Android device for that matter.
My current primary phone is Android. (At least until my iPhone 5S arrives, that is. ;))
My current primary tablet is Android. (I'll be sticking with this.)

You were saying?

iMessage forced carriers to stop ass raping on SMS and instead start ass raping on data by reducing data caps, setting data overages, and/or getting rid of unlimited data plans.

I don't see that as a good thing at all...
Well, things may be a little different in Canada. I actually got a better data plan for cheaper at the same time as getting my free unlimited SMS and MMS. Mind you, that was a holiday promotional plan from Dec. 2012, but still, it was a way better in-market retail plan than what my carrier has ever offered before. However, we haven't had true unlimited plans on my carrier since the GPRS days, and they were quite high priced too so most people never got them.
 
Last edited:

lopri

Elite Member
Jul 27, 2002
13,314
690
126
Not trying to excuse the bad behavior of the phone designers.

But would it be cheating if they have an option in the settings (since they do their own version of android this is trivial for them) where you can turn "on" or "off" these cheats; yet they keep these cheats "on" by default since everybody else does it.

You know just misleading people by not lying to them.

This is essentially how SLI/CF "profiles" were born. Those first began with benchmark optimizations, went through huge firestorms, and settled with what we have now in AMD/NV's drivers.

These smartphone "optimizations" and SLI/CF profiles for games are not logically distinguishable except for the fact the phone OEMs are optimizing things for the benchmarks and the benchmarks only. Lesson No.1: Always treat popular benchmarks with suspicion. AnandTech's smartphone benchs are full of useless tests (esp. those Java stuff that are always called out) that are ripe for this kind of "cheating".
 

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
Well, things may be a little different in Canada. I actually got a better data plan for cheaper at the same time as getting my free unlimited SMS and MMS. Mind you, that was a holiday promotional plan from Dec. 2012, but still, it was a way better in-market retail plan than what my carrier has ever offered before. However, we haven't had true unlimited plans on my carrier since the GPRS days, and they were quite high priced too so most people never got them.
Things are worse in Canada in most cases, than the US.
I thought the US was bad, but man...comparing the rates on AT&T, Verizon, Sprint, and T-Mobile to Rogers, Bell, and Telus websites and with 3 year contracts instead of the 2 years of ass raping we have to endure with our carriers here in the US.
Also:
cell-plan-cost.png

http://arstechnica.com/gadgets/2010/10/us-canada-lead-the-world-in-expensive-cell-packages/

You got yours on a "special, time-limited" promotion so I don't think you can compare that.

The people that had true unlimited plans since the GPRS days; are they grandfathered and allowed to continue getting subsidized phones, or are they grandfathered but have to pay out of pocket for the full cost of the phone like Verizon does here to keep your grandfathered status?
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
Yes, traditionally, our Canadian phone plans are expensive. I was just saying is that in 2012 they actually got better rather than worse. My plan was a promotional plan but the in-market plans were better than before too. They weren't good, just not as bad. They only became good for the promotional limited time offerings. Mind you they've gotten worse again this year. :p

Those with GPRS unlimited are grandfathered AFAIK and I believe they get phone subsidies, but Fido limits the unlimited speed to one generation behind. So right now they're still stuck on HSPA+ IIRC.

The speeds on HSPA+ in uncongested areas is decent, but in congested areas it can be unusable during busy times. Hence LTE really is necessary in some Canadian large urban centres.
 

Bateluer

Lifer
Jun 23, 2001
27,730
8
0
Since my Note 8 just got its 4.2.2 ota this morning, I need to play around more with it. Under 4.1.2, it was perfectly fast, fluid, with the only real issue being the intermittent little freezes. The screen would stop recognizing touches for a second or two, then continue as normal. Every app would open perfectly fast, background fast, etc. I assume the version of TW on the Note 3 is a more recent revision since that device runs 4.3 though.

I cannot say the same about Sense 5 on my HTC One though, despite it having a technically faster CPU than the Exy4412 in the Note 8. Sense, from my personal use and comparison between the Note 8 and One, was the more laggy and bloated skin by far. At least with TW's bloat, apps still work. In Sense, they don't and Play Store alternatives need to be installed for basic things, email, gallery, launcher, and so on.
 

makken

Golden Member
Aug 28, 2004
1,476
0
76
I would like to see the benchmark devs write a test into their benchmark that just idles the phone. If it detects the CPU refusing to down clock during this test, it would notify the user of cheating and automatically deduct 10% from the final scores.

that should get the OEM's attention.
 

Bateluer

Lifer
Jun 23, 2001
27,730
8
0
I would like to see the benchmark devs write a test into their benchmark that just idles the phone. If it detects the CPU refusing to down clock during this test, it would notify the user of cheating and automatically deduct 10% from the final scores.

that should get the OEM's attention.

Think they'd care?

I was going to say they should simply deny listing in online databases, but how would that effect custom ROMs and OC'd kernels? Block them too?
 

lopri

Elite Member
Jul 27, 2002
13,314
690
126
That's what made me tolerate Touchwiz. I have a Note 8 in my household and while I do not like the skin at all, I find it very functional along with all those hardware components.
 

lothar

Diamond Member
Jan 5, 2000
6,674
7
76
I would like to see the benchmark devs write a test into their benchmark that just idles the phone. If it detects the CPU refusing to down clock during this test, it would notify the user of cheating and automatically deduct 10% from the final scores.

that should get the OEM's attention.
And what if the cheating puts the OEM ahead by 20%?
Even after your automatic 10% deduction, the OEM will still be ahead.

You truly think Samsung(and HTC and LG) will stop cheating because of that?
 

Eug

Lifer
Mar 11, 2000
24,142
1,791
126
The sad part here is it's the geeks calling out the OEMs to stop them from hacking the profiles, not the other way around.

How things have turned around.

It's almost like the OEMs are the parents'-basement-dwelling geek overclockers, and the enthusiast sites like Ars are the mature ones, reversing the OEM tweaks for fair testing.
 

makken

Golden Member
Aug 28, 2004
1,476
0
76
And what if the cheating puts the OEM ahead by 20%?
Even after your automatic 10% deduction, the OEM will still be ahead.

You truly think Samsung(and HTC and LG) will stop cheating because of that?

There's a 4.4% increase in performance from the CPU optimization. Some of that gap is actually due to differences in compiler optimizations (V1 is tuned by the OEMs for performance, V2 is tuned for compatibility as it's still in beta).

...
 

grkM3

Golden Member
Jul 29, 2011
1,407
0
0
Guys the benchmarks are poorly coded and the OEM govenors are set to loosely to trigger all cores to hit full clocks through out the whole bench and that is why OEMS are putting these in place to make sure the bench runs all the cores without shutting a few off when it tests single threads.

Please someone anyone post one single piece of proof saying the snap dragon 800 can't run all 4 cores at 2.3ghz in regular use.

I'll bet anything you can see all 4 cores being used recording 4k video on the note 3 and you can record at 4k for 5 min at a time.
G
The a7 in the 5s also clocks at 1.7ghz not 1.2 like some are saying

You have to set these profiles as the apos are not coded to max out the cores through the whole bench and the kernel is always trying to adapt the soc for max battery life and power boost when it sees full load.

Make a benchmark that runs 4threads and demands full CPU calculations and the kernel will clock the s800 max clocks until it needs to throttle and these benchmarks are not pushing the chip long enough to throttle it.

I can run 4 back to back runs with the scores staying within 2% and the cell is not hot at all.it actually gets warmer using 4g and web surfing.

The s800 soc runs pretty cold and can easily sustain 2.3ghz on all 4 cores and the exynos 5 octa evolved can run all 8 at once along with pushing code to the GPUs
 
Last edited: