AnTuTu and Intel

Idontcare · Jul 11, 2013

Exophase said:
I more or less agree with this notion too. If a legitimate compiler optimization breaks a benchmark that doesn't necessarily make the optimization wrong, it makes the benchmark bad. If the compiler optimization does nothing but break that benchmark then the optimization is dishonest.

As far as I'm concerned you can't break a non-synthetic benchmark, and generally you can't even break a good synthetic benchmark. nbench is quite bad (some parts worse than others). It's also very very old. If the writers realized this part could be broken like this, which they should have but may not have, they may have also thought no compiler would bother because compilers were a fair bit more primitive back then.

110% agree, especially with the bolded part.

I remember well when SUN broke the SPEC benchmark by some compiler optimization that improved scores in just one test out of the suite by something like 10x (or was it even more).

So they scored the same in 14 out of 15 tests, but in that one test their score went from something like 8.7 to an astonishing 93 (IIRC), which then pulled the weighted average in such a way that it seemed like they were suddenly kicking Power6's ass in Spec.

Completely broke the purpose of the benchmark, and the compiler trick wasn't really applicable to real world software.

Everyone knew it, the subtest results were there for all to see...but that didn't stop SUN marketing from hyping and grandstaging their uber processor for its spec scores

(and yes, the irony never escaped me that while we were SUN's foundry, I knew their chips were crappy and delivered bottom-tier performance, the Atom's of the big-iron world

)

BLaber · Jul 11, 2013

Idontcare said:
Very true. I don't see too many people complaining that successive video card driver releases basically optimize the drivers by way of profiling specific games and rolling out optimizations that are absolutely game and hardware specific.

In the end these video driver profiles improve gameplay, improve the performance of the hardware when processing the given software, and the consumer gets more for their money.

If that is what compilers are doing, and in doing it the performance optimizations are being captured and represented in benchmark scores, then that is a good thing.

Optimizing for games & apps that people buy systems to run on makes sense, its worth the effort , but whats going on here is not even remotely comparable to games & apps optimization.

Abwx · Jul 11, 2013

A french saying , who did drink will drink again...

http://techreport.com/review/17732/...loy-questionable-3dmark-vantage-optimizations

beginner99 · Jul 11, 2013

CTho9305 said:
Great post, Exophase. A friend who works on an ARM design has been ranting for a while about Antutu and Geekbench and the quality of code currently coming out of JITs on ARM... I really hate the cross-ISA situation in terms of benchmarking. The worst part is that generally-credible reviewers don't caveat their articles enough, so people actually give credit to these results. It's worse than the 80s IPC comparisons across RISC/CISC because the macroscopic workload characteristics aren't even the same.

I agree that if a compiler gets optimized for s specific benchmark it's cheating and idiotic. However the thing with the JIT is ARMs problem and they must solve it. Because if the like it or not but the software is part of the whole stack and if your CPU and ISA rocks and is 10x times faster if properly optimized for it's still useless if no software does it or your your compiler sucks. I guess you get the point.

If not defending intel but on the other hand you can't always say Intel CPU is only faster due to the better compiler. Well, your free to make a compiler for your CPU if it is from scratch or by contributing to GCC.

galego · Jul 11, 2013

Intel17 said:
Because ARM chips win Geekbench despite evidence that Geekbench is intentionally crippled on Intel processors. From a recent interview with Silvermont's lead architect,

Q: I saw very interesting comparisons of Silvermont with Saltwell in the disclosure. What puzzles me, though, is that it is very difficult to get a read on CPU-limited performance of these low power micro-architectures. For example, a benchmark like "Geekbench" paints "Saltwell" in a rather unflattering light compared to the ARM contemporaries, but then you see benchmarks such as AnTuTu showing a 2C/4T Saltwell taking leadership positions againt a 4C/4T "Krait" or even Cortex A15 in integer and memory bandwidth, while even staying competitive on floating point! Could you help me to understand how Saltwell compares to the competition from what you have seen with more sophisticated measurements, and then from there I have a lot better context to think about Silvermont performance?

A: Geekbench is interesting: you look at the results, and the main “unflattering” results are in a few sub-benchmarks, where the IA version is set up to handle denorms precisely, and the input dataset is configured to be 100% denorms. This is not normal FP code, not a normal setup, and possibly not even a apples-to-apples comparison to how ARM is handling these numbers. So we view this as an anomaly. (The Geekbench developer agrees with us)
Saltwell trails A15 in raw IPC, but its higher frequency and threads are able to help compensate.
Saltwell trails Krait on very basic workloads like DMIPS, but on more complicated workloads Saltwell’s robust architecture will pull ahead.

Click to expand...

I bolded relevant parts. This Intel engineer claims that the Geekbench developer agrees. Therefore, it all of this is right, this would be a case where the benchmark needs to be adjusted/improved, not one where the benchmark has been deliberately cheated to cripple competence. Nobody cheats a benchmark and then admits that did. It is stupid.

AnandThenMan said:
I'm with SiliconWars on this one. OEMs don't give a flying bleep about Intel's benchmark tricks and games, so why even bother?

But history shows that OEM have been fooled before. From the Intel-AMD FTC complaint (verified in the settlement):

59. To the public, OEMs, ISVs, and benchmarking organizations, the slower performance of non-Intel CPUs on Intel-compiled software applications appeared to be caused by the non-Intel CPUs rather than the Intel software. Intel failed to disclose the effects of the changes it made to its software in or about 2003 and later to its customers or the public. Intel also disseminated false or misleading documentation about its compiler and libraries. Intel represented to ISVs, OEMs, benchmarking organizations, and the public that programs inherently performed better on Intel CPUs than on competing CPUs. In truth and in fact, many differences were due largely or entirely to the Intel software. Intel’s misleading or false statements and omissions about the performance of its software were material to ISVs, OEMs, benchmarking organizations, and the public in their purchase or use of CPUs. Therefore, Intel’s representations that programs inherently performed better on Intel CPUs than on competing CPUs were, and are, false or misleading. Intel’s failure to disclose that the differences were due largely to the Intel software, in light of the representations made, was, and is, a deceptive practice. Moreover, those misrepresentations and omissions were likely to harm the reputation of other x86 CPUs companies, and harmed competition.

I find reasonable that some mobile OEMs could be cheated again.

Idontcare said:
Very true. I don't see too many people complaining that successive video card driver releases basically optimize the drivers by way of profiling specific games and rolling out optimizations that are absolutely game and hardware specific.

In the end these video driver profiles improve gameplay, improve the performance of the hardware when processing the given software, and the consumer gets more for their money.

I am one of those complaining. Yes, those optimizations are improvements, but you are ignoring the true problem.

The true problem is that optimization is made for some few games (for instance half dozen of them) and then, curiously, those few good-performing games are used in reviews ever and ever. The final user gets the false impression of that is the general performance of the hardware, when he is not aware that that performance is not achievable with the 99% of games for which no improvement was made in the driver.

SlimFan · Jul 11, 2013

It's not clear to me at all that phone OEMs know anything about performance, benchmarks, or what's important to end-user experience. Quad core phones are pretty much useless in today's market with today's workloads. The only things that the additional cores make faster are the benchmarks that we're talking about here.

ARM has been selling Dhrystone performance for a very long time, and DMIPS/MHz, DMIPS/mW, and total DMIPS score (reached by multiplying by the number of cores, even though it's a single threaded benchmark). This is what much of the ARM ecosystem has been using to make decisions. These parts typically went into a closed system (before App stores) that made it almost impossible to run 3rd party software to figure out how fast they were. This was all wonderful, because nobody really cared. As long as the preloaded software ran at an acceptable performance level, all was well.

In today's market, you have app stores where arbitrary code is now run on the products. Suddenly you can download new code that may or may not run very well. The inherent performance of the product is now visible to the end user in a brand new way.

This is now a new world for the OEMs. I wouldn't assume that they are that much more evolved about performance, benchmarks, and what portions of the platform matter to the end user experience. This is not the PC/server space where you've had Intel, AMD, IBM, Sun, etc., fighting over benchmarks, multiple versions of SPEC, and virtually limitless software that can be run on each and every platform.

Now the only thing that anyone can run are these toy benchmarks with crazy results. End users and reviewers run them, and this is now publicly seen by people looking to make a purchase. An OEM that ignored these would be taking a big risk and either assuming that their customers are smarter than the reviewers, or that none of their customers would ever read these reviews.

mrmt · Jul 11, 2013

galego said:
But history shows that OEM have been fooled before. From the Intel-AMD FTC settlement:

You are not quoting the settlement, you are quoting the complaint.

beginner99 · Jul 11, 2013

mrmt said:
You are not quoting the settlement, you are quoting the complaint.

lol. epic fail. but what else is to be expected from current troll number one.

AnandThenMan · Jul 11, 2013

I just don't think the phone makers give a hairy rats *** about a bench like this, and neither does the target buyer. The form factor, screen, "must have" appeal etc. etc. are much higher on the list. If you ask just about any smart phone owner, hey did you see the latest AnTuTu bench? Your next phone has to be the one that does the best on it, they will not understand a word you're saying let alone even begin to care.

mrmt · Jul 11, 2013

AnandThenMan said:
I just don't think the phone makers give a hairy rats *** about a bench like this, and neither does the target buyer. The form factor, screen, "must have" appeal etc. etc. are much higher on the list. If you ask just about any smart phone owner, hey did you see the latest AnTuTu bench? Your next phone has to be the one that does the best on it, they will not understand a word you're saying let alone even begin to care.

How much would it cost to AMD/ARM to develop their own compiler/benchmarks and why didn't do this before?

sontin · Jul 11, 2013

I guess nobody takes a benchmark from an IHV serious...

simboss · Jul 11, 2013

mrmt said:
How much would it cost to AMD/ARM to develop their own compiler/benchmarks and why didn't do this before?

ARM and AMD do develop their own compilers

The trick (well, one of them) from AnTuTu and/or Intel is that they use a compiler that no one else can or want to use on Android, whereas the other benchmarks use GCC, which is the officially supported compiler for android for ARM and x86.

As for benchmark, how much credibility would you give to a benchmark developed by ARM, AMD or Intel? :hmm:
If they were open source, it would still be vaguely reliable, but closed-source benchmark would be pretty much useless.
I have even done one myself

Code:

If (my_arch) score = 100000000000; 
else score = 1; // let's not give them 0

AnandThenMan · Jul 11, 2013

Genx87 · Jul 11, 2013

Idontcare said:
Very true. I don't see too many people complaining that successive video card driver releases basically optimize the drivers by way of profiling specific games and rolling out optimizations that are absolutely game and hardware specific.

In the end these video driver profiles improve gameplay, improve the performance of the hardware when processing the given software, and the consumer gets more for their money.

If that is what compilers are doing, and in doing it the performance optimizations are being captured and represented in benchmark scores, then that is a good thing.

Oh you werent around on the video card forum about a decade ago then?

Any optimization was considered cheating. I agree with you if the optimization can be used in real world applications then it is legit. But if it is a trick that only works within the benchmark then it is crap and misrepresenting the capabilities of the processor.

AnandThenMan · Jul 11, 2013

If there a more slippery slope than benchmarking? Probably not. The responsibility to keep things as fair as possible ends up falling on the review sites, they are the last line of defense against vendors trying to cheat their way to the top. And you can be sure if the respective vendors can, they will.

SlimFan · Jul 11, 2013

I think you're right that these benchmarks shouldn't be involved in any phone makers or phone buyer's decision process. But I think you're wrong in assuming they don't. Review sights haven't really caught onto the worthlessness of the benchmarks. No, nobody says "did you see the latest AnTuTu score" but if a phone comes out with bad scores relative to others people say "yeah, that phone sucks."

Why else is there a phone SOC arms race? Because nobody cares? How do you think OEMs measure who's "winning" the race?

Nothingness · Jul 11, 2013

SlimFan said:
No, nobody says "did you see the latest AnTuTu score"

Did you miss the ABI Research report or the recent leak of Bay Trail AnTuTu score?

krumme · Jul 11, 2013

Intel is acting like they did in the old days fighting amd. The difference this time is they are not the big gorilla but a small player, and most sites for phones is not dependant on info and good relationship with Intel.

Its not like a Intel power engineer just happen to come by gsmarena or t3 with a voltmeter, sunspider and ohms law. What do they care? They just need the newest pink fast and will eat the pr from Samsung or Apple.

The end result might even backfire and the result be some arm bend benchmarks dominating not showing the potential of the new Atom core. After all its a little in apple and samsung interest to show a few of the customers that have already bougt their phones, it was the best choice, as nobody can tell the difference anyway.

Intels best bet to pull this antutu is after all the oem have more important problems to fight. Like what looks more and more like a stagnating high end phone market.

Kidster3001 · Jul 11, 2013

While I agree that most benchmarks favor one platform or another, I think the OP is missing the point about compilers.

Compilers aren't there to convert your source code directly into machine code in the same sequence so that the same steps run in the same order on all platforms. The whole idea behind compilers is to create the most efficient code possible for the target platform. If a given compiler can figure out how to make work easier, for example if you are dealing with loops and constants then good for the compiler.

If you create and initialize an array would you prefer your compiler to use a loop to set the contents or would you prefer your compiler to call memset() in libc? The second compiler here is going to generate code that performs much faster.

Why not use the compiler that generates the best code for each platform? If you don't then you're really just benchmarking the compiler.

sefsefsefsef · Jul 11, 2013

Kidster3001 said:
Why not use the compiler that generates the best code for each platform? If you don't then you're really just benchmarking the compiler.

This is actually OP's complaint. AnTuTu on Intel is just benchmarking ICC, not AnTuTu.

jfpoole · Jul 11, 2013

galego said:
I bolded relevant parts. This Intel engineer claims that the Geekbench developer agrees. Therefore, it all of this is right, this would be a case where the benchmark needs to be adjusted/improved, not one where the benchmark has been deliberately cheated to cripple competence. Nobody cheats a benchmark and then admits that did. It is stupid.

John from Primate Labs here (the company behind Geekbench).

I wanted to provide some details about what's going on with the floating point workloads the Silvermont architect referenced. Two of the Geekbench 2 floating point workloads (Sharpen Image and Blur Image) have a fencepost error. This error causes the workloads to read uninitialized memory, which can contain denorms (depending on the platform). This causes a massive drop in performance, and isn't representative of real-world performance.

We only found out about this issue a couple of months ago. Given that Geekbench 3 will be out in August, and fixing the issue in Geekbench 2 would break the ability to compare Geekbench 2 scores, we made the call not to fix the issue in Geekbench 2.

If you've got any questions about this (or about anything Geekbench) please let me know and I'd be happy to answer them. My email address is john at primatelabs dot com if you'd prefer to get in touch that way.

KompuKare · Jul 11, 2013

sefsefsefsef said:
This is actually OP's complaint. AnTuTu on Intel is just benchmarking ICC, not AnTuTu.

That, plus as Exophase said in the OP the kind of optimizations which ICC has done to AnTuTu is not something which they can do without knowledge of the benchmark. In other words, Intel seems to specifically targeting this benchmark to look good. Seems like a lot of trouble but Intel do have past form for this kind of thing. So yes, the Intel compiler is genuinely a very good compiler but a certain percentage of Intel compiler budget seems to be set aside for these kind of shenanigans.

Exophase said:
In this case I'm sure Intel could claim that they're performing a legitimate optimization. Frankly, I doubt it; this kind of optimization would be difficult to recognize and apply in generic code. It'd also be for little benefit, because I've never seen someone use code like this to set or clear huge sets of bits. That part is kind of the catch, because this optimization would make the code slower if the run lengths weren't sufficiently large. In nbench's case they are, but there's no way the compiler could have known that on its own.

What's more, this optimization wasn't present in ICC until a recent release. Somehow I don't think that they just now discovered it has general purpose value. More likely case is that they discovered is they could manipulate AnTuTu's scores. Seems to coincide well with this third-party report appearing showing how amazing Atom's perf/W is - using nothing but AnTuTu. Or the leaked scores seen for CloverTrail+ and now BayTrail that are AnTuTu. Is this really a coincidence?

TuxDave · Jul 11, 2013

jfpoole said:
John from Primate Labs here (the company behind Geekbench).

I wanted to provide some details about what's going on with the floating point workloads the Silvermont architect referenced. Two of the Geekbench 2 floating point workloads (Sharpen Image and Blur Image) have a fencepost error. This error causes the workloads to read uninitialized memory, which can contain denorms (depending on the platform). This causes a massive drop in performance, and isn't representative of real-world performance.

We only found out about this issue a couple of months ago. Given that Geekbench 3 will be out in August, and fixing the issue in Geekbench 2 would break the ability to compare Geekbench 2 scores, we made the call not to fix the issue in Geekbench 2.

If you've got any questions about this (or about anything Geekbench) please let me know and I'd be happy to answer them. My email address is john at primatelabs dot com if you'd prefer to get in touch that way.

Nice of you to chime in. Thanks for your comments.

Nothingness · Jul 11, 2013

Kidster3001 said:
If you create and initialize an array would you prefer your compiler to use a loop to set the contents or would you prefer your compiler to call memset() in libc? The second compiler here is going to generate code that performs much faster.

You'll be happy then to know that recent versions of gcc just do that to one part of the Stream benchmarks: it's changed into a call to memcpy.

The problem here is that the code that icc transforms into a form of memset certainly doesn't look like a memset. The transformation is really impressive and is probably useless outside of that particular loop in that particular benchmark.

Exophase also mentions the optimization in icc only happened recently and conveniently just before AnTuTu v3 release. Call me paranoid if you want...

My experience with icc is the same as many people I know: if your code can't be vectorized icc isn't significantly faster than gcc, if at all. Ah well except if your code looks like a benchmark for which Intel already has tweaked icc

StrangerGuy · Jul 11, 2013

krumme said:
Intel is acting like they did in the old days fighting amd. The difference this time is they are not the big gorilla but a small player, and most sites for phones is not dependant on info and good relationship with Intel.

Its not like a Intel power engineer just happen to come by gsmarena or t3 with a voltmeter, sunspider and ohms law. What do they care? They just need the newest pink fast and will eat the pr from Samsung or Apple.

The end result might even backfire and the result be some arm bend benchmarks dominating not showing the potential of the new Atom core. After all its a little in apple and samsung interest to show a few of the customers that have already bougt their phones, it was the best choice, as nobody can tell the difference anyway.

Intels best bet to pull this antutu is after all the oem have more important problems to fight. Like what looks more and more like a stagnating high end phone market.

I'm sure ARM licensees are going to bend over their asses to Intel over doctored benchmarks so they can join the line of Intel slaves like Asus to suffer stuff like the RDRAM fiasco.

AnTuTu and Intel

Elite Member

Member

Lifer

Diamond Member

Golden Member

Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Member

Diamond Member

Lifer

Diamond Member

Member

Platinum Member

Diamond Member

Junior Member

Senior member

Member

Golden Member

Lifer

Platinum Member

Diamond Member