[ElReg] ARM tests: Intel flops on Android compatibility, Windows power

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Shivansps

Diamond Member
Sep 11, 2013
3,918
1,570
136
But how much performance will Intel need in order to compensate for the binary translation tax? If the figures in the article are correct, that's a pretty staggering margin.

Those translations are fast, remember MIPS does it too, the slow JZ4770 does it, performance was not an issue back them.

Still im seeing Chinese tablets with BT already, software support will be coming fast.
 

erunion

Senior member
Jan 20, 2013
765
0
0
But how much performance will Intel need in order to compensate for the binary translation tax? If the figures in the article are correct, that's a pretty staggering margin.

Intel needs devs to support x86. The translator is just a hack.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Those translations are fast, remember MIPS does it too, the slow JZ4770 does it, performance was not an issue back them.

I've never heard of any MIPS tablet performing binary translation. And JZ4770 has pretty low native performance to begin with..
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Okay, had no idea there was a third party solution for this. Interesting. Looks like it didn't get too as far as it could have though, with less than stellar compatibility for ARMv6 and no ARMv7 support..

But I don't think you can really just say performance or perf/W was no problem because you didn't happen to have performance problems with the apps you ran. Then again, even you said that one of the ones you tried was slow. JZ4770 is like Cortex-A5 level, slowing things down further is the last thing I'd want to do in a lot of real world scenarios..
 

Shivansps

Diamond Member
Sep 11, 2013
3,918
1,570
136
Actually JZ4770 was faster than a AllWinner A10 (A8), on cpu, marginally faster, but faster, it also used a lot less power. The Novo 7 Basic could playback a movie for almost 11 hours, with no wifi and low bright, the Novo 7 Aurora was lucky to hit 6 under the same conditions and they both had the same bat.

Mame4drod was the one that was slow, but that one actually emulates cpus to run mame roms, that petty much intensive, not even the A10 was able to run mame roms very well either. The translator allowed the JZ4770 to play ARM only games very well, most of the problem it was compatibility, for example, the translator could not translate executable bins, only the lib .so, if the app also used an external executable bin, it will not work, thats the only reason of why Skype never worked on mips. But if it ram, it ram well.
You cant really expect more of a A8-like single core cpu. BT is a hell lot faster than that, and with an advanced translator too.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Today Dell updated their CloverTrail+ based Venue 8 to run Android KitKat 4.4.2, and we found something quite terrible. All of a sudden our app, which has native x86 libraries, was running the translated ARM libraries instead. This never happened before (Venue 8 is our x86 test device). Suffice it to say that performance was much, much worse despite the fact that our native app source is much better optimized for ARM than x86.

After a lot of trial and error we found the problem: if the number of native library files in the armeabi-v7a and x86 directories don't match, the loader will put it in ARM mode. Maybe Intel (or Google?) did this because there were apps that had incomplete x86 library sets. But this is a really big mistake and something that needs to be changed ASAP.

We had a different number of ARM and x86 library files for a good reason: the ARM directory has different libraries for CPUs with and without NEON. At runtime the CPU features are queried to determine which of the libraries to load: ARMv7 with NEON, ARMv7 without NEON, or x86. This is a totally bog standard approach, that both Intel and Google have recommended in guides and presentations. It's actually necessary to select NEON libraries at runtime because Android doesn't treat ARMv7 with NEON as a separate ABI (something they really should have done - we can all blame nVidia for this). So I really doubt that we're the only app that has this problem.

And Galaxy Tab 3 10.1 is also running KitKat 4.4.2 now. This also explains reports we've gotten about very poor performance on this device, that we previously didn't understand.

So ARM says that 41% of apps are running emulated ARM code on x86. Here's the big question: is this the number of apps that simply don't have an x86 library directory at all, or are they somehow determining this at runtime? Because if it's the latter, it could mean that the number is artificially (and probably temporarily) worse than it should be due to this issue. But if it's the former, then right now the situation is actually even worse than they've said.

BTW, is anyone interested in some numbers comparing our app in x86 native vs the ARM emulation? The app is a Nintendo DS emulator and it's very CPU intensive, and runs a broad range of code.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Sorry, I missed this earlier..

Actually JZ4770 was faster than a AllWinner A10 (A8), on cpu, marginally faster, but faster, it also used a lot less power. The Novo 7 Basic could playback a movie for almost 11 hours, with no wifi and low bright, the Novo 7 Aurora was lucky to hit 6 under the same conditions and they both had the same bat.

What were you using to compare performance? If it's something like AnTuTu then that covers a ton more than CPU performance (it's also a terrible benchmark for a variety of reasons). For example, I wouldn't be amazed if JZ4770 has a faster GPU than A10. But that doesn't mean much regarding CPU. Video playback also doesn't say that much, since it depends so much on the decode hardware and software ecosystem.

GCW-Zero uses JZ4770, while Open Pandora uses Cortex-A8, and I've seen a lot of CPU-only tests that show the latter to be significantly faster than the former, with 800MHz MIPS vs 1GHz ARM.
 
Last edited:
Mar 10, 2006
11,715
2,012
126
Today Dell updated their CloverTrail+ based Venue 8 to run Android KitKat 4.4.2, and we found something quite terrible. All of a sudden our app, which has native x86 libraries, was running the translated ARM libraries instead. This never happened before (Venue 8 is our x86 test device). Suffice it to say that performance was much, much worse despite the fact that our native app source is much better optimized for ARM than x86.

After a lot of trial and error we found the problem: if the number of native library files in the armeabi-v7a and x86 directories don't match, the loader will put it in ARM mode. Maybe Intel (or Google?) did this because there were apps that had incomplete x86 library sets. But this is a really big mistake and something that needs to be changed ASAP.

We had a different number of ARM and x86 library files for a good reason: the ARM directory has different libraries for CPUs with and without NEON. At runtime the CPU features are queried to determine which of the libraries to load: ARMv7 with NEON, ARMv7 without NEON, or x86. This is a totally bog standard approach, that both Intel and Google have recommended in guides and presentations. It's actually necessary to select NEON libraries at runtime because Android doesn't treat ARMv7 with NEON as a separate ABI (something they really should have done - we can all blame nVidia for this). So I really doubt that we're the only app that has this problem.

And Galaxy Tab 3 10.1 is also running KitKat 4.4.2 now. This also explains reports we've gotten about very poor performance on this device, that we previously didn't understand.

So ARM says that 41% of apps are running emulated ARM code on x86. Here's the big question: is this the number of apps that simply don't have an x86 library directory at all, or are they somehow determining this at runtime? Because if it's the latter, it could mean that the number is artificially (and probably temporarily) worse than it should be due to this issue. But if it's the former, then right now the situation is actually even worse than they've said.

BTW, is anyone interested in some numbers comparing our app in x86 native vs the ARM emulation? The app is a Nintendo DS emulator and it's very CPU intensive, and runs a broad range of code.

Yes. Any information you can provide would be fantastic, Exophase.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
BTW, is anyone interested in some numbers comparing our app in x86 native vs the ARM emulation? The app is a Nintendo DS emulator and it's very CPU intensive, and runs a broad range of code.

This is very interesting, as console emulation has always been an issue when ran in x86. I havent tested your program altough I aknowledge it's very popular for Nintendo DS emulation. DesmuME runs good enough in my desktop for me, even tho I'm sure it might be over 10 times more inefficient at running the emulator than the ARM android version.

This actually reminds me I yet have to download the free version just to test my current phone's performance. Interesting test indeed.
 

Shivansps

Diamond Member
Sep 11, 2013
3,918
1,570
136
Sorry, I missed this earlier..



What were you using to compare performance? If it's something like AnTuTu then that covers a ton more than CPU performance (it's also a terrible benchmark for a variety of reasons). For example, I wouldn't be amazed if JZ4770 has a faster GPU than A10. But that doesn't mean much regarding CPU. Video playback also doesn't say that much, since it depends so much on the decode hardware and software ecosystem.

GCW-Zero uses JZ4770, while Open Pandora uses Cortex-A8, and I've seen a lot of CPU-only tests that show the latter to be significantly faster than the former, with 800MHz MIPS vs 1GHz ARM.

The GC860 on JZ4770 is slower than the mali-400 on the AlWinner chip, and by a lot, i was 100% sure of that at the time.

It was like 3 years ago, but i still have the quadrant numbers on hand.

JZ4770
TXFmM.png


A10
XaZIE.png


Antutu was also faster cpu and slower gpu.

BTW, JZ4770 ran on 1.2ghz on novo 7 after a firmware update at some point.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Okay, I have some results from testing native vs translated on the DS emulator. Also some figures from some ARM devices.

Code:
                                        Mario Kart   NSMB world 1   NSMB world 1
                                        countdown    map            in-game
Galaxy S3       4x1.4GHz A9             135          140            290
JXD S7300       2x1.2GHz A9             85           90             180
POV Mobii       2x1.0GHz A9 (Tegra 2)   52           52             135
Xperia Play     1x1.0GHz Scorpion       50           58             135
Venue 8 (X86)   2x2.0GHz Atom Z2580     62           56             150
Venue 8 (ARM)   2x2.0GHz Atom Z2580     26           27             70
These are in percentages (so 100 = full speed), taken without frameskip. There are three different versions of the code running:

1) ARMv7 NEON: CPU emulation with ARM recompiler, 2D/3D/Geometry emulation with hand-optimized NEON routines.
2) ARMv7 compatibility: CPU emulation with ARM recompiler, 2D/3D/Geometry emulation with C functions (this is only used on Tegra 2)
3) x86: CPU emulation with less advanced x86 recompiler, 2D/3D/Geometry emulation with C functions.

If available, 3D emulation will use 2-3 threads to divide the work between cores and 2D emulation will use 2 threads. The Venue 8 uses 3 threads (but the difference between this and 2 threads is negligible, hyperthreading doesn't really help here), XPeria Play uses 1, the POV Mobii and JXD S7300 use 2, and the Galaxy S3 uses 3.

Of the three scenes tested, the first two have heavy CPU and 3D loads, while the third one has much lighter CPU and 3D load (but a heavier 2D load.. which doesn't really impact things as much). The x86 version is hit hardest in the second test (NSMB world 1 map) because of the weaker CPU emulation vs all of the ARM versions.

But despite the fact that the x86 version is running much less optimized code than the ARM NEON version - which is what is the translation layer is using, because NEON support is reported in this mode - the x86 native version is a lot faster. About 2 to 2.4 times faster.

A better comparison may actually be between the JXD S7300 and the Venue 8 in ARM mode, since they're both running the same code (ARM NEON) and both have the same core count and thread configuration. The Venue 8's up to 2GHz Saltwell cores should ostensibly be decently faster than the S7300's 1.2GHz Cortex-A9 cores. Here the S7300 is 2.5 to 3.33 times faster.

To be fair though, our code - both the ARM recompiler and the NEON functions - may not exactly be representative. They have heavy register pressure (going to punish x86 w/8 registers, especially only 8 SSE and MMX regs) and the NEON instructions will often not gracefully map to SSE instructions. But this is what you get with something that is heavily optimized that also desperately needs as much performance as it can get. If it's running with ARM translation it's pretty much useless on current gen x86 Android hardware. I don't know how much Silvermont cores will help things but it'll probably still be pretty bad. Running the x86 native code is an unfortunate compromise, but it's at least good enough to be worth using (if you throw in frameskip), but doing it with the ARM translation is unacceptable.

UPDATE: We also tried testing the Venue 8 using the ARM w/o NEON library, and it was actually about 5-12% faster. Meaning that, at least in our case, it's better off translating less optimized scalar ARM code to scalar x86 code than more optimized NEON code to SSE! So I guess I was right, translating (our) NEON code to SSE is a mess...
 
Last edited:

xpea

Senior member
Feb 14, 2014
458
156
116
So what, there is a Android game that BT igp cant play compared to S800? BT igp was mostly criticrised for its performance on x86 Windows tablets, and thats something no ARM chip can even try.
android no, but it is if i chroot a linux root, and thats also add wine x86 support, there is also the desktop OGL support that is not common, and with good drivers, thats does not exist on arm.

hmm hmm T4 in surface 2 :hmm:
and TK1 is coming with full Linux support, OGL 4.4, OGL ES3.1, CUDA 6.0, DX12 and of course WIN RT too...
that's not all, 64bit Denver TK1 is already in 64bit android code source...