• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Blender on ARM

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Updated with Snapdragon 801. Had to underclock to 422 MHz because of throttling issues of my phone. Rather impressed that performance/clock is similar to Core 2! Not what I expected at all.
 
Updated with Snapdragon 801. Had to underclock to 422 MHz because of throttling issues of my phone. Rather impressed that performance/clock is similar to Core 2! Not what I expected at all.
It looks very wrong to me, there is no way S801 could be 3x more efficient than the Cortex-A15 in Exynos 5250 especially on FP. I guess it wasn't running at 422MHz 😉.
 
It looks very wrong to me, there is no way S801 could be 3x more efficient than the Cortex-A15 in Exynos 5250 especially on FP. I guess it wasn't running at 422MHz 😉.

It is anomalous. Unfortunately the app (No Frills) says it's at 422 MHz. Is there a better way to check CPU frequency on phones?

Thing is, when I turn the CPU frequency to 422 MHz, the phone UI is slow as molasses when Blender is running. When I turn it back to normal, the UI response returns to normal when Blender is running. Also the phone doesn't warm up at 422 Mhz whereas it does at normal frequencies.
 
Last edited:
I've been using TinyCore to monitor CPU core 0 frequency in the system bar. Really accurate and updates/polls quickly.
 
Updated with OMAP 4470. Removed Snapdragon result because I can't tell what the actual speed of the processor is. I've narrowed it down to the following more plausible numbers (40057 samples/s single core only):

1.267 GHz - 31615 samples/s/GHz
1.498 GHz - 26740 samples/s/GHz
1.574 GHz - 25449 samples/s/GHz
1.728 GHz - 23181 samples/s/GHz
 
The Exynos build was most likely compiled without any sort of feature set identification meaning VFP/NEON optimisations were left out. Every other processor on the list had some sort of SSE or AVX flag set when the binaries were being compiled.
VFP is used but NEON isn't. And in fact NEON is not very good for FP since it's not IEEE compliant and anyway can only be used for single precision. ARMv8 64-bit NEON fixes these issues at last.
 
BTW, if anyone is running Ubuntu 14.04 on a Haswell Core i3 or Core i7, I'd lIke to get your results too.
 
BTW, if anyone is running Ubuntu 14.04 on a Haswell Core i3 or Core i7, I'd lIke to get your results too.
You don't need results on 4770k with no HT, no OC and Fedora 19 Blender 2.68a? 😛

EDIT: Time 2:13.90
 
Last edited:
You don't need results on 4770k with no HT, no OC and Fedora 19 Blender 2.68a? 😛

EDIT: Time 2:13.90

That seems wrong. An i7 4770k stock shouldn't be slower than a Core i5 4570 @ 3.4 GHz turbo (time 2 minutes 3.7 seconds). Unless you really are running Fedora 19 with Blender 2.68a...
 
Last edited:
4C/4T 4770K

2.72b: 2:08.76
2.71: 2:01.09

EDIT: I guess this doesn't use AVX as my temps only went up to 45°C, while AVX heavy programs tend to go higher than 65°.
 
4C/4T 4770K

2.72b: 2:08.76
2.71: 2:01.09

EDIT: I guess this doesn't use AVX as my temps only went up to 45°C, while AVX heavy programs tend to go higher than 65°.

Could you try it with HT enabled? That'd be more informative (already have a 4C/4T Haswell result).
 
That would be a different type of comparison. The Povray compilation via ICC and GCC is just to show which one is better on FX. Still, I haven't tried the Open64 compiler that AMD is supporting (which I don't know why they don't just funnel support into GCC and LLVM instead). Also haven't tested Intel's MK libraries either since I'm more interested in rendering performance.

my mistake, I read your comment as a rebuttle to mine
 
4C/4T 4770K

2.72b: 2:08.76
2.71: 2:01.09

EDIT: I guess this doesn't use AVX as my temps only went up to 45°C, while AVX heavy programs tend to go higher than 65°.

You are correct. The official binary has these build flags:

-DWITH_FREESTYLE -pipe -fPIC -funsigned-char -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -fopenmp -DNDEBUG -O2 -msse -msse2 -DWITH_MOD_FLUID -DWITH_MOD_OCEANSIM -D__LITTLE_ENDIAN__ -DWITH_AUDASPACE -DWITH_AVI -DWITH_OPENNL -DHAVE_STDBOOL_H

I think there was some talk about adding AVX or AVX2 support eventually, maybe. You can always compile your own and see what happens. Thus far, I can't get my compiled binaries to run on Linux because they keep seg faulting.
 
Updated with custom compiled binaries. These are faster than the precompiled binaries. Other speed improvements would probably come from compiling the other libraries that Blender is linked to also (particularly python). And, of course, the major speed improvement would be using NVidia GPUs. I'd actually like someone to do that comparison.
 
Why not -march?

Refuses to compile with -march=core-avx2

It just stops at ~12% with an error.

What happens? Anyway I don't think it would bring a significant speedup.

Refuses to compile with NEON support (-mfpu=neon -funsafe-math-optimizations); Same as above: stops at ~12% with an error.

BTW, someone on reddit figured out how to set and keep clockspeeds on the Android devices (mainly disable /system/bin/mpdecision), so I've put up the Snapdragon 801 results again. Now working the Snapdragon S4 Pro.
 
Updated with Snapdragon S4 Pro (APQ8064) results. Why would Qualcomm design processors slower than the stock ARM ones?
 
Updated with Snapdragon S4 Pro (APQ8064) results. Why would Qualcomm design processors slower than the stock ARM ones?

It's not like Blender is very representative of common loads for an apps processor that's deployed almost entirely in mobile (phones and tablets). In the apps space here double precision FP is rarely used very heavily. I don't know what else Blender really stresses - I doubt it's just doubles or Saltwell wouldn't perform that well either - but I do know that it's an application that isn't popular for the platform in general.

That said, Krait 200 came in products about 10 months before Cortex-A15 did, so it's not like Qualcomm had it as an alternative. As far as Cortex-A9 goes, Krait 200 usually beats it, although not always. Especially when clocked more at its peak frequencies and not at the low frequency you have it clocked at. Krait 300 and 400 improve things a little further, but maybe not as much as Qualcomm would have hoped. The performance is really all over the place vs the competition. It does seem to have some pretty big glass jaws, like small L1 caches, a fairly high L1 dcache latency when the L0 cache is missed (and some loads will probably miss from it pretty frequently), a very high L2 cache latency, and some weird decoding penalties - see here:

http://www.7-cpu.com/cpu/Krait.html

Now that's just looking at performance, where power efficiency and area are also huge factors. So it's hard to judge it purely on that basis alone.

I think going with Cortex-A57 in their current flagship (810) is a way of conceding that their uarch has fallen too far behind. Not that adding 64-bit support is trivial, but if it came down to only that I think they could have managed it in time.
 
Last edited:
Back
Top