Blender on ARM

jhu · Dec 27, 2014

Updated with Snapdragon 801. Had to underclock to 422 MHz because of throttling issues of my phone. Rather impressed that performance/clock is similar to Core 2! Not what I expected at all.

Nothingness · Dec 27, 2014

jhu said:
Updated with Snapdragon 801. Had to underclock to 422 MHz because of throttling issues of my phone. Rather impressed that performance/clock is similar to Core 2! Not what I expected at all.

It looks very wrong to me, there is no way S801 could be 3x more efficient than the Cortex-A15 in Exynos 5250 especially on FP. I guess it wasn't running at 422MHz 😉.

jhu · Dec 27, 2014

Nothingness said:
It looks very wrong to me, there is no way S801 could be 3x more efficient than the Cortex-A15 in Exynos 5250 especially on FP. I guess it wasn't running at 422MHz 😉.

It is anomalous. Unfortunately the app (No Frills) says it's at 422 MHz. Is there a better way to check CPU frequency on phones?

Thing is, when I turn the CPU frequency to 422 MHz, the phone UI is slow as molasses when Blender is running. When I turn it back to normal, the UI response returns to normal when Blender is running. Also the phone doesn't warm up at 422 Mhz whereas it does at normal frequencies.

Nothingness · Dec 27, 2014

jhu said:
It is anomalous. Unfortunately the app (No Frills) says it's at 422 MHz. Is there a better way to check CPU frequency on phones?

Getting frequency under Android seems to be a pain. Did you try CPU Z for Android?

wilds · Dec 27, 2014

I've been using TinyCore to monitor CPU core 0 frequency in the system bar. Really accurate and updates/polls quickly.

jhu · Dec 28, 2014

Updated with OMAP 4470. Removed Snapdragon result because I can't tell what the actual speed of the processor is. I've narrowed it down to the following more plausible numbers (40057 samples/s single core only):

1.267 GHz - 31615 samples/s/GHz
1.498 GHz - 26740 samples/s/GHz
1.574 GHz - 25449 samples/s/GHz
1.728 GHz - 23181 samples/s/GHz

Nothingness · Dec 28, 2014

greatnoob said:
The Exynos build was most likely compiled without any sort of feature set identification meaning VFP/NEON optimisations were left out. Every other processor on the list had some sort of SSE or AVX flag set when the binaries were being compiled.

VFP is used but NEON isn't. And in fact NEON is not very good for FP since it's not IEEE compliant and anyway can only be used for single precision. ARMv8 64-bit NEON fixes these issues at last.

jhu · Dec 28, 2014

BTW, if anyone is running Ubuntu 14.04 on a Haswell Core i3 or Core i7, I'd lIke to get your results too.

Nothingness · Dec 28, 2014

jhu said:
BTW, if anyone is running Ubuntu 14.04 on a Haswell Core i3 or Core i7, I'd lIke to get your results too.

You don't need results on 4770k with no HT, no OC and Fedora 19 Blender 2.68a? 😛

EDIT: Time 2:13.90

jhu · Dec 28, 2014

Nothingness said:
You don't need results on 4770k with no HT, no OC and Fedora 19 Blender 2.68a? 😛

EDIT: Time 2:13.90

That seems wrong. An i7 4770k stock shouldn't be slower than a Core i5 4570 @ 3.4 GHz turbo (time 2 minutes 3.7 seconds). Unless you really are running Fedora 19 with Blender 2.68a...

Nothingness · Dec 28, 2014

jhu said:
That seems wrong. An i7 4770k stock shouldn't be slower than a Core i5 4570 @ 3.4 GHz turbo (time 2 minutes 3.7 seconds). Unless you really are running Fedora 19 with Blender 2.68a...

My post clearly states my configuration 🙂

jhu · Dec 28, 2014

Nothingness said:
My post clearly states my configuration 🙂

You wouldn't mind downloading 2.71 and testing that, would you?

Nothingness · Dec 28, 2014

jhu said:
You wouldn't mind downloading 2.71 and testing that, would you?

Hmm, do you really insist on 2.71? 2.72b is the latest official one.

jhu · Dec 28, 2014

Nothingness said:
Hmm, do you really insist on 2.71? 2.72b is the latest official one.

I do because I can't get 2.72b to build on FreeBSD, and then I would also have to retest every machine again.

Nothingness · Dec 28, 2014

4C/4T 4770K

2.72b: 2:08.76
2.71: 2:01.09

EDIT: I guess this doesn't use AVX as my temps only went up to 45°C, while AVX heavy programs tend to go higher than 65°.

jhu · Dec 28, 2014

Nothingness said:
4C/4T 4770K

2.72b: 2:08.76
2.71: 2:01.09

EDIT: I guess this doesn't use AVX as my temps only went up to 45°C, while AVX heavy programs tend to go higher than 65°.

Could you try it with HT enabled? That'd be more informative (already have a 4C/4T Haswell result).

Nothingness · Dec 28, 2014

With HT enabled 1:31.28 for 2.71.

Note my RAM is OC to 2400.

jhu · Dec 28, 2014

Nothingness said:
With HT enabled 1:31.28 for 2.71.

Note my RAM is OC to 2400.

Thanks, much appreciated. It's about in line with what I expected.

By my calculations, a stock 5960X should do this in about 56 seconds!

soccerballtux · Dec 28, 2014

jhu said:
That would be a different type of comparison. The Povray compilation via ICC and GCC is just to show which one is better on FX. Still, I haven't tried the Open64 compiler that AMD is supporting (which I don't know why they don't just funnel support into GCC and LLVM instead). Also haven't tested Intel's MK libraries either since I'm more interested in rendering performance.

my mistake, I read your comment as a rebuttle to mine

jhu · Dec 29, 2014

Nothingness said:
4C/4T 4770K

2.72b: 2:08.76
2.71: 2:01.09

EDIT: I guess this doesn't use AVX as my temps only went up to 45°C, while AVX heavy programs tend to go higher than 65°.

You are correct. The official binary has these build flags:

-DWITH_FREESTYLE -pipe -fPIC -funsigned-char -fno-strict-aliasing -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE -fopenmp -DNDEBUG -O2 -msse -msse2 -DWITH_MOD_FLUID -DWITH_MOD_OCEANSIM -D__LITTLE_ENDIAN__ -DWITH_AUDASPACE -DWITH_AVI -DWITH_OPENNL -DHAVE_STDBOOL_H

I think there was some talk about adding AVX or AVX2 support eventually, maybe. You can always compile your own and see what happens. Thus far, I can't get my compiled binaries to run on Linux because they keep seg faulting.

jhu · Dec 31, 2014

Updated with custom compiled binaries. These are faster than the precompiled binaries. Other speed improvements would probably come from compiling the other libraries that Blender is linked to also (particularly python). And, of course, the major speed improvement would be using NVidia GPUs. I'd actually like someone to do that comparison.

Nothingness · Jan 1, 2015

jhu said:
03: Ubuntu 14.04, gcc 4.8 -mtune=core-avx2

Why not -march?

Unable to compile with ARM versions with NEON support. Oh well.

What happens? Anyway I don't think it would bring a significant speedup.

jhu · Jan 1, 2015

Nothingness said:
Why not -march?

Refuses to compile with -march=core-avx2

It just stops at ~12% with an error.

Nothingness said:
What happens? Anyway I don't think it would bring a significant speedup.

Refuses to compile with NEON support (-mfpu=neon -funsafe-math-optimizations); Same as above: stops at ~12% with an error.

BTW, someone on reddit figured out how to set and keep clockspeeds on the Android devices (mainly disable /system/bin/mpdecision), so I've put up the Snapdragon 801 results again. Now working the Snapdragon S4 Pro.

jhu · Jan 1, 2015

Updated with Snapdragon S4 Pro (APQ8064) results. Why would Qualcomm design processors slower than the stock ARM ones?

Exophase · Jan 1, 2015

jhu said:
Updated with Snapdragon S4 Pro (APQ8064) results. Why would Qualcomm design processors slower than the stock ARM ones?

It's not like Blender is very representative of common loads for an apps processor that's deployed almost entirely in mobile (phones and tablets). In the apps space here double precision FP is rarely used very heavily. I don't know what else Blender really stresses - I doubt it's just doubles or Saltwell wouldn't perform that well either - but I do know that it's an application that isn't popular for the platform in general.

That said, Krait 200 came in products about 10 months before Cortex-A15 did, so it's not like Qualcomm had it as an alternative. As far as Cortex-A9 goes, Krait 200 usually beats it, although not always. Especially when clocked more at its peak frequencies and not at the low frequency you have it clocked at. Krait 300 and 400 improve things a little further, but maybe not as much as Qualcomm would have hoped. The performance is really all over the place vs the competition. It does seem to have some pretty big glass jaws, like small L1 caches, a fairly high L1 dcache latency when the L0 cache is missed (and some loads will probably miss from it pretty frequently), a very high L2 cache latency, and some weird decoding penalties - see here:

http://www.7-cpu.com/cpu/Krait.html

Now that's just looking at performance, where power efficiency and area are also huge factors. So it's hard to judge it purely on that basis alone.

I think going with Cortex-A57 in their current flagship (810) is a way of conceding that their uarch has fallen too far behind. Not that adding 64-bit support is trivial, but if it came down to only that I think they could have managed it in time.

Blender on ARM

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Lifer

Lifer

Lifer

Diamond Member

Lifer

Lifer

Diamond Member