Povray on ARM

jhu

Lifer
Oct 10, 1999
11,918
9
81
Finally figured out how to compile 3.7 on ARM!

Povray 3.7
Code:
FX 8350 (4m/8t, 4.1 GHz turbo):        1985.83 pps ; 121.09 pps/module/GHz
Core i5 3317U (Ivy Bridge)
  (2c/4t, 1.7 GHz, 2.4 GHz turbo):      573.68 pps ; 119.52 pps/core/GHz
Core i7 2600 (Sanyd Bridge)
  (4c/8t, 3.4 GHz, 3.5 GHz turbo):     1618.48 pps ; 115.61 pps/core/GHz
Core i5 4570 (Haswell)
  (4c/4t, 3.2 GHz, 3.4 GHz turbo):     1540.24 pps ; 113.25 pps/core/GHz
Core i5 660 (Nehalem)
  (2c/4t, 3.33 GHz, 3.46 GHz turbo):    694.34 pps ; 100.34 pps/core/GHz
Core i5 3317U (Ivy Bridge)
  (2c/2t, 1.7 GHz, 2.4 GHz turbo):      474.19 pps ;  98.79 pps/core/GHz
Core i5 2400S (Sandy Bridge)
  (4c/4t, 2.5 GHz, 2.6 GHz turbo):      991.64 pps ;  95.35 pps/core/GHz
Core i7 2600 (sandy Bridge)
  (4c/4t, 3.4 GHz, 3.5 GHz turbo):     1318.28 pps ;  94.16 pps/core/GHz
Core i5 660 (Nehalem)
  (2c/2t, 3.33 GHz, 3.46 GHz turbo):    561.51 pps ;  81.14 pps/core/GHz
Core 2 Duo E8400 (2c/2t 3.0 GHz):       462.46 pps ;  77.08 pps/core/GHz
Phenom II x6, 1090T 
  (6c/6t, 3.2 GHz):                    1388.11 pps ;  72.30 pps/core/GHz
FX 8350 (4m/1t, 4.2 GHz turbo):         292.03 pps ;  69.53 pps/module/GHz 
AMD E-450 (2c/2t, 1.6 GHz):             159.86 pps ;  49.95 pps/core/GHz
Exynos 5250 (1 thread, 1.7 GHz):         83.95 pps ;  49.38 pps/GHz
Exynos 5250 (2c/2t, 1.7 GHz):           160.62 pps ;  47.24 pps/core/GHz
Snapdragon 801, MSM8974AB
  (1t, 0.96 GHz):                        35.34 pps ;  36.81 pps/GHz
PowerPC 970MP 
  (G5, 4c/4t, 2.5 GHz):                 357.72 pps ;  35.77 pps/core/GHz
PowerPC 7400 (G4, 0.47 GHz):             16.20 pps ;  34.71 pps/core/GHz
Pentium 4 HT (1c/2t, 3.2 GHz):          105.92 pps ;  33.10 pps/core/GHz
Pentium 4m (1c/1t, 1.5 GHz):             48.99 pps ;  32.66 pps/core/GHz
S4 Pro APQ8064 (1t, 1.026 GHz):          31.58 pps ;  30.78 pps/GHz
Pentium 4 HT (1c/1t, 3.2 GHz):           86.04 pps ;  26.89 pps/core/GHz
Atom N270 (1c/2t, 1.6 GHz):              42.05 pps ;  26.28 pps/core/GHz
OMAP4430 (2c/2t, 1.0 GHz):               50.73 pps ;  25.37 pps/core/GHz
OMAP4470 (2c/2t, 1.5 GHz):               72.88 pps ;  24.29 pps/core/GHz
Exynos 4210 (2c/2t, 1.2 GHz):            48.15 pps ;  20.06 pps/core/GHz
S4 Pro APQ8064 (4c/4t, 1.5 GHz):        110.76 pps ;  ????? pps/core/GHz
Snapdragon 801, MSM8974AB 
  (4c/4t, 2.5 GHz):                     177.66 pps ;  ????? pps/core/GHz

The systems:

FX8350 - FreeBSD 10, gcc 4.8 / gcc 4.9
Core i5 4570 - Ubuntu 14.04, icc 14.0.3
Core i5 3317U - Ubuntu 14.04, icc 14.0.3
Core i5 2400S - Ubuntu 14.04, icc 14.0.3
Phenom IIx6 1090T - Ubuntu 14.04, gcc 4.8
AMD E-450 - Ubuntu 12.04, gcc 4.6
Pentium 4 HT - Debian 7, gcc 4.7
Pentium 4m - Debian 7, gcc 4.7
Atom N270 - Ubuntu 12.04, icc 12

PowerPC 4700 - Debian 7, gcc 4.6

Exynos 5250 (Chromebook) - Ubuntu 14.04, gcc 4.8
Snapdragon 801 MSM8974AB (HTC One M8) - Android 4.4, gcc 4.8
S4 Pro APQ8064 (Nexus 4) - Android 4.4, gcc 4.6
OMAP 4430/4470 (Nook Tablet, Nook HD+) - Android 4.3, gcc 4.6
 
Last edited:
Dec 30, 2004
12,554
2
76
if you wanted to calculate these in a normalized clock/clock score for the lazy brain dead among us that would be appreciated. This is a great thread.
 
Last edited:

jhu

Lifer
Oct 10, 1999
11,918
9
81
Updated with pps and pps/GHz. Looks like Exynos is faster than first generation Atom per GHz. I wonder how the new Medfield performs?
 
Dec 30, 2004
12,554
2
76
This is a cool thread.
It'll be interesting to see how the hard abi, and neon, affect things.
Also, A15 out now which is ~25% faster clock/clock.
Pretty soon we won't need Intel...
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
Povray 3.6 here are the numbers:

Exynos 4210 @ 1.2 GHz (Samsung Galaxy SII, ARM Cortex A9), Debian 6.0,
gcc 4.4 -mfloat-abi=softfp -mcpu=cortex-a9
Parse Time: 0 hours 0 minutes 4 seconds (4 seconds)
Photon Time: 0 hours 1 minutes 43 seconds (103 seconds)
Render Time: 1 hours 49 minutes 59 seconds (6599 seconds)
Total Time: 1 hours 51 minutes 46 seconds (6706 seconds)

For comparison

Athlon II x4 (K10h), 2.8 GHz, one thread
gcc 4.4.5, -march=barcelona, -ffast-math -unroll-loops
Parse Time: 0 hours 0 minutes 1 seconds (1 seconds)
Photon Time: 0 hours 0 minutes 16 seconds (16 seconds)
Render Time: 0 hours 13 minutes 23 seconds (803 seconds)
Total Time: 0 hours 13 minutes 40 seconds (820 seconds)


^ neat :)
Also holy cow that Athon II kicks arse compaired to the others.
Now do one with a Ivy Bridge :D
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
^ neat :)
Also holy cow that Athon II kicks arse compaired to the others.
Now do one with a Ivy Bridge :D

They're actually all single thread. What's more interesting is pps/GHz, and you'll see even the lowly celeron beat K10h. Ouch!
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
This is a cool thread.
It'll be interesting to see how the hard abi, and neon, affect things.
Also, A15 out now which is ~25% faster clock/clock.
Pretty soon we won't need Intel...

I wouldn't count on that. These guys did a comparison of armel vs. armhf on the Raspberry Pi. The numbers are better for armhf, but the ranges vary quite a bit.

Hmm, looks like Debian armhfs beta installer is downloadable. I don't know how I missed that. I'll report back later!
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
A "ARM Cortex A9" @4ghz would give a Celeron 220 a run for its money :)

I wonder what its power usage would be like?
How easy would it be to overclock one to around 4ghz?

Does anyone feel like pulling apart a mobile phone,
and slapping a heatsink onto the chip and trying?


edit:

http://www.engadget.com/2012/05/03/tsmc-ramps-28nm-arm-cortex-a9-chip-to-3-1ghz/

Lab workers at Taiwan's semiconductor giant have successfully run a dual-core ARM Cortex-A9 processor at 3.1GHz under normal conditions.
Hmmm looks like they "could" make Cortex-A9's around 3.1ghz if they wanted to.
It wouldnt be a "speedy" chip (still slower than celeron 220, going by the pps/GHz above),
but it might be enough to run windows 8 and still have a semi enjoyable experiance.

(weird its max rated at 2,000mhz, but they can get them running stable at 3.1ghz in labs)

I wonder when we ll see the first ARM chips ment for PC users.
 
Last edited:

jhu

Lifer
Oct 10, 1999
11,918
9
81
Not really. That's an incredibly meaningless metric on its own.

It actually isn't. The Celeron 220 is Core 2 based. And that's about the time when Intel started to overtake AMD in performance. The pps/GHz is a decent proxy for IPC (Core 2 is nearly 2.5x faster per clock vs. Netburst, ouch!) and the number is, not surprisingly, fairly consistent across the range of a processor family so it's easy to see what pps numbers a processor would get at a certain frequency.
 
Last edited:

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
It has absolutely no practical meaning by itself. It is only when combined with clockspeed and power usage that it has any meaning at all.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
It has absolutely no practical meaning by itself. It is only when combined with clockspeed and power usage that it has any meaning at all.

Uhm, data, in general, has no practical meaning without context, but you also forgot price, which is more important (eg. Oracle touting it's TPS with TCO numbers on different hardware, etc.). With pps/GHz, it's easy to figure out how one particular chip will perform compared with another particular chip, then compare prices to see if the performance gain (if any) is worth the money to upgrade.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
I assume POVRay 3.7 tests the FPU more than anything in these chips?

The Exynos, N270, P4m, and PPC750 all have 64 bit FPUs, with the K10h and Celeron 220 having 128 bit yes? From even such a small amount of data, you can interpolate other processors into it, simply based on their FPU widths. I assume any AVX equipped i series or AMD BD module to be about twice the pps/GHz.
 
Last edited:

jhu

Lifer
Oct 10, 1999
11,918
9
81
I assume POVRay 3.7 tests the FPU more than anything in these chips?

The Exynos, N270, P4m, and PPC750 all have 64 bit FPUs, with the K10h and Celeron 220 having 128 bit yes? From even such a small amount of data, you can interpolate other processors into it, simply based on their FPU widths. I assume any AVX equipped i series or AMD BD module to be about twice the pps/GHz.

Povray 3.7 is multithreaded, 3.6 is only single threaded. The program uses the SIMD registers as scalar entities. Looking through the code, there's not much that can be vectorized in critical paths and AVX2 probably will not be of much benefit for this program. Watching the program compile, the only things getting autovectorized are in the file manipulation routines.
 
Last edited:

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
Povray 3.7 is multithreaded, 3.6 is only single threaded. The program uses the SIMD registers as scalar entities. Looking through the code, there's not much that can be vectorized in critical paths and AVX2 probably will not be of much benefit for this program. Watching the program compile, the only things getting autovectorized are in the file manipulation routines.

Sorry I get a bit confused as to how SIMD, Vector Units, FPUs, etc al all work in their own manner.:oops:

But is the SIMD width directly affecting how much data is getting shoved through?
 

Haserath

Senior member
Sep 12, 2010
793
1
81
This is a cool thread.
It'll be interesting to see how the hard abi, and neon, affect things.
Also, A15 out now which is ~25% faster clock/clock.
Pretty soon we won't need Intel...

Improvements made by ARM's army of designers mean that the Cortex-A15's performance is significantly improved: official figures put the chip's integer performance at around 1.5 times that of the Cortex-A9, while floating-point performance is doubled.
http://www.bit-tech.net/news/hardware/2012/08/10/samsung-exynos-5/1

You can say that again. I was thinking they were going to start out with 2ghz dual cores though, but 1.7ghz would still put this a league ahead of an A9.

It's amazing that these chips will be using 12.8GB/s memory bandwidth when the desktop APUs are merely getting 30GB/s or less.
 

Arkadrel

Diamond Member
Oct 19, 2010
3,681
2
0
Improvements made by ARM's army of designers mean that the Cortex-A15's performance is significantly improved: official figures put the chip's integer performance at around 1.5 times that of the Cortex-A9, while floating-point performance is doubled.
http://www.bit-tech.net/news/hardware/2012/08/10/samsung-exynos-5/1

You can say that again. I was thinking they were going to start out with 2ghz dual cores though, but 1.7ghz would still put this a league ahead of an A9.

It's amazing that these chips will be using 12.8GB/s memory bandwidth when the desktop APUs are merely getting 30GB/s or less.

@1.5x int and 2x fp, that would give it how many pps in this type of bench? 50ish? still lower than a celeron 220.

I agree about the memory bandwidth however, Im guessing there gonna put it to good use:

1) "promises 1080p video playback at 60 frames per second with full support for wireless displays and 3D stereoscopic videos."

2) "the image processor portion can capture pictures and video from an eight megapixel sensor at 30 frames per second and features hardware post-processing units including 3D noise reduction, image stabilisation and optical distortion compensation"

Also impressive:
Comes with USB3 and SATA3.
 

nenforcer

Golden Member
Aug 26, 2008
1,767
1
76
This is a great thread I will be very interested in following these numbers starting with the release of Windows 8 RT this fall and obviously Debian 7.

Also interested in Qualcomm Snapdragon Krait and nVidia Tegra 4? since those are suppose to be considered the flagship ARM Cortex A15 processors.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
This is a great thread I will be very interested in following these numbers starting with the release of Windows 8 RT this fall and obviously Debian 7.

Also interested in Qualcomm Snapdragon Krait and nVidia Tegra 4? since those are suppose to be considered the flagship ARM Cortex A15 processors.

Same here, but I don't have access to all that hadware.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
This is a great thread I will be very interested in following these numbers starting with the release of Windows 8 RT this fall and obviously Debian 7.

Windows RT is a stillborn child tho. Almost all OEMs rejected it in favour of x86 tablets.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Exynos hard float numbers are up. Had to install Debian wheezy armhf on my phone. Loads of fun. Looks like performance is about on par wth a Pentium 4 and better than first generation Atom.
 
Last edited: