nbench benchmarks

jhu

Lifer
Oct 10, 1999
11,918
9
81
So it appears Antutu has portions based on nbench. Why not just compile nbench directly and see what the numbers are? So I did:

Exynos 4 Dual (Samsung Galaxy S II, Debian 7)

gcc 4.6.3; compiler options: -mcpu=cortex-a9:
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          627.56  :      16.09  :       5.29
STRING SORT         :           75.44  :      33.71  :       5.22
BITFIELD            :      1.9833e+08  :      34.02  :       7.11
FP EMULATION        :          81.607  :      39.16  :       9.04
FOURIER             :          7411.8  :       8.43  :       4.73
ASSIGNMENT          :          9.9088  :      37.70  :       9.78
IDEA                :          1789.3  :      27.37  :       8.13
HUFFMAN             :          914.15  :      25.35  :       8.09
NEURAL NET          :          9.7649  :      15.69  :       6.60
LU DECOMPOSITION    :          406.08  :      21.04  :      15.19

gcc 4.6.3; compiler options: -mcpu=cortex-a9 -mfp=neon:
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          604.32  :      15.50  :       5.09
STRING SORT         :           75.88  :      33.91  :       5.25
BITFIELD            :      1.9742e+08  :      33.86  :       7.07
FP EMULATION        :          81.567  :      39.14  :       9.03
FOURIER             :          7501.2  :       8.53  :       4.79
ASSIGNMENT          :          9.5924  :      36.50  :       9.47
IDEA                :          1791.4  :      27.40  :       8.13
HUFFMAN             :          914.15  :      25.35  :       8.09
NEURAL NET          :             9.8  :      15.74  :       6.62
LU DECOMPOSITION    :           411.6  :      21.32  :      15.40

For comparison on an Intel Core i5 3317U (Ubuntu 12.04, 64-bit)

gcc 4.6.4; compiler options: none
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :            1191  :      30.54  :      10.03
STRING SORT         :          775.68  :     346.60  :      53.65
BITFIELD            :       4.655e+08  :      79.85  :      16.68
FP EMULATION        :          457.84  :     219.69  :      50.69
FOURIER             :           36478  :      41.49  :      23.30
ASSIGNMENT          :          44.732  :     170.21  :      44.15
IDEA                :            8848  :     135.33  :      40.18
HUFFMAN             :          4032.7  :     111.83  :      35.71
NEURAL NET          :          79.456  :     127.64  :      53.69
LU DECOMPOSITION    :            2089  :     108.22  :      78.14

icc 13.1.1; compiler options: none
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :            1316  :      33.75  :      11.08
STRING SORT         :          895.92  :     400.32  :      61.96
BITFIELD            :      4.8141e+08  :      82.58  :      17.25
FP EMULATION        :          433.44  :     207.98  :      47.99
FOURIER             :       1.789e+05  :     203.46  :     114.28
ASSIGNMENT          :          39.193  :     149.14  :      38.68
IDEA                :          9131.6  :     139.66  :      41.47
HUFFMAN             :          4011.2  :     111.23  :      35.52
NEURAL NET          :          129.12  :     207.42  :      87.25
LU DECOMPOSITION    :          3696.7  :     191.51  :     138.29

S4 Pro (APQ8064, Krait 200 @ 1.5 GHz)
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          714.84  :      18.33  :       6.02
STRING SORT         :          104.12  :      46.52  :       7.20
BITFIELD            :      1.5008e+08  :      25.74  :       5.38
FP EMULATION        :          110.67  :      53.10  :      12.25
FOURIER             :          7181.5  :       8.17  :       4.59
ASSIGNMENT          :           10.24  :      38.97  :      10.11
IDEA                :          2441.2  :      37.34  :      11.09
HUFFMAN             :          1289.2  :      35.75  :      11.42

Presence of NEON doesn't make much of a difference. Or maybe gcc isn't very good at vectorizing?
 
Last edited:

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 compiler options: none:
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          1969.9  :      50.52  :      16.59
STRING SORT         :          357.28  :     159.64  :      24.71
BITFIELD            :       6.668e+08  :     114.38  :      23.89
FP EMULATION        :           644.6  :     309.31  :      71.37
FOURIER             :           40831  :      46.44  :      26.08
ASSIGNMENT          :          46.805  :     178.10  :      46.20
IDEA                :           11364  :     173.81  :      51.61
HUFFMAN             :          4016.8  :     111.39  :      35.57
NEURAL NET          :          83.447  :     134.05  :      56.39
LU DECOMPOSITION    :          2300.5  :     119.18  :      86.06
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 138.625
FLOATING-POINT INDEX: 90.523
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU AuthenticAMD AMD Phenom(tm) II X4 965 Processor 4000MHz
L2 Cache            : 512 KB
OS                  : Linux 3.2.0-49-generic
C compiler          : test
libc                : 
MEMORY INDEX        : 30.100
INTEGER INDEX       : 38.397
FLOATING-POINT INDEX: 50.207
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 compiler options: march=native:
side note I have no clue what I am doing not a compile stuff guy...
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          1928.6  :      49.46  :      16.24
STRING SORT         :           361.6  :     161.57  :      25.01
BITFIELD            :      6.4363e+08  :     110.40  :      23.06
FP EMULATION        :          634.32  :     304.38  :      70.23
FOURIER             :           41848  :      47.59  :      26.73
ASSIGNMENT          :          48.021  :     182.73  :      47.40
IDEA                :           12201  :     186.61  :      55.40
HUFFMAN             :          3853.8  :     106.87  :      34.13
NEURAL NET          :           98.08  :     157.56  :      66.27
LU DECOMPOSITION    :          2330.2  :     120.71  :      87.17
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 138.520
FLOATING-POINT INDEX: 96.730
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU AuthenticAMD AMD Phenom(tm) II X4 965 Processor 4000MHz
L2 Cache            : 512 KB
OS                  : Linux 3.2.0-49-generic
C compiler          : gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 
libc                : libc-2.15.so
MEMORY INDEX        : 30.123
INTEGER INDEX       : 38.324
FLOATING-POINT INDEX: 53.651
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04) compiler options: march=none:
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          2767.2  :      70.97  :      23.31
STRING SORT         :          358.56  :     160.21  :      24.80
BITFIELD            :      6.3871e+08  :     109.56  :      22.88
FP EMULATION        :             688  :     330.13  :      76.18
FOURIER             :           42039  :      47.81  :      26.85
ASSIGNMENT          :          46.341  :     176.34  :      45.74
IDEA                :           11192  :     171.17  :      50.82
HUFFMAN             :            4000  :     110.92  :      35.42
NEURAL NET          :            85.2  :     136.87  :      57.57
LU DECOMPOSITION    :          2264.8  :     117.33  :      84.72
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 145.444
FLOATING-POINT INDEX: 91.564
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU AuthenticAMD AMD Phenom(tm) II X4 965 Processor 4000MHz
L2 Cache            : 512 KB
OS                  : Linux 3.2.0-49-generic
C compiler          : gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04) 
libc                : libc-2.15.so
MEMORY INDEX        : 29.608
INTEGER INDEX       : 42.282
FLOATING-POINT INDEX: 50.785
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04) compiler options: march=native:
Note: first run
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          2850.5  :      73.10  :      24.01
STRING SORT         :          361.04  :     161.32  :      24.97
BITFIELD            :      6.4562e+08  :     110.75  :      23.13
FP EMULATION        :          673.12  :     322.99  :      74.53
FOURIER             :           41896  :      47.65  :      26.76
ASSIGNMENT          :           45.69  :     173.86  :      45.10
IDEA                :           11176  :     170.93  :      50.75
HUFFMAN             :          3993.6  :     110.74  :      35.36
NEURAL NET          :          90.124  :     144.78  :      60.90
LU DECOMPOSITION    :          2271.1  :     117.65  :      84.96
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 145.617
FLOATING-POINT INDEX: 93.275
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU AuthenticAMD AMD Phenom(tm) II X4 965 Processor 4000MHz
L2 Cache            : 512 KB
OS                  : Linux 3.2.0-49-generic
C compiler          : gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04) 
libc                : libc-2.15.so
MEMORY INDEX        : 29.643
INTEGER INDEX       : 42.333
FLOATING-POINT INDEX: 51.734
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04) compiler options: march=native:
Note: second run

Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          2872.8  :      73.67  :      24.20
STRING SORT         :          361.52  :     161.54  :      25.00
BITFIELD            :      6.5065e+08  :     111.61  :      23.31
FP EMULATION        :          672.64  :     322.76  :      74.48
FOURIER             :           41728  :      47.46  :      26.66
ASSIGNMENT          :          45.625  :     173.61  :      45.03
IDEA                :           11176  :     170.93  :      50.75
HUFFMAN             :            3992  :     110.70  :      35.35
NEURAL NET          :           91.97  :     147.74  :      62.15
LU DECOMPOSITION    :          2281.7  :     118.20  :      85.35
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX       : 145.916
FLOATING-POINT INDEX: 93.928
Baseline (MSDOS*)   : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU                 : 4 CPU AuthenticAMD AMD Phenom(tm) II X4 965 Processor 4000MHz
L2 Cache            : 512 KB
OS                  : Linux 3.2.0-49-generic
C compiler          : gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~12.04) 
libc                : libc-2.15.so
MEMORY INDEX        : 29.719
INTEGER INDEX       : 42.403
FLOATING-POINT INDEX: 52.096
Baseline (LINUX)    : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
You need the -ftree-vectorize option to enable auto-vectorization in GCC, or at least last I was aware.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
You need the -ftree-vectorize option to enable auto-vectorization in GCC, or at least last I was aware.

No significant changes.

Oh just read the manual, it wants -funsafe-math-optimizations

Didn't make much of a difference

Exynos 4
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          576.32  :      14.78  :       4.85
STRING SORT         :           75.81  :      33.87  :       5.24
BITFIELD            :      1.9718e+08  :      33.82  :       7.06
FP EMULATION        :          81.647  :      39.18  :       9.04
FOURIER             :          7521.1  :       8.55  :       4.80
ASSIGNMENT          :          9.4033  :      35.78  :       9.28
IDEA                :          1818.5  :      27.81  :       8.26
HUFFMAN             :          904.44  :      25.08  :       8.01
NEURAL NET          :          10.079  :      16.19  :       6.81
LU DECOMPOSITION    :          411.44  :      21.31  :      15.39
 
Last edited:

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
Linaro gave has some more information and insight into GCC's auto-vectorization, in particular for ARM:

http://www.linaro.org/documents/download/c2df5509df84260a37f8feae73dcaf7b4fbb998d2ff1b

It would be interesting to turn on the reporting options to see what 4.6.3 vectorizes vs later versions.

I don't have dynamic numbers so this isn't saying anything very conclusive, but AnTuTu 3.3's x86 binary has at least some integer SIMD in the nbench parts, in what looks like the middle of loops: for instance in DoStringSort @ 0xf4f07
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Vectorization for ARM really improved with gcc 4.7 and 4.8. Also note the Android NDK uses poor compilation options; try the thumb flags mentioned here: http://code.google.com/p/android/issues/detail?id=56951

For icc, what options did you use? Your BITFIELD score should be much higher. Try -O3 -xSSSE3 (or -xSSSE3_ATOM).

Looks like there's no significant change.

Core i5 3317U
icc 13.1.1; options -xSSSE3
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          2212.8  :      56.75  :      18.64
STRING SORT         :          934.88  :     417.73  :      64.66
BITFIELD            :      4.9362e+08  :      84.67  :      17.69
FP EMULATION        :          443.52  :     212.82  :      49.11
FOURIER             :      1.8477e+05  :     210.14  :     118.03
ASSIGNMENT          :          41.167  :     156.65  :      40.63
IDEA                :          9404.2  :     143.84  :      42.71
HUFFMAN             :          4201.3  :     116.50  :      37.20
NEURAL NET          :           136.8  :     219.76  :      92.44
LU DECOMPOSITION    :          3784.6  :     196.06  :     141.58
 

Nothingness

Diamond Member
Jul 3, 2013
3,333
2,416
136
Looks like there's no significant change.

Core i5 3317U
icc 13.1.1; options -xSSSE3
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          2212.8  :      56.75  :      18.64
STRING SORT         :          934.88  :     417.73  :      64.66
BITFIELD            :      4.9362e+08  :      84.67  :      17.69
FP EMULATION        :          443.52  :     212.82  :      49.11
FOURIER             :      1.8477e+05  :     210.14  :     118.03
ASSIGNMENT          :          41.167  :     156.65  :      40.63
IDEA                :          9404.2  :     143.84  :      42.71
HUFFMAN             :          4201.3  :     116.50  :      37.20
NEURAL NET          :           136.8  :     219.76  :      92.44
LU DECOMPOSITION    :          3784.6  :     196.06  :     141.58
With -O3?
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
With -O3?

icc 13, compiler options: -O3 -xSSSE3
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          2188.2  :      56.12  :      18.43
STRING SORT         :          936.64  :     418.52  :      64.78
BITFIELD            :       4.949e+08  :      84.89  :      17.73
FP EMULATION        :           449.4  :     215.64  :      49.76
FOURIER             :      1.8523e+05  :     210.66  :     118.32
ASSIGNMENT          :          40.775  :     155.16  :      40.24
IDEA                :            9480  :     144.99  :      43.05
HUFFMAN             :            4191  :     116.22  :      37.11
NEURAL NET          :           137.2  :     220.40  :      92.71
LU DECOMPOSITION    :          3847.2  :     199.30  :     143.92

Looks like there's really no significant changes.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Do you have any Atom hardware to test on?

I believe you and Nothingness are on to something...

Atom N270 @ 1.6 GHz, Ubuntu 12.04 (32-bit)

gcc 4.6.3, compiler options: -O3 -march=atom
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          543.24  :      13.93  :       4.58
STRING SORT         :           132.3  :      59.11  :       9.15
BITFIELD            :      2.6148e+08  :      44.85  :       9.37
FP EMULATION        :          100.16  :      48.06  :      11.09
FOURIER             :          8203.5  :       9.33  :       5.24
ASSIGNMENT          :          12.995  :      49.45  :      12.83
IDEA                :          2168.2  :      33.16  :       9.85
HUFFMAN             :          986.97  :      27.37  :       8.74

icc 13.1.1, compiler options: -O3 -march=atom
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :          789.12  :      20.24  :       6.65
STRING SORT         :           132.8  :      59.34  :       9.18
BITFIELD            :      1.1926e+10  :    2045.74  :     427.30
FP EMULATION        :          53.975  :      25.90  :       5.98
FOURIER             :           15062  :      17.13  :       9.62
ASSIGNMENT          :          9.6777  :      36.83  :       9.55
IDEA                :          2317.2  :      35.44  :      10.52
HUFFMAN             :          731.23  :      20.28  :       6.48

icc 13.1.1, compiler options: -O3 -march=core2
Code:
TEST                : Iterations/sec.  : Old Index   : New Index
                    :                  : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT        :           706.8  :      18.13  :       5.95
STRING SORT         :          107.24  :      47.92  :       7.42
BITFIELD            :      9.4628e+09  :    1623.21  :     339.05
FP EMULATION        :          45.324  :      21.75  :       5.02
FOURIER             :           15729  :      17.89  :      10.05
ASSIGNMENT          :           8.585  :      32.67  :       8.47
IDEA                :          2085.8  :      31.90  :       9.47
HUFFMAN             :          822.68  :      22.81  :       7.28

This appears to be an issue with nbench compiled on 32-bit systems and not 64-bit systems.
 
Last edited:

Nothingness

Diamond Member
Jul 3, 2013
3,333
2,416
136
Thanks a lot for that! The fact the optimization only applies to 32-bit targets probably is another hint that this is a trick.

Also note the doubling of Fourier is due to the use of Intek math libraries cos, sin and pow.

Last point: as many devs know icc is useless for most programs and as can be seen here gcc is competititve except where icc cheats or uses hand written code.
 

tempestglen

Member
Dec 5, 2012
88
17
71
Thanks a lot for that! The fact the optimization only applies to 32-bit targets probably is another hint that this is a trick.

Also note the doubling of Fourier is due to the use of Intek math libraries cos, sin and pow.

Last point: as many devs know icc is useless for most programs and as can be seen here gcc is competititve except where icc cheats or uses hand written code.

Intel Compiler is speedy and buggy.

http://polyhedron.com/pb05-lin64-f90bench_SBhtml