Athlon XP/MP / P4 benchmark for Linux

hwstock

Senior member
Oct 7, 2001
254
0
0
Hi -- here is the linux version of the benchmark (link) previously posted. I've reprised my original comments below, with minor changes.
====================================
Is there anyone out there who might run this
SSE-enabled benchmark on an Athlon XP or MP system? This
benchmark was discussed on the Ace's hardware tech forum
back in the spring; there is now a slight update on the SSE
optimizations. This version works under linux-- I've run it on three
systems under Redhat 6.2 and SuSE 7.1. Since the libraries are statically bound,
there shouldn't be an issue with the OS version. If there is a worry about
viruses, I will gladly e-mail from my gov address, which
should provide some traceability. The test requires at least
200MB, else it pages to disk and slows down.

Download:

http://www.wizard.com/~hwstock/bench/3d0/3d0_linux.zip
or
http://home.earthlink.net/~stockman3/bench/3d0/3d0_linux.zip

unzip to obtain the 3 files. You'll probably have to change
the permissions on the executable (in KDE, just right click
on the exe; else use chmod). Open a console (xterm)
into the directory with the 3 files. The batch file contains
the command line for a win32 system; either scriptify the
batch file, or simply paste its command line into the xterm window.
Return the lb_data.txt file to me (and
return the bmp files if you so choose, so I can verify
that the test ran correctly).

The full test may take a few minutes. If you don't want to
tie up the machine for that long, edit the command line
(from now.bat), changing the switch -s1000 to -s200
(i.e., run 200 steps instead of 1000).


The fastest previous Athlon results were: 2.135 MUPs for a 1.2 GHz Tbird with 2-2-2 SDRAM.
The fastest previous P4 results were: 5.723 MUPs for a 1.4 GHz P4 with 1 GB RDRAM.

We now have a 1.4 GHz Ath XP at 3.400 MUPs, and a 1.7 GHz P4 at 6.324 MUPs.

Higher MUPs (millions of updates per second) are better. if the command line switches include -s1000, -l38, -r151 and -c423, the number of updates is (1000 steps)*(432 columns)*(151 rows)*(38 layers). The start and end times, in the lb_data.txt file, allow one to calculate a total execution time of "sec" seconds. Thus the MUPs is [(1e-6)*(1000 steps)*(432 columns)*(151 rows)*(38 layers)]/sec.

However, the Athlon exe used hand-coded 3DNow under MS VC++, whereas the P4 used SSE intrinsics under Intel C++ 5. The Intel compiler generally produces superior code. In principle, the SSE-enabled executable should run just fine on an Athlon XP (it is compiled for PIII instructions, and even runs on my Celeron CuMine), and the memory interface on the Athlon XP (with DDR) is supposed to be a significant improvement.

I won't try to mislead you -- I'm not expecting the Athlon XP to be a knock-out performer for this code, because I don't think DDR is yet on a par with RDRAM, and the code is memory-intensive. However, I have two Athlon systems (and one P4, a Celeron, and two PIIIs), and I would love to be pleasantly surprised.

If you want to know more about this code, visit these web sites for background info:
http://www.sandia.gov/eesector/gs/gc/hws/saltfing.htm
http://www.sandia.gov/eesector/gs/gc/hws/3d.htm

If more info is needed, I can e-mail (or post links to) two pdfs from peer-reviewed scientific journals, one of which gives details of the code architecture and optimization strategy.

 

hwstock

Senior member
Oct 7, 2001
254
0
0


<< Hi -- here is the linux version of the benchmark (link) previously posted. I've reprised my original comments below, with minor changes.
[...]
The fastest previous Athlon results were: 2.135 MUPs for a 1.2 GHz Tbird with 2-2-2 SDRAM.
The fastest previous P4 results were: 5.723 MUPs for a 1.4 GHz P4 with 1 GB RDRAM.

We now have a 1.4 GHz Ath XP at 3.400 MUPs, and a 1.7 GHz P4 at 6.324 MUPs.

However, the Athlon exe used hand-coded 3DNow under MS VC++, whereas the P4 used SSE intrinsics under Intel C++ 5. The Intel compiler generally produces superior code. In principle, the SSE-enabled executable should run just fine on an Athlon XP (it is compiled for PIII instructions, and even runs on my Celeron CuMine), and the memory interface on the Athlon XP (with DDR) is supposed to be a significant improvement.

[...]
>>



Just to clarify -- the posted links are for exes that run only on SSE-enabled Athlons, and SSE-enabled Intel chips. The "previous" Athlon result on a Tbird (2.135 MUPs) used a version with hand-coded 3dNow. The newer exes use pure SSE, and were compiled with Intel's C++ 5.