Mathtester: return of the un-benchmark

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
Mathtester returns! Here are the links for the latest build:

https://www.dropbox.com/s/21p61jcwjv9w3js/mathtesterclasses03272015.zip?dl=0
.class files. Read below for instructions. Requires Oracle Java 8 or equivalent.

https://www.dropbox.com/s/5v0arvstf9y1zw9/mathtestersource03272015.zip?dl=0
.source files. Open source under a BSD 2-clause license.

Virustotal says it's safe.

Previous builds

build 0203
https://www.dropbox.com/s/21mch2lmqlloppg/mathtesterclasses02032015.zip?dl=0
.class files.

https://www.dropbox.com/s/s99hq4qfbrsgei0/mathtestersource02032015.zip?dl=0
source files.

build 0217
https://www.dropbox.com/s/ivfevk6exan5r5q/mathtesterclasses02172015.zip?dl=0
.class files.

https://www.dropbox.com/s/7otk5v0s9qjglc5/mathtestersource02172015.zip?dl=0
source files.

build 0227
https://www.dropbox.com/s/jxuaf3v26kb856a/mathtesterclasses02272015fixed.zip?dl=0
.class files.

https://www.dropbox.com/s/c93z1om51d75l3f/mathtestersource02272015fixed.zip?dl=0
source files.

build 0311
https://www.dropbox.com/s/wfhu70qbwdtmio9/mathtesterclasses03112015.zip?dl=0
.class files.

https://www.dropbox.com/s/53zuk227qr4wdpn/mathtestersource03112015.zip?dl=0
source files.

Mathtester has come a long way from its original form, and enough has changed that the two programs are only roughly comparable. It still attempts to iterate ~103 billion loops of a particular set of very simple calculations. IntegerLoop, for example, is essentially this:

Code:
int3 = int1 + int2;
int3 = int1 * int2;
int3 = int1 / int2;

Those of you who bother to look at any code revision of the current Mathtester project can see that some of the code is a wee bit different. We can discuss that in greater detail below.

Bottom line: Mathtester is still the “un-benchmark”. The results of this test signify very little. Will you ever see mathematical calculations aligned in this way in real code? Probably not. But it is interesting nevertheless to see how different microarchitectures handle this workload.

And yeah . . . it’s still Java. C++ version incoming any day now! Maybe! Uhm yeah. Ahem. Right. Anyway . . .

Feel free to run it and submit your times! Turn on logging and you can grab your results from results.txt .
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
Instructions for Operation

0203 build:
Mathtester now has some new options. Like before, starting the program is as simple as downloading the .class files archive, extracting the archive to its own directory, and running runme.bat within the directory (just double-click it, you don’t have to do it from the command line).

The first time you start the program, Mathtester will run every available test routine as well as one alternate version of each test routine to see which one is fastest on your system. It will store preferences in a file named config.ini in the same directory where you started runme.bat . Editing config.ini by hand may cause problems, preventing Mathtester from running properly. In the event that your config.ini file becomes corrupt or otherwise misconfigured, either delete it and re-run the program or replace it with a known-good backup copy if you have one.

Some slow computers may take a very long time (1+ hours) to complete the config procedure. For those users, I have provided config.slo, which is a valid config.ini file with a change to its file extension. Before you start the program, copy config.slo to config.ini so that you can start the program without iterating through every test and its alternate all at once. Later on you can change config.ini inside the program (or by hand) to run the proper tests to get the highest possible scores for your machine. It will take some trial-and-error, but at least you won’t have to run everything all at once to sort out which is best.

Once you are in the program, there are several options available. Options 1 and 2 show you the various tests and give you the opportunity to run them individually or in a batch. Option “f” lets you alter your config.ini file, and option “p” enables or disables logging.

Selecting code paths

This process is fairly simple. After entering “f”, you’ll get a list of every test available, as well as the settings from your config.ini file showing which code path is used for each test. Note that initiating this procedure actually deletes your config.ini file and replaces it with a blank; only by properly returning to the main menu will the contents of your config.ini (changed or otherwise) be written to the file. So, don’t ctrl-c during this procedure or you’ll wind up with a blank file. Oops.

Anyway, here you can toggle between the normal and alternate code paths. Just select the test (1-18), and then enter 1 for the normal path or 2 for the alternate. Pretty easy stuff.

If your config.ini file is hosed up from hand-editing gone wrong, this process probably won’t work properly. You’re better off copying in a known-good file (config.slo will work in a pinch).

Logging

After entering “p”, you’ll notice that the logging status will switch from false to true, or true to false. If logging status is true, then running any test will attempt to create a file named results.txt in the same directory where you started runme.bat . It will then append the test output to the file. When recording execution times, it’s often best to turn on logging so you can go back to the file and getting your results that way instead of copy/pasting from a terminal.

. . . and, that’s it! Nothin to it.

0217 build:

Operation is the same as before, though check the included readme.txt in case you are confused by the new tests invalidating config.ini files from previous versions. tl;dr version: the easiest way to use your old config.ini file with 0217 is to open up config.slo from 0217, copy the last four lines, and then paste them onto the end of your config.ini from 0203. Then you can use the path selector to dial in which of the new tests run fastest on your system.

0227 build:

No changes to operation. Your old config.ini file should work just fine.

0311 build:

No changes to operation. Your old config.ini file should work just fine.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
What Mathtester does

0203 build:

Mathtester spawns 48 separate worker "jobs" sent to an ExecutorService thread pool. Each worker job runs an outer do/while loop 2^24 (16777216) times. Nested inside the outer loop is an inner loop that either runs 128 times or 8 times, depending on the contents of the loop. Generally speaking, there are two forms of the inner loop:

Code:
int3 = int4[i] <someoperator> int5[i];

or

Code:
int3 = int1[i][0] <someoperator> int2[i][0];
int3 = int1[i][1] <someoperator> int2[i][1];
. . .
int3 = int1[i][14] <someoperator> int2[i][14];
int3 = int1[i][15] <someoperator> int2[i][15];

There&#8217;s also some funky stuff involving casting and rounding (in the isolated test routines, the last three tests do not have any operator and just involve casting or rounding exclusively).

In any case, as you can see, the second type of loop still runs a particular instruction or set of instructions 128 times. The second type simply accomplishes this feat by using a two-dimensional array. Those of you who know anything about SIMD will probably detect that I have aligned the instructions in the code paths following the second type of loop in a rather blatant attempt to cue the JVM to translate microcode into SIMD instructions if the host CPU supports them. My main test machine (A10-7700k) has mostly responded well to this treatment, though there are a few exceptions - some bizarre. The JVM can be an odd beast, and so can Steamroller.

Anyway, once the worker threads are spawned, they start running until complete (ExecutorService overhead is pretty low now, thank goodness). They record completion times by recording the run time (in milliseconds) of all 2^24 iterations of the outer loop and then adding that total to a static array of long values established in the core class file (Mathtester.class). Later on, the program sifts through that array, compares values, and selects the largest value as the total run time for the test.

Additionally, there is a brief warmup routine that iterates the outer loop 200000 times instead of 2^24 times. There is also a &#8220;blank&#8221; loop provided for every test that attempts to gauge how much time the system spends handling the do/while loops minus their normal contents. Each time a test is run, the &#8220;blank&#8221; loop runs the same number of times as the normal loop, and the run time for that &#8220;blank&#8221; loop is subtracted from the normal test&#8217;s run time. It isn&#8217;t a perfect method for removing the loop handling from consideration, but it does make a small difference.

The values used in computation are established when the program first starts. Random values ranging from 1 to 13 are generated and loaded into eight different arrays: two integer and and two float one-dimensional arrays with 128 entries, and two integer and two float two-dimensional arrays with 8 and 16 entries respectively.

When computing values, values are stored in variables int3 or float3, which are never used in any meaningful way. The values computed are low enough that there is no risk of integer rollover or reaching a float value of Infinity.

For information regarding what any particular test does, inspect the code yourself, and you&#8217;ll probably figure it out pretty quickly (just look at the inner loop). Note that the isolated tests are all located in OmniLoop.java in their own self-descriptive methods, while the &#8220;Awesomeballs&#8221; tests are located in their own individual files (IntegerLoop.java, for example).

0217 build:

Many tweaks have been made, and at least one bug that was causing the "blank" loops not to subtract their run times from the totals has been fixed. The big news is that a new test category is added: benchmark routines.

Benchmark routines? In the "un-benchmark"?

Yes, indeedy! They still don't signify real-world performance, but at least these tests . . . do something. Included in 0217 are loops that calculate distance formula, the angle between the x-axis and a line defined by two points (I'm using a numerical approximation of arctangent here, not the (in)famous atan2), the quadratic equation, and the volume of a sphere.

It should be pointed out that I am not using Java's built-in Math.atan or Math.atan2 methods for angle calculation. Both proved to be impossibly slow (far too slow for ~103 billion iterations). I chose James Gregory's Taylor Series approximation with a k value of 3, which does not result in the best accuracy. I am considering experimenting with the improved approximation later in the linked document, but that can come later, and it may prove to be much slower in execution.

0227 build:

The threading model has been altered so that the total number of threads spawned is equal to your machine's thread-handling capacity. It still creates 48 jobs, but it no longer creates one thread per job. This change makes things run a bit faster with the current timing mechanism (much older versions of Mathtester suffered performance loss with this threading model).

Also, all times generated are now time-averages. The new threading model and altered the timing mechanism's output so that it was reporting the completion time for individual jobs instead of individual threads. So what Mathtester does now is add together the job completion times for all jobs and then divide them by the number of threads, producing a fairly reliable average amount of completion time for all of the work. The change should result in less variation in completion times between runs.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
Here's my run: A10-7700k, 4.7ghz, 2.1 ghz NB, DDR3-2400 10-12-13-32-2T

It took 33367 milliseconds to complete IntegerLoop.
It took 2087 milliseconds to complete IntegerLoopNoDiv.
It took 4535 milliseconds to complete FloatLoop.
It took 2106 milliseconds to complete FloatLoopNoDiv.
It took 4264 milliseconds to complete CastFloatToInt.
It took 2638 milliseconds to complete CastFloatToIntNoDiv.
It took 14492 milliseconds to complete RoundFloatToInt.
It took 16515 milliseconds to complete RoundFloatToIntNoDiv.
It took 39144 milliseconds to complete CastIntToFloat.
It took 2709 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 121857 milliseconds.

It took 2056 milliseconds to complete integer addition.
It took 2062 milliseconds to complete integer multiplication.
It took 70181 milliseconds to complete integer division.
It took 2188 milliseconds to complete float addition.
It took 2305 milliseconds to complete float multiplication.
It took 3881 milliseconds to complete float division.
It took 10366 milliseconds to complete rounding of float to integer.
It took 2 milliseconds to complete casting of float to integer.
It took 3 milliseconds to complete casting of integer to float.

Total execution time for your selection is 93044 milliseconds.

Contents of my config.ini:

IntegerLoop
IntegerLoopNoDiv
FloatLoop
FloatLoopNoDiv
CastFloatToInt
CastFloatToIntNoDiv
RoundFloatToIntAlternate
RoundFloatToIntNoDiv
CastIntToFloat
CastIntToFloatNoDivAlternate
intaddloop
intmultloop
intdivloopalternate
floataddloop
floatmultloop
floatdivloop
roundfloattointloopalternate
castfloattointloop
castinttofloatloopalternate

I am more than a little confused by the RoundFloatToInt/RoundFloatToIntNoDiv results. I'm running what is the more SIMD-friendly code path for RoundFloatToIntNoDiv, *and* it has 1/3 of the total number of calculations to carry out, yet it's slower? Weird stuff. I could dump microcode and pretend I knew what I was looking at, but I'd just be pretending.

It's also funny to see that integer division (intdivloop) is slower than IntegerLoop which contains three times as many calculations. Furthermore, the alternate code path for intdivloop is the fastest one, which is at odds with the behavior exhibited by IntegerLoop. Weird stuff.
 

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
Here's a slower machine: E1-2500 @ stock

It took 209939 milliseconds to complete IntegerLoop.
It took 19569 milliseconds to complete IntegerLoopNoDiv.
It took 17739 milliseconds to complete FloatLoop.
It took 15000 milliseconds to complete FloatLoopNoDiv.
It took 31905 milliseconds to complete CastFloatToInt.
It took 24608 milliseconds to complete CastFloatToIntNoDiv.
It took 113659 milliseconds to complete RoundFloatToInt.
It took 111101 milliseconds to complete RoundFloatToIntNoDiv.
It took 341055 milliseconds to complete CastIntToFloat.
It took 19877 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 904452 milliseconds.

It took 21507 milliseconds to complete integer addition.
It took 18410 milliseconds to complete integer multiplication.
It took 413186 milliseconds to complete integer division.
It took 13804 milliseconds to complete float addition.
It took 17085 milliseconds to complete float multiplication.
It took 25934 milliseconds to complete float division.
It took 88480 milliseconds to complete rounding of float to integer.
It took 4 milliseconds to complete casting of float to integer.
It took 4 milliseconds to complete casting of integer to float.

Total execution time for your selection is 598414 milliseconds.

Config.ini for this machine:

IntegerLoop
IntegerLoopNoDiv
FloatLoop
FloatLoopNoDiv
CastFloatToInt
CastFloatToIntNoDivAlternate
RoundFloatToIntAlternate
RoundFloatToIntNoDiv
CastIntToFloat
CastIntToFloatNoDivAlternate
intaddloop
intmultloop
intdivloopalternate
floataddloop
floatmultloop
floatdivloop
roundfloattointloopalternate
castfloattointloop
castinttofloatloopalternate

Again, some peculiar results, but the behavior of the Jaguar chip seems to be very much in line with that of Steamroller, albeit at about 10-20% speed depending on the test. With only a hair over 1/4 the clockspeed and half the thread capacity, what do you expect? There's no weird-arsed situation where a test with additional instructions runs faster, though.

edit: a lot of those of you who are in the habit of panning AMD's FP performance may be scratching your head at this results. IntegerLoop runs slowly because it contains integer division, which seems to break SIMD all to hell (and cause other strangeness). I am not so sure about my old Stars chip, but I can tell that both the Steamroller and Jaguar prefer floating point division to integer division.

It is still interesting to see how, on Jaguar, the floating point code is consistently faster than integer code across the board. For Steamroller, it's kind of a wash until division is involved, at which point Steamroller is much faster handling floating point instructions.

Weird, huh?

edit edit: for whatever reason, my code to pre-configure config.ini b0rked things up pretty good on the E1-2500 machine. It picked the wrong tests in maybe 2-3 situations. I am not sure why that happened. I resorted to using config.slo to start, running a batch, then switching all tests to alternate and re-running the batch and comparing result files.
 
Last edited:

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
What Mathtester does

0203 build:
Mathtester spawns 48 separate worker threads. Each worker thread runs an outer do/while loop 2^23 times. Nested inside the outer loop is an inner loop that either runs 128 times or 8 times, depending on the contents of the loop. Generally speaking, there are two forms of the inner loop:

Code:
int3 = int4[i] <someoperator> int5[i];

or

Code:
int3 = int1[i][0] <someoperator> int2[i][0];
int3 = int1[i][1] <someoperator> int2[i][1];
. . .
int3 = int1[i][14] <someoperator> int2[i][14];
int3 = int1[i][15] <someoperator> int2[i][15];

There’s also some funky stuff involving casting and rounding (in the isolated test routines, the last three tests do not have any operator and just involve casting or rounding exclusively).

In any case, as you can see, the second type of loop still runs a particular instruction or set of instructions 128 times. The second type simply accomplishes this feat by using a two-dimensional array. Those of you who know anything about SIMD will probably detect that I have aligned the instructions in the code paths following the second type of loop in a rather blatant attempt to cue the JVM to translate microcode into SIMD instructions if the host CPU supports them. My main test machine (A10-7700k) has mostly responded well to this treatment, though there are a few exceptions - some bizarre. The JVM can be an odd beast, and so can Steamroller.

Anyway, once the worker threads are spawned, they start running until complete (ExecutorService overhead is pretty low now, thank goodness). They record completion times by recording the run time (in milliseconds) of all 2^23 iterations of the outer loop and then adding that total to a static array of long values established in the core class file (Mathtester.class). Later on, the program sifts through that array, compares values, and selects the largest value as the total run time for the test.

Additionally, there is a brief warmup routine that iterates the outer loop 200000 times instead of 2^23 times. There is also a “blank” loop provided for every test that attempts to gauge how much time the system spends handling the do/while loops minus their normal contents. Each time a test is run, the “blank” loop runs the same number of times as the normal loop, and the run time for that “blank” loop is subtracted from the normal test’s run time. It isn’t a perfect method for removing the loop handling from consideration, but it does make a small difference.

The values used in computation are established when the program first starts. Random values ranging from 1 to 13 are generated and loaded into eight different arrays: two integer and and two float one-dimensional arrays with 128 entries, and two integer and two float two-dimensional arrays with 8 and 16 entries respectively.

When computing values, values are stored in variables int3 or float3, which are never used in any meaningful way. The values computed are low enough that there is no risk of integer rollover or reaching a float value of Infinity.

For information regarding what any particular test does, inspect the code yourself, and you’ll probably figure it out pretty quickly (just look at the inner loop). Note that the isolated tests are all located in OmniLoop.java in their own self-descriptive methods, while the “Awesomeballs” tests are located in their own individual files (IntegerLoop.java, for example).


Oh my head hurts, what does it do in laymans terms.
 

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
Hmm! Good question.

It does a lot of really simple math using a lot of worker threads. Enough worker threads to load up all the cores on any modern desktop x86 CPU, and probably more than a few modern MP systems as well. 48 threads, to be exact.

The total number of loops is around 103 billion, counting all the work done by all the threads. Each loop contains anywhere from 1-3 statements that each do something. What each loop does is at least partially described in the test name.

Like, IntegerLoop has integer math. It has an integer add, integer mult, and integer div. IntegerLoopNoDiv is basically the same thing, except no division.

FloatLoop/FloatLoopNoDiv are the same as IntegerLoop, just using 32-bit floating point values instead.

Some of the code has been set up to let SIMD work, and it looks correspondingly . . . different. For those of you unfamiliar with arrays, it may look a little strange. Arrays are just collections of values of the same kind. In Java, I declare one array like this:

Code:
int[] int4 = new int[128];

int[] just means it's an array of integers. new means . . . it's new. It's named "int4" and it has 128 positions in the array. So I can put an integer (let's say, 7) at int4[0], which is the first position. Then I can put another integer (12?) at int4[1]. Etc.

Two-dimensional arrays work like an x/y grid of numbers. If I have this array:

Code:
int[][] int1 = new int[8][16]

then that means I have an array that lets me store 16 integer values at every array position in the first part of the array. 8 * 16 = 128, so the total number of integer values in int1 will be the same as int4. They're just stored and accessed a little bit differently.

With that in mind, code like this:

Code:
i = 0;

do
{
	int3 = Mathtester.int1[i][0] + Mathtester.int2[i][0];
	int3 = Mathtester.int1[i][1] + Mathtester.int2[i][1];
	int3 = Mathtester.int1[i][2] + Mathtester.int2[i][2];
	int3 = Mathtester.int1[i][3] + Mathtester.int2[i][3];
	int3 = Mathtester.int1[i][4] + Mathtester.int2[i][4];
	int3 = Mathtester.int1[i][5] + Mathtester.int2[i][5];
	int3 = Mathtester.int1[i][6] + Mathtester.int2[i][6];
	int3 = Mathtester.int1[i][7] + Mathtester.int2[i][7];
	int3 = Mathtester.int1[i][8] + Mathtester.int2[i][8];
	int3 = Mathtester.int1[i][9] + Mathtester.int2[i][9];
	int3 = Mathtester.int1[i][10] + Mathtester.int2[i][10];
	int3 = Mathtester.int1[i][11] + Mathtester.int2[i][11];
	int3 = Mathtester.int1[i][12] + Mathtester.int2[i][12];
	int3 = Mathtester.int1[i][13] + Mathtester.int2[i][13];
	int3 = Mathtester.int1[i][14] + Mathtester.int2[i][14];
	int3 = Mathtester.int1[i][15] + Mathtester.int2[i][15];

	int3 = Mathtester.int1[i][0] * Mathtester.int2[i][0];
	int3 = Mathtester.int1[i][1] * Mathtester.int2[i][1];
	int3 = Mathtester.int1[i][2] * Mathtester.int2[i][2];
	int3 = Mathtester.int1[i][3] * Mathtester.int2[i][3];
	int3 = Mathtester.int1[i][4] * Mathtester.int2[i][4];
	int3 = Mathtester.int1[i][5] * Mathtester.int2[i][5];
	int3 = Mathtester.int1[i][6] * Mathtester.int2[i][6];
	int3 = Mathtester.int1[i][7] * Mathtester.int2[i][7];
	int3 = Mathtester.int1[i][8] * Mathtester.int2[i][8];
	int3 = Mathtester.int1[i][9] * Mathtester.int2[i][9];
	int3 = Mathtester.int1[i][10] * Mathtester.int2[i][10];
	int3 = Mathtester.int1[i][11] * Mathtester.int2[i][11];
	int3 = Mathtester.int1[i][12] * Mathtester.int2[i][12];
	int3 = Mathtester.int1[i][13] * Mathtester.int2[i][13];
	int3 = Mathtester.int1[i][14] * Mathtester.int2[i][14];
	int3 = Mathtester.int1[i][15] * Mathtester.int2[i][15];

	i++;
}
while (i < 8);

might make a little bit more sense. This code here is an example of just one of many inner loops in Mathtester. This one is IntegerLoopNoDiv. The value of i is used as a timer, and it goes from 0 to 7. The statements inside the loop check arrays int1 and int2 at position i, and carry out instructions using all 16 integers stored at that position. It may not be obvious, but all this code is really doing is an add and multiply 128 times.

Then the outer loop (not shown here) repeats that stuff 16777216 times.

You could rewrite the inner loop above to make it easier to understand, and it would look like this (IntegerLoopNoDivAlternate has this code):

Code:
i = 0;
			
do
{
	int3 = Mathtester.int4[i] + Mathtester.int5[i];
	int3 = Mathtester.int4[i] * Mathtester.int5[i];
				
	i++;
}
while (i < 128);

Pretty straightforward. The value i goes from 0 to 127, and each time it does, it grabs the values from int4 and int5 at position i and then adds them and multiplies them together. In the end, it's the same darn thing as in the standard IntegerLoopNoDiv, just laid out differently.

While that code is much easier to understand, Java runs it slower, at least on both my 7700k and also on a Jaguar. I've included both blocks of code in Mathtester in case there's a CPU out there where the JVM will make the simpler code run faster.

And that's pretty much what it does. A lot of simple math, over and over and over again. There's some "fancy" stuff in there to make sure that all the non-math stuff I had to do to get 48 threads running and time those threads doesn't slow things down unnecessarily or poison the results. What you see when the program spits out a time in milliseconds is pretty close to the amount of time it took churning through all that math and not much else.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
0217 has a bug with sphereloopalternate that is causing the JVM to NOOP the code (run time is ~1 millisecond for the entire test). For the time being, stick to sphereloop until I can fix the problem.

Anyway, my run for 0217 thus far:


It took 24199 milliseconds to complete IntegerLoop.
It took 2015 milliseconds to complete IntegerLoopNoDiv.
It took 3874 milliseconds to complete FloatLoop.
It took 3538 milliseconds to complete FloatLoopNoDiv.
It took 3337 milliseconds to complete CastFloatToInt.
It took 1670 milliseconds to complete CastFloatToIntNoDiv.
It took 11739 milliseconds to complete RoundFloatToInt.
It took 10675 milliseconds to complete RoundFloatToIntNoDiv.
It took 37845 milliseconds to complete CastIntToFloat.
It took 2576 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 101468 milliseconds.

It took 2054 milliseconds to complete integer addition.
It took 2028 milliseconds to complete integer multiplication.
It took 79710 milliseconds to complete integer division.
It took 2327 milliseconds to complete float addition.
It took 2272 milliseconds to complete float multiplication.
It took 3926 milliseconds to complete float division.
It took 8648 milliseconds to complete rounding of float to integer.
It took 2 milliseconds to complete casting of float to integer.
It took 4 milliseconds to complete casting of integer to float.

Total execution time for your selection is 100971 milliseconds.

It took 26914 milliseconds to complete distance formula.
It took 34405 milliseconds to complete angle formula.
It took 34409 milliseconds to complete quadratic formula.
It took 5505 milliseconds to complete sphere volume formula.

Total execution time for your selection is 101233 milliseconds.

As expected, small reductions in run times across the board due to a bug fix. Though integer division is slower now than it used to be . . . might be a bum run there, who knows.
 

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
I have (hopefully) fixed the problem with sphereloopalternate. Now the alternate code path should run fine. All I had to do was switch from an [8][16] array to a [16][8] array, which produced more inner loop iterations. This is the first time I've seen the JVM NOOP anything using my partially-unrolled loops. Learn something new every day . . .

In any case, the fixed .class files and source are both available at the above links, so you can redownload if you need to do so.
 

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
New build available: build 0227

https://www.dropbox.com/s/jxuaf3v26kb856a/mathtesterclasses02272015fixed.zip?dl=0
.class files. Use runme.bat as always. Java 8 required.

https://www.dropbox.com/s/jxuaf3v26kb856a/mathtesterclasses02272015fixed.zip?dl=0
.source files.

Virustotal.org reports the .class file archive to be safe:
https://www.virustotal.com/en/file/...3dca20788e3a6126cdd6f9b2/analysis/1425109390/

Long story short, I introduced a new threading model hoping it would speed things up a bit. It did, but I really jumped the gun on by how much. So there was a broken version of 0227 returning insanely fast results.

So, I fixed the problem where I was grossly-underestimating run times based on the way the new threading model altered the program's reporting mechanism. Now it reports time averages by adding together the run times for each individual job (all 48 of them) and then dividing the total by the number of cores on your machines. The resulting time is a fair representation of how long it should take your CPU, on average, to complete an entire test. It should also make the autoconfig sequence far more reliable. Should.

7700k on fixed code:

It took 24435 milliseconds to complete IntegerLoop.
It took 2196 milliseconds to complete IntegerLoopNoDiv.
It took 4003 milliseconds to complete FloatLoop.
It took 2294 milliseconds to complete FloatLoopNoDiv.
It took 4076 milliseconds to complete CastFloatToInt.
It took 3043 milliseconds to complete CastFloatToIntNoDiv.
It took 11313 milliseconds to complete RoundFloatToInt.
It took 9158 milliseconds to complete RoundFloatToIntNoDiv.
It took 32416 milliseconds to complete CastIntToFloat.
It took 2711 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 95645 milliseconds.

It took 2782 milliseconds to complete integer addition.
It took 2098 milliseconds to complete integer multiplication.
It took 69005 milliseconds to complete integer division.
It took 2392 milliseconds to complete float addition.
It took 2452 milliseconds to complete float multiplication.
It took 4055 milliseconds to complete float division.
It took 6692 milliseconds to complete rounding of float to integer.
It took 11 milliseconds to complete casting of float to integer.
It took 5 milliseconds to complete casting of integer to float.

Total execution time for your selection is 89492 milliseconds.

It took 26258 milliseconds to complete distance formula.
It took 33439 milliseconds to complete angle formula.
It took 33525 milliseconds to complete quadratic formula.
It took 4373 milliseconds to complete sphere volume formula.

Total execution time for your selection is 97595 milliseconds.

Just a bit faster.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
New build available: build 0311

https://www.dropbox.com/s/wfhu70qbwdtmio9/mathtesterclasses03112015.zip?dl=0
.class files. Use runme.bat or java mathtester.Mathtester. Needs Java 8.

https://www.dropbox.com/s/53zuk227qr4wdpn/mathtestersource03112015.zip?dl=0
.source files.

Virustotal says it's safe.

The only change here is the addition of a fourth option, which is a mixed selection of tests. This mode of operation loads one instance of every test available in Mathtester (yes, this includes all the alternate code paths) and cues each one up as a job in the thread pool. So, you're running all the Mathtester code simultaneously. Currently there are only 46 different tests available, so the last 2 of the 48 are selected randomly.

In other news, I have HSA working marginally on my system. It may be very, very difficult for me to present HSA software in redistributable form since a). it would only run on Linux and b). you'd have to install the same software stack that I have installed which is a bitch to configure. But I'll see what I can do and offer up the best solution that I can with aparapi-lambda. I haven't tried the Sumatra alpha/beta/whateveritis but I may try that sometime soon-ish.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
Woops, fixed a problem in which I forgot to provide any of the complementary files with the .class archive. No runme.bat or license file or anything. My mistake! The archive has been fully stocked with the right stuff.

In a separate issue, it would appear that Java 8 update 40 has ruined Math.round() performance in all of my code that uses it. It's now nearly 20x slower than before. I just updated the JVM on my Win10 partition and ran tests before and after, and the update killed performance. I'm guessing Math.round() was available for SIMD optimization by the JVM before the update and (for some reason) no longer is after the update. Quite a shame really.
 

DrMrLordX

Lifer
Apr 27, 2000
22,939
13,024
136
New build available: build 0327

https://www.dropbox.com/s/21p61jcwjv9w3js/mathtesterclasses03272015.zip?dl=0
.class files. The usual drill here . . .

https://www.dropbox.com/s/5v0arvstf9y1zw9/mathtestersource03272015.zip?dl=0
source files.

Virustotal says it's safe.

This version has a fairly significant bugfix related to things I learned tracking down the changes in Java 8u40. It turns out that all previous versions of Mathtester referenced in this thread had problems with the JVM selectively NOOPing certain elements of code ("dead code") while still handling the loop logic normally. So, there were a lot of tests just running empty loops or partially-empty loops, which is one reason why some of the execution times were so low. In a few select cases, Java 8u40 stopped doing that (such as in my Math.round() code), which is why it took longer to run that code in u40 than in u31. I had to implement a fairly convoluted system of recording the computational results of the tests in order to stop the JVM from eliminating "dead code" (doing this without breaking SIMD was an interesting challenge) everywhere. The sum is posted at the end of every test circuit as a "nonsense" or "nonsense number". I have gone to no effort to reset the value of the sum to 0 between tests, so sometimes the number builds up as you run more tests.

Due to the code actually running all the mathematical operations it was always meant to run and due to the convoluted method of recording the results and adding them all together, the actual run time of the program is significantly slower. For this reason, I highly recommend that you stick with your old config.ini from version 0311 until you've had time to check every test and its alternate individually to see which will run fastest on your machine.

Contrary to what some might have expected, this is not the HSAIL build. That is also (mostly) complete, but I'll probably reference that elsewhere . . . perhaps in the HSA thread that I started recently.
 
Last edited: