java test: unexpected results on AMD CPUs

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
fx9590
Beginning entire batch.
It took 46642 milliseconds to complete IntegerLoop.
It took 87099 milliseconds to complete FloatLoop.
It took 15346 milliseconds to complete IntegerLoopNoDiv.
It took 45124 milliseconds to complete FloatLoopNoDiv.
It took 46036 milliseconds to complete IntegerLoopWithLatch.
It took 85097 milliseconds to complete FloatLoopWithLatch.
It took 15347 milliseconds to complete IntegerLoopWithLatchNoDi
It took 46140 milliseconds to complete FloatLoopWithLatchNoDiv.

Total execution time for your selection is 386831 milliseconds.
 

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
Thanks! I added that to the record in the OP. Your Integer nodiv performance is doubled by the latest code. Eliminating rollovers seemed to help your chip in ways vastly different from my Stars chip. I got more gains where Integer division was present, but your times there stayed virtually the same. My gains from Integer nodiv were certainly more modest when moving from 9/22 to 9/26. Different architectures, different results.

edit: interesting to see that your Float results are lower across the board. Perhaps you were not paying much of a penalty when handling float3 as Infinity, or maybe you were just gaining a lot from the failed incrementation of float1/float2. Or maybe the conversion of increment++ (now j++/increment++) to floating point instructions is the cause of it.
 
Last edited:

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
I'll run it again without my hardware monitoring stuff on, I've known it to muck with test results for some reason here and there. I should probably also turn off CnQ and all that and see what it does. Will do this evening or tomorrow. Thanks for the instructions, I should have downloaded and looked before I asked. :)
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
I'll run it again without my hardware monitoring stuff on, I've known it to muck with test results for some reason here and there. I should probably also turn off CnQ and all that and see what it does. Will do this evening or tomorrow. Thanks for the instructions, I should have downloaded and looked before I asked. :)


I noticed coretemp bounced around on my display a little while running this. Just a few pixels one way then back every few seconds.
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
I use SpeedFan for fan control, it's been less-bothersome than a lot of monitoring software but I've deff seen it effect benchmarks and even games sometimes. Dunno if that is an AMD thing or not but it's there in my system. Isn't a problem unless load is extreme.
 

biostud

Lifer
Feb 27, 2003
18,286
4,813
136
5820K @ 4.4Ghz

Original code:
IntegerLoop: 53551 ms
FloatLoop: 40257 ms

Original, no division:
IntegerLoop:9422 ms
FloatLoop: 31806 ms

Latch code:
IntegerLoop: 48481 ms
FloatLoop: 40303 ms

Latch code, no division:
IntegerLoop: 9503 ms
FloatLoop: 31909 ms
 

Maximilian

Lifer
Feb 8, 2004
12,603
9
81
Updated GUI version for those having trouble with the .bat (IE wont moan about viruses, only seems to be chrome).
YSfR0ek.png


i5 4570 @ stock
Running batch mode (all code)...
It took 110155 milliseconds to complete IntegerLoop.
It took 78311 milliseconds to complete FloatLoop.
It took 19136 milliseconds to complete IntegerLoopNoDiv.
It took 57978 milliseconds to complete FloatLoopNoDiv.
It took 110787 milliseconds to complete IntegerLoopWithLatch.
It took 78052 milliseconds to complete FloatLoopWithLatch.
It took 18937 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 58678 milliseconds to complete FloatLoopWithLatchNoDiv.
Total execution time for your selection is 532034 milliseconds.

Benchmark program:
https://www.dropbox.com/s/bkjhe5l2ycixm6e/benchmark2609.jar?dl=0

Source for GUI version:
https://www.dropbox.com/sh/pwbe2pr42z4h6xb/AADv7_wCU7ONZqeZZvaVzzIaa?dl=0

I used a glue object to act as a medium between OmniCode and the View so the View dosent lock up waiting for the return of the long value.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
I noticed coretemp bounced around on my display a little while running this. Just a few pixels one way then back every few seconds.

That's odd. Wonder why it's doing that?

5820K @ 4.4Ghz

Original code:
IntegerLoop: 53551 ms
FloatLoop: 40257 ms

Original, no division:
IntegerLoop:9422 ms
FloatLoop: 31806 ms

Latch code:
IntegerLoop: 48481 ms
FloatLoop: 40303 ms

Latch code, no division:
IntegerLoop: 9503 ms
FloatLoop: 31909 ms

I'm assuming you're running the 9/26 version here. Regardless, it looks like eliminating int rollovers has sped up the nodiv integer code by quite a bit on your system. Should be interesting to see what happens on the next build with your chip (see below).

Updated GUI version for those having trouble with the .bat (IE wont moan about viruses, only seems to be chrome).
YSfR0ek.png


i5 4570 @ stock
Running batch mode (all code)...
It took 110155 milliseconds to complete IntegerLoop.
It took 78311 milliseconds to complete FloatLoop.
It took 19136 milliseconds to complete IntegerLoopNoDiv.
It took 57978 milliseconds to complete FloatLoopNoDiv.
It took 110787 milliseconds to complete IntegerLoopWithLatch.
It took 78052 milliseconds to complete FloatLoopWithLatch.
It took 18937 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 58678 milliseconds to complete FloatLoopWithLatchNoDiv.
Total execution time for your selection is 532034 milliseconds.

Great, thanks! I'll update the OP to link to the latest version of the GUI test.

Obviously you've seen your times quicken moving from 9/22 to 9/26. I haven't done all the comparatives, but it looks like some of the HT-induced skew between Integer and Float loop execution time has been altered or just flat-out eliminated. Just looking at IntegerLoop and FloatLoop, you now have better Float times like Biostud, and the relative difference is very close. You've got ~.711 seconds spent on FloatLoop per second spent on IntegerLoop, while Biostud has ~.752.

Working on a new build now. It may or may not see completion today, but we'll see. The main difference will be elimination of the if test in the division loops (all I did was change int1, int2, float1, and float2 to have starting values of 1 to eliminate the possibility of division by 0). Testing thus far has shown a significant increase in division loop performance, which is fine since I didn't set out to test the performance of if tests.

I am also going to look at switching the while loop to a for loop to see how that will improve performance, if at all.

Oh, I also need to look at slapping a simple open source license on this thing. I'm thinking BSD 2-clause. Anyone have any objection to that?
 
Last edited:

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
I just ran this on my A6-3400M (overclocked). I'm about to run it at stock clocks, have my Kill-O-Watt running. If you like to tweak, you should look into your p-states, I'm able to undervolt and overclock at the same time. Puts a little life in this budget machine (so does the SSD!) I'll edit this with power numbers and results in a bit.

A8-3400M (1.4GHz @ 1.075v / 2.3GHz turbo @ 1.375v) <- Factory settings and voltages.
~18 watts idle. ~39 watts IntergerLoop ~38 watts FloatLoop)
Running original code...
It took 578590 milliseconds to complete IntegerLoop.
It took 301283 milliseconds to complete FloatLoop.
Total execution time for your selection is 879873 milliseconds.


A8-3400M OC (1.667GHz @ 1.0375v / 2.733GHz turbo @ 1.3125v) <- P-states changed.
~16.5 watts idle. ~39 watts IntergerLoop. ~ )
Running original code...
It took 503550 milliseconds to complete IntegerLoop.
It took 260329 milliseconds to complete FloatLoop.
Total execution time for your selection is 763879 milliseconds.

*edit - Added factory settings / clock results.
 
Last edited:

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
Need anymore old CPU data? I've got a lappy with a TK-57 and a wayback 3.2ghz P4 HT socket 478 chip in the shopbox.
 

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
Any and all data is appreciated (thanks SlowSpyder, will update the OP again when I have more time). Bear in mind that I have another update in testing with some additions that may take a long time to execute, especially on older systems like that P4. But if you want to run 9/26 then have at it!
 

Ramses

Platinum Member
Apr 26, 2000
2,871
4
81
I started it on the P4, after it took right at 30 minutes on the first test, I decided to run it later.. lol... wow..
 

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
I started it on the P4, after it took right at 30 minutes on the first test, I decided to run it later.. lol... wow..

30 minutes? Sweet. If you think that's bad, get a load of some of the new stuff I've added:

Introducing the 9/27 build of Awesomeballs. A few changes:

I removed the if test intended to avoid division-by-zero in the division-enabled loops. To avoid immediate division by zero, int2/float2 now start at 1 and reset to 1 at the completion of each loop. To be fair, I made the same change even in the loops that feature no division. After all, it is possible that the first set of operations was never happening (effectively it was int3/float3 += 0 + 0 each time the loop started anew). Also, thanks to the old if statement, the first division never happened. Now it does, even if it is division by 1.

A wee bit of testing showed that the quickest loops were, in fact, do/while loops. Apparently I'm not the only one who has made this discovery (see the last answer that shows differences in bytecode). So, I converted the while loop and for loop both to do/while loops. The internal set of instructions should still iterate 2^31 times.

Once I had all that tested and working, I added three new code paths that I had been planning for a very short while: casting and rounding code. All three feature mixed integer and float operations. The code can do a better job of describing what's involved than anything else, methinks. Here's the key code from CastFloatToInt:

Code:
int3 += (int)(float1 + float2);
int3 += (int)(float1 * float2);
int3 += (int)(float1 / float2);

Basically, there's CastFloatToInt + CastFloatToIntNoDiv, RoundFloatToInt + RoundFloatToIntNoDiv, and CastIntToFloat + CastIntToFloatNoDiv (there's no point in trying to use Math.round() when converting integers to floats). All three work pretty much like the normal code, except they have casting or Math.round() statements such as the ones above. In the float-to-int code, the incrementing variables are integers, while in the int-to-float code, the incrementing variables are floats.

Be warned that the rounding code is especially slow. Yes, Ramses, I'm thinking about your P4 when I say that.

Without further ado, here are the .class files for the 9/27 build. Source is also available.

For all you Chrome users, I went to the trouble of checking the .class archive at virscan.org. It's clean, so far as I can tell.

Note: after today, I'm going to slap a BSD 2-clause license on future builds and, if possible (which I'm pretty sure it is), all previous builds. Still haven't done it to allow more time for comment and, frankly, because I'm a bit lazy.

edit: I forgot to post my times on 9/27. Here they are:

Stars chip:

Beginning entire batch.
It took 147884 milliseconds to complete IntegerLoop.
It took 176483 milliseconds to complete FloatLoop.
It took 37215 milliseconds to complete IntegerLoopNoDiv.
It took 102077 milliseconds to complete FloatLoopNoDiv.
It took 108602 milliseconds to complete IntegerLoopWithLatch.
It took 169686 milliseconds to complete FloatLoopWithLatch.
It took 36515 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 100453 milliseconds to complete FloatLoopWithLatchNoDiv.
It took 194541 milliseconds to complete CastFloatToInt.
It took 114170 milliseconds to complete CastFloatToIntNoDiv.
It took 400444 milliseconds to complete RoundFloatToInt.
It took 263148 milliseconds to complete RoundFloatToIntNoDiv.
It took 398612 milliseconds to complete CastIntToFloat.
It took 134204 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 2384034 milliseconds.

Jaguar chip:

Beginning entire batch.
It took 353819 milliseconds to complete IntegerLoop.
It took 587758 milliseconds to complete FloatLoop.
It took 147106 milliseconds to complete IntegerLoopNoDiv.
It took 481985 milliseconds to complete FloatLoopNoDiv.
It took 340907 milliseconds to complete IntegerLoopWithLatch.
It took 591690 milliseconds to complete FloatLoopWithLatch.
It took 146562 milliseconds to complete IntegerLoopWithLatchNoDiv.
It took 477767 milliseconds to complete FloatLoopWithLatchNoDiv.
It took 977271 milliseconds to complete CastFloatToInt.
It took 428923 milliseconds to complete CastFloatToIntNoDiv.
It took 1513947 milliseconds to complete RoundFloatToInt.
It took 1010457 milliseconds to complete RoundFloatToIntNoDiv.
It took 861010 milliseconds to complete CastIntToFloat.
It took 600603 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 8519805 milliseconds.

Notice the big difference in IntegerLoop and IntegerLoopWithLatch times on the Stars chip. I can't speak for the Jaguar (it didn't have as many test runs), but the Stars chip showed completion times of around 80-160 seconds for either of the div-enabled IntegerLoops (average probably 120-140 seconds). It's still faster than 9/26 regardless of where it falls in the range, but the variation is something I can't explain away just now. A bit odd.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
I have gone ahead and applied a BSD 2-clause license. The only effect is to guarantee that there are virtually no restrictions on distribution or use of the source code or binaries. It probably won't make a big difference one way or the other, but hey, at least now it's official.

Probably no build for today, especially since 9/27 seems to work well without any horrendous flaws ala the 9/24 build. Guess it's time to brush up on my C++ to see if I can do some builds with that. I have some other ideas on what else I might also add to Mathtester/Awesomeballs/whateveryouwanttocallit but that can wait.
 

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
I'll see if I can toast my old tk-57 laptop with it some here in a bit.

Go for it! It might help to run the tests individually instead of the entire batch, especially when it comes to the new ones. I guess it depends on how long you can afford to let it sit there, doing nothing but churning through goofy Java.
 

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
Various distractions, plus work on the 9/30 build, have prevented work on a C++ version from achieving any meaningful progress.

But that's okay, because I did put together a 9/30 build. As always, there is source available.

Some notable changes:

All latch code has been removed. It did not seem to be helping anyone in the most recent builds.

A new set of tests has been introduced. These new tests (sort of) resemble a more-traditional microbenchmark. They isolate a particular instruction inside a fairly-uniform loop structure, rather than groups of related instructions as in the "Awesomeballs" tests. These new tests are much faster, don't loop as often, and has fewer instructions to complete per loop iteration.

So, instead of testing integer adds, mults, and divs in a single loop, we can now test adds in one loop, mults in another, and divs in yet another. Same deal for float.

There is still loop overhead. Attempts to isolate and mitigate that overhead have, thus far, met with failure. Perhaps I'll have more luck with that in future builds. What was supposed to happen was that I was to run an empty loop, time it, and then subtract that amount of time from the time-of-execution for a fully-populated loop with an identical loop structure. Sadly, for a few of the loops, the measured "empty loop" time was longer than the time of execution for the populated loop (leading to a negative corrected time of execution) on some or most runs. I suspect that the populated loops are warming up while the empty loops aren't, which is a bit weird but oh well.

Another issue you will notice is that some of the loops will actually warm up (at least, they will on my Stars test machine). This applies to the new loops, as well as the older loops. I did not observe this behavior on build 9/22, but a good bit has changed since then. In any case, I have not introduced any way to warm up methods quickly before beginning the main program. If you want the "fastest possible" times for any or all of the tests, I recommend running them twice. You will see some increase performance on a few of them (IntegerLoop, for example). It's considerably easier to do this on the new tests since they are so much faster.

Anyway, I'm not going to sweat the loop overhead issue and/or the warmup issues just now.

Eventually I'll move all this to a new, slightly-more-descriptive post, but not just now.

edit: results coming eventually, I'm tired and the E1 is slow. Also, virscan.org detects no threats inside the .class archive for the 9/30 build.

edit edit:

results for 9/30 build:

Stars chip:

"Awesomeballs" test:

Beginning entire batch.
It took 60318 milliseconds to complete IntegerLoop.
It took 178747 milliseconds to complete FloatLoop.
It took 34547 milliseconds to complete IntegerLoopNoDiv.
It took 98585 milliseconds to complete FloatLoopNoDiv.
It took 190617 milliseconds to complete CastFloatToInt.
It took 103623 milliseconds to complete CastFloatToIntNoDiv.
It took 407643 milliseconds to complete RoundFloatToInt.
It took 271224 milliseconds to complete RoundFloatToIntNoDiv.
It took 415065 milliseconds to complete CastIntToFloat.
It took 130431 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 1890800 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 19 milliseconds to complete integer addition.
It took 33 milliseconds to complete integer multiplication.
It took 28 milliseconds to complete integer division.
It took 17 milliseconds to complete float addition.
It took 22 milliseconds to complete float multiplication.
It took 8855 milliseconds to complete float division.
It took 13073 milliseconds to complete rounding float to integer.
It took 6586 milliseconds to complete casting float to integer.
It took 25 milliseconds to complete casting integer to float.

Total execution time for your selection is 28658 milliseconds.

Jaguar chip:

"Awesomeballs" test:

Beginning entire batch.
It took 327581 milliseconds to complete IntegerLoop.
It took 586038 milliseconds to complete FloatLoop.
It took 143055 milliseconds to complete IntegerLoopNoDiv.
It took 495517 milliseconds to complete FloatLoopNoDiv.
It took 979270 milliseconds to complete CastFloatToInt.
It took 428087 milliseconds to complete CastFloatToIntNoDiv.
It took 1567989 milliseconds to complete RoundFloatToInt.
It took 1028298 milliseconds to complete RoundFloatToIntNoDiv.
It took 857251 milliseconds to complete CastIntToFloat.
It took 587177 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 7000263 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 126 milliseconds to complete integer addition.
It took 136 milliseconds to complete integer multiplication.
It took 137 milliseconds to complete integer division.
It took 136 milliseconds to complete float addition.
It took 137 milliseconds to complete float multiplication.
It took 38160 milliseconds to complete float division.
It took 51780 milliseconds to complete rounding float to integer.
It took 29447 milliseconds to complete casting float to integer.
It took 173 milliseconds to complete casting integer to float.

Total execution time for your selection is 120232 milliseconds.
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,286
4,813
136
"Awesomeballs" test:

Beginning entire batch.
It took 38864 milliseconds to complete IntegerLoop.
It took 35691 milliseconds to complete FloatLoop.
It took 9997 milliseconds to complete IntegerLoopNoDiv.
It took 32312 milliseconds to complete FloatLoopNoDiv.
It took 29374 milliseconds to complete CastFloatToInt.
It took 22069 milliseconds to complete CastFloatToIntNoDiv.
It took 75265 milliseconds to complete RoundFloatToInt.
It took 52424 milliseconds to complete RoundFloatToIntNoDiv.
It took 59655 milliseconds to complete CastIntToFloat.
It took 31999 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 387650 milliseconds.

Isolated

It took 16 milliseconds to complete integer addition.
It took 29 milliseconds to complete integer multiplication.
It took 27 milliseconds to complete integer division.
It took 25 milliseconds to complete float addition.
It took 25 milliseconds to complete float multiplication.
It took 1977 milliseconds to complete float division.
It took 2555 milliseconds to complete rounding float to integer.
It took 1554 milliseconds to complete casting float to integer.
It took 14 milliseconds to complete casting integer to float.

Total 6222 milliseconds
 

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
could you maybe add code that put the results in a txt file?

Yeah, actually, that's a good idea.

"Awesomeballs" test:

Beginning entire batch.
It took 38864 milliseconds to complete IntegerLoop.
It took 35691 milliseconds to complete FloatLoop.
It took 9997 milliseconds to complete IntegerLoopNoDiv.
It took 32312 milliseconds to complete FloatLoopNoDiv.
It took 29374 milliseconds to complete CastFloatToInt.
It took 22069 milliseconds to complete CastFloatToIntNoDiv.
It took 75265 milliseconds to complete RoundFloatToInt.
It took 52424 milliseconds to complete RoundFloatToIntNoDiv.
It took 59655 milliseconds to complete CastIntToFloat.
It took 31999 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 387650 milliseconds.

Isolated

It took 16 milliseconds to complete integer addition.
It took 29 milliseconds to complete integer multiplication.
It took 27 milliseconds to complete integer division.
It took 25 milliseconds to complete float addition.
It took 25 milliseconds to complete float multiplication.
It took 1977 milliseconds to complete float division.
It took 2555 milliseconds to complete rounding float to integer.
It took 1554 milliseconds to complete casting float to integer.
It took 14 milliseconds to complete casting integer to float.

Total 6222 milliseconds

Whew, that thing smokes my Stars chip. 16 milliseconds, hah. Did you run each batch twice and record the second result? One of these days I'll put in some proper warmup code. One of these days . . . !

(what's funny is that I *did* have a massive warmup set in 9/30 for awhile, but it didn't work! I think the problem was that I was calling the methods differently, and in a different scope, or something . . . anyway I think I know how to fix it)
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,286
4,813
136
Yeah, actually, that's a good idea.



Whew, that thing smokes my Stars chip. 16 milliseconds, hah. Did you run each batch twice and record the second result? One of these days I'll put in some proper warmup code. One of these days . . . !

(what's funny is that I *did* have a massive warmup set in 9/30 for awhile, but it didn't work! I think the problem was that I was calling the methods differently, and in a different scope, or something . . . anyway I think I know how to fix it)

without warmup it was the hundreds :p, but it differed between the runs.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,687
1,222
136
Factory FX-8320; 9/30 build

"Awesomeballs" test:

Beginning entire batch.
It took 73400 milliseconds to complete IntegerLoop.
It took 88730 milliseconds to complete FloatLoop.
It took 23446 milliseconds to complete IntegerLoopNoDiv.
It took 65422 milliseconds to complete FloatLoopNoDiv.
It took 85362 milliseconds to complete CastFloatToInt.
It took 40775 milliseconds to complete CastFloatToIntNoDiv.
It took 205737 milliseconds to complete RoundFloatToInt.
It took 130333 milliseconds to complete RoundFloatToIntNoDiv.
It took 107337 milliseconds to complete CastIntToFloat.
It took 83268 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 903810 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 1933 milliseconds to complete integer addition.
It took 23 milliseconds to complete integer multiplication.
It took 22 milliseconds to complete integer division.
It took 22 milliseconds to complete float addition.
It took 22 milliseconds to complete float multiplication.
It took 4965 milliseconds to complete float division.
It took 6135 milliseconds to complete rounding float to integer.
It took 3614 milliseconds to complete casting float to integer.
It took 19 milliseconds to complete casting integer to float.

Total execution time for your selection is 16755 milliseconds.

==
I think from this that the actual results for biostud should be 14 to 16 ms for; Int Add, Mul, Div // Float Mul, Div, to Int.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
without warmup it was the hundreds :p, but it differed between the runs.

Yeah, it does that. A lot. The tighter the times, the more the variations skew the results. But really your 16 ms time is quite fast. That 19 ms time for my Stars chip is quite the outlier. Most of the warmed-up runs for the integer addition come out around 32 ms. I just had the one lucky run . . .

Factory FX-8320; 9/30 build

"Awesomeballs" test:

Beginning entire batch.
It took 73400 milliseconds to complete IntegerLoop.
It took 88730 milliseconds to complete FloatLoop.
It took 23446 milliseconds to complete IntegerLoopNoDiv.
It took 65422 milliseconds to complete FloatLoopNoDiv.
It took 85362 milliseconds to complete CastFloatToInt.
It took 40775 milliseconds to complete CastFloatToIntNoDiv.
It took 205737 milliseconds to complete RoundFloatToInt.
It took 130333 milliseconds to complete RoundFloatToIntNoDiv.
It took 107337 milliseconds to complete CastIntToFloat.
It took 83268 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 903810 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 1933 milliseconds to complete integer addition.
It took 23 milliseconds to complete integer multiplication.
It took 22 milliseconds to complete integer division.
It took 22 milliseconds to complete float addition.
It took 22 milliseconds to complete float multiplication.
It took 4965 milliseconds to complete float division.
It took 6135 milliseconds to complete rounding float to integer.
It took 3614 milliseconds to complete casting float to integer.
It took 19 milliseconds to complete casting integer to float.

Total execution time for your selection is 16755 milliseconds.

==
I think from this that the actual results for biostud should be 14 to 16 ms for; Int Add, Mul, Div // Float Mul, Div, to Int.

Just curious, but did you run the isolated tests twice? Your integer addition score is strangely high, even for a first run.

New build coming in just a second. Same tests, but now with results logging and built-in warmup that actually works (yay).

edit: New build is now here. So is the source. As mentioned above, this build can write results to a text file (it should create the file wherever you have runme.bat) where possible, and it has built-in warmup. In other words, you don't have to run the entire batch twice to get good results. You CAN do it multiple times to try to eliminate outliers/take advantage of averaging, which is probably a good idea for the isolated tests since they're quick anyway.

The issue with warmup seemed to have to do with the way I was called the run methods within the loop classes. Calling them directly, from a different scope (that is, from somewhere other than OmniCode) caused the VM to not optimize anything. Calling the warmup via ExecutiveService.execute from within OmniCode worked. What's also interesting is that, let's say you've got a method with two loops, one nested in the other (as we have with Mathtester). You can iterate the main loop over 16 million times in one method call , and it'll never warm up for that one method call. But if you run the same loop only 10000 times calling the same method and then call the method again, the second time you call the method, the loop will be all warmed-up.

In other words, it doesn't matter how many times you iterate a loop within a method. You have to call the method at least one more time for the VM to apply any optimizations having to do with warmup.

edit edit: added times for NostaSeronx and Biostud's latest to the results archive in the OP.

Also, as before, virscan.org reports that the .class archive is free from malware/viruses/trojans/etc.
 
Last edited:

NostaSeronx

Diamond Member
Sep 18, 2011
3,687
1,222
136
Just curious, but did you run the isolated tests twice? Your integer addition score is strangely high, even for a first run.
I ran it a couple times, it never went lower.
===

New build fixed the integer addition problem. Ran the below only once.
Code:
Factory FX-8320; 10/03/2014 build

"Awesomeballs" test:

Beginning entire batch.
It took 38330 milliseconds to complete IntegerLoop.
It took 81580 milliseconds to complete FloatLoop.
It took 17114 milliseconds to complete IntegerLoopNoDiv.
It took 69448 milliseconds to complete FloatLoopNoDiv.
It took 81454 milliseconds to complete CastFloatToInt.
It took 38704 milliseconds to complete CastFloatToIntNoDiv.
It took 174403 milliseconds to complete RoundFloatToInt.
It took 114261 milliseconds to complete RoundFloatToIntNoDiv.
It took 101174 milliseconds to complete CastIntToFloat.
It took 84387 milliseconds to complete CastIntToFloatNoDiv.

Total execution time for your selection is 800855 milliseconds.

Isolated test:

Beginning batch mode (all tests).
It took 23 milliseconds to complete integer addition.
It took 23 milliseconds to complete integer multiplication.
It took 22 milliseconds to complete integer division.
It took 19 milliseconds to complete float addition.
It took 25 milliseconds to complete float multiplication.
It took 4871 milliseconds to complete float division.
It took 6489 milliseconds to complete rounding float to integer.
It took 4354 milliseconds to complete casting float to integer.
It took 24 milliseconds to complete casting integer to float.

Total execution time for your selection is 15850 milliseconds.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,712
10,986
136
I ran it a couple times, it never went lower.
===

New build fixed the integer addition problem. Ran the below only once.

Yeah the new build is good like that. Interesting that 9/30 wouldn't warm up for you, but it makes no difference now.

I've been (almost) tearing my hair out figuring out how to consistently correct for loop times so that I can isolate loop contents. I have no interest in pursuing that degree of precision when running the "Awesomeballs" code, but the isolated loops are dominated by time spent iterating, which is not really what I'm looking to measure here. I'll keep working on it as time permits. This problem (plus an Android version of the client . . . no, rly) had set back the C++ port semi-indefinitely.

Also, if anyone is wondering why float division, round float to int, and cast float to int are so slow, part of the reason why is that they feature a float for an iterating variable on the internal do while loop. If you run an empty version of a loop like that (blankfloatloop in OmniLoop), the execution time on my Stars chip is ~4000 ms.