ChipAbuse: Mathtester's misbegotten child

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
Wow, it looks like Oracle is pretty serious about supporting AVX/AVX2 in the JVM. Not sure if that's fully implemented in Java 8 but I don't see why not. It may well be that ChipAbuse options 3 and 4 can trigger AVX2 overvolt. Obviously the older code won't since it's all volatile.

In any case, the article is an interesting read, especially when they start going into optimizing code snippets to let the JVM peel away loops sequentially to use SIMD to its fullest.

In my experience, the JVM does better if you explicitly feed it operands of like type with like operators in groupings of 256 bits (or 512 bits such as in the first example below). That's on a Steamroller though, and who knows what kind of optimization the JVM is doing for that thing. Regardless:

Code:
j = 0;

do
{      
    float3[j][0] = float1[j][0] + float2[j][0];
    float3[j][0] = float1[j][1] + float2[j][1];
    //2-14 here
    float3[j][15] = float1[j][15] + float2[j][15];

    j++;
}
while (j < 8);

seems to work a lot better than:

Code:
i = 0;

do
{
    float3[i] = float1[i] + float2[i];
    i++;
}
while (i < 128);

It may be due to my habit of using do/while loops and the inclusion of an iterator variable that is defined outside the loop structure, but whatever. The fastest completion times I've been able to cook up have been from including statements in blocks in multiples of 8. What's weird is the Intel document explicitly mentions arrays of 1024 elements, whereas I've had the best results with 128 elements. Might be a cache size or performance problem. Again, Steamroller, not Haswell . . .
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
23,223
13,301
136
New build of ChipAbuse (02/27/2015):

https://www.dropbox.com/s/lj65iqihiom193c/chipabuseclasses02272015.zip?dl=0
.class files. Read below for changes.

https://www.dropbox.com/s/zo1543de0zl2dzr/chipabusesource02272015.zip?dl=0
source files.

The program runs exactly as before, except that it spawns n threads where n equals the number of threads your CPU(s) can handle, and now you don't have to enter ctrl-c to stop the program from running. Hooray! Much less messy.

Still looking for AVX2-capable Haswell owners (notably desktop chips) to run options 3 or 4 in this program. I'm interested in seeing if you get the AVX2 overvolt doing it.