New Linpack Stress Test Released

Nov 3, 2007
28
3
91
#1

Linpack Xtreme is a console front-end with the latest build of Linpack (Intel Math Kernel Library Benchmarks 2018.3.011). Linpack is a benchmark and the most aggressive stress testing software available today. Best used to test stability of overclocked PCs. Linpack tends to crash unstable PCs in a shorter period of time compared to other stress testing applications.

Linpack solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. Linpack uses partial pivoting to assure the accuracy of the results.

Linpack Xtreme was created because Prime95 is no longer effective like it used to be. LinX, IntelBurnTest, OCCT use outdated Linpack binaries from 2012. Modern hardware requires modern stress testing methodology with support for the latest instructions sets.

Linpack Xtreme is available for Windows, Linux, and as a bootable media. The bootable version is considered to be the most superior as the Linux SMP kernel is a lot more sensitive to hardware instabilities than Microsoft Windows. Watch this video for a short comparison of Prime95 vs. Linpack Xtreme.

Make sure to keep an eye on the temperatures as Linpack generates excessive amount of stress like never seen before.

Changes (v1.1.1):
* Added /residualcheck command-line switch. This improves error detection
on legacy Intel CPUs. It is enabled by default on AMD CPUs.
* Added stress test profiles of 14GB and 30GB.
* Added quick and extended benchmark profiles.
* Fixed false positive hardware errors.
* Some minor changes.

Downloads:
Linpack Xtreme for Windows | Mirror #1 | Mirror #2
Linpack Xtreme for Linux | Mirror #1 | Mirror #2
Linpack Xtreme Bootable Media
 
Last edited:
Feb 19, 2017
61
1
51
#2
Seems to be a nice tool, but it is very picky as to which processor it will run on!
For anyone thinking of testing their VIA or AMD processors - only Intel CPU's are supported.
 
Apr 27, 2000
10,476
326
126
#3
Yeah you gotta used patched versions of Linpack to run on AMD chips.
 

ehume

Golden Member
Nov 6, 2009
1,279
3
91
#4
Does this new Linpack version invoke AVX2? AVX512?
 
Nov 3, 2007
28
3
91
#5
Support for non-Intel support is being addressed...

The latest Linpack uses AVX-512 (if supported).
 
Nov 3, 2007
28
3
91
#6
A new version was released with support for AMD CPUs.
 

Edrick

Golden Member
Feb 18, 2010
1,879
0
106
#7
Thank you for this, but you do know that you can run LinX with updated binaries right? I use LinX regularly and simply drop in the newest binaries from Intel into the LinX folder and it runs perfectly.
 
Nov 3, 2007
28
3
91
#8
LinX is no longer maintained... the last official release doesn't support the latest binaries and it also lacks AMD support.

Linpack doesn't work well with problem sizes above 35000 (9500MB), and yet it is not documented in LinX or IBT.

The latest version (v0.4) features unlimited runs, option to disable Windows' sleep mode, integrated HWMonitor.. and its working with AMD CPUs.
 
Nov 3, 2007
28
3
91
#9
There's a new version...

v0.8
- Added benchmark feature.
- Added option to specify amount of threads.
- Changed the project name.
 
Feb 19, 2017
61
1
51
#10
There's a new version...

v0.8
- Added benchmark feature.
- Added option to specify amount of threads.
- Changed the project name.
Works great on Threadripper, thanks!

EDIT:
Picture!
 
Last edited:

tamz_msc

Platinum Member
Jan 5, 2017
2,148
84
106
#12
Each Zen core can do 8 DP ops/clock, so @lightmanek it would seem you're getting around half the throughput(140 ~= 0.85*3.4*12*4, typical efficiency is 85%, base clock at 3.4-3.5 GHz).

@DavidOf are you sure that it utilizes AVX/2 fully according to what Zen is capable of?
 
Feb 19, 2017
61
1
51
#13
Each Zen core can do 8 DP ops/clock, so @lightmanek it would seem you're getting around half the throughput(140 ~= 0.85*3.4*12*4, typical efficiency is 85%, base clock at 3.4-3.5 GHz).

@DavidOf are you sure that it utilizes AVX/2 fully according to what Zen is capable of?
My CPU was fixed at 3.9GHz for that test. I did work on it durig run though.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,148
84
106
#14
My CPU was fixed at 3.9GHz for that test. I did work on it durig run though.
Then it makes even less sense, if this is to be believed. I think one needs to run the multi-socket version of linpack on a Threadripper system to get the correct results.
 

JoeRambo

Senior member
Jun 13, 2013
632
33
136
#15
Running 24 threads is counter productive, i get best results with Linpack when number of threads equals cores and by manually setting CPU affinity mask to 1 thread per core.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,148
84
106
#16
Running 24 threads is counter productive, i get best results with Linpack when number of threads equals cores and by manually setting CPU affinity mask to 1 thread per core.
AFAIK the Intel MKL linpack benchmark on which this front-end is based already spawns threads according to the core count, not virtual core count.
 

JoeRambo

Senior member
Jun 13, 2013
632
33
136
#17
AFAIK the Intel MKL linpack benchmark on which this front-end is based already spawns threads according to the core count, not virtual core count.
I have ran this test and it was using 12 threads.
 

.vodka

Golden Member
Dec 5, 2014
1,014
9
136
#18
Each Zen core can do 8 DP ops/clock, so @lightmanek it would seem you're getting around half the throughput(140 ~= 0.85*3.4*12*4, typical efficiency is 85%, base clock at 3.4-3.5 GHz).

@DavidOf are you sure that it utilizes AVX/2 fully according to what Zen is capable of?
I don't think it's using AVX/2 on Zen.

There's this updated, korean version of LinX named "LinX v1.0.1K (AMD edition)" that has a Ryzen logo as an icon. Using the same problem size as the 8GB option in here (32209), all 16 threads on my R7 1700 @ 3.8GHz, I get the following:



~170 GFLOPS, and ~130s for each run (~2m 10s). Enabling Relaxed EDC throttling (at the cost of higher power consumption and temperatures) in the AMD CBS menu and using a larger problem size (12GB) got me 200 GFLOPS once, at the same 3.8GHz...



When running OP's version, which doesn't seem to report in between runs, at around ~120s I don't see the CPU power meter dropping to ~80-90w indicating the main calculation part of the run has finished.



It took 4m 52s from start to get to the lower power end of the run. More than double time than the Korean version, and per core peak power + package power is reported higher overall, too.

I assume if it's taking longer then it's not using AVX/2, as we saw the same behavior back when Sandy was released and earlier versions of Linpack that didn't use AVX took twice as long to run as the AVX capable version, and naturally produced around half the GFLOPS while not being as much of a stress test for these new CPUs.


I can't seem to find the Korean version including the Linpack binaries, so I've uploaded that one I'd found complete back then. Virustotal reports it's clean, for anyone who'd like to give it a try.
 
Last edited:

tamz_msc

Platinum Member
Jan 5, 2017
2,148
84
106
#19
I don't think it's using AVX/2 on Zen.

There's this updated, korean version of LinX named "LinX v1.0.1K (AMD edition)" that has a Ryzen logo as an icon. Using the same problem size as the 8GB option in here (32209), all 16 threads on my R7 1700 @ 3.8GHz, I get the following:



~170 GFLOPS, and ~130s for each run (~2m 10s). Enabling Relaxed EDC throttling (at the cost of higher power consumption and temperatures) in the AMD CBS menu and using a larger problem size (12GB) got me 200 GFLOPS once, at the same 3.8GHz...



When running OP's version, which doesn't seem to report in between runs, at around ~120s I don't see the CPU power meter dropping to ~80-90w indicating the main calculation part of the run has finished.



It took 4m 52s from start to get to the lower power end of the run. More than double time than the Korean version, and per core peak power + package power is reported higher overall, too.

I assume if it's taking longer then it's not using AVX/2, as we saw the same behavior back when Sandy was released and earlier versions of Linpack that didn't use AVX took twice as long to run as the AVX capable version, and naturally produced around half the GFLOPS while not being as much of a stress test for these new CPUs.


I can't seem to find the Korean version including the Linpack binaries, so I've uploaded that one I'd found complete back then. Virustotal reports it's clean, for anyone who'd like to give it a try.
That's more like it, but you still need to test with SMT off, which is recommended when running linpack.
 

tamz_msc

Platinum Member
Jan 5, 2017
2,148
84
106
#20
I also think that the reason why the OP's version does poorly on the Threadripper is possibly due to it not using the OpenMP version.
 

.vodka

Golden Member
Dec 5, 2014
1,014
9
136
#21
That's more like it, but you still need to test with SMT off, which is recommended when running linpack.
I see. Well then, let's do an 8 thread run + setting affinity to thread 0 of each core, instead of disabling SMT.



Oh, an even lower running time and... 210-214 GFLOPS on an 8GB problem size? Nice. I don't think I'd ever seen such a high result at only 3.8GHz and that problem size...

Well, good to know that Linpack runs best as one thread per core, power consumption is similar (or higher) than using all 16 threads and performance is better.

I'll leave OP's version running for a while to compare results once it prints it all out.
 
Last edited:

tamz_msc

Platinum Member
Jan 5, 2017
2,148
84
106
#22
^That is exactly what I would expect. 86-87% efficiency, 8 DP ops/clock, 8 cores, 3.8 GHz. Multiply all of it together and it comes to ~210 GFLOP/s.
 

JoeRambo

Senior member
Jun 13, 2013
632
33
136
#23
I see. Well then, let's do an 8 thread run + setting affinity to thread 0 of each core, instead of disabling SMT.
The important part here is probably taking windows scheduler out, the threads stay pinned on same cores and not migrated around. When all 16 threads are busy, there are penalties after rescheduling.
Still 174 going to 210 is big jump, kinda shows how fragile AMD setup is to OS scheduler.
 

.vodka

Golden Member
Dec 5, 2014
1,014
9
136
#24


Yeah, OP's version is not using AVX/2 on Zen. Same setup as the other post, 8 threads and pinned to thread 0 of each core.

Power consumption was ~10w less than the Korean version giving out ~210 GFLOPS and CPU temperature was 57.4°C instead of 64.8°C.
 
Feb 19, 2017
61
1
51
#25
I quickly ran the Korean version on all 24 threads and same settings as previous run posted few days ago.
Power consumption at peak went from 200W to 218W and performance result from 140GFlops to 303GFlops.
 


ASK THE COMMUNITY