New Linpack Stress Test Released

Regeneration

Member
Nov 3, 2007
55
37
91
www.ngohq.com
Screenshot.jpg

Linpack Xtreme is a console front-end with the latest build of Linpack (Intel Math Kernel Library Benchmarks 2018.3.011) developed and maintained by ngohq.com. Linpack is a benchmark and the most aggressive stress testing software available today. Best used to test stability of overclocked PCs. Linpack tends to crash unstable PCs in a shorter period of time compared to other stress testing applications.

Linpack solves a dense (real*8) system of linear equations (Ax=b), measures the amount of time it takes to factor and solve the system, converts that time into a performance rate, and tests the results for accuracy. The generalization is in the number of equations (N) it can solve, which is not limited to 1000. Linpack uses partial pivoting to assure the accuracy of the results.

Linpack Xtreme was created because Prime95 is no longer effective like it used to be. LinX, IntelBurnTest, OCCT use outdated Linpack binaries from 2012. Modern hardware requires modern stress testing methodology with support for the latest instructions sets.

Linpack Xtreme is available for Windows, Linux, and as a bootable media. The bootable version is considered to be the most superior as the Linux SMP kernel is a lot more sensitive to hardware instabilities than Microsoft Windows. Watch this video for a short comparison of Prime95 vs. Linpack Xtreme.

Make sure to keep an eye on the temperatures as Linpack generates excessive amount of stress like never seen before.

Changes (v1.1.4):
* Fixed a crash on AMD Ryzen processors.
* Updated CPUID HWMonitor to version 1.43.

Downloads:
Linpack Xtreme for Windows | Mirror #1 | Mirror #2
Linpack Xtreme for Linux | Mirror #1 | Mirror #2
Linpack Xtreme Bootable Media
 
Last edited:

lightmanek

Senior member
Feb 19, 2017
508
1,245
136
Seems to be a nice tool, but it is very picky as to which processor it will run on!
For anyone thinking of testing their VIA or AMD processors - only Intel CPU's are supported.
 
  • Like
Reactions: Pro-competition

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Thank you for this, but you do know that you can run LinX with updated binaries right? I use LinX regularly and simply drop in the newest binaries from Intel into the LinX folder and it runs perfectly.
 

Regeneration

Member
Nov 3, 2007
55
37
91
www.ngohq.com
LinX is no longer maintained... the last official release doesn't support the latest binaries and it also lacks AMD support.

Linpack doesn't work well with problem sizes above 35000 (9500MB), and yet it is not documented in LinX or IBT.

The latest version (v0.4) features unlimited runs, option to disable Windows' sleep mode, integrated HWMonitor.. and its working with AMD CPUs.
 

lightmanek

Senior member
Feb 19, 2017
508
1,245
136
There's a new version...

v0.8
- Added benchmark feature.
- Added option to specify amount of threads.
- Changed the project name.

Works great on Threadripper, thanks!

EDIT:
Picture!
nL0JuDI.png
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Each Zen core can do 8 DP ops/clock, so @lightmanek it would seem you're getting around half the throughput(140 ~= 0.85*3.4*12*4, typical efficiency is 85%, base clock at 3.4-3.5 GHz).

@DavidOf are you sure that it utilizes AVX/2 fully according to what Zen is capable of?
 

lightmanek

Senior member
Feb 19, 2017
508
1,245
136
Each Zen core can do 8 DP ops/clock, so @lightmanek it would seem you're getting around half the throughput(140 ~= 0.85*3.4*12*4, typical efficiency is 85%, base clock at 3.4-3.5 GHz).

@DavidOf are you sure that it utilizes AVX/2 fully according to what Zen is capable of?

My CPU was fixed at 3.9GHz for that test. I did work on it durig run though.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
My CPU was fixed at 3.9GHz for that test. I did work on it durig run though.
Then it makes even less sense, if this is to be believed. I think one needs to run the multi-socket version of linpack on a Threadripper system to get the correct results.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
Running 24 threads is counter productive, i get best results with Linpack when number of threads equals cores and by manually setting CPU affinity mask to 1 thread per core.
 
  • Like
Reactions: lightmanek

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
Running 24 threads is counter productive, i get best results with Linpack when number of threads equals cores and by manually setting CPU affinity mask to 1 thread per core.
AFAIK the Intel MKL linpack benchmark on which this front-end is based already spawns threads according to the core count, not virtual core count.
 

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
Each Zen core can do 8 DP ops/clock, so @lightmanek it would seem you're getting around half the throughput(140 ~= 0.85*3.4*12*4, typical efficiency is 85%, base clock at 3.4-3.5 GHz).

@DavidOf are you sure that it utilizes AVX/2 fully according to what Zen is capable of?

I don't think it's using AVX/2 on Zen.

There's this updated, korean version of LinX named "LinX v1.0.1K (AMD edition)" that has a Ryzen logo as an icon. Using the same problem size as the 8GB option in here (32209), all 16 threads on my R7 1700 @ 3.8GHz, I get the following:

0heOKZG.png


~170 GFLOPS, and ~130s for each run (~2m 10s). Enabling Relaxed EDC throttling (at the cost of higher power consumption and temperatures) in the AMD CBS menu and using a larger problem size (12GB) got me 200 GFLOPS once, at the same 3.8GHz...



When running OP's version, which doesn't seem to report in between runs, at around ~120s I don't see the CPU power meter dropping to ~80-90w indicating the main calculation part of the run has finished.

L2dPlIh.png


It took 4m 52s from start to get to the lower power end of the run. More than double time than the Korean version, and per core peak power + package power is reported higher overall, too.

I assume if it's taking longer then it's not using AVX/2, as we saw the same behavior back when Sandy was released and earlier versions of Linpack that didn't use AVX took twice as long to run as the AVX capable version, and naturally produced around half the GFLOPS while not being as much of a stress test for these new CPUs.


I can't seem to find the Korean version including the Linpack binaries, so I've uploaded that one I'd found complete back then. Virustotal reports it's clean, for anyone who'd like to give it a try.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
I don't think it's using AVX/2 on Zen.

There's this updated, korean version of LinX named "LinX v1.0.1K (AMD edition)" that has a Ryzen logo as an icon. Using the same problem size as the 8GB option in here (32209), all 16 threads on my R7 1700 @ 3.8GHz, I get the following:

0heOKZG.png


~170 GFLOPS, and ~130s for each run (~2m 10s). Enabling Relaxed EDC throttling (at the cost of higher power consumption and temperatures) in the AMD CBS menu and using a larger problem size (12GB) got me 200 GFLOPS once, at the same 3.8GHz...



When running OP's version, which doesn't seem to report in between runs, at around ~120s I don't see the CPU power meter dropping to ~80-90w indicating the main calculation part of the run has finished.

L2dPlIh.png


It took 4m 52s from start to get to the lower power end of the run. More than double time than the Korean version, and per core peak power + package power is reported higher overall, too.

I assume if it's taking longer then it's not using AVX/2, as we saw the same behavior back when Sandy was released and earlier versions of Linpack that didn't use AVX took twice as long to run as the AVX capable version, and naturally produced around half the GFLOPS while not being as much of a stress test for these new CPUs.


I can't seem to find the Korean version including the Linpack binaries, so I've uploaded that one I'd found complete back then. Virustotal reports it's clean, for anyone who'd like to give it a try.
That's more like it, but you still need to test with SMT off, which is recommended when running linpack.
 
  • Like
Reactions: .vodka

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
I also think that the reason why the OP's version does poorly on the Threadripper is possibly due to it not using the OpenMP version.
 

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
That's more like it, but you still need to test with SMT off, which is recommended when running linpack.

I see. Well then, let's do an 8 thread run + setting affinity to thread 0 of each core, instead of disabling SMT.

K1flwh4.png


Oh, an even lower running time and... 210-214 GFLOPS on an 8GB problem size? Nice. I don't think I'd ever seen such a high result at only 3.8GHz and that problem size...

Well, good to know that Linpack runs best as one thread per core, power consumption is similar (or higher) than using all 16 threads and performance is better.

I'll leave OP's version running for a while to compare results once it prints it all out.
 
Last edited:
  • Like
Reactions: lightmanek

tamz_msc

Diamond Member
Jan 5, 2017
3,865
3,729
136
^That is exactly what I would expect. 86-87% efficiency, 8 DP ops/clock, 8 cores, 3.8 GHz. Multiply all of it together and it comes to ~210 GFLOP/s.
 
  • Like
Reactions: .vodka

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
I see. Well then, let's do an 8 thread run + setting affinity to thread 0 of each core, instead of disabling SMT.

The important part here is probably taking windows scheduler out, the threads stay pinned on same cores and not migrated around. When all 16 threads are busy, there are penalties after rescheduling.
Still 174 going to 210 is big jump, kinda shows how fragile AMD setup is to OS scheduler.
 
  • Like
Reactions: .vodka

.vodka

Golden Member
Dec 5, 2014
1,203
1,537
136
bUdvJ09.png


Yeah, OP's version is not using AVX/2 on Zen. Same setup as the other post, 8 threads and pinned to thread 0 of each core.

Power consumption was ~10w less than the Korean version giving out ~210 GFLOPS and CPU temperature was 57.4°C instead of 64.8°C.
 

lightmanek

Senior member
Feb 19, 2017
508
1,245
136
I quickly ran the Korean version on all 24 threads and same settings as previous run posted few days ago.
Power consumption at peak went from 200W to 218W and performance result from 140GFlops to 303GFlops.