How do you make Linpack work properly?

ehume

Golden Member
Nov 6, 2009
1,511
73
91
The Linpack redistributable packages comes with a sample batch file that you can invoke to run it. It also comes with a sample data input file. The sample script runs ever larger problems until it reaches the specified size, then stops. But it will run without errors (but see below) if you specify the problem size and the number of repetitions to do.

LinX runs Linpack a specified problem size for up to 20 reps. I get around 100Gflops on my 4770k at default using LinX-Linpack 10 (utilizes AVX). I get less than that -- 66 Gflops or so -- using the Linpack batch file.

Using LinX and Linpack 11 (AVX2) I get 160 Gflops for one rep, then it dies. The batch file gives me around 100 Gflops.

Interestingly, it also says the CPU speed was 3.9GHz. When you run all eight cores you should get 3.7GHz. When LinX+Linpack runs, I get 3.7GHz.

So I ran the batch files and the instances of LinX with CPUID's TMonitor using the -T option ("TMonitor64.exe -T" -- that 'T' must be upper case). When LinX runs, all 8 threads go to their max during the run. When the batch files run, there is a time during the middle of the Linpack 11 run when threads 5-8 drop out. That means (I think) that it is using only 2 cores during that time). The Linpack 10 runs are far uglier.

All of this would be academic except for the fact that LinX + Linpack 11 won't finish, even at default speeds. Which means I can't use it to establish a ceiling for overclocking.

If I can't get Linpack to work with a batch file, I don't know how I will be able to test the temps properly. The only IBT distributions I have seen are old; I can't trust that they have the AVX2 Linpack.

I'm assuming here that LinX + Linpack 11 is the problem. Or my use of Linpack's batch and data input files is the problem. With the standard small-to-large problem sets in the sample files, Linpack runs to completion and declares my processor has passed the tests.

If there is something wrong with the cpu I'd like to RMA it now rather than later.

Can someone tell me they have run Linpack 11 on LinX for 20 reps? If so, I'd like a copy of your zipfile (please).

Or can anyone replicate my failures? I need better tools, better technique or a better cpu.

Thanks.
 

stahlhart

Super Moderator Graphics Cards
Dec 21, 2010
4,273
77
91
This is probably completely unrelated to your issue, but I have yet to be able to have any Linpack-based benchmark stress the logical CPUs of my 2700K when I have HT enabled. I've tried LinX, IBT and OCCT, and they all behave similarly. Prime95 hits all eight to 100% untilization without a problem, but Linpack will only hit four, usually the four physical CPUs, but on rare occasions three of the physical CPUs and one logical one, and overall CPU utilization settles at 50% after a brief initial surge past it.

Never figured out how to fix it, and while I found through searching that many others ran into the same issue I did, I never saw anyone post a solution that worked for me.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
This is probably completely unrelated to your issue, but I have yet to be able to have any Linpack-based benchmark stress the logical CPUs of my 2700K when I have HT enabled. I've tried LinX, IBT and OCCT, and they all behave similarly. Prime95 hits all eight to 100% untilization without a problem, but Linpack will only hit four, usually the four physical CPUs, but on rare occasions three of the physical CPUs and one logical one, and overall CPU utilization settles at 50% after a brief initial surge past it.

Never figured out how to fix it, and while I found through searching that many others ran into the same issue I did, I never saw anyone post a solution that worked for me.

While LinX does provide loading on all 8 threads for me, the latest version crashes after one rep.

I forgot about OCCT. I'll have to look to see what version of Linpack it uses. That TMonitor -T is really helpful for eye-balling thread utilization.
 

24601

Golden Member
Jun 10, 2007
1,683
40
86
You are not supposed to run Intel Math Library Linpack on hyperthreading cores.

Intel Math Library Linpack is extremely highly optimized.

Hyper-threading doesn't magically make things go faster, it only runs a thread on spare resources left behind from the 4 physical cores.

If those resources are not available, hyper-threading goes absolutely haywire.

Turn off hyperthreading when you are running Intel Math Library Linpack.

The reason hyperthreading works so well nowadays is basically because the second that Conroe came out, most programmers just stopped caring about absolutely maximum performing code-paths since CPU performance was "Good Enough." This leaves quite a large amount of idle execution resources in the super-wide Intel cores. The magical extra performance from your extra hyper-threading "cores" is just Intel jamming another execution path through the remaining execution resources.
 
Last edited:

stahlhart

Super Moderator Graphics Cards
Dec 21, 2010
4,273
77
91
You are not supposed to run Intel Math Library Linpack on hyperthreading cores.

Intel Math Library Linpack is extremely highly optimized.

Hyper-threading doesn't magically make things go faster, it only runs a thread on spare resources left behind from the 4 physical cores.

If those resources are not available, hyper-threading goes absolutely haywire.

Turn off hyperthreading when you are running Intel Math Library Linpack.

The reason hyperthreading works so well nowadays is basically because the second that Conroe came out, most programmers just stopped caring about absolutely maximum performing code-paths since CPU performance was "Good Enough." This leaves quite a large amount of idle execution resources in the super-wide Intel cores. The magical extra performance from your extra hyper-threading "cores" is just Intel jamming another execution path through the remaining execution resources.

Thanks very much for the clarification/explanation. That behavior was just puzzling to me, when compared to Prime95.
 

stahlhart

Super Moderator Graphics Cards
Dec 21, 2010
4,273
77
91
ehume: I've got Linpack 11 installed here; I'll try running what you've specified.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
You are not supposed to run Intel Math Library Linpack on hyperthreading cores.

Intel Math Library Linpack is extremely highly optimized.

Hyper-threading doesn't magically make things go faster, it only runs a thread on spare resources left behind from the 4 physical cores.

If those resources are not available, hyper-threading goes absolutely haywire.

Turn off hyperthreading when you are running Intel Math Library Linpack.

The reason hyperthreading works so well nowadays is basically because the second that Conroe came out, most programmers just stopped caring about absolutely maximum performing code-paths since CPU performance was "Good Enough." This leaves quite a large amount of idle execution resources in the super-wide Intel cores. The magical extra performance from your extra hyper-threading "cores" is just Intel jamming another execution path through the remaining execution resources.

At bottom, I am trying to find what uses the max resources so I will know the max load that anything can put on my chip. LinX clearly was doing that before Linpack 10 (I don't know what the original version was), was doing that with Linpack 10, and does that for one rep of Linpack 11. I would like to let it do that for 20 reps so I can know what this chip's limits are.

LinX can do it with 8 threads, but I can't do it with Intel's sample script, even modified.

I also need to know soon, because July 15 is the monthiversary of my purchasing the chip, and if there is something wrong with it I would like to take it back to Micro Center.

ehume: I've got Linpack 11 installed here; I'll try running what you've specified.

Thank you. Replication is the soul of science.
 

ehume

Golden Member
Nov 6, 2009
1,511
73
91
Well, I thank you Communism and stahlhart for your assistance. I disabled HT in BIOS on Communism's suggestion. I ran LinX+Linpack 11.0.5.009 on stahlhart's demonstration. LinX failed, but because stahlhart had clearly made Linpack 11 work, I tried it with the batch file. Results below:

Code:
Intel(R) Optimized LINPACK Benchmark data

Current date/time: Sun Jul 07 15:01:25 2013

CPU frequency:    3.439 GHz
Number of CPUs: 1
Number of cores: 4
Number of threads: 4

Parameters are set to:

Number of tests: 1
Number of equations to solve (problem size) : 25000
Leading dimension of array                  : 25000
Number of trials to run                     : 2    
Data alignment value (in Kbytes)            : 4    

Maximum memory requested that can be used=705536800, at the size=25000

=================== Timing linear equation system solver ===================


Size   LDA    Align. Time(s)    GFlops   Residual     Residual(norm) Check
25000  25000  4      55.216     188.6748 5.194889e-010 2.954147e-002   pass
25000  25000  4      55.161     188.8630 5.194889e-010 2.954147e-002   pass

Performance Summary (GFlops)

Size   LDA    Align.  Average  Maximal
25000  25000  4       188.7689 188.8630

Residual checks PASSED

End of tests

Sun 07/07/2013 
03:04 PM

TMonitor -T showed 4 threads active. I would say that 188 Gflops indicates AVX2 is quite an addition to the Core architecture. (This is the same result I got on the single pass LinX did before it broke down.)

Thanks again to both of you.

I am running 20 reps of Linpack 11 as I write this. And clearly I need a different copy of LinX.