Exquisitely sensitive stability testing - the linux kernel!

graysky

Senior member
Mar 8, 2007
796
1
81
TL; DR Summary
The linux kernel is a powerful tool to detect instabilities in your overclock settings with both greater accuracy and sensitivity than either Prime95 or IBT/LinX.

More Details
The linux kernel supplies users with a dead simple method for measuring hardware instabilities -- like those caused by an 'unstable' overclock. There is nothing special to install as this functionality seems to be naively included in the kernel itself. To use it, simply run a standard stress test such as Prime95 or Linpack and watch the output from dmesg. If the system is unstable due to insufficient voltage settings, excessive heat, it will report:

Code:
[Hardware Error]: Machine check events logged

I have seen the kernel throw these errors during a prime95 run before prime95 gave an error in the math. Further, I have seen these errors appear when and linpack did not detect the settings are unstable as evident by the residual number not chaining during the run when the error occurred.

How to Stress Test Under Linux
Probably the most newb-friendly flavor of Linux is Ubuntu. Users can run it live off a CD or a USB without installing it to their systems. Further, it is pre-configured to boot into a GUI with network and hardware autodetected. Download an image from http://www.ubuntu.com - I recommend the 64-bit version as the 32-bit Linux suffers from the same <4 GB of memory limitation that the 32-bit Windows does,

Note: don't feel like Ubuntu is your only option. There are many other Linux distributions out there from which to choose.

Download the iso, burn it to media or to a USB and boot. Ubuntu prompts users to either "try ubuntu" or "install ubuntu." Just hit the "try ubuntu" button and you will be dumped into the live linux environment.

Here are a few suggestions for stress testing:
1) mprime ---> linux version of prime95. Help to download and run mprime.
2) linpack ---> back end to both LinX and IBT. Help to download and run linpack.

Fine, run mprime using your favorite torture test (small FFTs for example). Now to see the output from the kernel, you need to print the output of the kernel ring buffer. You can do this in one of two ways:

1) Open a terminal and type dmesg to see a snapshot.
2) Perhaps more useful is to be informed when something happens rather than typing dmesg over and over again! You can do this with the following command:
Code:
sudo cat /proc/kmsg

It looks like nothing is happening, but actually, the command more or less opened a connection to the ring buffer; it will update when something happens. To test it, plug in a USB thumb drive.

Example on my box:
Code:
<5>[13393.025582] scsi 10:0:0:0: Direct-Access     Kingston DataTraveler 112 1.00 PQ: 0 ANSI: 2
<5>[13393.026103] sd 10:0:0:0: [sdc] 7831552 512-byte logical blocks: (4.00 GB/3.73 GiB)
<5>[13393.026449] sd 10:0:0:0: [sdc] Write Protect is of<>133065]s 0000 sc oeSne 30 00

Anyway, you will want to watch for that message I posted above:
Code:
[Hardware Error]: Machine check events logged
 
Dec 30, 2004
12,554
2
76
hm someone should slipstream a ubuntu version for overclocking use
limited fan and voltage and multiplier control keeps me out of linux though.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
The Windows Event Viewer should report machine check exceptions too (in cases where the machine check doesn't halt the computer immediately / BSOD).