The RIGHT way to test for stability

flexy · Mar 9, 2008

This could be seen as a "Mini FAQ" which i see the need to write after i did quite soem research. My findings are not only based on my own observations, but also based on statements by the programmers of various stress testing programs.

Of course we know all the programs around, be it OCCT, Prime95, Wprime, Orthos, Memtest etc.

What many people probably dont know is that none of those programs cant be used alone to make a solid statement about a "stable" system.

In fact, some programs are far superior finding errors/instability for certain components of your overclocked PC (eg. CPU, Ram, Northbridge, FSB) - and you NEED a combination of tools and run the appropriate tests targeting only this ONE hardware component: NONE of the tools is an allround stability tester. Using ONE tool testing for CPU, Memory, Northbridge, FSB stability is a big mistake!

Ok, let's start:

1) CPU:
The *currently* best and most sensitive test for CPU stability is using the latest multithreaded Prime95 http://mersenne.org/gimps/p64v256.zip
if you only use it at the "small FFT" setting, solely for testing the CPU.

This is indeed a fact, resulting in even other tools like OCCT http://www.ocbase.com now rewriting their CPU stress code to adapt to the new Prime95 code. Nothing beats Prime95 when it comes to test your CPU stability.

2) MEMORY:
The king of memory testing is without a doubt memtest *for windows* http://hcidesign.com/memtest/
There is no other tool which finds memory errors as quick as memtest for windows. (Try it for yourself, use various tools and compare.)

Many, many people use a bootable memtest86+ CD http://www.memtest.org/ for getting an impression of memory stability without the need to boot into an OS. There are also some BIOS with memtest "built in". All those Memtest86+ versions are the "DOS" versions of memtest. The bootable dos/memtest version is a good first indicator for checking whether your memory is defective - but it is a very far way from memtest86+ stability in DOS to "real life" memory stability under Vista or XP! I had situations where memtest 2.0 from DOS/CD passes test #5 for 30 mins, but i wasnt even able to boot into windows! Ever since i recommend this only for "a first glance" for checking your memory - but the real testing has to be done in the OS!

3) NORTHBRIDGE, VMEM Voltage and Ram Timings
OCCT http://www.ocbase.com in "Ram" testing mode is the best in testing VMEM voltage, memory timings or low Northbridge Voltage!

Combine the above programs: P95 for CPU - OCCT/Ram for FSB, timings, NB - Memtest/Windows for overall ram-integrity. Only if you pass all those tests you can be halfway sure that your system is actually stable.

For a VERY sloppy, first "impression" test, you can also use OCCT in "CPU/Ram" modus and let it run 60mins - this might be sufficient if time is short. But be adviced that it cannot replace testing with the above tools and targeting each component with the tool which does the best job.

georg.

Add: 3/20

The new version OCCT 2.0.0a/b changed its code now for CPU testing. CPU testing using OCCT now uses the same, sensitive code as Prime95 using small FFTs. This is, as of now, the most sensitive code fo testing your CPU.

Oberservations show that in OCCT2.0.0 the *MEM* testing seems to have improved also - but i think that a memtest (windows) might still be advised also.

sjwaste · Mar 9, 2008

Thanks for the info, very good. But I think you should add a little about iTAT. My system can prime for hours at 3.3 GHz (E2200) but iTAT crashes it in a minute or so. I think that's one hell of a stress test

flexy · Mar 9, 2008

itat? Do you have a link? I am on various OC forums, i admit i never heard of it!!

nerp · Mar 9, 2008

There's linpack too.

Dadofamunky · Mar 9, 2008

Good stuff. I like the battery of tests approach.

LOUISSSSS · Mar 9, 2008

http://forums.anandtech.com/me...id=28&threadid=2155968

a better list for system stability tests =]

flexy · Mar 10, 2008

oh..i got it, itat is the intel thermal analysis tool. Doesnt run here under vista64

Loiusss: But my list is definite

Seriously, i wrote that list even based on statements eg. by the programmer of OCCT

linpack: Yes, heard of it. But its too confusing..didnt even find the right download link.

JPForums · May 1, 2008

It seems you got most of this correct by targeting applications specifically to what they are good at testing.
Generally, you chose the most effective settings in each test as well.
However, I think your memory testing needs to be more well thought out.

Originally posted by: flexy

2) MEMORY:
The king of memory testing is without a doubt memtest *for windows* http://hcidesign.com/memtest/
There is no other tool which finds memory errors as quick as memtest for windows. (Try it for yourself, use various tools and compare.)

Many, many people use a bootable memtest86+ CD http://www.memtest.org/ for getting an impression of memory stability without the need to boot into an OS. There are also some BIOS with memtest "built in". All those Memtest86+ versions are the "DOS" versions of memtest. The bootable dos/memtest version is a good first indicator for checking whether your memory is defective - but it is a very far way from memtest86+ stability in DOS to "real life" memory stability under Vista or XP! I had situations where memtest 2.0 from DOS/CD passes test #5 for 30 mins, but i wasnt even able to boot into windows! Ever since i recommend this only for "a first glance" for checking your memory - but the real testing has to be done in the OS!

First off, Memtest86/Memtest86+ have nothing to do with DOS. They are released under the Gnu Public License (GPL). Thus, they cannot be tied to DOS. They have their own boot loader.

Second, they are not the same thing as a DOS version of memtest.

Third, and most importantly, if you are only running test 5, then you aren't using the test properly and shouldn't be surprised when it doesn't find something.

In previous versions of Memtest86 (not sure about Memtest86+), test 5 was a quick and dirty way to find problems with the memory controller or CPU. If it didn't fail any of the other tests, the memory wasn't really at fault. Conversely, I've been able to pass test 5 and fail test 4 or test 7.

Now, I make sure to run every test as I've tested enough computers to see memory fail each given test while passing all or almost all the others.
Side Note: I'm not sure that test 5 can be used as a quick and dirty test in the more recent versions as the tests have been refined and I now run the full array of tests every time.

While memtest for windows is very useful, I find it lacking in several areas. (Due to the OS not the program)
First, since it runs on top of the OS, it can never test all of memory. The OS steals a significant chunk and memtest has no control over that.

Second, When running on top of windows, memtest has to play ball with the windows memory manager. In cases where people are running more than 2Gb ram in 32bit windows, memtest is confined to a maximum of 2Gb of address space by the windows memory manager. These people need to run at least two instances of the program. Also, since the address range that the program sees is a virtual address space, you don't really know where in physical memory it is actually testing. A DOS memory tester would actually be better as you could program it in real mode with physical addressing instead of protected mode with virtual addressing.

Let me give you a simplistic example (Please forgive simplifying assumptions):
If you take a 2Gb chunk of memory and write a pattern to the whole chunk, then you go back and check the whole chunk, there is nothing that prevents the default behavior of windows paging/swapping out the "Old Data" to the disk to free up memory space for the program. So while your program thinks it is writing to 2Gb, it is actually writing to a smaller physical space as windows makes room for other potential programs. Vista goes to the additional measure of prefetching other data into that memory. The memory is then recalled when the tester comes back around for the check. The only way this approach would work is if the program could convince the memory manager that the data wasn't old.

If you program takes smaller chunks and tests them, you can be sure that it tests those physical chunks. However, you still have the same problem on a different scaled as once you've left a chunk alone for a while it will get paged/swapped out and new data written in its placed. So you'd do a lot of checking on a portion of your RAM. Once again, the program would have to convince the memory manager that the data isn't old.

I realize that there are a lot of things you can do to minimize these issues (and memtest for windows does them), but it still leaves doubt. You include many more variables when running the test in windows. If you do find a problem, it could indeed be a memory, memory controller, or cpu error. However, it could also be a buggy or corrupted memory manager or the harddisk could be at fault due to paging/swapping. So, you're really not narrowing the problem as much as you could.

The final reason I wouldn't rely on memtest for windows is, if your memory is potentially bad/unstable, it's a bad idea to boot into windows and risk writing important system files into memory before your find out if it is stable. It's a good way to corrupt your windows (or any other OS) installation.

In conclusion, I would ask that you revisit Memtest86/Memtest86+.
Memtest for windows is a good second test, but I'd give an OS independent test a full run before using it. I've actually never found an error in memtest for windows (memtest86 catches it first).
Your other procedures seemed well enough thought out that I have to wonder if someone else influenced your memory section. Due to protected mode programming, windows is less "real life" as far as the ram is concerned than DOS, not to mention specialized boot loaders.

Don't take this as a negative, as I am really a fan of this post. I just wanted to present enough information to make sure my points aren't taken as fanboy fluff. It would be nice to see this post expand into a simple but comprehensive test guide that includes basic GPU and HDD testing as well.

aigomorla · May 1, 2008

and i thought i was anal about my stress test methodology.

its nicely written.

However Mark will pop in here and comment about how F@H is the best stress program out there. And you know what, i have to agree with him.

Ive had F@H fail @ settings where it would pass OCCT/PRIME for many many hours. like 8+.

But on that same setting ive never had any games or anything crash. Even WCG works fine, its just F@H. :T

But yeah i give F@H people with high overclocks TONS of respect. Its not easy pushing high clocks on that program.

Cutthroat · May 1, 2008

Originally posted by: aigomorla
and i thought i was anal about my stress test methodology.

its nicely written.

However Mark will pop in here and comment about how F@H is the best stress program out there. And you know what, i have to agree with him.

Ive had F@H fail @ settings where it would pass OCCT/PRIME for many many hours. like 8+.

But on that same setting ive never had any games or anything crash. Even WCG works fine, its just F@H. :T

But yeah i give F@H people with high overclocks TONS of respect. Its not easy pushing high clocks on that program.

F@H FTW, it's definitely the best stress testing program. Only problem is it takes too long, you can't consider yourself F@H stable for at least 24 hours.

I like OCCT 2.0.0a because it generates errors very quickly, especially low vcore IMO, it would usually crash in less than 10 minutes. Plus it doesn't take the system down with it, it just quacks.:laugh:

I also use Everest stress tests as well for the ability to test components seperatly, and I especially like it for testing memory timing and voltage, it will usually generate errors fairly quickly.

This current system I thought I had stable at 3.6GHz, it would pass all tests I did including F@H. But I discovered it could not run F@H and play a .wmv at the same time, it would cause a stop0x00000124 BSOD everytime after 5-10 minutes, weird. Anyway 3.56GHz works fine.

Rubycon · May 1, 2008

Originally posted by: Cutthroat
But I discovered it could not run F@H and play a .wmv at the same time, it would cause a stop0x00000124 BSOD everytime after 5-10 minutes, weird. Anyway 3.56GHz works fine.

And that is the key right there...

Until there is a program that can truly stress every featureset of a given cpu, the only way to ensure guaranteed stability is to run at the manufacturer's guaranteed speed.

This is why a system may be stable in XP but crashes with Vista.

Linpack will make a cpu cry uncle and gets hotter than most programs by 10 degrees C or more. My passively cooled Xeons will hit 93C running it, never produce errata but they're never o/c either.

flexy · May 1, 2008

Hello JP,

thanks for good input. You have valid points, for sure!
Will put this into consideration and probably revise my little guide.

Drsignguy · May 1, 2008

Good job Flexy. Not only for the post but very respectful also. Jp, yours is appriciated too, Thanks. FWIW, in short, because of time isn't a luxury for me during the 5 day work week, this is what I like to do when I test. Set a goal of a short term OC, get the voltage set so I know windows boots without errors or crashes, test with OCCT for about and hour or so. Usually if there is a voltage problem, it finds is pretty quickly. During that time, if it "quacks", I reboot, bump the voltage up and repeat the process. If it runs OCCT for the hour, shut it down and then I run Prime95 overnight. I continue to repeat the process until I have reached ( possably ) my goal overclock. I always put memory @ 1:1 first and try to work on one thing at a time.

I know there is alot more to this but "in short" it works for me.

superstition · May 2, 2008

the only way to ensure guaranteed stability is to run at the manufacturer's guaranteed speed.

Not necessarily.

If you have a bad PSU or other component you may not have stability. Or, if something fails. Or is something is marginal. You have to deal with drivers and multiple 3rd party components. Plus, I've seen manufacturers design products with inadequate cooling. An Apple iBook my husband had would overheat with Age of Empires II and crash. It was rock solid when not gaming but clearly Apple has a history of pushing the envelope when it comes to adequate cooling. A friend's iMac G5 sounds like a hair dryer. The Apple III was actually dropped from two inches above a table to reseat popped chips because its case was designed before the innards. Anyway, a manufacturer can only be trusted to a point, although the closer one gets in the process to the original maker's specs the better (it's better to trust Intel about its chips than Apple with Intel chips).

But, yes, in ideal conditions (stock speed, within ambient temp guidelines, flawless components and software (like motherboard BIOS) et cetera) a manufacturer's chip should be stable, as long as it's new (not a returned product that was damaged by voltage abuse). But, that doesn't mean equal stability can't be found at a higher speed.

A manufacturer's guaranteed speed can be artificially low (such as the e2140's) due to factors that can be remedied reliably by an end-user. An e2140 may run reliably at a higher speed with a replacement cooler than is more effective. The guaranteed speeds becomes rather more irrelevant with a cooler change. Apple and other companies have down-clocked chips in the past to provide a less expensive product for consumers, which is another artificial lowering of guaranteed speed. While I've heard of design issues that do support a lower guaranteed speed for the e2140 as compared with other C2D chips (such as the lack of solder in the headspreader in favor of the less efficient paste bond), the downclocking to provide a less expensive product?one that's less competitive with higher margin products in conjunction with the cheap-to-produce weak cooler seem to make the guaranteed speed much lower than the chip's capability.

Another example is the Radeon 2400 Pro series from VisionTek. That company produced an "overclocked" version of the 2400 chip and a stock version. Is the overclocked version any less stable? Unlikely. Does it even have a different cooler? I doubt it. The copper cooler on the stock version is probably overkill in the first place.

Rubycon · May 5, 2008

Originally posted by: superstition
Not necessarily.

If you have a bad PSU or other component you may not have stability. Or, if something fails. Or is something is marginal...

And adding another point of failure with intentional o/c just makes the fault tree larger.

jaqie · May 5, 2008

After many years of messing with overclocking and stability tests, I personally have come to the conclusion that overclocking isn't worth the loss of stability and reliability for me. I also agree that folding@home seems to find problems much easier then almost any other test, on average, but there is a time and place for every one of those tests.

poohbear · May 23, 2009

why isnt this stickied? I had to find this through google.

There's an ancient A64 overclocking guide stickied and yet a thread like this that speaks of the most up to date overclocking tools isnt stickied? i didnt even know memtest has a program in windows, nor did i ever hear of OCCT until i asked around, if this was stickied it would've saved me and many others plenty of time.

plz sticky.

ShawnD1 · May 23, 2009

Another thing I think should be added to the top is run OCCT's power supply test. After running that, OCCT discovered that my power supply has incredibly bad voltage regulation when everything is running full blast. A bunch of individual tests might not catch something like that, and you're left wondering why the computer only crashes when trying to run a specific game.

It's also a nice way to test if your UPS can handle your overclocked system at full load.

n7 · May 23, 2009

Old thread necromancy...

What i rely on:

Memtest86+ purely for RAM
HCI Memtest for RAM/NB
P95 small FFTs for CPU; Large for FSB/NB/overall mobo stability
LinX (or Intel Burn Test) for CPU & RAM

For GPU:
RTHDRIBL for basic checks
Furmark for heavy duty
ATiTool for overclock checking quickly

And actual gaming is good too; especially UT3.

leeland · May 26, 2009

great post thanks guys for the info and discussion!

akugami · May 26, 2009

Originally posted by: jaqie
After many years of messing with overclocking and stability tests, I personally have come to the conclusion that overclocking isn't worth the loss of stability and reliability for me. I also agree that folding@home seems to find problems much easier then almost any other test, on average, but there is a time and place for every one of those tests.

I think one has to set goals. What is your goal with overclocking?

My goal always was to get basically the performance levels of the top CPU's out there at a fraction of the cost. If it clocks better than any retail CPU, great, but if it doesn't but gives great value after the overclock then that works.

Generally I push my CPU nearly to the limit, using only quick stability tests of an hour or less, then I dial it back down and start my testing from there. Sure, I could probably tweak and tweak and tweak to get the absolute max out of my overclock but as you say, too much time for too little gain. Just find a good speed gain and then work to stabilize it. It's relatively easy if you do your homework and if you're only stressing the CPU to the last mhz. So I could probably eke out another 100 to 200mhz on most of my CPU's. Maybe even more but if I already overclocked my CPU by 20-30%, why quibble over another 5%?

For those having trouble with Linpack, try Intel Burn Test. Very good tool for putting your CPU in the oven so to speak. Your CPU should never reach these temps attained by Intel Burn Test. If it passes this test, it should pass most tests. keeping in mind as Rubycon stated that no one test can ever hope to stress every facet of a computer to assess weak points. Some apps stress one area more than others. It's just the nature of the beast.

Basically for CPU I use Prime95, Memtest86+, RealTemp (or use your own favorite temp monitor tool), Intel Burn Test.

For GPU I use Futuremark, RTHDRIBL.

Another thing I like to do is run Prime95 while running RTHDRIBL to try to cover my bases in getting a good overall stress test on the CPU and GPU at the same time. My preference is to heat up my room as well.

I currently have a small room that I keep toasty by closing all the windows and doing my tests in there. It can reach over 100F on hot days in the summer. I also turn on a small portable heater in the winter and have it as hot as possible when doing overclocks in the winter. If it can withstand these stability tests in 90F+ heat, it's good to go. Generally after all of these stress tests, it's stable enough that it doesn't crash no matter what I throw at it in terms of real world work and gaming.

TurtleCrusher · May 27, 2009

IDK about you guys, but LinX finds instabilities at least a HUNDRED times faster then prime 95 while making my temps even higher.

For example, When i was searching for the lowest voltage possible for my overclock, Prime95 went 1 1/2 days and passed on 1.2V. Linx took me a minute and it crashed. I raised CPU voltage just a tad and viola! I kept going.

Idontcare · May 27, 2009

Originally posted by: Scholzpdx
IDK about you guys, but LinX finds instabilities at least a HUNDRED times faster then prime 95 while making my temps even higher.

For example, When i was searching for the lowest voltage possible for my overclock, Prime95 went 1 1/2 days and passed on 1.2V. Linx took me a minute and it crashed. I raised CPU voltage just a tad and viola! I kept going.

Minimum voltage for stable operation is a function of temperature. It is quite plausible that at the temperatures generated by prime95 your rig actually was 100% fully stable for that temperature and any lower temperature.

Do you intend to operate your CPU at the temperatures produced by LinX?

If not then you are likely needlessly over-volting (relative to the voltage you could be operating with, not over-volting as in exceeding VID or spec) your rig for sake of being stable in a temperature regime that you don't intend to operate.

I personally like Prime95 as it generates temperatures about 5C higher than those generated by my applications of choice. I'm not worried about being stable at temps up to TJmax as I have no real-world applications that push my CPU temps to TJmax. If I did have an app that did that then I'd be interested in LinX naturally.

E4300 · May 27, 2009

OCCT V3.0.1 is very similar to Prime95. I like OCCT's timer. 1 hour of small, blend, and large @ +82F ambient should be adequate for many overclockers. You can also supplement with S&M memory test, Memtest86, or various windows-based memtest applications.

The use of Linpak or equivalent to "roast" the CPU does not simulate a real-world condition. Passing Linpak tells me that the CPU is okay at +100F ambient. It does not guarantee system stability with other applications under normal use.

If the PC is not 100% stable with the stuffs on your hard drive, then back-off the core speed by 30-50MHz. No need to torture the chip for 24hr under OCCT/Prime95.

error8 · May 27, 2009

Originally posted by: Idontcare

Originally posted by: Scholzpdx
IDK about you guys, but LinX finds instabilities at least a HUNDRED times faster then prime 95 while making my temps even higher.

For example, When i was searching for the lowest voltage possible for my overclock, Prime95 went 1 1/2 days and passed on 1.2V. Linx took me a minute and it crashed. I raised CPU voltage just a tad and viola! I kept going.

Click to expand...

Minimum voltage for stable operation is a function of temperature. It is quite plausible that at the temperatures generated by prime95 your rig actually was 100% fully stable for that temperature and any lower temperature.

Do you intend to operate your CPU at the temperatures produced by LinX?

If not then you are likely needlessly over-volting (relative to the voltage you could be operating with, not over-volting as in exceeding VID or spec) your rig for sake of being stable in a temperature regime that you don't intend to operate.

I personally like Prime95 as it generates temperatures about 5C higher than those generated by my applications of choice. I'm not worried about being stable at temps up to TJmax as I have no real-world applications that push my CPU temps to TJmax. If I did have an app that did that then I'd be interested in LinX naturally.

I had this dilemma for some time: should I use Linpack to test for stability, or should I just take the frequency OCCT is giving me as stable? But for some reason, I just can't stand or admit that my computer BSODS in Linpack,when I run the test. I can't get over that instability, created by a program that over stresses my cpu in a way that no other program on earth would stress it, in a normal use situation. It's simply not stable! I know that I could possibly take my Q6700 for example at a "stable" 3.7 ghz with a safe voltage applied, but it doesn't pass Linpack and that means that is just not absolutely 100% stable. I can't sleep with that thought in my mind. It's driving me mad. It has to pass Linpack. Curse you Intel for your stupid test. They should have kept it for themselves. Why was it offered to us? Why?

The RIGHT way to test for stability

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Junior Member

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member

Golden Member

Madame President

Diamond Member

Platinum Member

Platinum Member

Madame President

Platinum Member

Platinum Member

Lifer

Elite Member

Diamond Member

Diamond Member

Lifer

Elite Member

Member

Diamond Member