Some brief thoughts about stability "certification"

BonzaiDuck · Aug 11, 2007

We've touched on this topic many times.

On the matter of thermal equilibriums for over-clocks, the THG "C2D Temperature Guide" notes that ORTHOS at "priority 9" puts the processor under 88% of the stress experienced under Intel TAT. It suggests that 10 minutes of ORTHOS will give you a fairly accurate temperature profile.

This, of course, doesn't get you to stability testing. For the process of adjusting voltage and speed, some suggest running ORTHOS for two hours between adjustments to see if stability is likely or possible.

Various opinions, mine included, suggest running ORTHOS for six, eight, twelve, sixteen or twenty-four hours to guarantee that a setting is stable. Yet, I have occasionally discovered that a nine-hour run suddenly terminates with a "STOPPED ERROR."

This doesn't mean that an eight-hour ORTHOS run is a bad rule, but it suggests what might be a Poisson-distributed failure-rate under stress-testing. That particular distribution is skewed to the left axis, and has been used to model things like the flaws in a bolt of cloth at a textile mill, the interarrival rates of customers at a checkout counter in a queueing model for grocery-stores and other phenomena.

I would also suggest a sort of Bayesian logic or conditional probability for stress-testing: "If it completes one hour with 0 errors, 0 warnings, the probability is 'high' that it will go two hours to 0,0." "If it completes four hours with 0,0, the probability is 'high' that it will go eight hours with 0,0," and so forth. Maybe we could measure exactly what those conditional probabilities are, but the results would be so specific to hardware choices, the motherboard settings themselves, the accuracy of room-ambient measurements and so many other things that it would seem like a waste of effort. So we result in concepts like "high" and "low."

Even so, "probability" is always a gamble (there seems something almost funny and redundant about saying it that way), which suggests why statistical methods have come late to acceptance in the court system for both criminal and civil litigation. Judges have trouble accepting that "Your honor, we're 99.9% confident that the bank records show the defendant stole the plaintiff's briefcase from the ATM booth" is "beyond reasonable doubt."

So I can accept a 13-hour ORTHOS run as "proving" the stability of my motherboard settings. I can also understand why someone might want to impose an arbitrary rule on the "Quad OVerclock Thread!" that "all results must be validated with a 24-hour ORTHOS or Prime95 test."

But I won't insist on it, and I won't suggest that all the data thus far is "unreliable" because of different test standards.

lopri · Aug 11, 2007

Some people are way too obsessed with Prime95 run (how long..). So it goes how many hours is really enough? Well, if we're making a database, we'll need a guideline with screenshot requirements.

Then some are conveniently forget that applications/drivers/OS can cause BSOD/hang/reboot, etc. A lot of those typical symptoms are software-based.

MarcVenice · Aug 11, 2007

Meh I have never run orthos for more then 4-5 hours or so. I figured it would be stable, and never had a BSOD thus far. I dunno man, orthos is nice to see if your rig is somewhat stable, if it's very unstable it will often crash in the first 10 minutes. If it doesn't you go up your overclock. Then, when you've reached your overclock and it's orthos stable for 4 hours or so, it should be fine.

And if you get randomn bsod's then it's apparently not stable, and you should go tweak some more. This works for me, since I don't really do things on my PC that when interupted by a bsod will cost me hours of work I have to redo. In essence, stability is important, but it doesn't have to be rocksolid coz I'm not running a dedicated server here. If stability is the world to you, then you shouldn't try to get the maximum OC out of your PC.

On a sidenote, even though orthos has only run for 4 hours, I have never had a bsod, so it does seem rocksolid.

nullpointerus · Aug 11, 2007

This is the way I understand it:

Stability testing has no hard-and-fast rule. Running two years of Prime95 won't *prove* that the CPU is 100% stable at a particular clock speed. These chips have lots of transistors, of which any given software load will only test a portion (and even then, only a small fraction of combinations), so the failures could easily occur in ways that traditional testing will not reveal. Stability testing just tells you that the CPU will *probably* be stable at that speed, but even then you might have to back the chip down to fix a crash in some other application (3DMark?).

BonzaiDuck · Aug 11, 2007

Sure, nullpointerus, we're clear on that. Some people think there's such a thing as "absolute certainty," but in my world, it's all about the level of being "probable." Remember the OJ Trial? The jury might have accepted the notion that DNA evidence has a reliability of one-in-billions error expectation, but it only took Fuhrman's screw-ups to tip them towards absolute-certainty as a criterion for no reasonable doubt.

There's also the issue of variables introduced by user carelessness. The only thing I have in my system tray when running ORTHOS (x2) is a USB connection icon for that [damned useless] Sunbeam Theta 101 fan controller.

I've noticed, for example, that Everest Ultimate gives you the same temperatures for cores that CoreTemp does. But there's one problem: Everest has noticeably higher overhead in CPU resource usage. So I've either had a marginally stable setting crash on me when loading both Everest and ORTHOS, or the temperature for core 0 is out-of-whack from what core temp would report, because it apparently has an affinity for that core and adds to the stress on it -- raising the temperature. A little like the old Heisenberg Uncertainty Principle, I guess.

For things like the Quad OverClock Thread, you'd want some sort of basic guideline. As I said, I've had a setting that was perfectly stable for eight hours fail in the ninth. But I know that it's highly likely that the next time I run an ORTHOS test for eight hours, it's probably good for a whole day.

I see people posting "screenies." I should probably do it more often, but it's a minor PITA and you have to have the screen capture utility-of-choice loaded. However confident I am about my testing, I'm not all eager to open and close software in the middle of the test run.

Even so, under that regimen, it would still be possible for anyone to manipulate their screenie with Corel PhotoPaint and forge their results. Good thing we don't offer cash prizes for over-clock records . . . .

BonzaiDuck · Aug 11, 2007

Anyway, here's another observation, and sorry I'm posting again before there's a reply.

In fall of 2005 I put a 3.2E Prescott in my 478-pin system -- affectionately called "MOJO" because I "got it workin'." The ASUS motherboard allowed you to avail yourself of the loose-jointed multiplier in that processor, so I dropped the multiplier a notch and ran it back up to 3.5 Ghz. It was "rock-stable." In fact -- for several months -- it was my own version of "dream machine," even as the rest of you here were probably flipping Smithfields for Preslers.

Xmas comes along. I get a dual-tuner-capture PVR-500-MCE card as a gift. It goes into the machine in early January. Suddenly, in March -- hardware failure. And nothing with "revised settings" would fix it. I think I've confirmed, though, that I blew the Northbridge memory controller, and the processor is still good. Did adding the tuner-card after I'd completed the PRIME95 tests have something to do with it? Don't know, and the tuner card is still stellar.

But that board -- as good as it was -- was spec'd for an 800 Mhz FSB only. AFter a few more generations of hardware bringing us into this year's market, if the board-maker says "good for processors running 1,333 FSB," one could deduce that the board is good at 1,333 regardless of the processor spec. So if you run up the FSB again to 1,500, you're talking about putting the mobo only 12% out of spec. That's a lot different than driving it to 25% out of spec -- which I did with an 800 Mhz FSB board running the FSB at 1,000.

Maybe the best way to avoid risk can be found in becoming a miserly Luddite hermit. Don't buy anything; don't engage technology; you'll only be at risk for becoming ill-informed and unable to keep up with everyone else. . .

SerpentRoyal · Aug 11, 2007

Run Orthos and full S&M for about 1 hour. If stability occurs down the road, then I drop FSB or RAM speed by 1 or 2MHz. So far, I've only had to drop FSB by about 5MHz (worst case scenario).

JustaGeek · Aug 11, 2007

I agree - I run Orthos for maximum 1 hour.

In case of instability, the errors would show within 2-3 minutes.

If you can run it for 1 hour with no errors, you're fine. If the computer is unstable, you will get the BSOD's, application errors or game freezes anyway.

BonzaiDuck · Aug 11, 2007

I agree in principle - that instability will most readily appear within the first hour. I think that's what I meant by trying to apply the left-skewed statistical distribution to the likelihood of errors occurring.

But since everything is probability and uncertainty, one has to wonder about those "outliers" that occur when you let 'em. For instance, my ORTHOS "STOPPED" error that popped up in the ninth hour of green-GO screens.

That would be the reason for first -- certifying stability with ORTHOS for some period of time -- whether you choose one hour, six hours or 12 hours in which to do it -- then, dropping the clock-speed at the same VCORE and other voltage settings.

CTho9305 · Aug 11, 2007

Are you guys running your tests during the hottest hours of the day during the hottest weeks of the year, or are you setting yourself up to be posting in a few months, "Vista sucks! It's so unstable... it isn't my OC's fault because my hardware is rock-solid"?

orion23 · Aug 11, 2007

Just test it for a good 5 to 15 minutes. Really, if it passes that long, chances are it is rather stable.

Normal computer use does not put a load on the CPU as does running Orthos.

Just make sure that your System works fine when using your applications!

There are hundreds of thread of people running Orthos or Prime95 for 36 hours or more, and then say, "my pc is not stable because Orthos crashed after 36 hours"

Who know, maybe CPU's are not even designed to work @ 100% for days in a row!

CCityInstaller · Aug 11, 2007

Originally posted by: CTho9305
Are you guys running your tests during the hottest hours of the day during the hottest weeks of the year, or are you setting yourself up to be posting in a few months, "Vista sucks! It's so unstable... it isn't my OC's fault because my hardware is rock-solid"?

Very Good ping Ctho. I run my o/c's out of my case so that there is little to no airflow being directed across my components. This will give me my absolute worse case temps, which drop by 5-7C inside my Antec 900.

genec57 · Aug 11, 2007

I see so much of testing for 12, 18, 24 hours that I am pleased to see posts here that many like me just don't go that long. My basic rule is to run Prime small for 30 minutes when I am trying to achieve maximum oc. When I feel that I am at about the max I increase the test time to three hours. Settings that pass the three hour mark I personally decide are stable. As a preliminary test I also regard a passed OCCT as evidence of stability.

Over time I have never had problems with a system set up in this fashion. For me and what I do, that criteria gives me a fully stable system. If a thread requires more I will simply read it and not post my results. I don't have anything to prove so capturing a mess os screenies is just a PITA.

nullpointerus · Aug 11, 2007

Originally posted by: BonzaiDuck
Sure, nullpointerus, we're clear on that.

I'm not sure you are clear on *everything* that I said.

Some people think there's such a thing as "absolute certainty," but in my world, it's all about the level of being "probable." Remember the OJ Trial? The jury might have accepted the notion that DNA evidence has a reliability of one-in-billions error expectation, but it only took Fuhrman's screw-ups to tip them towards absolute-certainty as a criterion for no reasonable doubt.

FWIW, I didn't follow the O.J. trial because (a) I couldn't affect the outcome, (b) I couldn't be bothered to wade through the media nonsense to get actual information, and (c) the outcome had no perceivable effect upon me or those around me. But I see the point you are making here: things are seldom as simple as we would like.

Getting back to the original discussion topic, if what I said earlier is true, then stability standards are arbitrary, in which case the discussion about Prime95/Orthos time limit doesn't matter--for the sake of comparison--assuming everybody picks the same time and agrees on the same software setup/options. So why are people still going on about what time limit to use? Arbitrary means you pick something reasonable and move on to more important things.

There's also the issue of variables introduced by user carelessness. The only thing I have in my system tray when running ORTHOS (x2) is a USB connection icon for that [damned useless] Sunbeam Theta 101 fan controller.

One can only assume that participants in the "contest" will try their best. So the contest becomes, in part, a measure of user experience (and dumb luck). I don't see how you could eliminate such factors except in the form of a warning to avoid running other programs and to use stable drivers. Personally, I don't see this as being worthy of consideration, either. Why try to fix this?

I've noticed, for example, that Everest Ultimate gives you the same temperatures for cores that CoreTemp does. But there's one problem: Everest has noticeably higher overhead in CPU resource usage. So I've either had a marginally stable setting crash on me when loading both Everest and ORTHOS, or the temperature for core 0 is out-of-whack from what core temp would report, because it apparently has an affinity for that core and adds to the stress on it -- raising the temperature. A little like the old Heisenberg Uncertainty Principle, I guess.

Launching a program while Prime95/Orthos are stress-testing increases temperature and decreases CPU voltage. I always have CPU-Z and TAT running in the background, so I spotted both these things. Often, I will load a web browser (i.e. Opera) because I get bored, and this activity will occasionally cause the PC to bluescreen in what should otherwise have been a much longer stress-testing session (i.e. extrapolating from other results). A slight decrease in clock speeds or increase in voltage fixes this. But the problem itself goes back to what I was saying earlier:

"Stability testing has no hard-and-fast rule. Running two years of Prime95 won't *prove* that the CPU is 100% stable at a particular clock speed. These chips have lots of transistors, of which any given software load will only test a portion (and even then, only a small fraction of combinations), so the failures could easily occur in ways that traditional testing will not reveal. Stability testing just tells you that the CPU will *probably* be stable at that speed, but even then you might have to back the chip down to fix a crash in some other application (3DMark?)."

Again, all this stuff you keep mentioning is irrelevant to the notion of an overclocking "contest," which is a purely relative comparison. All that's required is a standard set of testing criteria and the assumption that people will compete to the best of their ability. If you want to draw additional conclusions, such as the model/stepping's overclockability according to your definition of stable, then that's fine, but this is a logically distinct task that could easily be performed *after* the contest has ended.

For things like the Quad OverClock Thread, you'd want some sort of basic guideline. As I said, I've had a setting that was perfectly stable for eight hours fail in the ninth. But I know that it's highly likely that the next time I run an ORTHOS test for eight hours, it's probably good for a whole day.

What does any of this hour-related discussion matter? If you stipulate one hour, people will compete to find the point at which their CPU's will last one hour. If you stipulate eight hours, people will compete to find the point at which their CPU's will last eight hours. The number of hours is unimportant so long as everyone uses the same value. Only when *you* want to draw conclusions about *your* definition of stable do the length of time and exact clock speeds become meaningful to you. But everyone has different definitions of stability. So building these different notions into the testing criteria limits the usefulness of the contest results to people who think the way you do, and additionally this complicates the contest process and gives people excuse to bicker over the results. Set clear, arbitrary guidelines first, and tell people that's all the guidelines are: arbitrary.

Some useful information no one seems to have mentioned are cooling setup and ambient temperatures. People will have to provide these things during the contest if anyone wants to draw meaningful conclusions after the fact. So making the description of these things part of a valid contest entry would be helpful--though not strictly necessary. Setting requirements for these things depends on the goals of the contest (i.e. who it is supposed to cater to: die-hard enthusiasts w/ extreme cooling, or average, air-cooling forumers).

zach0624 · Aug 12, 2007

When I am trying to get to the max oc I will run orthos for about 30mins because from my experience the majority of errors are in the first couple of minutes of orthos. When I get an error I back down and run it all night (from about 12am to 9am) and once I get it stable for 9 hours under orthos stress then I consider it stable. I may push it farther later because sometimes a 9 hour stable orthos settings can fail in less than an hour. Really though for the gaming I do I hardly get over 70% cpu load so even a 10 min stable orthos can prove stable enough for gaming.

VirtualLarry · Aug 12, 2007

I'm of the school of thought that one MUST test OVER 24H using Prime95/Orthos for stability, at the very LEAST. Running various games or other stressful apps is also important, because while Prime95 can tell you if you're unstable, it cannot tell you if you are stable. This is an important distinction that is often overlooked.

The reason for the 24h timeframe is varied, but it at least includes all the hours of the day, and the various thermal cycles that occur due to that. It also includes a sort of probabilistic indicator, that if it has been stable for X hours, then it is likely stable for Y more, based on an increasing X, then therefore it can eventually be projected that Y approaches a sort of infinite limit. You definately want to be further along on the X, when calculating that stability curve.

Oh, and for heaven's sake, don't run your rig at the absolute limit of "tested" stability. Test and tweak until it "tests stable", and then BACK OFF the settings just a bit, to allow for a little bit of engineering margin. Voltages/temps can fluctuate in unpredicatable ways, and you don't want a slight variation of those to throw you into an unstable situation.

On my mobo, a Conroe865PE, there is no Vcore adjust, and the board undervolts (sad but true). So I was able to clock my E4400 up to 2.85Ghz or so (285FSB), but that was the limit of 24h Orthos stability. So to account for the undervolting, and so that I wasn't running right at the edge, I dropped my OC down 5 on the FSB, leaving it at 2.8Ghz. Good enough for me.

Noubourne · Aug 13, 2007

For me, the 24 hr testing is not important, because I run Prime 95 while I am at work, and because of my thermostat settings, that is the hottest my house gets (26C).

The best site I ever qualified a CPU OC for was DFI-Street, which included Prime 95, SuperPi, all 3 3dMarks, AquaMark, and a couple other things, with 10hrs on P95. As a result, I always start with 10hrs of P95 for my first test, as that is usually where I find the failure. I run the other 3d tests after that b/c I am a gamer and I do not want crashes in my games.

I also throw on a few of my favorite games - about an hour each - just to be sure.

JAG87 · Aug 13, 2007

Originally posted by: BonzaiDuck
Sure, nullpointerus, we're clear on that. Some people think there's such a thing as "absolute certainty," but in my world, it's all about the level of being "probable." Remember the OJ Trial? The jury might have accepted the notion that DNA evidence has a reliability of one-in-billions error expectation, but it only took Fuhrman's screw-ups to tip them towards absolute-certainty as a criterion for no reasonable doubt.

There's also the issue of variables introduced by user carelessness. The only thing I have in my system tray when running ORTHOS (x2) is a USB connection icon for that [damned useless] Sunbeam Theta 101 fan controller.

I've noticed, for example, that Everest Ultimate gives you the same temperatures for cores that CoreTemp does. But there's one problem: Everest has noticeably higher overhead in CPU resource usage. So I've either had a marginally stable setting crash on me when loading both Everest and ORTHOS, or the temperature for core 0 is out-of-whack from what core temp would report, because it apparently has an affinity for that core and adds to the stress on it -- raising the temperature. A little like the old Heisenberg Uncertainty Principle, I guess.

For things like the Quad OverClock Thread, you'd want some sort of basic guideline. As I said, I've had a setting that was perfectly stable for eight hours fail in the ninth. But I know that it's highly likely that the next time I run an ORTHOS test for eight hours, it's probably good for a whole day.

I see people posting "screenies." I should probably do it more often, but it's a minor PITA and you have to have the screen capture utility-of-choice loaded. However confident I am about my testing, I'm not all eager to open and close software in the middle of the test run.

Even so, under that regimen, it would still be possible for anyone to manipulate their screenie with Corel PhotoPaint and forge their results. Good thing we don't offer cash prizes for over-clock records . . . .

very good post bonzai.

I've had prime 95 reboot because everest was running, but run perfectly fine for 8 hours by itself. people should take note of that.

I think that you should stress test for the maximum lenght of time that your machine will be on. so if you use your computer 4 hours a day, you should test for 4 hours. if you use it 8 hours a day, you should test for 8 hours. if your machine needs to be on 24/7, then you should test for 24h. personally, I never have my machine on for more than 6h straight, so 5-6 hours of prime is more than enough.

Search

Some brief thoughts about stability "certification"

BonzaiDuck

Lifer

lopri

Elite Member

MarcVenice

Moderator Emeritus <br>

nullpointerus

Golden Member

BonzaiDuck

Lifer

BonzaiDuck

Lifer

SerpentRoyal

Banned

JustaGeek

Platinum Member

BonzaiDuck

Lifer

CTho9305

Elite Member

orion23

Platinum Member

CCityInstaller

Banned

genec57

Member

nullpointerus

Golden Member

zach0624

Senior member

VirtualLarry

No Lifer

Noubourne

Senior member

JAG87

Diamond Member

TRENDING THREADS