Microsoft research on hardware failures.

philipma1957

Golden Member
Jan 8, 2012
1,714
0
76
Nice article it did not spell out the i5 2500t or the i7 3770t as a factory underclocked chip . But I think of both cpus as underclocked super stable cpus. My interruption of this article is that if stability is important those t chips are where it is at.

I read it quickly and laptops crash less then desktops (maybe oc of desktops are much higher)

The other point was hdds die off 2x quicker then rated MTTF. other info 2% of 480,000 cpus were oc'ed and one vendor had a 20x chance of a crash vs the other vendor of a cpu has a 4x chance of a crash. compared to no oc cpus. well I am trying to figure which one it Intel the 20x or the 4x, this is a very interesting number.
 
Last edited:

pantsaregood

Senior member
Feb 13, 2011
993
37
91
Nice article it did not spell out the i5 2500t or the i7 3770t as a factory underclocked chip . But I think of both cpus as underclocked super stable cpus. My interruption of this article is that if stability is important those t chips are where it is at.

I read it quickly and laptops crash less then desktops (maybe oc of desktops are much higher)

The other point was hdds die off 2x quicker then rated MTTF. other info 2% of 480,000 cpus were oc'ed and one vendor had a 20x chance of a crash vs the other vendor of a cpu has a 4x chance of a crash. compared to no oc cpus. well I am trying to figure which one it Intel the 20x or the 4x, this is a very interesting number.

I wouldn't assume either of them were "super stable," as they run at a lower voltage than the non-T/S chips. Stability gain would likely be most significant on slightly overvolted units that have been underclocked by a relatively significant amount.

There's some extremely good research in here. When I opened this, I didn't expect it to be so in-depth, nor did I expect such a lack of bias or error. Usually "research" is good for little more than pointing you in the ballpark direction of truths, but this pretty well isolates every possible factor within reason.
 

Zap

Elite Member
Oct 13, 1999
22,377
7
81
That looks really neat. I just skimmed first page, but will read at my leisure later. Thanks!
 

borisvodofsky

Diamond Member
Feb 12, 2010
3,606
0
0
Why do you guy bother reading this, when you KNOW, that no matter what number they present, you're STILL going to overclock?

LOLOLOL
 

philipma1957

Golden Member
Jan 8, 2012
1,714
0
76
Why do you guy bother reading this, when you KNOW, that no matter what number they present, you're STILL going to overclock?

LOLOLOL

some have two systems like I do.

oh I oc the 2500k to 4.2 , and the hd6870 card inside it by 10%

but i feel a lot better about the i7 3770t after reading this.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Its from 2011. But still interresting. Not sure if it has been posted before.

1 million PCs examined. Overclocking the CPU adds 4-20x higher chance of an OS crash. Underclocking reduces it with 40-80%.

http://research.microsoft.com/pubs/144888/eurosys84-nightingale.pdf

Gotta love those soft-errors and silent corruption. Basically it doesn't matter who is doing the "over" clocking - be it the CPU maker during binning or the end-user while OC'ing. Clocking too high makes an unstable system.
 

Borealis7

Platinum Member
Oct 19, 2006
2,901
205
106
didn't read the article...but i bet it's still a lot less than crashes originating from nVidia and AMD drivers.
 

KingFatty

Diamond Member
Dec 29, 2010
3,034
1
81
So this study is based on data sets from the Windows Error Reporting (WER) system. I wonder if the study is affected by the kinds of people who click "submit" when presented with the WER prompt vs. the kinds of people who don't? I was concerned that maybe the data is skewed by overclockers who intentionally crash their systems to find overclocking limitations (e.g., an overclocker would fall into the pattern of a machine that suffers a failure repeatedly, which is similar to a conclusion the study found where a system that crashes once is likely to crash again). But, maybe overclockers won't be clicking the submit button on the WER prompt therefore keeping this study's source data relatively unbiased? I didn't read the whole study carefully enough to see how they dealt with these factors.
 

AsusGuy

Senior member
Dec 9, 2004
228
0
71
Interesting article, although I have rarely experienced a CPU issue that was caused by overclocking so I don't think this will affect my OC habits much. CPU failures in any system non OC or OC seem so minor I feel like its a moot point.
 

borisvodofsky

Diamond Member
Feb 12, 2010
3,606
0
0
Interesting article, although I have rarely experienced a CPU issue that was caused by overclocking so I don't think this will affect my OC habits much. CPU failures in any system non OC or OC seem so minor I feel like its a moot point.

The reason for the low failure rate is because most people just browse the internet and watch porn.

Overclocking has been proven to be perfectly porn stable. :D
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
Gotta love those soft-errors and silent corruption. Basically it doesn't matter who is doing the "over" clocking - be it the CPU maker during binning or the end-user while OC'ing. Clocking too high makes an unstable system.

Soft error rate is the bane of my existence. As we start packing more and more transistors into a smaller area, the spec to protect against SER for each core gets higher and higher.
 

peonyu

Platinum Member
Mar 12, 2003
2,038
23
81
It just confirms what overclockers have known since...Well, overclocking first started. Run programs to test stability if you overclock, up the vcore if its not stable. Test your RAM, check your cooling. Rinse and repeat. Even then a overclocked setup wont be as stable as a non-overclocked unit but if done properly and tested I would hardly call it unstable.

Of course in Microsoft's case its anyones guess if they upped the voltage on their tests and properly overclocked. Its really not in their best interest to do so anyways, they sell software and most [casual] people who overclock dont test their system out properly so im sure it does crash alot. Microsoft likely recieves alot of tech calls for support about the OS crash as though as its their fault for the crash when its not...Ocing is not something they want to encourage.
 

Kristijonas

Senior member
Jun 11, 2011
859
4
76
So this study is based on data sets from the Windows Error Reporting (WER) system. I wonder if the study is affected by the kinds of people who click "submit" when presented with the WER prompt vs. the kinds of people who don't? I was concerned that maybe the data is skewed by overclockers who intentionally crash their systems to find overclocking limitations (e.g., an overclocker would fall into the pattern of a machine that suffers a failure repeatedly, which is similar to a conclusion the study found where a system that crashes once is likely to crash again). But, maybe overclockers won't be clicking the submit button on the WER prompt therefore keeping this study's source data relatively unbiased? I didn't read the whole study carefully enough to see how they dealt with these factors.

I think overclockers are people who are more seldom to send reports than regular/business users. I think what adds to the plausibility of the theory of clock/crash connection is the stability of underclocked systems. It proves that underclocked systems (most of them are untampered, unlike overclocked systems) are more stable than regular systems, which explicitly shows a connection between clock and crash rate.

Anyway, great find, ShintaiDK!
 

Subyman

Moderator <br> VC&G Forum
Mar 18, 2005
7,876
32
86
I think the results may be skewed. If Windows is sending a report with every BSOD recovery, then I sent them well over 10 when adjusting an overclock on a new chip. I'm sure a lot of anandtech forum goers have sent their fair share of BSOD crashes due to testing OCes and BIOS settings. That doesn't mean we run it like that daily.
 
Dec 30, 2004
12,553
2
76
lot of people overclocking are stability testing though. So of course there are going to be failures.
What would be interesting is number of failures / PC after PC is "stable" IE what's the standard deviation on the failure rate? Obviously one PC fails a lot at first till dude figures out good voltage/frequency.
 

Bill Brasky

Diamond Member
May 18, 2006
4,324
1
0
I particularly enjoyed this tid bit about creating a OS that is hardware fault aware. Pretty neat ideas!


"For example, a hardware-fault-tolerant (HWFT) OS might map out faulty memory locations, just as disks map out bad sectors. In a multi-core system, the HWFT OS could map out intermittently bad cores. More interestingly, the HWFT OS might respond to an MCE by migrating to a properly functioning core, or it might minimize susceptibility to MCEs by executing redundantly on multiple cores. A HWFT OS might be structured such that after boot, no disk read is so critical as to warrant a crash on failure. Kernel data structures could be designed to be robust against bit errors. Dynamic frequency scaling, currently used for power and energy management, could be used to improve reliability, running at rated speed only for performance-critical operations. We expect that many other ideas will occur to operating-system researchers who begin to think of hardware failures as commonplace even on single machines."
 

samboy

Senior member
Aug 17, 2002
223
94
101
lot of people overclocking are stability testing though. So of course there are going to be failures.
What would be interesting is number of failures / PC after PC is "stable" IE what's the standard deviation on the failure rate? Obviously one PC fails a lot at first till dude figures out good voltage/frequency.

Agreed....... my thought also that this would significantly bias things
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Doesn't it kind of confirm that ECC would be useful?

It shows that memory failures are far, far less of an issue than processor and disk issues.


Of course, to the ECC zealots, sure, it will be taken as proof, but it shows just how far down the list their pet problem really is.