Hard drive MTBF

NeoPTLD

Platinum Member
Nov 23, 2001
2,544
2
81
When you read in HDD specification sheets, there is a reliability specification named MTBF listed.

Seagate 15K.4 has a MTBF of 1,400,000 hours (160 yrs at 24/7 use)
A typical consumer hard drive might have something around 500,000 hours (57 yrs @ 24/7 use)

My initial reaction is that it makes no difference whatsoever as you likely won't have the drive around much past ten years. From what I read on one website, they run say 10,000 drives for a 1,000 hours and one drive fails. They calculate MTBF using the formula (10,000 drives x 100 hours = 1,000,000). I don't know what statistical evaluation they use, but if they use the above formula, I can't see how it reprsent real life conditions, because it doesn't factor in the wear-and-tear.

It's like driving ten identical cars with 10,000 miles for 100 miles and trying to evaluate how the same vehicle will be like when one of those cars is driven for 1,000 miles after 100,000 miles.

Anyone know what the real life significance of 1/2 mil vs 1.4 mil hrs MTBF and the statistical method used to get these figures?
 

Mark R

Diamond Member
Oct 9, 1999
8,513
14
81
It's like driving ten identical cars with 10,000 miles for 100 miles and trying to evaluate how the same vehicle will be like when one of those cars is driven for 1,000 miles after 100,000 miles.

But that's the important point. MTBF is not meant to evaluate how a drive will behave when it is old.

What MTBF means is how many aggregate hours on average, a population of drives will run for before 1 fails. The calculation is very simple - and is exactly as you describe it. MTBF tells you the risk of your drive dieing prematurely. It does not tell you how long you can expect it to live..

Importantly, MTBF does NOT account for wear-and-tear. The MTBF measurement is only really relevent for populations of drives, which are retired when they reach old-age (as little as 3 years for low-end desktop drives, or as long as 7 years for high-end server drives).

The MTBF is a useful measure for people running datacentres, or system builders, or corporations with a large number of computers. It is of limited value for people with only a few drives, unless used as part of a risk assessment process.
 

warhorse

Member
Dec 1, 2001
28
0
0
An example of what Mark R is saying:

For your disk of 1400000 MTBF hours, if you have 1400000 disks (pretend you're Google), then you should expect to have one disk fail every hours. If you have 700000 disks, one disk will fail every 2 hours.

The number certainly does not apply to a single drive; there is no way to test that number :) The manufacturer simply runs a bunch of disks for some set amount of time and counts the number of dead drives. It works out to something like (#drives * time tested) / failures.
 

warhorse

Member
Dec 1, 2001
28
0
0
You don't even have to replace them! If your servers can detect failed disks, and stuff is backed up all over the place, just leave the disk there and have the software remember to put the data elsewhere.
 
Jun 18, 2004
105
0
0
Surly it is still a good indicator of reliability in single drives because if one set of 10000 drives has a failure every 10 hours and one set of 10000 every 5 hours then it is reasonable to presume that the ones that last 10 hours on average are more reliable individually?
 

PowerEngineer

Diamond Member
Oct 22, 2001
3,548
716
136
Originally posted by: warhorse
For your disk of 1400000 MTBF hours, if you have 1400000 disks (pretend you're Google), then you should expect to have one disk fail every hours. If you have 700000 disks, one disk will fail every 2 hours.

This is true if you assume that the failures are evenly distributed, which might be a reasonable assumption if the 1400000 diskes were of random ages. If they were all new disks, then I'd expect a higher than average number of "infant mortalities" failures followed be a long period of lower than average failures maybe past 1000000 hours.

Here an interesting
link.