Anand's last SSD article misses an important point

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

taltamir

Lifer
Mar 21, 2004
13,576
6
76
actually, the MTBF claim of spindle drives is pulled out of their asses, they use cheaty math so that the more drives they test the higher the number, and they also don't test it that often. They assume that new drives have similar MTBF to old ones.

1.4 million hours MTBF = 159.81735159817351598173515981735 years
300K MTBF = 34.246575342465753424657534246575 years
 

Axonn

Senior member
Oct 14, 2008
216
0
0
taltamir: *grin*, exactly. And I bet it's similar with all the marketing bullcrap they serve both us & journalists alike.

Phynaz: oh no, I am aware that it slowly becomes read only, but still...

Cerb: yes, true, but it's still an indicator of a drive's life.

a123456: it's gonna die slower ::- D.

I'd like to remind you all that I'm not debating whether 32nm is THE BEST CHOICE. It's just more reliable, that's all. Ok, so if it's 100 vs 105, it's not such a big difference, but there IS a difference.
 

groberts101

Golden Member
Mar 17, 2011
1,390
0
0
I'd like to remind you all that I'm not debating whether 32nm is THE BEST CHOICE. It's just more reliable, that's all. Ok, so if it's 100 vs 105, it's not such a big difference, but there IS a difference.


depends on what context "reliable" is being used. Write cycles?.. probably.. based on logic alone that makes sense.

"data reliability"?.. well that's a whole new debate. Along with die shrinks we are now getting better bit error correction as a result of the smaller dies inherent weaknesses.

This would be similar to high PE/c-small capacity drives having similar overall lifespans compared to low PE/c-large capacity drives. One gain negates the loss over the other for not much change overall.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
IIRC, MBTF for platter drives is more for failure rate assumptions than a single drive life (IE, w/ a 10yr MBTF, you should average one failure per 10 drives per year).

Exactly right.

1 million hour MTBF means "If you have 5,000 drives in your datacenter, you should expect to RMA 44 drives each year".

It absolutely does not mean that a drive is expected to last for 114 years.
 

Axonn

Senior member
Oct 14, 2008
216
0
0
But it still indicates a drive's estimated life. A drive with 5 mil MTBF definitely lasts longer than one with 1 mil. Statistically.
 

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
But it still indicates a drive's estimated life. A drive with 5 mil MTBF definitely lasts longer than one with 1 mil. Statistically.

Maybe it does, but that is not what it is intended to mean.

The MTBF calculation assumes that the drives are all young enough to be within warranty, and that they are replaced with new drives once the warranty expires.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Exactly right.

1 million hour MTBF means "If you have 5,000 drives in your datacenter, you should expect to RMA 44 drives each year".

It absolutely does not mean that a drive is expected to last for 114 years.

No, this is NOT the definition of MTBF...
But it is what drive manufacturers MEAN when they say MTBF, so they are bastardizing a term. But it just so happens that it is functionally identical. You see, both the math you suggested (which is NOT what they claim or how they test) and the math I suggested work perfectly and give the exact same result for the same figures.

You see, you proved yourself wrong, you calculated a failure rate of 44 / 5000 every year based on an MTBF of 1 million and said "this does not mean they would last 114 years" when such a failure rate absolutely DOES mean they last 114 years on average.

If you have 5000 drives and only 44 fail a year they then your failure rate is 0.88% failed/year.

But in this amazing place called reality failure rates are an order of magnitude higher. If you have a 5000 drive datacenter you can expect close to 500 of them to fail every year (~10%).

This gives you an MTBF closer to 11 years using both my formula AND your formula. Both formula are correct and give identical results if you input identical numbers. The thing is, I am actually inputting numbers and calculating things out while you are just taking the figures from the manufacturers and declare them facts.

Look here:
http://www.pcworld.com/article/1295...e_rates_much_higher_than_makers_estimate.html
The Carnegie Mellon study examined large production systems, including high-performance computing sites and Internet services sites running SCSI, FC and SATA drives. The data sheets for those drives listed MTBF between 1 million to 1.5 million hours, which the study said should mean annual failure rates "of at most 0.88%." However, the study showed typical annual replacement rates of between 2% and 4%, "and up to 13% observed on some systems."
I have no idea whether they used my formula, your formula, or some other formula entirely. But they got the exact same 0.88% yearly failure rate = 1 million hour MTBF. This is because in the amazing world of math you can solve the same problem many different ways and all of them are correct

I would like to point out that I only looked it up after calculating and writing out the rest.
 
Last edited:

Mark R

Diamond Member
Oct 9, 1999
8,513
16
81
No, this is NOT the definition of MTBF...
But it is what drive manufacturers MEAN when they say MTBF, so they are bastardizing a term. But it just so happens that it is functionally identical. You see, both the math you suggested (which is NOT what they claim or how they test) and the math I suggested work perfectly and give the exact same result for the same figures.

You see, you proved yourself wrong, you calculated a failure rate of 44 / 5000 every year based on an MTBF of 1 million and said "this does not mean they would last 114 years" when such a failure rate absolutely DOES mean they last 114 years on average.

I've not contradicted myself at all.

The MTBF (death) for a 30 year old man is 900 years (if you have 900 30 year old guys, only 899, on average, would make it to 31). This does not mean that life expectancy is 900 years.

My point was that when talking about a single drive, just because the MTBF is 114 years, this does not mean that you can expect the drive to operate for 114 years. MTBF and life expectancy are fundamentally difference concepts.

MTBF is just another way of writing AFR - they are identical in meaning. And, of course, as they are identical in meaning, they are both related to age. In the case of hard drives, the MTBF is measured during the warranty period.

That the MTBF figures in the marketing literature might be a lie, is not my point.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76

From second article you linked:
MTTF and MTBF are sometimes used interchangeably, but they are in fact different. MTTF refers to the average (the mean, in arithmetic terms) time until a component fails, can't be repaired and must therefore be replaced, or until the operation of a product, process or design is disrupted. MTBF is properly used only for components that can be repaired and returned to service. This introduces a couple of related abbreviations occasionally encountered: MTTR (mean time to repair) and, less common, MTTD (mean time to diagnose). With those notions in mind, we could say that MTBF = MTTF + MTTD + MTTR.

When I said definition I was referring to the words used to describe it.
Furthermore by convention MTBF, despite its definition, is estimated with the formula that you pointed out
second article said:
MTBF = 1/(sum of all the part failure rates)
and not with a simpistic 10 years MTBF = the average lifespan is 10 years; Therefore 10 years MTBF = 10% fail per year.

This amusingly explicitly contradicts the actual word definition but by convention that is the formula that is used.

That the MTBF figures in the marketing literature might be a lie, is not my point.
Really? because that certainly seemed to be the point. I thought you were contradicting my statement that "manufacturers falsify this figure"... As long as we agree on this then all is good.

The MTBF (death) for a 30 year old man is 900 years (if you have 900 30 year old guys, only 899, on average, would make it to 31). This does not mean that life expectancy is 900 years.
Humans do not have an MTBF, they have an MTTF. This is because a dead human cannot be (feasibly) repaired and returned to service. Also this again runs into the explicit conflict between the word definition of MTBF and the formula used by convention.

Actually HDD do also have an MTTF and not an MTBF because HDD cannot be (feasibly) from most failures.
 
Last edited: