Are 2 TB HDs more failure prone than disks with less capacity?

Barfo · May 27, 2010

I'm considering getting a 2 TB WD Green HD to upgrade my media storage. A smaller drive would make little sense to me and I don't want to wait for 3 TB drives.
So I was talking to a friend about this and he told me that 2 TB HDs are very failure prone at this point in time and I should be better off with 2 1 TB drives. I'm wondering if there's some truth in that or if he just pulled it out of his ass. What gives?

JAG87 · May 27, 2010

option 2, pulled out of ass.

sub.mesa · May 27, 2010

Your friend is talking about BER or Bit-Error Rate. This is typically 1 per 10^14.

As HDDs get much bigger in capacity (storing more bytes), the Bit-Error-Rate that roughly says "1 unrecoverable sector due to BER every 10TB read" stays relatively the same. So with more data, you have a higher chance of actually running into a Bit-Error.

Even worse, if you thought you were safe with a RAID5 array, think again. If one disk totally fails (dead) and you have a new disk to rebuild the array, you better pray that no bit-errors occur on the remaining disks. If they do, your recovery would be very complicated and many users give up and reformat their disks losing all data.

To protect against bit-errors, investing in a SOLID backup solution would be good advice anytime. Some more advanced users use the ZFS filesystem to protect against BER. With ZFS you would not have to worry about BER on redundant arrays; ZFS uses checksums to protect your data against corruption and when corruption occurs it will fix it by overwriting the corrupt parts with valid data, coming from a redundant source (RAID1/RAID5/RAID6).

So any large HDD is kind of vulnerable to BER. A 20GB disk has about the same bit-error rate, but due to its lower size the chance of actually running into one is a lot less likely.

Golgatha · May 27, 2010

There's been a lot of research concerning temperature and age effects, but not so much on the number of platters used in a drive. However, the more platters you have the hotter the drive will run. There is very little to no real correlation between the operating temperature and higher failure rates as far as I have read Online. I actually did read and digest that huge Google study published a few years ago (my God how boring!!), so I feel I'm fairly educated on this subject matter.

To summarize the results presented in Google's research, temperature had little to no effect on failure rates and most drives die within the first week to 6 months if they're a lemon to begin with. If they survive the infant mortality stage, they start dieing off quickly in the 4-6 year range, with 5 years being a good time to replace drives. Some outliers we noted for random "bad" batches of drives having abnormally high failure rates, but these "bad" batches were random and not specific to any one manufacturer of HDDs.

To actually answer your question, drives in the 2TB range use the same technologies and QC used for the 1TB ones, so it would stand to reason you wouldn't see any difference in failure rates for drives manufactured in the same factory on the same day.

Golgatha · May 27, 2010

sub.mesa said:
Your friend is talking about BER or Bit-Error Rate. This is typically 1 per 10^14.

As HDDs get much bigger in capacity (storing more bytes), the Bit-Error-Rate that roughly says "1 unrecoverable sector due to BER every 10TB read" stays relatively the same. So with more data, you have a higher chance of actually running into a Bit-Error.

Even worse, if you thought you were safe with a RAID5 array, think again. If one disk totally fails (dead) and you have a new disk to rebuild the array, you better pray that no bit-errors occur on the remaining disks. If they do, your recovery would be very complicated and many users give up and reformat their disks losing all data.

To protect against bit-errors, investing in a SOLID backup solution would be good advice anytime. Some more advanced users use the ZFS filesystem to protect against BER. With ZFS you would not have to worry about BER on redundant arrays; ZFS uses checksums to protect your data against corruption and when corruption occurs it will fix it by overwriting the corrupt parts with valid data, coming from a redundant source (RAID1/RAID5/RAID6).

So any large HDD is kind of vulnerable to BER. A 20GB disk has about the same bit-error rate, but due to its lower size the chance of actually running into one is a lot less likely.

Exactly, RAID does not equal a backup solution. Multiple copies of the same data on physically separate media do equal a backup solution. Hopefully one copy is stored in a fire and waterproof safe, or even better, off site. RAID is only useful for increased speed or to ensure greater uptime for mission critical servers or workstations.

taltamir · May 27, 2010

It is not just BER (although that is an issue, and I whole heartedly recommend ZFS)

there have been an unusual amount of firmware bugs, failure rate, and problems with 1TB, 1.5TB, and 2TB drives, getting progressively worse with each size iteration (although, newer versions of said drives have improved those)... This was due mostly to the rush to reach those sized first. A "dip" in drive quality so to speak... This caused some backlash and companies seem to be more cautious now, and I also think said bugs have since been fixed.

Always investigate a specific MODEL of drives you wish to buy (and let others buy a good amount of them first), since it might have issues.

Exactly, RAID does not equal a backup solution. Multiple copies of the same data on physically separate media do equal a backup solution. Hopefully one copy is stored in a fire and waterproof safe, or even better, off site. RAID is only useful for increased speed or to ensure greater uptime for mission critical servers or workstations.

ZFS RAID is superior to most backup only solutions, but is indeed not the same as backup, ZFS RAID1 + offsite backup together are the ultimate protection.
Fire safes are meant to protect PAPER, they reach temperatures that will destroy your DATA. Offsite backup is the way to go.

See my guide to protecting data: http://forums.anandtech.com/showthread.php?t=218081&highlight=backup

Golgatha · May 27, 2010

taltamir said:
It is not just BER (although that is an issue, and I whole heartedly recommend ZFS)

there have been an unusual amount of firmware bugs, failure rate, and problems with 1TB, 1.5TB, and 2TB drives, getting progressively worse with each size iteration (although, newer versions of said drives have improved those)... This was due mostly to the rush to reach those sized first. A "dip" in drive quality so to speak... This caused some backlash and companies seem to be more cautious now, and I also think said bugs have since been fixed.

Always investigate a specific MODEL of drives you wish to buy (and let others buy a good amount of them first), since it might have issues.

ZFS RAID is superior to most backup only solutions, but is indeed not the same as backup, ZFS RAID1 + offsite backup together are the ultimate protection.
Fire safes are meant to protect PAPER, they reach temperatures that will destroy your DATA. Offsite backup is the way to go.

See my guide to protecting data: http://forums.anandtech.com/showthread.php?t=218081&highlight=backup

Sentry makes 2 models which keep the internals at or below 52°C and 80% humidity. Very nice guide however.

http://www.sentrysafe.com/pdfs/OwnersManuals/504815.pdf

taltamir · May 27, 2010

Golgatha said:
Sentry makes 2 models which keep the internals at or below 52°C and 80% humidity. Very nice guide however.

http://www.sentrysafe.com/pdfs/OwnersManuals/504815.pdf

thats pretty cool, its nice to know that there are some fire safes that can protect data.
Just make sure you get a firesafe that is rated for data such as:

1⁄2-Hour UL Fire Endurance Test Subjected to temperatures up to 1550ºF (843ºC) for a duration of 1/2 hour, the interior of the unit will remain below 350ºF (177ºC) to protect documents.
Models 1710 and 6720 – The interior of the unit will remain below 125ºF (52ºC) and 80% humidity to protect computer and audio/visual media. (The unit withstands high-temperature exposure as fire moves through a building.)

RebateMonger · May 27, 2010

There's a couple of considerations in the reliability of 1 TB versus 2 TB disks.

1) Manufacturer - some makers are having more problems than others with the highest density disks

2) 4K Sectors - WDC is using these on some of their drives and, in theory, the better ECC can lower the uncorrectable error rate by 1/100.

3) Platter density and number of platters - The higher the density, the tougher it is to make reliable disks. Additional platters probably also affect reliability. There's likely a tradeoff in reliability between higher density platters and the number of platters.

Doubling the number of disks (using two 1 TB disks rather than a single 2 TB disk) might prove more reliable than a single 2 TB disk, but there's no single "correct" answer to this question.

taltamir · May 27, 2010

RebateMonger said:
There's a couple of considerations in the reliability of 1 TB versus 2 TB disks.

1) Manufacturer - some makers are having more problems than others with the highest density disks

2) 4K Sectors - WDC is using these on some of their drives and, in theory, the better ECC can lower the uncorrectable error rate by 1/100.

3) Platter density and number of platters - The higher the density, the tougher it is to make reliable disks. Additional platters probably also affect reliability. There's likely a tradeoff in reliability between higher density platters and the number of platters.

Doubling the number of disks (using two 1 TB disks rather than a single 2 TB disk) might prove more reliable than a single 2 TB disk, but there's no single "correct" answer to this question.

Very good observations, I wanted to add to it that doubling the number of disks with no RAID means that if one fails, you lose half your data.

I would say that as long as failure chance is below 50% (which it is), getting more smaller drives will improve your overall data safety.

However, drive failure WILL eventually happen, its only a question of when. And it can be made a non issue with redundancy and backups...
I'd get 2x2TB in RAID1, running ZFS, with an offsite backup (at least of the most important files)

Barfo · May 28, 2010

So I'm better off with 2 1 GB drives...oh well :/

Idontcare · May 28, 2010

Barfo said:
So I'm better off with 2 1 GB drives...oh well :/

The question shouldn't be "which option is better?" but rather "is there a material difference between these choices?".

Concluding that 2x1GB has lower aggregate fail-rate than 1x2GB says nothing about whether or not the aggregate fail-rate for either option is such that the info becomes actionable.

If 2x1GB has a fail rate of once every 10yrs and the 1x2GB option has a fail rate of once every 9yrs you might come to a very different conclusion as to what you will do with that info versus the case where the fail rates are 10yrs and 2 yrs respectively.

jiffylube1024 · May 28, 2010

Barfo said:
I'm considering getting a 2 TB WD Green HD to upgrade my media storage. A smaller drive would make little sense to me and I don't want to wait for 3 TB drives.
So I was talking to a friend about this and he told me that 2 TB HDs are very failure prone at this point in time and I should be better off with 2 1 TB drives. I'm wondering if there's some truth in that or if he just pulled it out of his ass. What gives?

I wouldn't say that 2TB drives are very failure prone at this moment in time, but they're definitely a bit more failure prone.

2TB drives aren't more failure prone because they're 2TB, they're more failure prone because they use so many platters - all manufacturers to my knowledge use 500GB platters to achieve 2TB, meaning 4 platters.

One of the most reliable hard drives I've deployed onto dozens of computers recently is the new Seagate 7200.12 500GB. Why is it so reliable? In part because it's a one-platter hard drive. There's much less to go wrong inside there, plus the 7200.12 500GB is thinner than other drives, and because of the reduced size and # of platters, it runs cooler. And heat is the enemy of hard drives.
-----

I would personally stick to two 1TB drives at this moment in time, and move up to ~1.5+ TB drives once disk platter size increases again.

The Seagate 7200.11 1.5TB is an infamous drive because it coupled 4 platters with buggy firmware, that led to lots and lots of dead drives.

Meanwhile, the Seagate 7200.9 160GB, 7200.10 250GB, 7200.11 320GB and 7200.12 500GB are all above-average in reliability because they're all one-platter drives. Just something to think about...

jimhsu · May 28, 2010

An argument about the safety of 2 TB disks shouldn't be made based on its capacity, but rather the intrinsic properties of such a disk.

Consider getting 2 x 1 TB drives instead of 1 x 2 TB drives. Neglecting all other factors, the failure rate PER DRIVE should be about the same. Say that the annual failure rate (AFR) is 1% for each drive. You then proceed to store 2 TB of data.

With the 2 TB disk, the probability of failure after 1 year is 0.01. For RAID 1, P(failure) = 0.01^2 = 1 x 10^-4 in the best case (no covariance, no controller failure), or if failures are completely correlated, P(failure) = 0.01. For RAID 0 or JBOD, P(failure) = 1 - (0.99)^2 = 0.0199 ~ 0.02. Predicting failure for things like ZFS is much more complicated. If no data loss is acceptable, you are at best equal, or at worse twice as worse off with 2 x 1 TB disks, simply because you have more disks that can fail. Or you can use RAID 1 and store less data. BER doesn't make a difference, because the total capacity of data stored is 2 TB (not in RAID 1); there is no evidence that 2 TB disks have a higher BER than smaller capacity disks. The argument extends to any number of smaller disks; if you are not using extra space for redundancy, whether as RAID 5, parity, PAR2, ZFS, backups, whatnot, you are only increasing your likelihood of experiencing a disk failure than by using a single huge disk.

As far as UREs are concerned, we simply need more CRC. The RAW failure rate for hard drives is astonishing ( http://forums.anandtech.com/showthread.php?t=2071008&highlight= ), and UREs simply express the statistical probability that enough failures occur in a sector to make the CRC useless. The probability of this happening is on the order of 10^-11 or less, per bit (see the slide).

So in other words, the capacity of a 2 TB disk should not be a deterrant to using one. Other members however addressed other real problems, e.g. bad firmware, multiple platters, etc.

taltamir · May 28, 2010

Barfo said:
So I'm better off with 2 1 GB drives...oh well :/

actually my point was that, statistically since the chance of a drive failing is less than 50%, if the failure rate is the SAME for 2TB and 1TB drive, 2x1TB drives means that the overall chance of EITHER drive failing is slightly higher then the chance of the 2TB drive failing, but the overall chance of loosing all your data (aka both drives failing at once) is much lower than that of the 2TB drive failing...

but both cases you are taking a big risk by having data that isn't backed up or redundant.
I would say that it is worth your time and money to go 2x2TB in RAID1 (mirroring), and perform a backup of your most important data. (an "important documents" folder should be small enough to burn unto a CD, etc.)

Also, as others have mentioned, the chances of failure of a 2TB and 1TB drive are not exactly the same, there is the number of platters to take into account, etc.

C1 · May 29, 2010

MTBF and BER are included as performance characteristics of of the HDD specification (at least that is so with Hitachis). Doubling the number of drives like HDDs should over double the probability of a failure (ie, sample space for failure now is HDD1 or HDD2 or HDD1 and HDD2 - probability 101?) ), but as has been pointed out, the chance of losing all the data simultaneously is less for any one failure event if the data are distributed or backed up on each HDD.

Mark R · May 29, 2010

Golgatha said:
Exactly, RAID does not equal a backup solution. Multiple copies of the same data on physically separate media do equal a backup solution. Hopefully one copy is stored in a fire and waterproof safe, or even better, off site. RAID is only useful for increased speed or to ensure greater uptime for mission critical servers or workstations.

RAID is useful, in that it makes a data store less vulnerable to physical damage or degradation. However, it's not a backup as you rightly say.

When handling large volumes of data, e.g. 2TB up - the BER does need to be considered on the backup as well. So, if you're backing up 10 TB - you'd better make sure that you use RAID for your backup - otherwise, there's a substantial risk that your backup may be unreadable if you need to restore.

lsv · May 29, 2010

All this talk of failing 2TB HDs made me go out and stick an old 80gig SATA in Vantec Nexstar to back up all my samples for my work. Thanks guys

taltamir · May 29, 2010

lsv said:
All this talk of failing 2TB HDs made me go out and stick an old 80gig SATA in Vantec Nexstar to back up all my samples for my work. Thanks guys

it WILL fail eventually... And google has proved that drives are most likely to fail when they are under 1 year or over 5...

Its 80GB, it shouldn't be too expensive to back up.

pitz · May 29, 2010

3.5" drives used to use 11-12 platters to achieve capacities such as 2.1gb or 4.3gb. Failures were not an epidemic in those drives (ie: Seagate Barracuda 2, Barracuda 4, etc.). Electronics in those particular Seagate drives, believe it or not, were more prone to failure, than the mechanicals.

So don't worry that the latest Hitachi drive has 5 platters, while other drives only have 3 or 4. Just establish an overall data management and data backup framework, that is more expansive than strictly RAID-1 or RAID-5, and you'll be all right.

Personally, I believe strongly in using iSCSI and remote booting whenever possible, simply to get the number of drives installed down, and to manage them in an intelligent fashion. Using a package such as gPXE or hardware/firmware based tools such as those Intel NICs.

taltamir · May 29, 2010

3.5" drives used to use 11-12 platters to achieve capacities such as 2.1gb or 4.3gb. Failures were not an epidemic in those drives (ie: Seagate Barracuda 2, Barracuda 4, etc.). Electronics in those particular Seagate drives, believe it or not, were more prone to failure, than the mechanicals.

How the heck do you stuff 12 platters into 3.5" drive? Now if it was a much larger drive I could understand, but that makes no sense...

And electronics for 12 platters are much more complex (and thus failure prone) then the electronics for a 1 platter drive...

pitz · May 29, 2010

taltamir said:
How the heck do you stuff 12 platters into 3.5" drive? Now if it was a much larger drive I could understand, but that makes no sense...

Its hard, but Seagate (Fujitsu, Quantum, Micropolis, etc.) did it routinely in the 1990s. Check out the ST15150N/ST15150W drives at the Seagate website. 11 data platters, 1 servo platter, 4.3Gb total capacity. All in a 3.5" package. Prices on such drives were typically in the $1300-$1500 range.

(most drives until the very late 1990s needed a dedicated servo platter, so as to provide tracking data so the feedback control system in the drive's actuator could line up the heads properly. Only when DSP's improved enough in performance were drive builders able to go to so-called 'embedded servo' solutions. Of course, PRML was also introduced in the late 1990s, which requires an absolute ton of DSP power because it is based on probabalistic calculations).

And electronics for 12 platters are much more complex (and thus failure prone) then the electronics for a 1 platter drive...

Not significantly. Its just a mux. Its all done in solid state components, so the statistical significance of failure is not meaningfully augmented. I'm not sure what used to fry on the old Seagates, but it was something on the logic boards, as opposed to the mechanical assembly. Merely swapping the boards almost always brought the drives back to life.

Emulex · May 29, 2010

6 discs - 11 data heads and 1 servo head - alot of weight you have to use a fast seek to gain momentum to the target then a fine adjustment or two to lock the servo track.

take a look at 3.5" cheetah 15K.7 compared to savvio 10K (2.5") sas drives - the platters are the same size (mostly). but you can see how much beefier the design is with the cheetah.

those new westy drives have two motor/seek design like two hard drive in one (raid-0) i wonder how much performance/reliability there is compared to two drives.

taltamir · May 29, 2010

pitz said:
Its hard, but Seagate (Fujitsu, Quantum, Micropolis, etc.) did it routinely in the 1990s. Check out the ST15150N/ST15150W drives at the Seagate website. 11 data platters, 1 servo platter, 4.3Gb total capacity. All in a 3.5" package. Prices on such drives were typically in the $1300-$1500 range.

http://www.seagate.com/support/disc/manuals/scsi/28880d.pdf

According to this, They sold a device comprised out of 4x3.5" disks in an enclosure that took TWO 5 1/4 slots, and contained the 4 drives, as well as a controller, and a single SCSI output to connect to them...
So, as I said, cramming 12 PLATTERS into 3.5" drive is physically impossible due to the SIZE of each platter.

Not significantly. Its just a mux. Its all done in solid state components, so the statistical significance of failure is not meaningfully augmented. I'm not sure what used to fry on the old Seagates, but it was something on the logic boards, as opposed to the mechanical assembly.

It is a very complicated device... and solid state does not preclude failure. HDD and CD and Floppy are the only non solid state components in computers... and every component can and does fail eventually.
Plenty of people had failure in their SSD, and the more complex something is, the more likely it is that something will fail.

pitz · May 29, 2010

taltamir said:
http://www.seagate.com/support/disc/manuals/scsi/28880d.pdf

According to this, They sold a device comprised out of 4x3.5" disks in an enclosure that took TWO 5 1/4 slots, and contained the 4 drives, as well as a controller, and a single SCSI output to connect to them...
So, as I said, cramming 12 PLATTERS into 3.5" drive is physically impossible due to the SIZE of each platter.

No, the ST15150/ST15150W fit into a 3.5" bay. I don't know about any supplementary mounting hardware they sold. They were 1.6" tall instead of the typical 1" tall 3.5" drives that you see today. That's the only difference. Oh yeah, and they would put any of todays' drives to shame in terms of heat generation.

Go take a look at that manual again. The Barracuda 4 series used 3.5" platters.

It is a very complicated device... and solid state does not preclude failure. HDD and CD and Floppy are the only non solid state components in computers... and every component can and does fail eventually.
Plenty of people had failure in their SSD, and the more complex something is, the more likely it is that something will fail.

The probability of failure is different by one or two order of magnitudes for solid state versus mechanical components. Most "SSD failures" that are reported today are the result of firmware malfunction or user malfunction, not of actual hardware failure. Flash chips, of course, have cycle limits, but that's due to their persistent nature.

Are 2 TB HDs more failure prone than disks with less capacity?

Lifer

Diamond Member

Senior member

Lifer

Lifer

Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Elite Member

Diamond Member

Senior member

Lifer

Platinum Member

Diamond Member

Golden Member

Lifer

Senior member

Lifer

Senior member

Diamond Member

Lifer

Senior member