Reliability of SSDs?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
Personally I have found a single SSD quite a lot more reliable than HDD's.

...

In many ways I am disappointed with the high failure rate of SSDs, they should be acting like solid state components like CPUs but right now they are moderately better hard drives in failure rate.

:confused:
 

Anteaus

Platinum Member
Oct 28, 2010
2,448
4
81
Well, the OP was interested in running 2X256GB SSDs in Raid 0 coming from 4X250 HDD so I'm not sure if capacity is that important here. Also, he mentioned trying to avoid "significant" downtime, described as days. Obviously any downtime is bad, but I'm thinking an hour or two is reasonable given the description. From what I gather, it is more important that the workstation is operational and less important it cover large storage needs as this is a workstation and not a file server. I also got the feeling that this was a system drive since he's talking downtime.

This might not be the actual situation but that is how I read it. I do agree 500GB SSDs are expensive so continuing with Raid 5 with HDDs is still an option.
 

npaladin-2000

Senior member
May 11, 2012
450
3
76
I would agree, going RAID5 with an add-on controller would be the ideal solution with speed and reliability being of high importance. No more than three drives though: the more drives you add to a RAID5, the slower it gets.

If that is cost prohibitive, the next best solution would be a single SSD and do periodic image-based backups (which you would have to do anyway if you went RAID0). While RAID0 would theoretically double the performance of an SSD, we're not talking about reducing load times from 10 seconds to 5 seconds here. We're talking about cutting a second in half at the most. At this point with most data the performance difference isn't meaningful, but you also double your chances of hardware failure, since the loss of either of the two drives will kill your volume and cause you to undergo a long restore process. Using one drive reduces the chances of that, allows the OS to TRIM the drive, and also eliminates the RAID as a point of failure.

If you're satisfied with your current performance, you'll find a single SSD to be blisteringly fast. They're a whole new paradigm of thinking, and I honestly don't see the point of RAID with them at all except for the following scenarios:

1. Redundancy, as in RAID1 or RAID5, because you can't afford for your system to be down when a drive dies.
2. Databases being accessed by truly massive numbers of processes simultaneously.
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76

In the same period as my SSD I have happily run 4 CPUs with no failures and I would expect that to be true for the next decade as well. Memory has equally been reliable once you find sticks that work (why does RAM so often arrive faulty?!). There is a wide gulf between 25% chance of failure a year and less than 1%. The statements are consistent when you look at the actual probability of failure.

I draw a reliability line such that SSDs in my mind fall under the "backup because it will fail" category and not "one will do because if it fails you were really unlucky" category. Its that categorisation I am disappointed with, they should be as reliable as memory but with a limited lifetime. Yet in practice they are a long way worse than that expectation but not as bad as HDDs.
 

npaladin-2000

Senior member
May 11, 2012
450
3
76
I draw a reliability line such that SSDs in my mind fall under the "backup because it will fail" category and not "one will do because if it fails you were really unlucky" category.

FYI, you shouldn't treat SSDs differently. ALL STORAGE should be "backup because it will fail." Never assume that it won't. If you lose your data, it's your fault for not having a backup, not the manufacturer's fault for not building a perfect storage device. :cool:
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
No more than three drives though: the more drives you add to a RAID5, the slower it gets.

Uhh, no.

If you're satisfied with your current performance, you'll find a single SSD to be blisteringly fast. They're a whole new paradigm of thinking, and I honestly don't see the point of RAID with them at all except for the following scenarios:

1. Redundancy, as in RAID1 or RAID5, because you can't afford for your system to be down when a drive dies.
2. Databases being accessed by truly massive numbers of processes simultaneously.

The reason I was looking at two 256GB SSDs rather than one 512GB SSD is that the 512GB drive was more expensive than two 256GB drives.

In the same period as my SSD I have happily run 4 CPUs with no failures and I would expect that to be true for the next decade as well. Memory has equally been reliable once you find sticks that work (why does RAM so often arrive faulty?!). There is a wide gulf between 25% chance of failure a year and less than 1%. The statements are consistent when you look at the actual probability of failure.

I draw a reliability line such that SSDs in my mind fall under the "backup because it will fail" category and not "one will do because if it fails you were really unlucky" category. Its that categorisation I am disappointed with, they should be as reliable as memory but with a limited lifetime. Yet in practice they are a long way worse than that expectation but not as bad as HDDs.

The :confused: was because of the dueling quotes, but it looks like I intepreted you correctly.

I'm dissapointed that SSDs don't seem to be as reliable as other solid-state components. It looks like the technology hasn't matured as much as I thought.

I vote 1x256 gb system ssd, with a backup image on a raid 1 or 5 array of hdds.

Not enough capacity :(
 

BrightCandle

Diamond Member
Mar 15, 2007
4,762
0
76
I have found the M4 drives being about the same price per GB at 512GB as the 256GB drives. its not true of all the SSDs but some of them are more reasonably priced at 512 than others.
 

npaladin-2000

Senior member
May 11, 2012
450
3
76

Uhh yes, one of these days I'll provide my test data but I've got a lot of experience with large disk arrays.


The reason I was looking at two 256GB SSDs rather than one 512GB SSD is that the 512GB drive was more expensive than two 256GB drives.

Pay the extra. It's worth it. If you shop around, it's no more than a $100 premium. Otherwise, learn to segregate your boot and programs on one of the SSDs and your data on the other. If reliability is important at all you do NOT use RAID0. Ever. RAID0 stands for Rickety Array of Inreliable Disks...Oh.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
Uhh yes, one of these days I'll provide my test data but I've got a lot of experience with large disk arrays.

RAID 5 is basically just RAID 0 with the addition of parity data, which is also striped across each disk. Barring a controller bottleneck, the performance should increase linearly as more drives are added to the array, and my own experiences working with larger arrays bears that out. If you have data that shows otherwise, I would certainly be interested in seeing it.
 

npaladin-2000

Senior member
May 11, 2012
450
3
76
RAID 5 is basically just RAID 0 with the addition of parity data, which is also striped across each disk. Barring a controller bottleneck, the performance should increase linearly as more drives are added to the array, and my own experiences working with larger arrays bears that out. If you have data that shows otherwise, I would certainly be interested in seeing it.

The controller is exactly where the bottleneck is but it's not what you're thinking. The more drives you add the more data is written for each stripe set. That means more parity data has to be calculated and written to the drive holding parity for that particular stripe. That's what slows it down, the added calculation and extra writes. Some controllers handle it more gracefuly than others, and it's less noticeable on SAS than it is on SATA, but they do slow down (it's really REALLY noticeable with SATA drives actually). If you want a large fast RAID array you use RAID10, not RAID5. No parity calculation needed. Of course, it's obviously more expensive, but that's the tradeoff...
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
Interesting white paper on SSD reliability.

Differential RAID: Rethinking RAID for SSD Reliability

From the abstract:
SSDs exhibit very different failure characteristics compared to hard drives. In particular, the Bit Error Rate (BER) of an SSD climbs as it receives more writes. As a result, RAID arrays composed from SSDs are subject to correlated failures. By balancing writes evenly across the array, RAID schemes can wear out devices at similar times. When a device in the array fails towards the end of its lifetime, the high BER of the remaining devices can result in data loss.

Sigh...
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
The controller is exactly where the bottleneck is but it's not what you're thinking. The more drives you add the more data is written for each stripe set. That means more parity data has to be calculated and written to the drive holding parity for that particular stripe. That's what slows it down, the added calculation and extra writes. Some controllers handle it more gracefuly than others, and it's less noticeable on SAS than it is on SATA, but they do slow down (it's really REALLY noticeable with SATA drives actually). If you want a large fast RAID array you use RAID10, not RAID5. No parity calculation needed. Of course, it's obviously more expensive, but that's the tradeoff...

I understand that RAID 10 will be faster than RAID 5 in writes, but your claim that adding more drives to a RAID 5 array makes the array slower than without the additional drive hasn't matched my experience at all.
 

npaladin-2000

Senior member
May 11, 2012
450
3
76
I understand that RAID 10 will be faster than RAID 5 in writes, but your claim that adding more drives to a RAID 5 array makes the array slower than without the additional drive hasn't matched my experience at all.

I compared 4 and 5 drive RAID5 arrays running on the same controller versus a single 10 disk RAID5 array (not all three at the same time of course). This particular test case was SATA due to the need of a single extremely large storage volume. Night and day, in favor of the 4-drive RAID5. Ended up doing a couple of those with the actual filesystem spanned between them.

Now, I do deal with really REALLY big arrays here, so maybe your experience doesn't run to 10+ drive arrays? Not everyone's does...

Incidentally that's an interesting whitepaper you posted. I figure the defining characteristic in enterprise SSDs won't be SAS, but it will be ECC capability, and possibly an addition to the drive communication protocols to alert the OS when a drive's wear limit is nearing, and put it in a predicted fail state for replacement. Diff-RAID is just going to bog RAID5 down even further, and I already mentioned why people avoid it in favor of RAID10.
 

theevilsharpie

Platinum Member
Nov 2, 2009
2,322
14
81
I compared 4 and 5 drive RAID5 arrays running on the same controller versus a single 10 disk RAID5 array (not all three at the same time of course). This particular test case was SATA due to the need of a single extremely large storage volume. Night and day, in favor of the 4-drive RAID5. Ended up doing a couple of those with the actual filesystem spanned between them.

Now, I do deal with really REALLY big arrays here, so maybe your experience doesn't run to 10+ drive arrays? Not everyone's does...

I tend not to make large RAID 5 arrays due to the stress placed on the array when a disk fails. However, I've run RAID 6 arrays with up to 36 disks, and performance has always scaled linearly. The scaling isn't perfect (the performance improvements starts to taper off after a while), but I haven't yet encountered a situation where adding a drive has decreased performance.
 

jwilliams4200

Senior member
Apr 10, 2009
532
0
0
I tend not to make large RAID 5 arrays due to the stress placed on the array when a disk fails. However, I've run RAID 6 arrays with up to 36 disks, and performance has always scaled linearly. The scaling isn't perfect (the performance improvements starts to taper off after a while), but I haven't yet encountered a situation where adding a drive has decreased performance.

Right, with any decent hardware RAID controller or good software RAID, both RAID-5 and RAID-6 get faster with additional drives added. The other poster must have a poor RAID system, or a flawed measurement.

Also, for the same number of disks, large sequential writes will be faster with RAID-5 than with RAID-10, except with some LSI RAID cards which seem to be flawed with writes to striped RAID. In this context, "large" means at least a full stripe width. In that case, the RAID-5 write speed will be equal to about (N-1) times the speed of a single drive minus some overhead for RAID. In comparison, a RAID-10 would have a large sequential write speed of N/2 minus some overhead for RAID.

In contrast, for small writes and random writes, the RAID-10 will usually be faster than the RAID-5, since RAID-10 does not suffer from the read-modify-write penalty.
 
Last edited:

Burner27

Diamond Member
Jul 18, 2001
4,452
50
101
Raid 5 with 3 256 GB drives is your best trade off. Write performance is somewhat reduced but you get some redundancy to reduce your downtime at the lowest cost point while maintianing the amount of space you need. You aren't going to benefit from trim at this point so you need to get an SSD with great trim support. Raid 1 for 512GB is more expensive and wont bring anything more in performance and all other solutions are going to cost hours on the failure of a disk.

To get safer or faster you are going to have to trade off against additional cost. Personally I have found a single SSD quite a lot more reliable than HDD's. I have had 4 SSDs running for over 2 years with a single failure while in the same time with 12 hard drives I have had 13 failures. In many ways I am disappointed with the high failure rate of SSDs, they should be acting like solid state components like CPUs but right now they are moderately better hard drives in failure rate.


RAID 0 with 3 x 256GB Crucial M4. FW 000F

 

Burner27

Diamond Member
Jul 18, 2001
4,452
50
101
And when any one of the three has an issue, what happens?

I would RMA it and get a replacement under warranty. Anyone who dabbles with RAID in ANY form does backups of their data. Anyone that relies on a HDD/SSD in their PC does backups.


What will you do when your single 512GB SSD has an issue? ^^^^^^^

;)
 

npaladin-2000

Senior member
May 11, 2012
450
3
76
I would RMA it and get a replacement under warranty. Anyone who dabbles with RAID in ANY form does backups of their data. Anyone that relies on a HDD/SSD in their PC does backups.


What will you do when your single 512GB SSD has an issue? ^^^^^^^

;)

I'll restore from a backup, and use my Android phone, Android tablet, or one of the extra laptops at work until I get it. But the OP said he can't tolerate significant amounts of downtime, hence the shoving towards RAID5. Since RAID1 is cost prohibitive. Barring that, using one SSD over two in a Not-RAID-0 cuts the chances of catastrophic failure in half.
 

Burner27

Diamond Member
Jul 18, 2001
4,452
50
101
I'll restore from a backup, and use my Android phone, Android tablet, or one of the extra laptops at work until I get it. But the OP said he can't tolerate significant amounts of downtime, hence the shoving towards RAID5. Since RAID1 is cost prohibitive. Barring that, using one SSD over two in a Not-RAID-0 cuts the chances of catastrophic failure in half.

I am not disagreeing with you on any of the points except one. When you Raid 0, you have multiple points of failure. If you have a single drive you have a single point of failure. Neither situation is acceptable. Having two drives in Raid 1 is better for uptime. Raid5 introduces parity issues and overhead that may/may not hinder performance.

If uptime and reliability is of chief importance, the OP should be running RAID 1.
 
Last edited:

npaladin-2000

Senior member
May 11, 2012
450
3
76
I am not disagreeing with you on any of the points except one. When you Raid 0, you have multiple points of failure. If you have a single drive you have a single point of failure. Neither situation is acceptable. Having two drives in Raid 1 is better for uptime. Raid5 introduces parity issues and overhead that may/may not hinder performance.

True, but 3 x 256 GB SSDs is cheaper than 2 x 512 GB in this case, and there is a price limitation. RAID5 provides fault tolerance at a lower price than RAID1 for 512 GB of storage, which the OP needs.
 

holden j caufield

Diamond Member
Dec 30, 1999
6,324
10
81
It seems if your biggest need is minimizing downtime, then you might be seeking a solution that is unnecessarily complex. A raid 0 or 5 failure is going to require intervention on some level, so neither solution means you can fire and forget.

My suggestion is to get out of raid completely. Spend a little money and buy two of the most reliable SSDs you can find with enough capacity in a single drive configuration. Use one SSD as your primary drive and keep the second one on hand for swap out should the first fail. Keep two backups if you really are worried.

Raid has it's place, but in this particular case I think your actually increasing risk while at the same time actually increasing potential downtime due to drive failure because then your dealing with raid arrays and secondary drive controllers.

With a simple drive swap, you'd be back in business within minutes. In fact, in this particular case I'd still recommend this setup if you stuck with HDDs.

I do this with my laptop and 1 other important pc. I clone it weekly and if anything happens, I'm up in minutes.