I read often that Raid 10 is preferred over Raid 5 for reliability and performance.
I would like to share my thoughts with you for critics/comments.
1) Small writes are surely slower, and this might be a problems for somes. Anyhow if you use linux, and you setup a software Raid 5, you can have performances equal or greater than Raid 10, by using aggressive write cache (Ram is so cheap that for 60$ you can have a 8 GB write cache), a modern filesystem (ZFS or EXT4) and lazy flush (240 seconds or more). Obviously an UPS is mandatory.
2) Reliability is worse. I built a statistical model that shows that this.
Hypotesis:
1) Each disk failure is independent from each other. This means that disk failures depends on internal causes rather than external, like buggy controller, misplaced disks, improper solicitation, ....
2) Declared MTBF is a lie. Taking for example a caviar green 2TB, it has a declared MTBF of 1,2 million hours, or 136 year, which I consider an outright lie, at least for continuous use. A more reasonable MTBF of 10 years is taken in account.
This may be viewed as controversial by someone, also because if we take in account the original MTBF as true the conclusions I came to not hold true anymore.
Anyhow thinking that a disk, ANY disk, can work for 136 years, even in idle, is simply an unreal hypothesis.
3) Critical faliure is:
Raid 5 = 2 disks are broken at a given time, thus data is lost.
Raid 10 = 3 disks , or two disks of the same battery are broken at a given time, thus data is lost.
This might be seen as another critical hypothesis by some readers, because it doesn't take in account that when a disk fails in Raid 10 it can be replaced without data loss.
Anyhow I didn't consider this for several reasons:
- On very large numbers (e.g.: a datacenter) the two models are the same.
- Replacing a disk is a cost and a burden, thus my model leans toward minimal maintenance.
- Also with Raid 5 you can replace a disk after a failure, without data loss.
- Calculations would have been much more complex 😛
Conclusions
After one year, the Critical Failure has a probability of 2.8% on Raid 5 and 8.6% on Raid 10. After two years is 10.4% for Raid 5 and 28.8% for Raid 10.
This show us that the benefit of having 2 disks for parity is outweighed by the cumulative risk of having 4 disks rather than 3.
The critical point is that if the correct two disks (i.e.: two disks that belong to the same battery) fails, then the system is compromised.
Obviously Raid 6 is safer than Raid 5 and 10, because any two disks can fail without a problem.
Thus Raid 5 is the best in Reliability/Dollar, Performance/Dollar and Functions over Complexity in the Raid segment (Raid 0, 1, 10, 01, 6).
Send me a PM with your mail if you want to receive an ODT spreadsheet which takes MTBF, years and hours of function and gives back Critical Failure probability for Raid 5 and 10.
I would like to share my thoughts with you for critics/comments.
1) Small writes are surely slower, and this might be a problems for somes. Anyhow if you use linux, and you setup a software Raid 5, you can have performances equal or greater than Raid 10, by using aggressive write cache (Ram is so cheap that for 60$ you can have a 8 GB write cache), a modern filesystem (ZFS or EXT4) and lazy flush (240 seconds or more). Obviously an UPS is mandatory.
2) Reliability is worse. I built a statistical model that shows that this.
Hypotesis:
1) Each disk failure is independent from each other. This means that disk failures depends on internal causes rather than external, like buggy controller, misplaced disks, improper solicitation, ....
2) Declared MTBF is a lie. Taking for example a caviar green 2TB, it has a declared MTBF of 1,2 million hours, or 136 year, which I consider an outright lie, at least for continuous use. A more reasonable MTBF of 10 years is taken in account.
This may be viewed as controversial by someone, also because if we take in account the original MTBF as true the conclusions I came to not hold true anymore.
Anyhow thinking that a disk, ANY disk, can work for 136 years, even in idle, is simply an unreal hypothesis.
3) Critical faliure is:
Raid 5 = 2 disks are broken at a given time, thus data is lost.
Raid 10 = 3 disks , or two disks of the same battery are broken at a given time, thus data is lost.
This might be seen as another critical hypothesis by some readers, because it doesn't take in account that when a disk fails in Raid 10 it can be replaced without data loss.
Anyhow I didn't consider this for several reasons:
- On very large numbers (e.g.: a datacenter) the two models are the same.
- Replacing a disk is a cost and a burden, thus my model leans toward minimal maintenance.
- Also with Raid 5 you can replace a disk after a failure, without data loss.
- Calculations would have been much more complex 😛
Conclusions
After one year, the Critical Failure has a probability of 2.8% on Raid 5 and 8.6% on Raid 10. After two years is 10.4% for Raid 5 and 28.8% for Raid 10.
This show us that the benefit of having 2 disks for parity is outweighed by the cumulative risk of having 4 disks rather than 3.
The critical point is that if the correct two disks (i.e.: two disks that belong to the same battery) fails, then the system is compromised.
Obviously Raid 6 is safer than Raid 5 and 10, because any two disks can fail without a problem.
Thus Raid 5 is the best in Reliability/Dollar, Performance/Dollar and Functions over Complexity in the Raid segment (Raid 0, 1, 10, 01, 6).
Send me a PM with your mail if you want to receive an ODT spreadsheet which takes MTBF, years and hours of function and gives back Critical Failure probability for Raid 5 and 10.
Last edited: