Thought I'd post some RAID applications for probability.

EarthwormJim · Dec 9, 2010

Just thought I'd post a quick applications paper I had to do for my probability class. I chose to base it on RAID 5 and 6; looking at the probability of an array failing to rebuild due to UREs.

Anyway, if I'm completely wrong, feel free to shoot me down

For performance and reliability purposes, computer hard disk drives may be paired together using a
scheme called RAID (Redundant Array of Independent Disks). RAID has quite a few different levels.
Level 0 stripes the data written across the array. As an example, two hard drives can be paired
together. One drive stores the odd number of bits, while the other drive stores the even number
of bits. This has the advantage of increasing the speed of the combined drives. Almost double
the read and write performance can be extracted. The downside to using RAID 0 is that you double
the chance that you will lose your data. Since the data depends on having both hard drives, if one
drive fails; the data on the other drive is unusable. Figure 1.1, shown below, illustrates how the
data is saved in a two drive RAID 0 array.

Figure 1.1: RAID 0 Array. [1]

The opposite of RAID 0 is RAID 1. RAID mirrors the data. Going back to the example two drives
discussed for RAID 0, in RAID 1, each drive has the same exact set of data. No performance gains
are realized (possibly some read gains depending on the controller), however the reliability of
the system (data) has essentially doubled. Figure 1.2 shows how the data is arranged in RAID 1.

Figure 1.2: RAID 1 Array. [1]

Another set of RAID levels are RAID 5 and RAID 6. These RAID levels essentially combine
the advantages of RAID 0 and RAID 1. RAID 5 requires a minimum of 3 disks while RAID 6 requires
a minimum of 4 disks. Using the exclusive or (XOR) function, a parity bit is created. So in
RAID 5, one bit is written to one disk, the next bit to the next bit, and using XOR, a parity
bit is written on the third disk. This cycle alternates between the drives so that every drive
has some parity bits and actual data. RAID 5 can expand to any number of drives. With RAID 5, one
drive can be lost and all the data can still be recovered. The total available data for a RAID 5
array is (n-1) * capacity of each drive, where n is the number of drives in the array. RAID 6 is
very similar to RAID 5; however two parity bits are generated rather than just one. As such, RAID 6
can lose two disks in the array and still recover all the data. RAID 6’s capacity is (n-2) * capacity
of each drive. Figures 1.3 and 1.4 illustrate RAID 5 and RAID 6 respectively.

Figure 1.3: RAID 5 Array. [2]

Figure 1.4: RAID 6 Array. [1]

The interesting points about RAID 5 and 6 are when they are in recovery mode. All hard disks
have a probability of experiencing an unrecoverable read error (URE). This probability is around 1 in
1014 reads for consumer level hard drives.[3] During a RAID rebuild, 100% of the data minus the failed
drive’s capacity, in the array must be read, and new parity or real data bits must be generated on the
replacement drive(s). As hard disk capacities have increased, the likelihood of reaching 1014 reads occurs.
With RAID 5, when the array is rebuilding, the array no longer has any protection from a read error or
drive failure. Should an unrecoverable read error occur, all of the data will be lost. RAID 6 can experience
a second drive failure or an unrecoverable read error during rebuild, however if two or more read errors occur,
all of the data in the array will be lost. Using probability theory, specifically Bernoulli trials; the
probability that a RAID 5 or RAID 6 array will fail during recovery, given that one drive has already failed,
can be calculated. Calculating this probability requires the use of Equation 1.1, where n is the number of bits
being read, k is the number of unrecoverable read errors, p is the probability of a read error, and lastly q is
simply 1-p.

Equation 1.1: P=n!/(k!*(n-k)!)*p^k*q^((n-k))

For RAID 5, the probability that it will fail requires that there be at least one read error. It is much
simpler to calculate this probability by doing one minus the probability of zero errors as shown below.

Equation 1.2: RAID 5 Restoration Failure Probability.
P(Rebuild Fail)=1-P(Rebuild)=1-P(0 Errors ∪1 Error)=1-[(#Bits)!/(0)!(#Bits-0)!]*(〖10〗^(-14) )^0 〖(1-〖10〗^(-14))〗^(#Bits)

Since the desired number of read error bits in Equation 1.1 is zero, the binomial coefficient simplifies to one.

Equation 1.3: P(Rebuild Fail)= 1-(1*(〖10〗^(-14) )^0 〖(1-〖10〗^(-14))〗^(#Bits)

RAID 6, due to its extra redundancy, requires at least two failures. Similarly with RAID 5, the probability
that a rebuild will fail can be found by computing one minus the probability that a rebuild will succeed. In order
for the build to succeed, no errors or one error must occur. Since no errors and one error are mutually exclusive,
their probabilities can be summed up. Thus the probability can be found as shown below:

Equation 1.4: RAID 6 Restoration Failure Probability.
P(Rebuild Fail)=1-P(Rebuild)=1-P(0 errors ∪1 Error)= 1-([(#Bits)!/(0)!(#Bits-0)!] *〖 (〖10〗^(-14) )〗^0 (1-〖10〗^(-14) )^(#Bits)+[(#Bits)!/(1)!(#Bits-1)!]*(〖10〗^(-14) )^1 (1-〖10〗^(-14) )^(#Bits-1) )

The Binomial coefficients in Equation 1.4 can be simplified.

Equation 1.5: P(Rebuild Fail)=1-(1*(〖10〗^(-14) )^0 (1-〖10〗^(-14) )^(#Bits)+(#Bits)*(〖10〗^(-14) )^1 (1-〖10〗^(-14) )^(#Bits-1))

The question proposed for this application, is whether or not consumer level hard drives are suitable for server use,
with their URE rate of 1 in 10^14 reads. A typical server array may include as many as twenty drives in each RAID array. Modern
disks come in capacities up to two terabytes ( 2*1012). There are eight bits in one byte, so there are 16*1012 bits in one hard drive.
Thus, twenty drives will have 320*1012 bits. During a rebuild with one drive failure, nineteen drives will need to be read, including
100% of their data, whether they are real data or parity bits. Thus 304*1012 will need to be read. Using Equations 1.3 and 1.5, the
probability of a RAID recovery failure for levels 5 and 6 can be computed respectively.

RAID 5: P(Recovery Failure)= 1-(1*(〖10〗^(-14) )^0*(1-〖10〗^(-14) )^(304*〖10〗^12 )= .952165

RAID 6: P(Recovery Failure)=1-(1*(〖10〗^(-14) )^0 (1-〖10〗^(-14) )^(304*〖10〗^12 )+(304*〖10〗^12 )*(〖10〗^(-14) )^1 (1-〖10〗^(-14) )^(304*〖10〗^12-1) )=.806365

As seen above, RAID 6 does significantly decrease the chance that an array will fail at recovering its data. Regardless,
in both cases with a 20 terabyte array, the chance of recovery is quite low; roughly 4% for RAID 5 and 20% for RAID 6. As
the numbers show, consumer level drives are definitely inadequate for server use, or any application where the data stored on
these drives is critical. For servers, a minimum of a drive with an unrecoverable error rate of 1 in 1015, is necessary and also
requires the use of RAID level 6. If RAID level 5 is desired, then drives with an URE of 1 in 1016 are required. Table 1.1 shows
the probability calculations for various drives with different URE rates and different RAID levels.

Probability of Restoration Failing

Drive URE| RAID 5| RAID 6
1 in 10^14 | 0.952049 | 0.806277
1 in 10^15 | 0.26196 | 0.037596
1 in 10^16 | 0.033188 | 0.003796
Table 1.1: Restoration Failure Probabilities for RAID 5 and 6 with Differing UREs.

All calculations were performed using the following matlab code:
% Bernoulli's Trials Project Program %
% Start with a clean slate in Matlab %
clc
clear
% Program Purpose for Matlab Paper %
fprintf('Enter the required values for the Bernoulli equation.These values \n')
fprintf('include the total number of bytes in the array, the probability that a drive \n')
fprintf('will suffer an URE. \n\n');
fprintf('Thank you for using our program. \n\n');

% Data Entry %
n = input('Please ENTER the number of drives: -> ');
d = input('Please ENTER the capacity of each drive in terabytes: ->');
p = input('Please ENTER the probability of an URE: -> ');
k = input('Please ENTER the RAID level: -> ');

capac = (n-1)*d;
capacBytes=capac *10^12;
bits=capacBytes*8;

% Calculations %

if k == 5
Pr = 1 - (p^0)*(1-p)^(bits);
else
Pr =1 - bits*(p^1)*(1-p)^(bits-1) - (p^0)*(1-p)^(bits-0);
end

fprintf('The probability that the array will fail at rebuilding is: %f' , Pr);

Cliffs:

For a 10 drive 20 terabyte array (2tb drives), these are the probabilities that the array will fail to recover during a rebuild.

Probability of Restoration Failing

Drive URE| RAID 5| RAID 6
1 in 10^14 | 0.952049 | 0.806277
1 in 10^15 | 0.26196 | 0.037596
1 in 10^16 | 0.033188 | 0.003796

KillerBee · Dec 9, 2010

This statement doesn't make sense to me:

As seen above, RAID 6 does significantly decrease the chance that an array will fail at recovering its data. Regardless, in both cases with a 20 terabyte array, the chance of recovery is quite low; roughly 4% for RAID 5 and 20% for RAID 6.

How can Raid6 have a worse chance of recovery than Raid5

EarthwormJim · Dec 9, 2010

vtx1300 said:
This statement doesn't make sense to me:

As seen above, RAID 6 does significantly decrease the chance that an array will fail at recovering its data. Regardless, in both cases with a 20 terabyte array, the chance of recovery is quite low; roughly 4% for RAID 5 and 20% for RAID 6.

How can Raid6 have a worse chance of recovery than Raid5

I worded that weird. A RAID 5 array that big, has a chance of being recovered of 4%, RAID 6 has a chance of 20%.

So RAID 6 is almost 5 times more likely to be repaired.

What the table lists is the compliment of this. So for RAID 5 the chance of NOT being recovered is 96% and RAID 6 is 80%.

KillerBee · Dec 9, 2010

96% of raid5 recoveries don't work? ...come on that's rediculous!
then again I've never built a 10-disk raid5 using consumer disks so will have to bow to your math skills

EarthwormJim · Dec 9, 2010

vtx1300 said:
96% of raid5 recoveries don't work? ...come on that's rediculous!
then again I've never built a 10-disk raid5 using consumer disks so will have to bow to your math skills

If you think about it, a 20 terabyte array is 8*2*10^13 bits = 1.6*10^14 bits (using hard drive manufacturers tera/giga definition). If a drive reads 1 out of 10^14 bits wrong, capacity wise, you're already past that 10^14 value.

KillerBee · Dec 9, 2010

Thats why I would break up those 10 disks into 2 separate raid groups ..that would bring up the numbers wouldn't it?

EarthwormJim · Dec 9, 2010

vtx1300 said:
Thats why I would break up those 20 disks into 2 or 3 separate raid groups

Or use server grade hardware. The reason why I chose such a large array is that I have seen a few forum posts (not necessarily here) of people making a storage server for their home. Often using 5+ 2tb consumer level drives with RAID 5.

I was curious to see if they basically had any protection from RAID 5 or not.

Not to mention this: http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

KillerBee · Dec 9, 2010

Always felt better when had a good tape backup - but with so much data the backup window shrank too small...remote drives are the only way now.

KillerBee · Dec 9, 2010

ps - what are the 5-disk raid5 and raid6 chances?

EarthwormJim · Dec 9, 2010

vtx1300 said:
ps - what are the 5-disk raid5 and raid6 chances?

Using an unrecoverable read error rate of 1 in 10^14 bits (consumer drives):

Raid 5: 47.2% chance that the array will NOT rebuild.

Raid 6: 13.5% chance that the array will NOT rebuild.

It's semi exponential, failure chance vs capacity for a RAID level. Once you cross a certain threshold, you basically have a sure chance of failure.

RebateMonger · Dec 9, 2010

Just a reminder.

Although I doubt there are any exact numbers, there's probably a "non-trivial" number of correlated disk failures in an array. Many arrays have substantial numbers of "identical" disks in the array. Given that these have the same design, same manufacturing processes, have the same running time, and are connected to the same power supply, whatever caused the original disk failure may affect other disks in the array.

Also, although not directly disk-related, if you read reports of RAID failures on these Forums, it's apparent there can be mistakes by humans and mistakes by RAID management software and firmware that can have adverse affects on array recovery.

Emulex · Dec 9, 2010

yeah everyone knows to run two arrays and stripe them with extra drives

11 + 1 raid-5
11 + 1 raid-5

striped to raid-0

Nobody uses SATA drives 2TB in a 20 drive raid. That is nuts. i would shoot the customer. stick to 600GB SAS2 (2.5 or 3.5" now)

Drive scrubbing was not considered. Arrays start scrubbing themselves almost immediately these days. that has to alter the %% a little bit.

But yeah raid-10 owns. i hate raid-5; raid-6 bogs down even the biggest storage devices

Mark R · Dec 9, 2010

I put together a spreadsheet to calculate these probabilities a while ago.

Of course, like all these things it doesn't take account of correlated failures (which are more common than you might expect) and doesn't simulate high-availability RAID like RAID 51/61. Nor does it take into account data scrubbing (although this can be corrected by adjusting the BER value).

Link to spreadsheet

Emulex · Dec 9, 2010

yeah the biggest challenge is presenting a dual path (since SAS drives are all dual ported now) with two chassis. even the DL380 comes standard with this today.

I'll be curious to see if raid-3 comes back into fashion with SLC

Mark R · Dec 9, 2010

Emulex said:
yeah the biggest challenge is presenting a dual path (since SAS drives are all dual ported now) with two chassis. even the DL380 comes standard with this today.

I'll be curious to see if raid-3 comes back into fashion with SLC

There's good reason to use RAID 3 on SSDs. Although, to get the lowest correlated failure rate, you would ideally have to reshape the array at the time of parity drive replacement, in order to ensure that the non-parity drives receive uneven wear. (i.e. move the parity onto a drive which previously contained data).

The issue here is that reshaping an online array is a less stable process than online capacity expansion - while you can expand an array without losing redundancy at any point - you cannot swap a parity drive with a data drive without compromising redundancy - hence some sort of more involved process is needed (be it backup of the array, or temporarily adding additional redundancy during the reshape process).

Emulex · Dec 9, 2010

well if you level the error correcting data in spare time (ssd has alot of spare time) you might be able to spread the cheer.

I'm sure with super mad modern ECC you can get a higher ratio too.

ie 32gb SLC can ecc for 8 160GB drives. so if you then spread that 32gb across those 8 160gb you'd have a hybrid raid that could rebuild it self really fricken fast.

also any unused portion of the drives (leveling again) would not need any ecc since they are unused.

With eMLC (reserve=slc, storage=mlc) it will be interesting to see what they do in february

Mark R · Dec 9, 2010

Emulex said:
well if you level the error correcting data in spare time (ssd has alot of spare time) you might be able to spread the cheer.

I'm sure with super mad modern ECC you can get a higher ratio too.

ie 32gb SLC can ecc for 8 160GB drives. so if you then spread that 32gb across those 8 160gb you'd have a hybrid raid that could rebuild it self really fricken fast.

also any unused portion of the drives (leveling again) would not need any ecc since they are unused.

With eMLC (reserve=slc, storage=mlc) it will be interesting to see what they do in february

Maybe I'm missing what you're saying.

However, I don't see how a 32 GB SLC drive would be a useful addition to a 8x160 GB array. You could use it as ECC - to detect and repair corruption - but it cannot stand in as a member of the RAID array, no matter how clever your algorithm. You cannot repair more data than you have parity/ECC/whatever you want to call it - that's a hard mathematical limit. Even ECC doesn't seem that useful in RAID - you merely need to store checksums, and then you have redundant data which can be searched combinatorially to recover the correct data.

Of course, you could use a RAID 3/4 with 8 160 GB MLC SSDs and 1 160 GB SLC SSD as parity. That would ensure relatively even wear as a proportion of life span, and the enhanced write speed of the SLC should ensure minimal bottlenecking of the parity drive.

EarthwormJim · Dec 9, 2010

Mark R said:
I put together a spreadsheet to calculate these probabilities a while ago.

Of course, like all these things it doesn't take account of correlated failures (which are more common than you might expect) and doesn't simulate high-availability RAID like RAID 51/61. Nor does it take into account data scrubbing (although this can be corrected by adjusting the BER value).

Link to spreadsheet

Now that is a neat spread sheet.

Emulex · Dec 9, 2010

no you want the ECC to reconstruct the missing data - not simple a checksum. real ECC. then you could use the excess space on the other drives (assuming there is excess space) to keep a second copy of such data.

I assume someone has come up with ECC that is more complex than a checksum. where you can get 16:1 ratio of ECC to actual data?

Mark R · Dec 9, 2010

Emulex said:
no you want the ECC to reconstruct the missing data - not simple a checksum. real ECC. then you could use the excess space on the other drives (assuming there is excess space) to keep a second copy of such data.

I assume someone has come up with ECC that is more complex than a checksum. where you can get 16:1 ratio of ECC to actual data?

I think I understand you.

But what happens when the drives get full? If you lose a 160 GB drive, you need 160 GB of ECC to restore it. There is no way around this. So, having a smaller parity drive doesn't help you.

Of course, there's nothing wrong with filling unused space with backup copies of data - it doesn't buy you much extra reliability if the copy goes on the same RAID array or save drive - but it can be useful in case of corruption.

Emulex · Dec 9, 2010

how much ECC bytes is required to reconstruct a 4096 sector ? (those new fangled western digitals?) not much.

If you are talking checksum that is one thing ECC is another.

Mark R · Dec 10, 2010

Emulex said:
how much ECC bytes is required to reconstruct a 4096 sector ? (those new fangled western digitals?) not much.

If you are talking checksum that is one thing ECC is another.

The ECC can't reconstruct the whole sector, it can only reconstruct a few corrupted bytes.

If the half the sector is gone, the ECC can't help.

The ECC cannot recover more missing data than it's size, and can only repair a far smaller amount of corrupted data. A 4k sector drive has about 100 bytes of ECC per sector - this will allow detection and recovery of 20-30 corrupted bytes. The point is that the vast majority of the sector must be intact for such a small ECC to be useful. This works because corruption on platter drives tends to be short, few-byte, bursts of corruption - not whole sectors wiped off.

This isn't what happens in a RAID array - the most likely failure is a whole drive dieing. If you can lose 160 GB in one go, then you need 160 GB of ECC in order to recover it. Note the other flaw in RAID here - if you have 160 GB drives, having 160 GB of ECC is not enough to detect corruption (if you want the ability to recover 160 GB of missing data). Thankfully, the drives' internal ECC is very good at detecting corruption, even if it can't correct it - so the RAID only ever has to deal with 'missing' data.

JustStarting · Dec 19, 2010

I'm no avid proponent for raid 5... I just wanted to try it once and have been too lazy to re-install XP in another configuration. When something really craps out I'll install windows 7, but thats probably a long time coming because my 3 drive raid 5 array has been rebuilt over 10 times without incident in the past 4 years. It runs 24/7 and is a huge watercooled monster OC'd to 4.0ghz on a daily basis. Usually it fails after I let one of the kids play on "dad's computer cause its easier than listening to the whining. I've got 3- 360gb Samsungs in raid 5 with one spare at the ready in case the drive really craps out. I have no issues even working on the computer while its rebuilding. It usually takes about an hour or less to rebuild. Everything is backed up on a spare Samsung in the computer and also to my laptop on the network. Something else other than the raid 5 will probably force me to go to Windows 7.

I've obviously run single drive OS's, raid 0, raid 1 and have seen no better performace than raid 0. I ran raid 0 when I gamed....loaded UT instantly on a few old "Craptors'". One day I lost the array and never built another raid 0.....at least not with kids playing on it

Thought I'd post some RAID applications for probability.

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member