could someone explain raid 5 / parity to me?

Maezr

Senior member
Jan 20, 2002
353
0
0
my understanding is that you can have three identical drives with raid 5. the data is striped across two disks and the third disk stores parity info.

in the invent one of the first two disks fail, the lost data can be rebuilt via the parity.

how is this possible? say I have data on my array that is already as compressed as it can possibly be. how can ALL of that info/data be recovered when one of the disks fail? doesn't this imply that everything can be compressed to half the size it currently is? obviously that can't be right. what am I missing?
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
wiki, google, AT FAQ, ...

Think of this:
(10 + 1 + 2 + 3) = 16

Given the 16 (parity) if you remove _any_ one of the other numbers I can tell you what it was, by comparing the remaining numbers to the parity number. Try it!


The trick is the parity is not all that you have, you still have the other drives too. (10 + :( + 2 + 3) = 16, solve for :(
 

Arcanedeath

Platinum Member
Jan 29, 2000
2,822
1
76
the parity data is stored on all the drives not just 1 so if one drive fails the other 2+ have enough info to rebuild the missing data think of it like each drive holds 2/3 data and 1/3 parity and when a drive dies the other 2 drives use their each 1/3 parity data to put the missing 2/3's of data back and then the array generates new parity info for all the drives and data. Check out Storagereview.com for a clear and concise info on raid of all types and how it works.
 

Maezr

Senior member
Jan 20, 2002
353
0
0
Originally posted by: DaveSimmons
wiki, google, AT FAQ, ...

Think of this:
(10 + 1 + 2 + 3) = 16

Given the 16 (parity) if you remove _any_ one of the other numbers I can tell you what it was, by comparing the remaining numbers to the parity number. Try it!


The trick is the parity is not all that you have, you still have the other drives too. (10 + :( + 2 + 3) = 16, solve for :(

I understand what you're saying.

what I don't understand is why this same concept can't be used to achieve better compression via, say, zip files online; if you don't need all of the data to rebuild it, why not just store the data you need to rebuild it?
 

Arcanedeath

Platinum Member
Jan 29, 2000
2,822
1
76
because you actualy lose space when you use raid to store the parity data, its for data protection not compression, a raid 5 array has a capasity of n-1 where n is the number of hard drives, ie 3 drives only have 2 drives capasity and the other drive is used for parity.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
^ Exactly, the parity isn't compressing the data.

You start with (say) 10 integer numbers and 1 parity number. You've achieved a 10% increase in size.

That 11th parity number doesn't decompress back to the original 10 numbers, it only lets you take 9 of those numbers plus the 11th number to get back the missing 10th number. By itself the parity number is worthless. With 8 (out of 10) numbers instead of 9 the parity number is also usless.

It's error correction not compresion.
 

Maezr

Senior member
Jan 20, 2002
353
0
0
Originally posted by: DaveSimmons
^ Exactly, the parity isn't compressing the data.

You start with (say) 10 integer numbers and 1 parity number. You've achieved a 10% increase in size.

That 11th parity number doesn't decompress back to the original 10 numbers, it only lets you take 9 of those numbers plus the 11th number to get back the missing 10th number. By itself the parity number is worthless. With 8 (out of 10) numbers instead of 9 the parity number is also usless.

It's error correction not compresion.

but when a disk fails, you'd lose 5 out of 10, not 1 out of 10, no?

if 5 numbers + 1 can be used to rebuild it back to 10, isn't that taking up less space than it was originally? now do you see what I mean?
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
No, each hard drive stores only 1 of the numbers, not 5 of them.

This is an over-simplification though, since like Arcanedeath said the parity number and other numbers are mixed up across all the drives.

The key point is that if the data is split into say 5 chunks plus parity you still need 4 of the 5 chunks plus the parity chunk to recover your data. That 4+1 is = 5, which is not any smaller than the size of your data before you added error correction.
 

Maezr

Senior member
Jan 20, 2002
353
0
0
how is it that you'd still have four out of five chunks after one of the drives (essentially 50% of the data in a three disc raid 5 array) is lost though?
 

Aluvus

Platinum Member
Apr 27, 2006
2,913
1
0
Originally posted by: Maezr
how is it that you'd still have four out of five chunks after one of the drives (essentially 50% of the data in a three disc raid 5 array) is lost though?

They are describing systems with more than 3 disks.
 

Maezr

Senior member
Jan 20, 2002
353
0
0
but the data on a three disk system is still recoverable. I'm just not understanding how.

if started with 10 pieces of information, one drive fails, you'd now have 5. how can 5 + 1 piece of parity info be used to rebuild the initial data that totaled 10 pieces of information?

that's my question, really, and I haven't seen anything here (or elsewhere) that can explain that.
 

Bobthelost

Diamond Member
Dec 1, 2005
4,360
0
0
Originally posted by: Maezr
but the data on a three disk system is still recoverable. I'm just not understanding how.

if started with 10 pieces of information, one drive fails, you'd now have 5. how can 5 + 1 piece of parity info be used to rebuild the initial data that totaled 10 pieces of information?

that's my question, really, and I haven't seen anything here (or elsewhere) that can explain that.

You're getting confused. Very confused.

With a 3 disc array you take each file coming in and break it in half, part A goes to disk A, part B goes to disc B and the parity is calculated and sent to disc C.

The reason they were talking about 10 pieces of info is to make it clearer with larger scale examples (like a 10 disc array, where the first 1/10th goes to A, the second 1/10th goes to B etc.). If you wanted to store 10 files with your array then each file would be ripped in half and stored with half on A, half on B and the parity on C.

Each file is stored with 50% on each drive and the parity information on the third. It's done on a much lower level than what you're thinking of. We're talking about kilobytes here when we talk about information being split up.
 

Maezr

Senior member
Jan 20, 2002
353
0
0
I'm most curious about a 3 disk setup.

in such a setup, as you said, half of the data is stored on disk A, and half on disk B, and a parity file is placed on disk C.

let's say disk A dies. how can disk B + disk C be used to restore 100% of the information stored on disk A?
 

Bobthelost

Diamond Member
Dec 1, 2005
4,360
0
0
Originally posted by: Maezr
I'm most curious about a 3 disk setup.

in such a setup, as you said, half of the data is stored on disk A, and half on disk B, and a parity file is placed on disk C.

let's say disk A dies. how can disk B + disk C be used to restore 100% of the information stored on disk A?

Ok, the information stored on:
A = 5
B = 6
C(Parity) = 11

A dies. The controller notices this, checks the parity number (11) and subtracts the value from the surviving disc B (6) which results in the value A would have held.

This was already explained above by DaveSimmons.
 

DaveSimmons

Elite Member
Aug 12, 2001
40,730
670
126
The parity chunk is _the same size_ as the A and B chunks.

think of it letter by letter:

Disk 1 H L O
Disk 2 E L !
Disk 3 p p p