Raid Performance

Wolfdreamer

Member
Apr 25, 2000
102
0
0
I have dual p3 1 ghz computer on an Abit VP6 mobo. I recently purchased a WD 30 gb (ata100, 7200 rpm) to compliment my Seagate's 30 gb (ata66, 7200) in a Raid 0 (Stripping) array. Though I feel like I am feeling increased performance SiSofts Sandra scores are significantly lower then what they say a Raid 0 array (ata 100 or even ata 66) should be. I am running windows 2000, I really dont know what would cause the scores to be so much lower please help?
 

SocrPlyr

Golden Member
Oct 9, 1999
1,513
0
0
i dont' know for sure but it might be the fact that your drives aren't match...
but what do i know anyways...

Josh
 

Wolfdreamer

Member
Apr 25, 2000
102
0
0
I dont know I was under the impersion that they didnt have to match exactly as long as the general stats were correct.
 

myforum

Member
Jul 6, 2001
69
0
0
Let's see how various storage techniques used in RAID differ in this regard:

Mirroring: Read performance under mirroring is far superior to write performance. Let's suppose you are mirroring two drives under RAID 1. Every piece of data is duplicated, stored on both drives. This means that every byte of data stored must be written to both drives, making write performance under RAID 1 actually a bit slower than just using a single disk; even if it were as fast as a single disk, both drives are tied up during the write. But when you go to read back the data? There's absolutely no reason to access both drives; the controller, if intelligently programmed, will only ask one of the drives for the data--the other drive can be used to satisfy a different request. This makes RAID significantly faster than a single drive for reads, under most conditions.
Striping Without Parity: A RAID 0 array has about equal read and write performance (or more accurately, roughly the same ratio of read to write performance that a single hard disk would have.) The reason is that the "chopping up" of the data without parity calculation means you must access the same number of drives for reads as you do for writes.
Striping With Parity: As with mirroring, write performance when striping with parity (RAID levels 3 through 6) is worse than read performance, but unlike mirroring, the "hit" taken on a write when doing striping with parity is much more significant. Here's how the different accesses fare:
For reads, striping with parity can actually be faster than striping without parity. The parity information is not needed on reads, and this makes the array behave during reads in a way similar to a RAID 0 array, except that the data is spread across one extra drive, slightly improving parallelism.
For sequential writes, there is the dual overhead of parity calculations as well as having to write to an additional disk to store the parity information. This makes sequential writes slower than striping without parity.
The biggest discrepancy under this technique is between random reads and random writes. Random reads that only require parts of a stripe from one or two disks can be processed in parallel with other random reads that only need parts of stripes on different disks. In theory, random writes would be the same, except for one problem: every time you change any block in a stripe, you have to recalculate the parity for that stripe, which requires two writes plus reading back all the other pieces of the stripe! Consider a RAID 5 array made from five disks, and a particular stripe across those disks that happens to have data on drives #3, #4, #5 and #1, and its parity block on drive #2. You want to do a small "random write" that changes just the block in this stripe on drive #3. Without the parity, the controller could just write to drive #3 and it would be done. With parity though, the change to drive #3 affects the parity information for the entire stripe. So this single write turns into a read of drives #4, #5 and #1, a parity calculation, and then a write to drive #3 (the data) and drive #2 (the newly-recalculated parity information). This is why striping with parity stinks for random write performance. (This is also why RAID 5 implementations in software are not recommended if you are interested in performance.)
Another hit to write performance comes from the dedicated parity drive used in certain striping with parity implementations (in particular, RAID levels 3 and 4). Since only one drive contains parity information, every write must write to this drive, turning it into a performance bottleneck. Under implementations with distributed parity, like RAID 5, all drives contain data and parity information, so there is no single bottleneck drive; the overheads mentioned just above still apply though.
Note: As if the performance hit for writes under striping with parity weren't bad enough, there is even one more piece of overhead! The controller has to make sure that when it changes data and its associated parity, all the changes happen simultaneously; if the process were interrupted in the middle, say, after the data were changed and not the parity, the integrity of the array would be compromised. To prevent this, a special process must be used, sometimes called a two-phase commit. This is similar to the techniques used in database operations, for example, to make sure that when you transfer money from your checking account to your savings account, it doesn't get subtracted from one without being certain that it was added to the other (or vice-versa). More overhead, more performance slowdown.

The bottom line that results from the difference between read and write performance is that many RAID levels, especially ones involving striping with parity, provide far better net performance improvement based on the ratio of reads to writes in the intended application. Some applications have a relatively low number of writes as a percentage of total accesses; for example, a web server. For these applications, the very popular RAID 5 solution may be an ideal choice. Other applications have a much higher percentage of writes; for example, an interactive database or development environment. These applications may be better off with a RAID 01 or 10 solution, even if it does cost a bit more to set up.

And the story goes on and on.....

CU


 

SCSIRAID

Senior member
May 18, 2001
579
0
0
Myforum.....

You state in your post (quite good by the way)

"With parity though, the change to drive #3 affects the parity information for the entire stripe. So this single write turns into a read of drives #4, #5 and #1, a parity calculation, and then a write to drive #3 (the data) and drive #2 (the newly-recalculated parity information)."

The algorithm is is not the normally used algorithm for RAID 5. It is not necessary to read all the data stripes (except for the updated one) as you describe. A more efficient algorighm would be to read drive #2 (parity) and drive #3 (old data). Then XOR old data with the new data. This creates a buffer that indicates any differences between old data an new data which is where the parity information needs to change to be correct based on the new data. Then XOR this 'difference buffer' with the old parity information to create new parity. You then finish by writing the new parity to drive #2 and the new data to drive #3. This results in 1 less read in your 4+1 RAID 5 array. This algorithm results in 2 reads and 2 writes no matter how large the array is whereas the algorithm you describe gets nasty as the array grows (7+1 would need 6 reads and 2 writes).
 

myforum

Member
Jul 6, 2001
69
0
0
There's no possible way to discuss every factor that affects RAID performance in a section like this one--and there really isn't any point in doing so anyway. As you read about RAID levels, and RAID implementation and configuration, many issues that are related to performance will come up. One is to try to define better what exactly is meant by performance in a RAID context. Most people who know something about RAID would say "RAID improves performance", but some types improve it better than others, and in different ways than others. Understanding this will help you differentiate between the different RAID levels on a performance basis.

And the walk thru raid goes on and on. I like it.....period.

Later

PS:
Hard disks perform two distinct functions: writing data, and then reading it back. In most ways, the electronic and mechanical processes involved in these two operations are very similar. However, even within a single hard disk, read and write performance are often different in small but important ways.
 

myforum

Member
Jul 6, 2001
69
0
0
Almost for about "wolfdreamers question";
Best to have the same exact drives but at least get the same speed drives.

Reason:
The hard disk platters are spinning around at high speed, and the spin speed is not synchronized to the process that moves the read/write heads to the correct cylinder on a random access on the hard disk. Therefore, at the time that the heads arrive at the correct cylinder, the actual sector that is needed may be anywhere. After the actuator assembly has completed its seek to the correct track, the drive must wait for the correct sector to come around to where the read/write heads are located. This time is called latency. Latency is directly related to the spindle speed of the drive and such is influenced solely by the drive's spindle characteristics.

Later
 

MortaniuS

Senior member
Oct 12, 2000
654
0
0
Sisoft reports back bogus numbers under 2000 btw. i score like 24,000 where i scored 43,000 under ME. Use a program like atto bench to get a real speed estimate.


 

Bumboy

Member
Jun 21, 2001
83
0
0
all i know is that raid kicks booty when you got 2 or more HDs of the SAME KIND

I could be wrong... Im probably drunk on peaches
 

SCSIRAID

Senior member
May 18, 2001
579
0
0
Right on myforum... The read speed of the striped array is no better than the slowest drive since it will be the last one to deliver its data.

Access time to data for a disk drive is more that the seek speed.... you have to add in the spin time which is statistically one half of the time for one rotation of the disk. For a 7200 rpm drive the spin time is 8.3 ms which means add 4.1 ms to your seek time for the 'real' access time. With todays drives the limiting factor is becoming rotational delay instead of seek time thus the quest for faster spin speed.
 

Wolfdreamer

Member
Apr 25, 2000
102
0
0
Okay wow this was some pretty infomrative stuff. I am going to have to try some different benchmarks to see how my drives compare on some different tests.