What is RAID and what does it do?

Nothinman · Sep 11, 2011

exdeath said:
(The 5th is parity and doesn't add to user data rates in RAID 5). It's obviously bursting from cache, which is pointless for benchmarking storage speed.

Parity is stripped across all drives, so yes all 5 do participate equally. And while including the cache doesn't help in determining the raw speed of the disks, it's more real-world than not because the cache will be used during normal operations.

exdeath · Sep 11, 2011

Nothinman said:
Parity is stripped across all drives, so yes all 5 do participate equally. And while including the cache doesn't help in determining the raw speed of the disks, it's more real-world than not because the cache will be used during normal operations.

I should have said blocks or stripes, not drives, my bad. With 5 disks in RAID 5 only 4/5ths of any particular block/stripe of data is going to be user data and contribute to measured transfer rate. Parity blocks are skipped over on reads and therefore does not contribute to read speed. Maximum possible transfer rate of a RAID 5 is going to be capped at the the combined max speed of n-1 drives. If you have 5 drives that max at 125 MB/s sequential in RAID 5, you will never see more than 500 MB/sec theoretical max sequential.

Relying on cache when bragging about sequential sustained read speeds on a RAID 5 is misleading. Copy a real 10 GB file off that partition, you will NOT see 2,000 MB/sec, it's impossible with only 5 drives which individually cannot sustain more than 100-125 MB/sec. More like 400-500 MB/sec sustained. The sum cannot be greater than the individual contributions.

Nothinman · Sep 11, 2011

Relying on cache when bragging about sequential sustained read speeds on a RAID 5 is misleading.

Benchmarks in general are misleading, they're rarely indicative of how things will work out for real.

Copy a real 10 GB file off that partition, you will NOT see 2,000 MB/sec, it's impossible with only 5 drives which individually cannot sustain more than 100-125 MB/sec. More like 400-500 MB/sec sustained. The sum cannot be greater than the individual contributions.

I understand that, I was just expressing my disdain for benchmarks in general. They barely tell you anything worthwhile and yet still seem to cause tons of people to base their workload, hardware, etc on them. Especially when it comes to I/O as there's just way too many other variables involved during real usage.

Emulex · Sep 11, 2011

raid-5 LONG READS are faster than raid-10 in reading believe it or not. (same drives, similar stripe size) - only in some cases. with 6 drives - raid-5 has 6 drives of data to read from - with raid-10 you have 3 stripes to read from - so if the stars aren't aligned you may not get the full read benefit from the raid-10. Odd. Probably why folks use raid-5 for core db storage and raid-10 for logs/temp - this will all change with the enterprise mlc drives that are out now ($10K for 800gb unit ouch!) - enterprise mlc - far cheaper than SLC

hot swappable unlike fusion i/o

- just reads based on my tests from my tests - writing is horrificly slower . this is using a modern p410i smartarray with 1gb flash back write cache - not a ich or soft-raid.

but honestly now that 900GB SAS dual ported 2.5" 10K drives are out you can still use raid-10 or a split and deal with it. I found that i needed to do raid-10 with 6 RE4 2TB to have enough speed to do storage vmotion or esxi would timeout; or vcenter. But you could outrun raid-5 speeds with 6 15K SAS 3.5" drives compared to raid-10 with 6 2TB SATA RE4 (drive write cache enabled, no less).

i've got some raid-5 systems that have been up for years - 12 disk 500gb (biggest sata drive) seagate - probably could replace that whole setup with 2TB but the disk iops i need wouldn't be there. And 3 72GB 10K 3.5" ultra-320 scsi (twice) - they are rocking out like no other. I do run the sector remap scan at 3seconds idle interval. Raid controllers worth a bean do that - they proactively seek out bad sectors and remap them when idle - and the controllers map away a small percent of sectors so you aren't relying solely on the drives spare sectors. Most cheap raid cards don't do proactive work to look for bad sectors since that isn't very green

sub.mesa · Sep 12, 2011

When talking about RAID performance, you guys should always keep in mind that the performance is highly dependent on the actual RAID implementation, rather than the theoretically possible performance that the RAID scheme allows.

For example, many RAID1 engines (RAID10 or 0+1 for that matter) gain very little out of the mirroring when reading large files. Either they just read from the master disk not bothering about the identical slave mirrored disk at all. More often they employ a 'split' algoritm, which can mean two things:
1) both disks get to perform the same I/O; the one who delivers it the quickest 'wins'; little to no performance increase here
2) both disks get to perform half of the requested I/O size. So when reading 64KiB disk1 gets the first 32KiB and disk2 gets the second 32KiB. This sounds good but it means both drives will be seeking to the same file and skipping 50% of LBA on sequential reads. The end result is barely any better performance than a single disk reading 100% LBA without having to skip anything.

RAID5 can use all disk members, though this requires the engine to let the parity drive (for that stripe block) to skip its parity and read the next stripe block instead. Not all implementations do this. Also, the skipping ahead means the disks also slow down a tiny bit. RAID1 theoretically has the best read performance, since a good engine will be able to make both disks read at full throughput. Virtually no implementation is capable of this, however.

The best RAID1 performance scaling I've seen is from ZFS, which has sequential reads on RAID1 just below that of RAID0. That is very good, and a sign that the engine does quite a good job at letting the disks do useful things.

@Red Squirrel: you need to increase the test size of at least 8 times your RAM, or your RAM buffercache will contaminate your results. I recommend using 'dd' instead of bonnie, because bonnie does not employ a cooldown period necessary for properly benchmarking RAIDs with write-back mechanisms.

exdeath · Sep 12, 2011

My point is that there is no implementation or special magic that will allow any RAID to achieve faster performance than the sum of its individual disks. If you have 5 mechanical disks that do 100 MB/s and your RAID benchmark shows 2,000 MB/s, that should be raising some red flags about the benchmark results.

sub.mesa · Sep 12, 2011

Well the special magic is called write-back, and this in essence allows much faster I/O than the disks are capable of, but only in short bursts. The benchmark results are not erroneous however. If you write a small amount of data to a write-back mechanism, you get 2GB/s+ performance regardless of the speed of your disks. That is real-life performance.

The problem is not the actual benchmark result itself. The problem is the conclusion people give to such scores. It's not a sustained sequential write score; it's a burst score of the entire I/O framework where both your RAM and disks are at work. In essence, the 2GB/s score is a memory and driver benchmark, not really a disk benchmark. The problem is when you think you're testing disk performance, while actually you were testing something else. That is extremely common with I/O benchmarking and computer benchmarks in general, and to a certain extend applies to statistics in general as well.

The numbers are not wrong, they just do not represent what you think they do. Benchmarking is tricky and proper benchmarking requires that you understand EXACTLY what the scores mean.

Voo · Sep 12, 2011

sub.mesa said:
Well the special magic is called write-back, and this in essence allows much faster I/O than the disks are capable of, but only in short bursts.

Sure, but then I can get EXACTLY the same result on a 15year old 4.2k rpm drive

It may be hard to write a perfect disk benchmark, but if not even the most obvious invariants hold this should raise some extremely large red flags. I don't think exdeath didn't get the part about write cache, he was just pointing out that the cache obviously completely dominates the data.

sub.mesa · Sep 12, 2011

You would not get the exact same score depending on the exact test, but say 80% is RAM-speed and 20% is actual disk speed, that would mean the disk performance is only a small factor in the overall performance of small burst writes.

Of course you're right that he was not familiar with the cache (actually: buffer) playing since a large role. But I guess I just want to stress that the 1GB/s+ scores are not erroneous. I/O performance is more than the disk performance alone; it's the combination of all storage layers working together, and especially write-back plays an important role in improving performance in realistic circumstances, which temporarily allows performance to far exceed that of the disk actual 'raw' performance level.

But most people want to test sustained sequential throughput, with the buffercache playing a marginal role at best. So in that regard you're totally right that the benchmark does not fit that goal.

Search

What is RAID and what does it do?

Nothinman

Elite Member

exdeath

Lifer

Nothinman

Elite Member

Emulex

Diamond Member

sub.mesa

Senior member

exdeath

Lifer

sub.mesa

Senior member

Voo

Golden Member

sub.mesa

Senior member

TRENDING THREADS