Originally posted by: Matthias99
snip -- exceedingly long discussion about RAID0 and why it is actually good because of I/O queueing
No, not I/O queueing, you did not read my post correctly. It is about parallelisation. With a single drive all I/O requests will be done in a serial order: A, then B, then C. With a 2-disk RAID0 array it is possible to do two I/O's at once, A+B, then C+D, then E+F. Thus: a theoretical performance increase of 100%; for random I/O.
1) With RAID0, each particular block is only hosted on one drive. On average, each drive in a 2-way RAID0 can only service 50% of the random requests (25% each in a 4-way array), cutting into the efficiency.
If the I/O would land equally divided on all available disks, a linear performance increase is to be expected. This is an ideal situation though, in real-life situations can be less ideal. But on average, a significant performance increase can be expected for random I/O. The major problem is actually the implementation of RAID0 and the filesystem logic. Windows does bad in both, unfortunately.
2) Most desktop programs don't queue I/Os deeply (if at all). Yeah, if you're running a webserver or something that might queue 100+ totally independent I/Os in the background, this factor can come into play. But it won't do much for a lot of desktop work.
It is true that desktop systems, due to their single user application pattern, do not profit as much from parallellisation than busy multiuser server systems. But even with an I/O queue of just 1, which is the worst possible situation, my numbers indicate a 63% performance increase. If you look at a queue depth of 4, the performance increase is already 255%. These numbers are strictly random I/O done with transfer sizes ranging from 16KB to 128KB - very much non-sequential.
The only important factor and assumption is that there is no misalignment between stripe block and filesystem. On Windows, this is a problem since Windows partitioning does nothing to prevent a misalignment. It can be fixed manually though. Unfortunately not many people know about it and FAQs do not mention it - a shame!
"real" hardware controllers also usually provide a substantial amount of cache onboard, and will frequently do prefetching as well. Also, most RAID controller benchmarks involve server programs (databases, webservers, email servers, etc.) that deeply queue I/Os. IMO, this is not really representative of a typical desktop workload.
Agreed, but my point was that access time is useless if comparing a single disk to a multidisk RAID system. While the latter might not have a lower access time, it does allow processing any type of I/O more quickly. Access time is only a measurement of ONE I/O request, RAID0 might not be able to process that SINGLE I/O request more quickly, but will process a *bunch* of I/O requests more quickly.
So someone cannot say hey this raptor here has a lower accesstime than my 24-disk RAID0 hardware array and thus is faster. No way. Access time is hyped and totally misunderstood.