Putting StorageReview's Database to Real Use

Modus · Mar 6, 2001

Hard drives aren't sexy.

That statement sums up in a nutshell the reason we have so few sources for reliable information on hard drive performance. In contrast to the latest crop of high performance video chipsets or CPU's, which have hardware reviewers swooning like teeny boppers, hard drives seem relegated to an esoteric corner of the web called StorageReview. No other site has been able or willing to match their steady output of competent reviews or their doggedly consistent methodology.

But like anyone at the top of their game, StorageReview is a little set in their ways. For one thing, their never-wavering choice of only two benchmarks to represent the entire performance spectrum of every hard drive released in the past 18 months has made their results somewhat difficult to swallow. Many hardware enthusiasts, accustomed to the reams of disparate and complementary benchmarks found in typical CPU and video reviews, find it difficult to accept that ZDNet Winbench 99 and Intel IOMeter are the final say on the performance of a modern hard drive. Further complicating matters is the fact that Winbench and IOMeter often contradict eachother; the Maxtor DiamondMax 60 Plus, for instance, gets top billing on Winbench but looks fairly ordinary under IOMeter.

Actually, StorageReview's extensive performance database contains four useful measures of disk performance:

1) IOMeter indices
2) Winbench 99 Disk Winmarks
3) Access times
4) STR measurements

Of these, two are synthetic, one is real-world, and one (IOMeter) is a flexible hybrid benchmark that simulates real world usage according to user-defined access parameters. Ideally, we could assign a relative weight to each benchmark to reflect its correlation to typical disk performance. By using a standard formula to average the data, we would be close to achieving that magic bullet of performance comparisons -- the single number that tells the whole story. But which benchmarks are the most important, and which are the least? The following exerts from StorageReview's Windows 2000 Testbed Disclosure Statement are probably our best guide:

(Please excuse the length -- skip it if you're anxious for the good stuff.)

"The best tests, of course, are common operations performed in a given application set. Doing so, again with human reflex/stopwatch is not as easily done as folks initially perceive. . . Macroing a standardized set of applications with a competent program for playback in the same sequence under the same conditions would be the best way to measure total system performance. . . while far from perfect, WinBench 99's Disk WinMarks are among the best-available scientific, standardized approaches to drive testing. . .

Over the last two years, we've used over 90 ATA and SCSI drives in our personal systems. We can attest through this sheer experience that performance and responsiveness as a whole correlate much more to WinBench than it does to file copies or other so-called "real world" measures.

That said, there are some legitimate concerns raised that WinBench, while accurate for testing the application workload it purports to measure, provides too light of a load on the system to represent performance on a more general basis. Further, as a given release of WinBench ages, manufacturers become better at "tuning" drive performance to reflect high numbers without a corresponding increase in actual performance. . . WinBench 99 is our "old faithful," but is getting a bit gray behind the ears. We've noticed certain instances (say, the Maxtor DiamondMax Plus 40 vs the Seagate Cheetah 36LP) where scores reported don't seem to correlate to our personal experiences. Hence the need for another good, corroborative benchmark: Enter IOMeter. . .

IOMeter isn't the user-friendliest program around, but it's definitely the most flexible. Unlike WinBench 99, which uses access patterns created by real applications, IOMeter's patterns are entirely synthetic. Such synthetic nature, however, yields incredible flexibility. . .

IOMeter can spawn multiple workers; Intel recommends one worker per CPU. Each worker, in term, can tax target (s) consisting of either an unpartitioned "physical disk" or partition(s) within a disk. Each worker must also be assigned a specific "access pattern," a series of parameters that guide the worker's access of a given target. . .

Generally speaking, it's evident that random access dominates typical workstation usage (and is thus the principal reason ThreadMark has been jettisoned). Even WinBench corroborates when pressed with the issue. The question is, how much randomness is sufficient?

Though the loading of executables, DLLs, and other libraries are at first a sequential process, subsequent accesses are random in nature. Though the files themselves might be relatively large, parts of them are constantly being sent to and retrieved from the swapfile. Swapfile accesses, terribly fragmented in nature, are quite random. Executables call other necessary files such as images, sounds, etc. These files, though they may represent large sequential accesses, consist of a very small percentage of access when compared to the constant swapping that occurs with most system files. Combined with the natural fragmentation that plagues the disks of all but the most dedicated defragmenters, these factors clearly indicate that erring on the side of randomness would be preferred. . .

Through recent tests in WinBench 99, however, empirical results indicated that STR had relatively little effect upon overall drive performance. Today, it should be clear that steadily-increasing transfer rates have in effect "written themselves out" of the performance equation. . . Judging from the above examples, it should be clear that random access time is vastly more important than sequential transfer rate when it comes to typical disk performance. Thus, the reordered "hierarchy" of important quantifiable specs would read:

1) Seek Time
2) Spindle Speed
3) Buffer Size
4) Data Density

Note that an important yet quantitatively-immeasurable factor is the nebulous drive electronics/firmware/algorithms package. Both of our benchmarks (WinBench 99 and IOMeter) are strongly influenced by these factors. . .

The often-contrary nature between StorageReview.com's new set of benchmarks begs the question: What's more important, WinBench 99 or IOMeter? In the past we've admitted how we use each drive we benchmark in our own personal systems to catch any cases where manufacturers are obviously padding WinBench results. In general, however, it's safe to say that we've been satisfied with the general correlation between WB99 results and perceived performance under normal, everyday use (something that would fall in between Light and Moderate Loads in our current IOMeter suite). . .

Another benefit in IOMeter's nature is its rather low-level nature. The fact that it doesn't possess a scripted access pattern to be played back (rather, it creates its own loads) makes it much more difficult for manufacturers to "tune" a drive for better scores without actually resulting in a proportional increase in actual usage. . .

The net result is that StorageReview.com from this point forwards will weigh IOMeter results, particularly of the Workstation Access Pattern, more highly than WinBench 99. This isn't to say that WinBench 99 is useless. . .

Again we come to the question of how to weight the four important benchmarks in StorageReview's database (IOMeter, Winbench99, access time and STR). From the above exerts, it is clear that IOMeter is a stronger benchmark than Winbench 99 for representing real world drive performance. And random access times are now considered much more important than STR. A glance at the access patterns StorageReview employs in their IOMeter tests will confirm what we already know from its results -- IOMeter is heavily access-time-dependent. Similarly, we find that Winbench 99 Winmarks scale much better with higher STR's than IOMeter inidices do.

After much soul searching and personal meditation (

) I've decided that the following weighting system would be most indicative of real world hard drive performance:

IOMeter Indices: 45%
Winbench 99 Disk Winmarks: 25%
Access time: 20%
STR's: 10%

This system follows two important guidelines: it conforms to the spirit of StorageReview's evaluation of important performance factors, and it stresses real world or hybrid benchmarks over purely synthetic measures.

To now apply this weighting system to StorageReview's database takes only a calculator and a little patience. Perhaps one day they and other hardware review sites will feature databases that allow one to customize the weighting of each benchmark into an aggregate index. Until then, we'll have to do it the old fasioned way.

The procedure to follow is this:

Go to StorageReview's Comparison Page.

Under "Compare Drive Performance of up to Six Drives", leave everything at its default and select up to six drives for comparison, clicking "Compare IOMeter Indicies" to continue. Scroll down to the three horizontal graphs. The normalized results are seen in your browser's status bar when you hover over a drive's bar. Average each drive's normalized results for the three IOMeter access patterns. This is our final IOMeter index for each drive.

Click Back on your browser and scroll down to "Compare Winbench and Threadmark 2.0 Results". Select up to six drives for comparison, clicking "Let's Compare!" to continue. For each drive, average its two normalized Winmark scores. This will be our final Winbench 99 index.

For access time, normalize all the drives in the comparison by finding the lowest access time, i.e. 11.5ms, and dividing by each of the other access times, e.g. 11.5ms / 13.1ms = 0.8779 = 88% normalized access time for the 13.1ms drive and 100% for the 11.5ms drive.

For STR, average the beginning and end scores for each drive (this assumes the STR descreases fairly linerarly over the platter surface, which is more or less true). Then, normalize the results, this time using the highest score as the quotient for each of the others. E.g. highest is 31000 so 28000 / 31000 = 0.903 = 90% for the 28000M/s drive and 100% for the 31000M/s drive.

Once we have the four final benchmark indexes for each drive, we simply multiply each by its corresponding weight and average them to arrive at a Total Performance Index for each drive. To start us off, I've done this for six of the most popular ATA/100 hard drives available today:

Drive-----------IOMeter Index---Winbench 99 Index-------Access time Index-------STR Index

Quantum LM-----------98.3------------78.5--------------------100---------------------77
IBM 75GXP------------99--------------92.5--------------------93----------------------93
WD 400BB-------------88--------------99----------------------83----------------------93
Quantum AS-----------86.3------------91----------------------87----------------------91
Maxtor 60Plus--------84--------------98.5--------------------88----------------------100
Maxtor 45Plus--------84.3------------92----------------------89----------------------89

Ranking of Total Performance Indexes with corresponding PriceWatch quotes for 30G drives:

(according to 45% IOMeter, 25% Winbench, 20% Access Time and 10% STR)

1) IBM Deskstar 75GXP----------------23.89-------------$119
2) Quantum Fireball LM Plus----------22.89-------------$106
3) Western Digital Caviar 400BB------22.56-------------$138 (40G only)
4) Maxtor DiamondMax 60 Plus---------22.51-------------$119
5) Quantum Fireball AS Plus----------22.02-------------$115
6) Maxtor DiamondMax 45 Plus---------21.91-------------n/a

What information can we glean from this kind of holistic performance comparison of currently competing hard drives? Two things stand out. First, StorageReview's "gut" feelings about the drives are extremely accurate. Their leaderboard featured, and continues to feature, the IBM 75GXP, which has the highest performance index by this system. The runner up was previously the Quantum LM+, which had the second highest index, but these days the LM+ is hard to find, so StorageReview fell back to the next in line after the LM+, the WD 400BB.

Second, and more importantly, modern hard drives of the same generation are remarkably similar performers. This performance index shows only a 9% difference between the slowest drive, the DiamondMax 45+, and the leader of the pack, the 75GXP. This leads to a buying decision focused mainly on price and availability. Assuming one can still find the Quantum LM+ for its PriceWatch list price, it is by far the best value. If not, the IBM 75GXP is, pleasantly, the best buy and the best performer.

Modus

bacillus · Mar 6, 2001

that posting was quite some read!

Menelaos · Mar 6, 2001

Anand,

Lookin for another writer for your site? This guy will do!

good points and nice read.

Menel.

Tannin · Mar 6, 2001

Excellent post, Modulus. There is much of interest in your argument, and you make a strong case. Benchmarking is indeed a complex and difficult matter, and Storage Review's methodology is, as you say, careful and conservative, almost to the point of being stodgy - and yet, as you also said, the best on the web at present.

There is a lot of room for more work on hard drive benchmarks. While impressed by your logic, however, I remain unconvinced that applying an arbitrary mathematical weighting to a combination of known-to-be-imperfect benchmarks will yield better results than one or another of the benchmarks on its own. It will be more consistent, insofar as any averaging of multiple measures tends to regress all cases towards the mean, but may end up teaching us less about the performance of particular drives than looking at the individual scores. An averaged multiple-measure score, in other words, may well be "wrong" less often, but will, equally, be "right" less often.

Like you, I admire SR's methodology but still have concerns about it. (Of course, one should always be concerned about methodology.) WinBench in particular can yield completely absurd results from time to time. And the reliance on Windows 2000 as the sole platform for testing is (in my view) a major problem. Nevertheless, IOMeter results in NT/2000 do seem to correlate with my own hands-on judgement of drives very well - better than any other benchmark I'm familiar with.

Where does that leave us? Still thinking about it - same as always!

Tannin · Mar 6, 2001

Oh, just one thing. Hard drives are very sexy! Much more fun than video cards and kryptonite-cooled CPUs and other kiddie toys. Hard drives are for when you are serious about useful performance. But not very important for games, of course.

Modus · Mar 6, 2001

aRyll,

<< Kudos to an excellent post. >>

Thanks, I appreciate that. Actually, I had been thinking about this topic for a long time. Like others, I had found that StorageReview's conclusions seemed too arbitrary. I had been used to looking at benchmarks from Anand or Tom and instatly knowing the performance judgement without having to read the reviewer's conclusion. But SR's benchmarks often contradict because they appear to target different kinds of usage.

I build a lot of systems for my customers and usually end up using the newest and cheapest Maxtor or Quantum 7200rpm drive I can get my hands on. I wanted to know exactly how much faster, say, the IBM 75GXP was. But since the process of weighing each of SR's benchmarks is so tedious, I procrastinated for a long time. Finally I had some free time last night and decided I might as well make a post out of it and save some one else the time in the future.

Tannin,

<< There is a lot of room for more work on hard drive benchmarks. >>

Agreed. IOMeter is fine but it's so exotic and synthetic that it's hard to swallow as the end-all-be-all of drive benchmarking. ZDNet Disk Winbench 2001 will no doubt improve on WB 99. I hope SR quickly evaluates its strengths and throws it into the mix as soon as possible. We could also use something totally different, like a more disk-limitted clone of OfficeBench, to represent typical multitasking.

<< While impressed by your logic, however, I remain unconvinced that applying an arbitrary mathematical weighting to a combination of known-to-be-imperfect benchmarks will yield better results than one or another of the benchmarks on its own. It will be more consistent, insofar as any averaging of multiple measures tends to regress all cases towards the mean, but may end up teaching us less about the performance of particular drives than looking at the individual scores. An averaged multiple-measure score, in other words, may well be "wrong" less often, but will, equally, be "right" less often. >>

Yes, taking a weighted average of several benchmarks is bound to yield results that seem a little "wattered down." However, the process of pulling somewhat contradictory benchmarks toward a mean score is integral to any performance analysis. If you dig deeper into IOMeter and Winbench, you'll find that they do essentially the same thing on a smaller scale to produce their own scores: Winbench runs a script of about a dozen different applications that have little to do with eachother and averages their disk performance, while IOMeter uses several different access patterns and workloads to arrive at its own index. Almost every other computer hardware benchmark can be thought of this way. Winstone, Q3A, Sysmark, SpecView -- they all perform standard "scripts" that stress a machine, but the final score for the benchmark is an average of how the system performed at various tasks during the script. Those tasks may be wildly disparate, but because the result is an average, a much broader view of performance is possible. Similarly, a composite index of the various benchmarks in StorageReview's database can only be a good thing. It recognizes that no benchmark is the final say on real world performance, but it also gives each measure some worth in a final evaluation.

You know what would be great? If some one here knew a little CGI and we could throw together a quick and dirty "Watching StorageReview" page where the latest performance data could be dumped into our own rudimentary database where simple CGI scripts could process it and provide composite averages weighted according to whatever scheme the user chose.

<< Oh, just one thing. Hard drives are very sexy! >>

Testify!

Modus

Modus · Mar 6, 2001

Just a bump to see if anyone would be willing to contribute a little Javascript expertise to such a project.

Modus

LXi · Mar 7, 2001

<<Agreed. IOMeter is fine but it's so exotic and synthetic that it's hard to swallow as the end-all-be-all of drive benchmarking.>>

That however, is the sole reason why SR chose to give the WD400BB the "Safe Buy" award and not the DM+60. I strongly disagree with SR's statement about Maxtor not having any kind of a chance ot keep up with the WD. As those numbers go through your weighing system, the Maxtor and the WD actually got some fairly close scores, and WD will take all the honor, which is unfair.

Modus · Mar 7, 2001

Yes, StorageReview's Leaderboard does tend to mask the fact that all current-generation ATA drives perform quite similarly. The DiamondMax 60+ is basically 99.98% as fast as the 400BB, but it has the advantage of a lower price tag and more flexible storage options (you're not forced into a 40G drive).

Modus

LXi · Mar 7, 2001

You nailed it right on, I'd love to see SR's response to that, it just convinces me that they're WD biased which is what I heard from their forum members. It really bothers me how they praised the WD so much and ignoring great products made by Maxtor and Quantum just because they were a couple of points lower in IOMeter.

andylawcc · Mar 7, 2001

anyways, is the despite its relatively poor performance, isn't the Quantum AS plus SUPER quiet...

(heck, my Maxtor and 75GXP is quiet as hell already.)

LXi · Mar 7, 2001

My Quantum isnt SUPER quiet, but its quiet, if I got the fluid bearing version it would be even quieter, but I dont think its the fluid version. Its a little bit quieter than my Maxtor, but they're roughly the same, but the seek sounds a little different, the pitch is not as high.

Modus · Mar 7, 2001

I think drive noise stopped being an issue about two years ago. Fujitsu probably lead the drive to whisper-quiet drives, and other manufacturers followed. The only recent drive I would actually consider "loud" was the Quantum Fireball LM Plus, but offsetting that was the fact that it posted the best seek times of any drive in IDE history.

Modus

Search

Putting StorageReview's Database to Real Use

Modus

Platinum Member

bacillus

Lifer

Menelaos

Senior member

Tannin

Junior Member

Tannin

Junior Member

Modus

Platinum Member

Modus

Platinum Member

LXi

Diamond Member

Modus

Platinum Member

LXi

Diamond Member

andylawcc

Lifer

LXi

Diamond Member

Modus

Platinum Member

TRENDING THREADS