No, you misunderstood my post I think. I never said people "prefer" [H] because it's subjective. I said that people refer to [H] because it tries to find a "playable setting" for each card in a manual run through. Yet, no one criticizes this type of testing as being subjective, which it clearly is since there is no universal idea of what "playable settings" are unless you just stick to 60 fps average? Now, when it comes to image quality - largely a subjective topic as well - they want to see objective measurements. See post by Stoneburner, for example:
Maybe I misunderstood it. But the main point is still valid. People will hopefully always prefer analysis of reality parameters (game-play performance, IQ etc.). Although the natural reference points and scales seems "subjective" or vague, as exampled by detail or playable settings, they will still have to form the framework we base performance on.
This makes the measurements harder but not subjective. Measuring on the right parameters is always the first step. Setting up the test for significance and repeatability are next steps. But the measurement parameters should always be set for relevance first.
To compromise on the parameters because of the challenge to measure them and loose the relevance of the measurement is just bad science. To criticize [H] for being more subjective than canned benches is not only wrong but it's like pissing in your drinking water
I think we had a discussion on an google sheet of canned benches of GTX580 versus other cards at 2560x1600 a week ago. The variation between the same games at different test sites including anandtech was just frightening. The randomness in itself indicated that that the numbers where more or less noise. This could easily be verified with a simple covariance check if anyone bothered to... Knowing that canned benches at the same time are optimized by the vendors means that not only are you looking at noise but you are looking at manipulated noise open for subjective analysis and outcome.
Problem is that noise can be extremely repeatable. It's the significance of the measurement that has to be the quality check, not the repeatability in an isolated system. To accomplish this is not necessary straightforward.
I don't know of any testsite that use modern statistical tools to deduce significance. They may repeat the bench three times but this actually only tests functionality of the test system, nothing else.
Until then best playable settings along a semilinear (granted) but open IQ scale combined with full diagram of FPS tracked over the full test time, n combination with apples to apples at highest possible settings, is as full story you can get.
You may not agree with the provided reference point (playable settings) but at least you get the full information at the reference point. Direct comparison at identical settings are also provided. This deconvolutes any supposed effect of choice reference point anyway.
This measurement design is clearly designed to eradicate any subjectivism IMHO.