You just need to think of [H]'s testing as showing which card provides better IQ not which card provides better FPS. They pick a target FPS and see which card can turn on more settings, rather than picking a target setting and see which can provide more FPS.
Either method suffers when a card is exceptionally slow or fast at a specific effect or combination of effects so you can never get an all encompassing measure of performance anyway because no review site has enough time to test a game with every combination of resolution, AA, AF, DoF, tessellation, HDR, etc.
Most sites only change AA settings between benches, which means you only learn how much of a hit AA causes on a card. If a card crawls at 4xAA and max settings, but turning off just DoF suddenly makes it playable, that's not something you'll usually find out on a non-[H] review.
By testing with AA on and off it's designated as the least important effect. Personally I find DoF and HDR the least important and would rather see tests with 4x AA and optional DoF/HDR.
This is a good point. Both reviews with canned benches and [H] use (by necessity) a certain priority order for increasing image quality. This order we may have different opinions on and may also be quite different depending on game. While [H] may not hit the sweetspot order that may everynoe happy they at least try to cut a meaningful path through this jungle towards playability versus IQ.
What I don't understand and objected to was the notion that people would prefer [H] due to some possible "subjectivity" when the choice is to avoid meaningles measurements as canned benches that repeatedly show themselves to be in best case highly noisy on the verge of randomness when compared between sites. In worst case the benches are only showing the optimization/secret reduction in image quality a card maker have made for the bench only.
Just because noise is easy to repeat it does not mean that it isn't noise.