Its the lazy reviewers who never mention this, very few sites even bother to look into it. They are the ones "cheating" their readers. As a reviewer, the onus is on them to provide the most informative review to the reading public, failing to tell readers that they could be losing 10-20% performance (compared to their results!) just by using it in a case is a pretty big fail.
So you would agree, that in a comparison test, to simulate performance of the card in gaming circumstances, disabling boost and setting the frequency to the average boost clock obtained in dedicated testing is the way to go, if you don't have the time to run every benchmark for 30 minutes, until the card is properly warmed up?
Because otherwise, you won't get a decent comparison, of real world performance.
It's as much cheating, in my opinion, as was/is the deliberate downclocking that happens during Furmark runs. The goal was to stick to TDP, we can assume, but for reviewers that had discovered Furmark as a means to determine actual maximum power use, and used this on the crippled cards, it was benchmark optimization. And any kind of benchmark optimization is despicable, as it means that reviewers have to look for new benchmarks, all the old benchmark results lose their meaning compared to the new generation, and we'll get to the point were manufacturers waste resources that could be spent on making a better product, on making a product that produces better benchmark results. Now, as I was saying, this may not have been the intent, but it has the effect. Maybe they are trying to trick the less capable reviewers, maybe they see it as a way to make sure their cards are silent...
Also, I posit that boost as it is currently implemented, is not ideal. It would be more interesting to limit frame rates intelligently, by downclocking, to save some thermal buffer, so when a particularly intense scene happens, you can briefly exceed TDP to increase min-FPS. But in benchmarks that focus on average FPS, this would probably reduce the score, and thus is not in the interest of the GPU-makers.
Currently, from what I understand, it's more of the opposite. But then, we have little analysis of what kind of scene actually uses "more" of a GPU than average, and hence reduces boost, by increasing power use. All we do know, is that if you keep it cool (or ignore the temperature recommendation), it will run faster, unless it's TDP limited.
For SSDs, Anandtech has finally started to do a little bit of a differential analysis for access times, and for FPS at least a Tukey-plot would allow a much better characterization. We can only hope, that with frame-time analysis, statistical results that offer more than a single value will become more commonplace.