So I've got some thoughts here, and perhaps we'll turn these into a post on the main site but I wanted to get in this thread as you guys are honestly the source/inspiration for any such post.
Let's start with why we want to define a clear framework for how general performance/power/sound testing goes. Not only does it allow for fair comparisons between products, it also helps us deal with the inevitable situation where a manufacturer submits a ringer for review (e.g. factory overclocked card). I don't think there's much argument against this point - we all want a level playing field.
Similarly, it should be obvious why we'd want to include in such a framework the idea of testing a card at default settings. Having a strict policy there prevents a situation where AMD/Intel/NVIDIA show up and say hey we're selling the card in configuration x because of yields/experience/someothervalidreason, but it's really quite awesome and can run in configuration x+50% and that's how you should test it and btw we rule the world if you test it like that. This makes a lot of sense particularly when talking about encouraging factory overclocked comparisons.
The close relationship between fan speed and performance sort of throws a wrench in all of this. When Intel first started introducing aggressive turbo modes back in Lynnfield I was worried that it would completely corrupt our ability to reliably test CPUs. It turns out that wasn't the case. With graphics however the situation is a bit different, and with the 290/290X we're beginning to get a feel for exactly why that is.
I originally assumed the reason this was a problem now (and is going to be in the future) is because we're stuck on 28nm trying to get more performance without a good process tech solution until 14/16nm FinFET in 2015. Now I'm feeling like this is just going to be a part of the reality going forward, so we need a real solution.
AMD's Uber mode in my eyes isn't the same as a factory overclocked card. At the same time, it's not the same as what we've done in the past - which is test a totally stock configuration (reference clocks and fan speed). I personally believe in the whole living document philosophy when it comes to things like constitutions or review policies, but here's where we can get into trouble. In the case of the 290X, AMD has two modes and you can make a good argument for why you should test both. Let's now take it one step further: what happens if NVIDIA shows up next round with 3 modes? Do we test all 3? Which modes do we then compare against AMD modes, particularly if they only line up along one vector (e.g. performance or acoustics, not both). What if AMD responds the next round with 4 modes, etc... It can quickly get out of hand.
What I'd like to do here is define a good policy for what to do if this turns into a fan speed arms race. Dealing with the 290X is simple: Ryan tested both quiet and uber modes, and I can totally appreciate the argument for including analysis based on both. What Ryan is concerned about is the future. This isn't a matter of him being lazy (me being the person he reports to, I can tell you that's definitely not the case - he's kept up an insane work schedule over these past several weeks in order to get everything done as best as possible. The launches aren't done yet for the year, add in short NDA windows, issues with cards/drivers and of course any travel and the pace you have to keep in order to put out these reviews is insane). The precedent we set here today will directly impact what manufacturers attempt to do with their reviews programs in the future. The safe bet is to stick with testing in default configurations. I am (and assuming Ryan is too) more than willing to expand/change/redefine that, but the question is how? Let's look beyond the present 290X situation and think about what happens next. If acoustics and performance become even more tightly coupled in future GPU designs, and multiple optimization points exist for each card (with 1 default setting obviously) how should we deal with that going forward? If things get crazy, we could be in a situation where there would even have to be a tradeoff in terms of review depth vs. card configuration combinations. E.g. would you be willing to give up a resolution setting across all games tested in order to get another operating mode included? What does this do to the complexity of graphs?
I don't know that I've got the answer/a solution here, but this is the discussion I'd love to have.
As I mentioned earlier, this is a discussion that we might take to the main site at some point. We felt like we owed it to you guys to start it here given the time/effort you guys have put into it already. We're here to listen and will obviously take your input into account (e.g. the 290 fan noise update was a direct result of your feedback). All that I'd ask is please be respectful of Ryan in your discussions of his work. He really puts a ton of time and effort into this stuff and takes all of your feedback very seriously. Obviously you're free to post/say whatever (as long as it doesn't violate our ToS), but I've always been a fan of the golden rule
Thank you all for reading the site and for caring enough to engage in hundreds of comments on the forums and on the site itself. I'm off to bed for now but I'll check back tomorrow.
Take care,
Anand