Minimum framerates: Methods of assesment.

Nox51

Senior member
Jul 4, 2009
376
20
81
I believe assessing minimum frames rates is a good component of reviews. The thing I am wondering about at the moment is just how they are assessed,and I am I am mainly wondering about anand's methods.

I tried to find some info on how it is done in the reviews but an admittedly cursory look did not spot anything of interest. The problem would be, I guess, one of sampling. For the sake of argument, if you run a benchmark for say an hour. You have a flat line of say 60 fps for 99% of the hour cause your uber card is just that good(work with me here). But you have a drop of to say 1 fps for a negligible instant. Looking at it from an entirely technical point of view, yes the system did have a minimum of 1 fps. Is it important? It was for a a millisecond in an hour long test. You probably would not notice it on screen, of be so taken with the game you are playing it just doesn't register to you.

Now presenting this set of data on a bar graph faces us with a problem. Minimum framerates: 1 fps. Looks bad on the graph you think it causes major problems. But reading the graph yo do not know if it occurs as a one time event or repeats enough to be significant.

Now [H] method of displaying continuous graphs is better than a bar graph. IT allows us to look at the data and make an informed decision for what we personally want as well as see if it is a one time event. I think it is a far better way to give minimum frames than a bar graph.

So I think I'm looking for what are your thoughts on testing methods, as well as try to find some more info on what methods are actually employed if anyone is familiar with them.

Please notice I have not named any companies, cards or specific fact based example from reviews to try not to bog down int a seething mass of anger. Please keep it that way.



Have corrected obvious thread title typo.

Super Moderator BFG10K.
 
Last edited by a moderator:

blastingcap

Diamond Member
Sep 16, 2010
6,654
5
76
Ideally you would like one gigantic continuous graph of the entire game. Yes, hours of gameplay. This would give you thousands of seconds' worth of data so you can see if the card hugs the minimum a lot or if it's usually much higher with the occasional very short spike downward.

Single numbers are not as useful because it could mean the hard drive was loading something or some other weird hiccup. Single numbers are especially useless in certain games where everything falls to zero fps, like Civ V.
 

Sylvanas

Diamond Member
Jan 20, 2004
3,752
0
0
Indeed minimum's are just as important if not moreso than averages to me these days. It's only really recently (last 2 years) that minimums have gained traction by review websites as a few years ago there were no such results.

I too, am in favour of a continuous line graph for displaying results, a minimum for 1% of the duration of a benchmark is of no help to me as a consumer who may be looking to base my next purchase off of a review.

What may be an issue is the repeatability of Fraps runs, the whole 'save game, run straight ahead for 1 minute, rinse and repeat'. Is every encounter going to be exactly identical if the game does not have a built in benchmark or in game scene? Will there be additional geometry that may need to be rendered if for some reason the in game character is 1° out from the bearing they took on the last in game 'run' (as proposed above). That could introduce a new piece of geometry to the scene, different objects that may require z-culling etc.

I am pleased to read every AT review as IMO it is of very high quality, and I know that reviewers have very strict timelines to operate in, but perhaps there could be more follow up reviews that could explore minimums or new drivers where there would be time to collect all the data rather than get one review out by the NDA deadline.
 
Last edited:

BFG10K

Lifer
Aug 14, 2000
22,709
3,003
126
A minimum by itself is worthless given it’s often instantaneous benchmarking noise. Here’s a good example:

http://www.computerbase.de/artikel/prozessoren/2010/test-zwei-3-kern-cpus/27/#abschnitt_far_cry_2

Here a Core i3-530 has a higher minimum than an i7 870, and the X3 740 (3 GHz) has a higher minimum than X4 965 (3.40 GHz). Clearly someone looking at minimums would be deceived.

For a minimum to be relevant, it needs a benchmark plot putting it into context. Then you can actually see if low performance is happening for sustained periods that actually matter. In the absence of this, the only relevant single number is an average.
 

Blitzvogel

Platinum Member
Oct 17, 2010
2,012
23
81
I would agree that any measurement needs a plot or line graph covering the entire time of benchmarking so all the data can be interpreted at once in an easy to understand format. Random hiccups can and do happen.
 

Keysplayr

Elite Member
Jan 16, 2003
21,219
54
91
FEAR bench was really good in showing what percentages framerates were spending their time. 10% below 30, 70% above 30 and less than 50, 20% above 50. Etc. etc.
 

Rifter

Lifer
Oct 9, 1999
11,522
751
126
I agree graphs would be alot better than just a number, i mean who cares if the game went down to 15fps once during the entire game cause your HDD/pcie bus/whatever was jammed up for a second when it stayed about 30fps for the rest. Having just one number is completly pointless and you might as well not bother listing it, i dont even look at the min framerate on reviews.
 

Absolution75

Senior member
Dec 3, 2007
983
3
81
Good thread - I wonder if we can convince anand to do fps over time graphs rather than single numbers.
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
I wonder how many trials to [H] take for a single game, if is just a single trial.. then its not a good estimate to come to a definitive conclusion.

Statistically speaking, a reviewer should have to run a single game no less than 30 times to come to a conclusion to recommend one card over the other. But that is not practically possible and personally, I would like to have reviewers take atleast 3 trials and average those FPS plots.

Alternatively, one can play a game for 15 mins on each card and divide the whole set into 5 equal sets of 3 mins each and then average the corresponding data points to get a final average FPS plot(3mins or 180 data points). I know this is not a scientific method... but its worth a try atleast. It would be interesting to see how the results turnout to be.. Ill try this for a single game and post some results later in the day.

One obvious flaw I can think of, especially in FPS games, you should not die in the 15 mins time span.. other wise the results are flawed.

Can anyone fund me.. so that I can carry out some further research? :p
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Statistically speaking, a reviewer should have to run a single game no less than 30 times to come to a conclusion to recommend one card over the other.
Uh and where exactly did you get that number from? I must've missed the part in my statistics course where they described how you come up with one general number despite all important numbers (like umn standard derivation) missing.

Sure one run is problematic for obvious reasons, but 30 sounds just like some arbitrary high number too. I'd think generally 3 runs sounds more reasonably for the usual games where you can get a repeatable playthrough and if not you've got a big problem anyhow.
 
Last edited:

Absolution75

Senior member
Dec 3, 2007
983
3
81
Uh and where exactly did you get that number from? I must've missed the part in my statistics course where they described how you come up with one general number despite all important numbers (like umn standard derivation) missing.

Sure one run is problematic for obvious reasons, but 30 sounds just like some arbitrary high number too. I'd think generally 3 runs sounds more reasonably for the usual games where you can get a repeatable playthrough and if not you've got a big problem anyhow.


Not to mention that the data on graphs at [H] tend to follow each other. If they looked like

nvidia = cos(x) & amd = sin(x), then it'd be pretty obvious something is flawed. Most of the time, they are pretty similar with a fairly constant number of fps difference - not counting the valleys and peaks showing min/max.


When ever I run benchmarks, they are probably close to 95% the same with each iteration. . . Its not like you have running programs in the background that really change things up. 3 iterations is probably more than enough.
 
Last edited:

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
Uh and where exactly did you get that number from? I must've missed the part in my statistics course where they described how you come up with one general number despite all important numbers (like umn standard derivation) missing.

Typically, for any independent random variable, if the sample size > 30.. central limit theorem can be applied to that particular set of samples.

This mean of a large set of samples will closely approximate the estimated mean with 95% confidence interval.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Typically, for any independent random variable, if the sample size > 30.. central limit theorem can be applied to that particular set of samples.
Yeah but if we assume a standard derivation of say <5% (which is reasonable for many repeatable benchmarks) the difference between 5 and 30samples will be rather small.
Clearly you can use the law of large numbers or other similar laws, but I'm prettey sure that you can come up with a much smaller boundary in that case. Although I'm way too lazy to try to proof that right now.
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
Yeah but if we assume a standard derivation of say <5&#37; (which is reasonable for many repeatable benchmarks) the difference between 5 and 30samples will be rather small. Clearly you can use the law of large numbers or other similar laws, but I'm prettey sure that you can come up with a much smaller boundary in that case. Although I'm way too lazy to try to proof that right now.

I said typically and also,I mentioned doing 3 runs for each game personally in my post you quoted before. Yes, greater the samples higher the confidence interval.

And don't worry about the proof, we are in agreement here.

Also, you mentioned standard deviation which cannot be calculated for some random variables like Cauchy RV for instance. First, you need to prove that the data you collected follows a distribution for which standard deviation can be computed(Usually student's t-distribution for small sample size and normal distribution for large sample size).
 

Rezist

Senior member
Jun 20, 2009
726
0
71
I agree and for that reason I like checking out [H] for there card reviews but they don't benchmark enough games for my tastes.
 

Nox51

Senior member
Jul 4, 2009
376
20
81
My apologies rUmX I didn't see your post.

Seems that there is a general sentiment of agreement that bar graphs aren't the best way to represent this. Maybe if there is a certain point where there has to be consistent evidence of actually reaching it? Sort of like a threshold of level that is attained 10&#37; of the time? I think that would be a more indicative value than a pure snapshot.

Also maybe some of the guys that actually run benchmarks for review sites would like to comment on how they do it in more detail? I think some of us would be interested in it.
 
Last edited:

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
I bet if someone emailed Ryan Smith a very convincing thought out argument, along with methods, we could get time graphs on Anandtech.

*edit*

A problem I can see with this method, is its impossible to have a whole bunch of cards benchmarked in the same timegraph.
 
Last edited:

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
FEAR bench was really good in showing what percentages framerates were spending their time. 10&#37; below 30, 70% above 30 and less than 50, 20% above 50. Etc. etc.

This seems like the most logical way to test to me. Perhaps have an interactive graph on a line graph (similar to [H]'s) with a slider on the y-axis for 3 different datasets of your choice. Also have the choice to quickly select different cards. It would probably require tons of datapoints, but would tell all.
 

Lonyo

Lifer
Aug 10, 2002
21,938
6
81
This seems like the most logical way to test to me. Perhaps have an interactive graph on a line graph (similar to [H]'s) with a slider on the y-axis for 3 different datasets of your choice. Also have the choice to quickly select different cards. It would probably require tons of datapoints, but would tell all.

I would think it would be simple enough to run FRAPS numbers and calculate the &#37; below various levels and just display that on a bar chart, so basically you have FEAR style results presented without the need for a big graph that would get horribly messy with lots of results.
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
I didn't know FRAPS logged the FPS, I didn't know any program did. Sounds like it could be done easily enough...