Fermi's lead over Cypress shrinks w/ new drivers

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
The results he presented seem to have mathematical errors and so are invalid - as in the raw data disagrees with the final result.

And that is assuming no foul play.

Or are you saying that happy medium point is that FRAPS sucks at maths?

I'm actually saying what I said. Yes, believe it or not, some people actually say what they mean and mean what they say. :D
 

NoQuarter

Golden Member
Jan 1, 2001
1,006
0
76
I'm actually saying what I said. Yes, believe it or not, some people actually say what they mean and mean what they say. :D

But the inaccuracies people are pointing out aren't inaccuracies between multiple gameplay run throughs as he's trying to prove, but rather inaccuracies in the data he collected. You can't skip ahead to addressing the accuracy of real gameplay benches if the data itself wasn't collected and presented properly to begin with, which it wasn't.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
That's the kind of person to wind up with a dual GTX480 setup. Simply buying one makes them no more of an enthusiast than a dentist with the most expensive harley they can get their hands on is a motorcycle enthusiast.

As someone who likes the gpu guy's to have fatter, not leaner, wallets so that the R&D budgets for next year's gpus are all the juicier I fully support calling these guys enthusiasts if that is what they want to be called. I'm not picky. The guy who is enthusiastic about his GTS250 is the enthusiast that ain't exactly pulling his weight if you know what I mean...sure we can't say he's not an enthusiast (that's uncool) but inside we are all rolling our eyes as we mutter "yeah dude, your hardcore all right :rolleyes:"
 

luv2increase

Member
Nov 20, 2009
130
0
0
www.youtube.com
As someone who likes the gpu guy's to have fatter, not leaner, wallets so that the R&D budgets for next year's gpus are all the juicier I fully support calling these guys enthusiasts if that is what they want to be called. I'm not picky. The guy who is enthusiastic about his GTS250 is the enthusiast that ain't exactly pulling his weight if you know what I mean...sure we can't say he's not an enthusiast (that's uncool) but inside we are all rolling our eyes as we mutter "yeah dude, your hardcore all right :rolleyes:"


:D
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
I'm actually saying what I said.

And I asked for a clarification because by just looking at the data presented (the raw data) it is clear he didn't prove is point (my interpretation from your post is that you believe he did).

First set of benches, BFBC2, is 100 and 92 average with only 2 samples. If I was buying a card and making the decision based on real game play benches and one card had 100 fps and other had 92, I would call a tie. Likewise, if the reviewers averaged the results and presented this card ad 96 FPS average I don't think anyone could say that this result wasn't a valid bench of the card value on this game. It is +-4% error on a setting where the card is in excess of 90 fps avg and minimums of over 70.

Second set of FRAP benches, FC2, Happy Medium concluded that the real game play benches are crappy because he obtained 59.283, 58.53 and 73.533.

Then a user pointed out that it is impossible for a bench that presented no value over 70 to arrive to a 73.533 average and that 3512 frames over 60s is in fact 58.533 fps.

So now the averages for 3 runs are 56.617, 59.283 and 58.533 which seem consistent. A 58.129 if averaged seems quite representative of this game in that situation.

And I'll give a look at the raw data now that I've time.

Anyway, no one can expect to dismiss the validity of a benching technique, that [H] for example, uses to get an overall view of a GPU performance in gaming, based on something that is obviously filled with errors.

EDIT: So I just added up all the raw data points of those 3 FC2 runs and I got 3332 frames, 3558 frames and 3234 frames, or 56.53 fps, 59.3 fps and 53.9.

So Happy Medium FRAPS suck - It can't add up frames, it can't choose minimum frame, it can't choose max frame, it can't obtain average frame.

Either FRAPS is quite bad (and might as well be), we don't understand how it works or something is wrong and you can't take this results in a serious way.

Yes, believe it or not, some people actually say what they mean and mean what they say. :D

What is this supposed to mean?
 
Last edited:

railven

Diamond Member
Mar 25, 2010
6,604
561
126
I'm actually saying what I said. Yes, believe it or not, some people actually say what they mean and mean what they say. :D

The issue with Happy's numbers are:
Test #1
1 + 1 == 2
Test #2
1 + 1 == 2
Test #3
1 + 1 == 5
Conclusion: See, margin of error my rear!


I really wish Happy would state what could have given him this error, at least for his own reputation.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
Just went to check if my FRAPS sucked at maths.

This is a Guild Wars bench, not that it matters.

Frames, Time (ms), Min, Max, Avg
9172, 60000, 49, 228, 152.867

50 86 100 115 124 121 101 132 141 132 102 79 93 104 115 140 174 162 125 135 143 138 138 143 154 136 119 152 172 170 179 176 183 189 186 201 210 217 226 213 220 155 99 102 122 123 169 175 179 173 175 172 188 179 197 188 194 193 203 189

The sum of data point by data point is 9171 and not 9172. The average is correct but the minimum data point is 50 and not 49 and the max data point is 226 and not 228. Looking at the frametimes you can see that is due to rounding and that FRAPS actually choose from the function and not the data points (that are rounded).

Still, minimum discrepancies.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Just went to check if my FRAPS sucked at maths.

This is a Guild Wars bench, not that it matters.

Frames, Time (ms), Min, Max, Avg
9172, 60000, 49, 228, 152.867

50 86 100 115 124 121 101 132 141 132 102 79 93 104 115 140 174 162 125 135 143 138 138 143 154 136 119 152 172 170 179 176 183 189 186 201 210 217 226 213 220 155 99 102 122 123 169 175 179 173 175 172 188 179 197 188 194 193 203 189

The sum of data point by data point is 9171 and not 9172. The average is correct but the minimum data point is 50 and not 49 and the max data point is 226 and not 228. Looking at the frametimes you can see that is due to rounding and that FRAPS actually choose from the function and not the data points (that are rounded).

Still, minimum discrepancies.

With that said, wasn't one of Happy's data sets minimum registered as like in the 50's but in the number's you could clearly see a 40's number.

With a +/- 1 difference in min and max, and average, that could lead to the margin of error of 6%? If we did the math manually, the averages might be even closer.
 

Lonyo

Lifer
Aug 10, 2002
21,938
6
81
Why not run FRAPS and a built in timedemo at the same time and see if the results are equal?
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
With that said, wasn't one of Happy's data sets minimum registered as like in the 50's but in the number's you could clearly see a 40's number.

With a +/- 1 difference in min and max, and average, that could lead to the margin of error of 6%? If we did the math manually, the averages might be even closer.

Man, in Happy Medium run #3 of FC2 the max is 241 when the highest number you can see is 67!

I think the exact average is the one FRAPS do based on the frametimes data (if you bench something with fraps you will see fraps calculate how much it takes to render each frame) and then FRAPS invert it from frametime to fps.

Something is really wrong with Happy Medium data.

Why not run FRAPS and a built in timedemo at the same time and see if the results are equal?

From the [H] link you posted earlier Lonyo

http://www.hardocp.com/article/2008/02/11/benchmarking_benchmarks/4

The “Real Time Timedemo FRAPS” data you see is gleaned from running the canned GPU timedemo in real time, and recording the framerate with FRAPS.

The traditional canned bench was 51.7 for the 8800GTX and the running the timedemo with fraps was 38.8 average.
 
Last edited:

railven

Diamond Member
Mar 25, 2010
6,604
561
126
Man, in Happy Medium run #3 of FC2 the max is 241 when the highest number you can see is 67!

I think the exact average is the one FRAPS do based on the frametimes data (if you bench something with fraps you will see fraps calculate how much it takes to render each frame) and then FRAPS invert it from frametime to fps.

Something is really wrong with Happy Medium data.

I'd rather give a person the benefit of the doubt before I assume it's the person. But those numbers are really skewed and his stance on these tests was set from like post #2 in the thread.

I'm starting to drink my Friday away so...might be my last post :D.

I've love to see someone post:
FRAP numbers of Canned Demo when I eventually wake up! haha.

EDIT:

Saw you're edit. Damn, that is steep diference, so I ask:
Do you think FRAP's memory allocation and writing to the HDD/RAM/whichever is affecting it's own count?

Unless the drivers aren't screwing with the render option but directly with how the application is counting frames.
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
Why not run FRAPS and a built in timedemo at the same time and see if the results are equal?

I have done what you said.

Game: DiRT 2
Resolution:1680 x 1050
Video Card: ATI 5750
Settings: Ultra

Time Demo Results:

1st run: F:4518/avg: 55.1/min:46.9
2nd run: 4629/56.0/46.6
3rd run: 4645/55.1/47.7

Fraps results:

1st run: 4093/53.771/44
2nd run:4108/54.884/48
3rd run:4035/53.96/45

Legend: F= total frames, avg= average, min= minimum frames
 
Last edited:

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
EDIT:

Saw you're edit. Damn, that is steep diference, so I ask:
Do you think FRAP's memory allocation and writing to the HDD/RAM/whichever is affecting it's own count?

Unless the drivers aren't screwing with the render option but directly with how the application is counting frames.

Can't really answer you.

What I know is that GW does have an ingame FPS indicator and fraps seems to always match it.

Never noticed any slowdown when benching games with fraps, but I'm not really a specialist in the matter.

I bet someone like BFG10K, apoppin or even Kyle (if he is around) are better sources for that.

EDIT:

I have done what you said.

Game: DiRT 2
Resolution:1680 x 1050
Video Card: ATI 5750
Settings: Ultra

Guess we can discount on FRAPS affecting the performance.

It also seems to go along with the previous remarks that DIRT2 canned bench is quite indicative of actual game play.
 
Last edited:

Keysplayr

Elite Member
Jan 16, 2003
21,211
50
91
Well, I guess it would be great if he would rerun those benches then. Figure out what went wrong, if in fact something did go wrong. We'll have to wait I guess.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,002
126
Built-in canned benchmarks aren’t the answer since we know both IHVs optimize for them.

Manual Fraps runs aren’t the answer either because they’re non-deterministic, so every benchmark run plays differently.

My philosophy is to make custom timedemos specifically designed to reflect realistic gameplay. Also these demos are never released to the public, so IHVs can’t optimize around them.
 

busydude

Diamond Member
Feb 5, 2010
8,793
5
76
My philosophy is to make custom timedemos specifically designed to reflect realistic gameplay. Also these demos are never released to the public, so IHVs can’t optimize around them.

That is about the perfect way to bench. Are there any games that can be benched, in-game, in real world situations? I regularly play urban terror and I can benchmark using in-game tool while playing, but I hit the FPS cap more than 95% of the time, which makes it useless.

I found DiRT 2's benchmarking tool is the best of the lot in modern games with different cars and different outcomes, albeit on the same track.
 

konakona

Diamond Member
May 6, 2004
6,285
1
0
Built-in canned benchmarks aren’t the answer since we know both IHVs optimize for them.

Manual Fraps runs aren’t the answer either because they’re non-deterministic, so every benchmark run plays differently.

My philosophy is to make custom timedemos specifically designed to reflect realistic gameplay. Also these demos are never released to the public, so IHVs can’t optimize around them.

If I am reading your post right, one problem would be that a reader would have to entrust the reviewer entirely as the demo itself is never revealed, not giving a clue what it includes there at all.

Then again, reviews are always subjective and you would need a fair bit of faith to take anything you read seriously anyway. Better than canned timedemos, yes I do agree.
 

tannat

Member
Jun 5, 2010
111
0
0
Built-in canned benchmarks aren’t the answer since we know both IHVs optimize for them.

Manual Fraps runs aren’t the answer either because they’re non-deterministic, so every benchmark run plays differently.

My philosophy is to make custom timedemos specifically designed to reflect realistic gameplay. Also these demos are never released to the public, so IHVs can’t optimize around them.


Hi,

But do you know that the repeated custom timedemo put the same stress on the system as the recording run? Has that ever been investigated? Not only the graphics card, but memory, cpu, hard drive reads?

If a timedemo does not rerun all the calls and processes exactly as the recording run did it will not represent real gameplay stress. And more importantly; it may be easy to optimize the drivers around custom timedemos. Noone needs to know the timedemo.

Since it seems like canned benching gains some 15%-20% for the fermi cards performance on the HD5870 and HD5850 compared to real gameplay runs, these cards might give a good option to see where recorded timedemos position themself on the scale. As gameplay or canned benches?
 

HurleyBird

Platinum Member
Apr 22, 2003
2,800
1,528
136
My philosophy is to make custom timedemos specifically designed to reflect realistic gameplay. Also these demos are never released to the public, so IHVs can’t optimize around them.

The obvious downside there is that your results become difficult to reproduce. There also may be fundamental differences between what gets processed in a time demo (even custom ones) vs. actual real world game-play depending on how the timedemo is done (eg. whether the game recomputes or merely replays the decisions the AI makes). Just goes to show that there is no such thing as a free lunch, and for the record I think your methodologies is one of the better ones.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
Custom timedemos is what [H] does and probably what those guys did of the review of the link OP provided did (you just have to look at the graphs presented).

How do you create your own timedemos by the way, BFG? I never tried doing that.

The obvious downside there is that your results become difficult to reproduce. There also may be fundamental differences between what gets processed in a time demo (even custom ones) vs. actual real world game-play depending on how the timedemo is done (eg. whether the game recomputes or merely replays the decisions the AI makes). Just goes to show that there is no such thing as a free lunch, and for the record I think your methodologies is one of the better ones.

Reproduce for the rest of us.

On the other way any static time demo can be optimized for.

If you read what Kyle said in his articles you would see that he says the benchmark is only part of his "evaluation".

So, yeah, as you say we need to trust the reviewer or on the other hand the canned demos.

Both being presented and compared would be the best alongside the impressions the reviewer got.

Actually scrap that.

The best it would be for all of us to have several machines at home to test it. :)
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
So you are saying that you aren't an enthusiast If you don't have dual 480s or 5870s?

His comments are basically translated into: "You aren't an enthusiast unless you can brag about spending x amount of $ on a product so that you can brag to everyone that you have the latest and greatest". In other words, you aren't a car enthusiast unless you have a Ferrari Enzo, Veyron or a McLaren F1. But yet I am pretty sure a 5970 CF or GTX480 Tri-SLI is faster than GTX 480 SLI. Therefore, by his very own arbitrary definition, he isn't an enthusiast either.

Also, luv2increase, how can you state that heat is a non-issue? I have to turn off Distributed Computing projects on my 4890 during hot summer days because my room gets too hot and I hate A/C.
 
Last edited:

happy medium

Lifer
Jun 8, 2003
14,387
480
126
Haha, there you go. Looking at the third data set, there are more <=65's than >70's just using that as a base there is no way he got 73.533 FPS average.

How misleading. There is no way he copied and pasted that wrong, he must have deliberately changed the values.

Hey I just copied and pasted. If need be I will run them again.
I think it will turn out the same way unless I can make the game do the same thing over and over again, which in most cases is not possible.

I pick action packed spots to simulate real game experience, not just running down a hallway. That probrobly why its at a lower frame rate.
 

GaiaHunter

Diamond Member
Jul 13, 2008
3,697
397
126
Hey I just copied and pasted. If need be I will run them again.
I think it will turn out the same way unless I can make the game do the same thing over and over again, which in most cases is not possible.

I pick action packed spots to simulate real game experience, not just running down a hallway. That probrobly why its at a lower frame rate.

The problem is the average is 73 and all the data points are under 70.

How do you explain that? Averaging numbers under 70 will never give averages over 70.

Likewise the max of that run is 241 (or something like that) and no data points are over 70.

Just look at the numbers you provided.
 
Last edited:

happy medium

Lifer
Jun 8, 2003
14,387
480
126
The problem is the average is 73 and all the data points are under 70.

How do you explain that? Averaging numbers under 70 will never give averages over 70.

Likewise the max of that run is 241 (or something like that) and no data points are over 70.

Just look at the numbers you provided.

I think I got the dates mixed up,there were alot of benches in the folder, sorry.

Here are the 4 benches I have in the folder, I deleted the others so there is no confusion.

Frames, Time (ms), Min, Max, Avg
4412, 60000, 15, 241, 73.533

FPS 67 66 66 67 67 65 66 64 61 58 62 64 70 71 70 68 70 71 70 65 64 65 65 64 65 64 65 64 65 65 65 65 64 65 65 65 62 61 31 19 25 196 229 237 233 238
228 175 25 25 25 25 25 25 25 25 25 25 19 45


Frames, Time (ms), Min, Max, Avg
3397, 60000, 42, 87, 56.617

FPS 55 58 62 49 52 50 51 55 59 66 67 65 63 66 64 63 58 65 64 69 77 86 58 51 51 52 52 53 48 43 50 60 70 71 55 43 42 43 43 47 48 49 45 46 46 45 47 50 52 59 61 61 51 51 60 66 63 65 70 65

Frames, Time (ms), Min, Max, Avg
3557, 60000, 13, 89, 59.283

FPS 57 58 63 59 58 60 64 63 60 62 60 70 66 70 62 57 53 53 54 50 46 51 54 56 54 51 53 53 47 49 56 59 45 46 40 45 45 44 48 55 58 85 76 83 83 84 85 84 85 87 88 89 88 88 84 14 25 25 25 24

Frames, Time (ms), Min, Max, Avg
3235, 60000, 41, 68, 53.917

FPS 61 65 58 57 57 62 63 62 60 65 61 53 55 57 52 46 47 53 49 50 57 62 64 53 52 52 54 64 59 60 60 63 60 57 60 67 60 50 49 50 49 50 45 54 51 50 46 49 53 45 41 43 44 45 45 44 43 42 54 55