Let's have some fun

lopri

Elite Member
Jul 27, 2002
13,310
687
126
As BFG noted, this is actually being discussed in the other thread on this subject in here.

lopri, if you would like I can merge this thread with the other one, or you can repost your results in detail at the end of that thread.

DerekWilson
Forum Administrator


---------


Grab your popcorn. :D

Mr. DumbDumb published the following on 2/11/07.

Benchmarking the Benchmarks

Originally posted by: Kyle Bennett
Canned Testing with Crysis

Crysis represents what is probably the most graphically challenging game on the market right now that is enjoying a fairly large install base. Luckily enough it comes with a built in GPU benchmark that pretty much anyone can run, and they do...a lot. Even large sites like Anandtech exclusively relied on this canned benchmark for testing Crysis in its recent ATI Radeon HD 3870 X2 review.

Crysis ships with a built-in GPU benchmark, unfortunately the game is still too stressful to run at the highest quality settings so we're left running at the "high" defaults with no AA and at only two resolutions.

I am not sure if this is supposed to point the reader to any type of real world expectations at all, and it leaves me a bit confused, but Anandtech does go on to say this as well:

The last driver drop ensured that the 3870 X2 was actually faster than any single NVIDIA card in our lineup. At 1920 x 1200, the X2 is around 18% faster than the 8800 GTS 512. You'd need a pair of these X2s or faster in order to actually run at smooth frame rates at these settings unfortunately. It looks like the perfect card for Crysis still doesn't exist.

Does that mean if I play Crysis at 1920x1200 with the settings Anandtech used, my game on the X2 should run 18% faster than the 8800 GTS 512? Again, I am a bit confused here as to what the value of the information is.

Using real world gameplay, we come to much different conclusions at HardOCP about settings we can utilize and still play Crysis comfortably.

Canning H Benchmarks

All of this has left a lot of people very confused. And rightly so. You have a multitude of sites telling you that a 3870 X2 is ?faster? than an 8800 Ultra and HardOCP is telling you that the card does not perform up to 8800 GTX levels. The above is not to pick on Anandtech, but obviously it is the highest profile site to conduct ?canned? testing like this and its editors have openly defended their methods.

We decided to use the same canned Crysis benchmark and see how it compared to our own real world gaming testing of Crysis and see if we could understand all the results a bit better.
Then he reported his findings. First up is HD 3870 X2. According to him, running the built-in benchmark gave vastly superior scores, compared to the scores he got using Fraps in real time. According to him,

Crysis built-in demo report: Min 21 / Max 47 / Avg 33
Measuring the same exact demo with Fraps: Min 26 / Max 71 / Avg 46

Originally posted by: Kyle Bennett
What you are looking at above is the built in ?GPU? Crysis benchmark that you have seen so many people run and report numbers on. The settings used above are EXACTLY the settings that we used for real world in-game testing here on the 3870 X2. It is also the same exact hardware and driver setup. That said, this canned demo has to stand on its own since we cannot replicate the exact demo in real world gameplay (we?re getting to that, be patient), but we can run the canned demo in REAL TIME and record the framerate with FRAPS.

The ?Real Time Timedemo FRAPS? data you see is gleaned from running the canned GPU timedemo in real time, and recording the framerate with FRAPS. The ?Traditional Timedemo Benchmark? results are as you might expect from running in timedemo mode where the recorded demo runs as fast as it can till completion then gives you your benchmark scores.

So to put it simply, one is the canned GPU demo run real time and the other is the demo run in timedemo benchmark mode.

Now what you will immediately notice is that the two sets of results using the Crysis canned GPU demo are not even close to the same. Simply running the timedemo as a traditional ?timedemo benchmark? gives us a 38% increase in average framerate over running the canned demo at real time speed using the 3870 X2. Average framerate increased 38% going from a real time canned demo to a traditional ?fast as it can draw it? timedemo benchmark. Same demo, same settings, same hardware, same driver.
Next up is 8800 GTX. Once again, the same exact demo gives vastly superior numbers when run using Crysis' built-in benchmark.

Crysis built-in demo report: Min 25 / Max 54 / Avg 39
Measuring the same exact demo with Fraps: Min 32 / Max 72 / Avg 51

Strange, isn't it? It's getting more exiting.

Originally posted by: Kyle Bennett
The fact is that we cannot responsibly compare real world gameplay to the Crysis GPU canned timedemos since it is impossible for us to put together a real world FRAPS-monitored run through that mirrors the canned timedemo. We can however put together our own comparable real world Crysis run throughs and record them so we have custom demos. After we use FRAPS to record our real world gaming Crysis run though, we can then ?timedemo benchmark? that real world run though for comparison. It is also worth saying that this is not easily done as Crysis presents a long list of obstacles when it comes to recording real world demos. After a week of trials and a lot of practice, we finally got to a place where we could pull repeatable real world gameplay run throughs. You only get one shot at recording a ?good? real world run through to later use as a demo. If you screw it up, you have to start over.

Continued..
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
From post #1

So according to Mr. DumbDumb, we can only draw some weird conclusions or conspiracy theories, such as:

- AMD and NV are both cheating in this specific time demo built in Crysis, and every time drivers detect 'Benchmark_GPU.exe' it automatically (or by any measure) injects 30~40% points to Crysis' script, in which way that only Crytek's scripts are affected but somehow not Fraps. (sounds like some sort of virus.. haha)

- Fraps is somehow capable of properly measure the impact of A.I. or physics from recorded time demo (of which main purpose is testing system's graphics performance) that even the programmers who made this 'canned' benchmark couldn't predict.

- Or maybe, just maybe, the folks @H don't really have what it takes to what they do, let alone calling out on real professionals? Would it be possible at all, these people are indeed not educated or informed enough even to do what they're doing? Maybe these folks don't know nothing more than what I know, or even less? How could that be? They're running a huge site generates lots of traffics and, well, lots of revenue.. That's impossible..

Above imagination was inspired by the original thread regarding Mr. DumbDumb's loud voice to the industry and opinions of his followers, 'Concerned Geeks against Canned Peanuts'

http://forums.anandtech.com/me...erthread=y&STARTPAGE=5

Their theory sounded all reasonable and ethical, but the above results (see post #1) were not what I expected, so I decided to go ahead and see for it myself. My own results are also reported at the end of the previous thread.

In short, using a 8400 @3.6GHz and 8800 GT SLI I've got following result:

Crysis built-in time demo (1920x1200 / High Quality)

Crysis built-in demo report: Min 21 / Max 54 / Avg 34
Measuring the same exact demo with Fraps: Min 16 / Max 56 / Avg 34

Here are the details.

Originally posted by: lopri
  1. CPU: E8400 @3.60GHz
    Motherboard: 780i SLI
    Memory: 4 x 2GB DDR2-667 @800MHz/4-4-4-12
    GPU: 2 x 8800 GT (SLI)
    Monitor: Dell 2405FPW
    OS: Vista Home Premium 64-bit
Testing was conducted at the monitor's native resolution (1920x1200). Reason being to compare my result with that of AnandTech's found here. I didn't crop the screenshots so they are all at 1920x1200.
  1. Crysis
    Built-in Benchmark 64-bit
    Default High Quality
    ForceWare 169.25 Default Control Panel Setting
Result with Fraps NOT running: 33.79

http://img514.imageshack.us/im...8800gtslinofraput0.jpg

Result with Fraps running: 33.94

http://img177.imageshack.us/im...8800gtslifrapsnmz2.jpg

Result with Fraps running and manually benchmarked using F11: 33.14 / Result reported by Fraps: 33.80

http://img177.imageshack.us/im...8800gtslifrapsrei7.jpg

As you can see, I got just about equal results by both methods. I encourage everyone to test this out and see for yourself. This came to me as a total shock because I have been thinking all this while at least there would be a little tiny contribution that these so-called 'Real-world Gameplay' thingy could make for video card reviews. It couldn't come as more bogus or stupider way, that reveals the real intelligence (or complete lack of) these people who dare to influence the community and the industry.

OK. I won't sugar coat this any more. I will make it clear and summarize things for busy folks:

1. First, HardOCP's real-world gameplay is a complete myth. The way they measure the 'real world' is not achieved by sitting down and playing the games for hundreds of hours. What they do is recording their own time demo using Fraps, and getting avg. min. max that it reports. (with graphs)

2. Somehow, they believe that's better than other 'canned' benchmarks. I have no idea how a Fraps recorded clip would differ from any other recorded clips. I guess they can claim that they record 'realer' part of the games.. :laugh: Do you really? Mr. DumbDumb?

3. Further, most importantly, they don't freaking even know how to use Fraps! Or maybe they haven't learn how to type yet. Wait, that's not possible. All it takes is pressing F11 to benchmark using Fraps. I can't believe they P-U-B-L-I-S-H-E-D this article. I think this is so incredibly, absolutely, astonishingly moronic that it can completely kill Mr. DumbDumb's career as a journalist. I admire his courage but I wish he had verified his data with his peers before he published..

4. This all boils down to how erroneous and bias-prone their 'real world' gamplay can be. In the end, they're human and we humans make mistakes and we all have bias, don't we? Alas, trouble begins with us humans, not with the machines. This in turn boils down to how intelligent, how well informed, how skilled, and how ethical the human behind the machine is. And with your unprecedented-in-the-industry article, you just blew everything away, Mr. DumbDumb.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Not at all. No one else attempted to debunk their data yet as far as I know.

Edit: Check the last page of the previous thread. Here
 

Aberforth

Golden Member
Oct 12, 2006
1,707
1
0
they are going to shutdown Nvidia if this is true- actually it leads to acquiring more market share by unfair business practices, the consequences are serious.
 

milesl

Member
Oct 11, 2004
103
0
0
All mr.b is doing is playing a bit of the games until he finds an area that is stressfull on the system and the fps drop.Every game has an area like this.No company that includes ingame benchmarks has ever included an area like this in the benchmark because it makes their game look like it runs like crap and is unoptimized.

All he does is go to the stressfull area and walk around in circles to make the graph look like he is playing.The longer he stays looking at the stressfull area the lower the score is.The longer he stays away from that area the higher the score is.He can skew the score any way he likes.hocp real world bench results=all up to the reviewer.

He certainly is making some page hits and some more as dollars outta this one,but he lost my interest.I deleted his site from my bookmarks and won't go there anymore.I recommend all of you do the same.This isn't about vid card benching it about creating drama and page hits.

The only place I would go looking for real world info from h is in the h forums(of course these forums are just as good and others are also).That place has alot of fan boys,but also has alot of intelligent people that truly give their real world experiences with hardware and are good troubleshooters.
 

Narse

Moderator<br>Computer Help
Moderator
Mar 14, 2000
3,826
1
81
lopri, I did the fraps vs in game benchmark last night and got the same results as you. I could not get that differance that HOCP was getting eithier.
 

n7

Elite Member
Jan 4, 2004
21,281
4
81
Hah, very interesting.

This does not surprise me one bit either.
 

chizow

Diamond Member
Jun 26, 2001
9,537
2
0
Originally posted by: lopri
Then he reported his findings. First up is HD 3870 X2. According to him, running the built-in benchmark gave vastly superior scores, compared to the scores he got using Fraps in real time. According to him,

Crysis built-in demo report: Min 21 / Max 47 / Avg 33
Measuring the same exact demo with Fraps: Min 26 / Max 71 / Avg 46

Crysis built-in demo report: Min 25 / Max 54 / Avg 39
Measuring the same exact demo with Fraps: Min 32 / Max 72 / Avg 51

Strange, isn't it? It's getting more exiting.

Are these backwards? It seems the built-in canned demos are performing worst than when measured with FRAPS. Also, I didn't pore over that post on H, but I got the impression he was attempting to discredit the canned built-in by using a FRAPS bench of a real in-game test. The built-in score was much higher than the real-time FRAPS bench for both the X2 and GTX, but I suppose the GTX showed less performance hit in backing up his claim. From what I saw the difference would be pretty obvious, he was taking his bench from the level "Relic" which is where the game overall slowed down considerably. But overall I agree there should be little to no difference between the canned built-in results whether measured with FRAPS or the in-game counter. Small differences in min/max FPS are due to sampling/time differences.
 

ionoxx

Senior member
Jan 18, 2005
267
0
0
Oh lopri, good job.

Now... lets have this on the front page of Anandtech shall we!
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Originally posted by: ionoxx
Oh lopri, good job.

Now... lets have this on the front page of Anandtech shall we!
Thank you. :) To be frank I told a small lie in the 2nd post saying;

Their theory sounded all reasonable and ethical, but the above results (see post #1) were not what I expected, so I decided to go ahead and see for it myself. My own results are also reported at the end of the previous thread.

Well the truth is that I didn't know what the hell I was looking at, because the results were all the same and within the margin of error, other than slight (<1%) overhead while using F11 method. I actually read HardOCP's article afterward and couldn't believe what it's claiming. I mean, they should at least know how to use Fraps.. and just to think tat they've done this all those years.. causing drama in the industry, swaying people to make purchase decisions based on their erroneous data.. It was just too much not to bring it up.