HardOCP says anandtech has bad methods

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lopri

Elite Member
Jul 27, 2002
13,209
594
126
Typical Kyle style. Why bother even talking about him? He's an attention wh*** and money hungry. I'd say I know more about video cards than he does. It's easy to target the biggest English-spoken tech website to raise the tension and page views. I just feel bad for the real knowledgeable folks who 'work for' him. Like Dan Morris and even Brent Justice. It'd be hard to work when every time you write an article Mr. DumbDumb wants to 'edit'. It'd be very frustrating.

I even wonder the OP of this thread is Mr. Bennett's alias. (Joined 2/11/08) Is your site losing page views? Are you itchy for the spotlight on you that you feel like is not bright enough?
 

MagnusTheBrewer

IN MEMORIAM
Jun 19, 2004
24,135
1,594
126
Meh. Another tempest in a teapot. After reading replies on both sites, I have seen no reason to declare either side a winner. I will continue to read reviews and evaluations on both sites as well as many others.

I do give Kyle the lead in being a bigger fanboi of himself but the rhetoric in both camps just makes me smile until I realize how much time I wasted reading everyone's replies.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: munim
I think their reviews are perhaps less scientific, but a hell of a lot more relevant and I'll take that over whatever anyone else does. I mean, take that Crysis example where frame rates increased a hell of a lot in the fly-by but only a little in the real game. Is this a moot point to you guys? This bears no relevance at all?

Also, if you read the article carefully and in full, you'll notice that he says that he doesn't use the graphs he provides as the basis for his conclusion. He claims to play the game in entirety ( this may be a point of contention, but I dont' think he's a liar) and adjusts the settings until he gets the best image quality coupled with playability. He then picks out a particular stressful scenario from the game and displays the FPS to support his conclusion. Perhaps this isn't repeatable by you or me, but I'll take it over the canned stuff I suppose because it may lead to "cheating" like the 60% increase in FPS with AMD's hotfix.

yes Crysis is one BAD example
.. one out of hundreds, he picks a brand new one

i don't believe the GPU/CPU Crysis benchmarks are anything but dev tools 'modified' into something like a tech demo

notice it especially reports *where* the 'bad frame' is ... that would be a 'tool' for the dev to optimize the game

IF kyle is going to make a unique Benchmark Demo, he can post it for us ALL to evaluate .. i can load up his save and attempt to do the exact same thing at his settings. All these "canned benchmarks" are recorded so everything is as identical and repeatable as possible. Kyle won't let us see "his" special benchmarks like i can do with the other tech sites. For all i know, he plays through the game just to see the particular section where AMD cards "struggle" and where nvidia's "shines"
:roll:

he is asking us to TRUST him ... since he is OUT of LINE with *everyone else* [except for other fan sites] .. i DO NOT ... i require *proof*
He is the one making the outrageous claims. He needs to supply the documentation. Then i will believe ... and apologize.

do you hear that mr kyleB?

 

CP5670

Diamond Member
Jun 24, 2004
5,510
588
126
The thing I like about HardOCP is the maximum playable settings for each card, it gives you a more realistic image of what to expect with the card but AT's way of benchmarking is a better way of testing relative performance.

This is another problem with HardOCP's benchmarks. "Playable" can mean just about anything, depending on the player's point of view. I don't consider many of the settings and framerates HardOCP gives to be anywhere near playable, and there are others who would gladly settle with much lower performance than what their scores show.

Granted, there are problems with using cutscenes or official benchmarks too (like the issue with the AMD drivers in Crysis), but I think they're still better than the HardOCP style of testing, which is extremely subjective and has no more relevance to real world situations than a canned benchmark. The best alternative would probably be to have a prerecorded benchmark but one that is built in-house by the reviewers and is not accessible to the GPU companies. This could perhaps be doing using a sequence of macro commands or something, and sticking to maps or areas where randomness from AI behavior and so on wouldn't be an issue.

As for this latest tirade by Kyle, is anyone really surprised? He has quite a history of doing this sort of thing. :p
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: CP5670
The thing I like about HardOCP is the maximum playable settings for each card, it gives you a more realistic image of what to expect with the card but AT's way of benchmarking is a better way of testing relative performance.

This is another problem with HardOCP's benchmarks. "Playable" can mean just about anything, depending on the player's point of view. I don't consider many of the settings and framerates HardOCP gives to be anywhere near playable, and there are others who would gladly settle with much lower performance than what their scores show.

Granted, there are problems with using cutscenes or official benchmarks too (like the issue with the AMD drivers in Crysis), but I think they're still better than the HardOCP style of testing, which is extremely subjective and has no more relevance to real world situations than a canned benchmark. The best alternative would probably be to have a prerecorded benchmark but one that is built in-house by the reviewers and is not accessible to the GPU companies. This could perhaps be doing using a sequence of macro commands or something, and sticking to maps or areas where randomness from AI behavior and so on wouldn't be an issue.

As for this latest tirade by Kyle, is anyone really surprised? He has quite a history of doing this sort of thing. :p

we already do that ... it is SO easy to make a custom time demo; Keys made a good one for STALKER - the USUAL thing is to POST it for everyone to replicate and for peer review. Kyle's is HIDDEN ... no one can review his benchmarks because we don't know which section of the game he choose.
:roll:

unethical is a word that springs into my mind

... and there is NO way to keep recorded timedeomos from "falling into the 'hands" of the GPU companies ... but it is SILLY for them to optimize for each benchmark - they really need to optimize the game ... and generally they do just that

 

WT

Diamond Member
Sep 21, 2000
4,818
59
91
Well ... I felt bad for Kyle, so I went and took the initiative .. I bought him an Ultra 750w PSU and had it sent to use on their testbench rigs from here on out ... it was a highly regarded model that I hoped to use in my next build, but I felt it was better served powering up their test rigs to give us all accurate benchmarks.

No need to thank me ... these units were a Hot Deal that I found. Maybe Kyle will thank me too !
;)
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
28,453
20,465
146
Originally posted by: KristopherKubicki


Kyle's benchmarking method can justify whatever outcome he'd like. "Feel good" benchmarks are the easiest in the world since they can never be wrong. KillerNIC anyone?
Agreed.

However, the article does revisit an old issue that has been plaguing us for years i.e. benchmark specific optimizations. Perhaps the best methodology isn't Bennett's or Anand's, but a combination of both, a sort of compare and contrast, like the article. How else do you keep the IHVs honest? If just a "canned" benchmark for a game is run, how do you determine if benchmark specific optimizations are present? Particularly if both are doing it?

I don't take exception to what he wrote, just how he wrote it. While I am not going to rely solely on his very subjective, and arguably biased, opinion of a game play experience, having the canned benchmarks themselves, constantly subjected to close scrutiny, seems to me a good thing.

It is just a damned shame he concludes that his method is the better, and fires a shot across the proverbial bow, instead of vowing to benchmark the benchmarks each and every time, for the benefit of the consumer. Even then, there is doubtless room for error, but it should consistantly turn up a very blatant opt, right?
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
i believe this is gonna backfire on HardOCP - big time ... they may well get debunked once and for all
-it could work very well for AT and our forum

... i never thought a site owner would actually take notice of my posts and then attempt to defend by attacking credible sites and established benchmarking.
-he IS nervous

'come on KyleB ... let loose with your 'saves' and let us review them.
-if you have nothing to hide then you will be vindicated

Maybe he doesn't have enough bandwidth left on his site to post the videos of the benchmarks he chooses.* :p

:D

*in many RPG and online games this is known as a "taunt" .. a challenge to really show what you got ... enough "talk"
:lips:
 

UltimaBoB

Member
Jul 20, 2006
154
0
0
Obviously, objective benchmarking trumps subjective benchmarking *if it is possible*.

But if the tools don't exist to objectively benchmark actual gameplay, and if that which can be scientifically benched is being manipulated, then inherently subjective benchmarks - done as "objectively" as you can - may be the best you can hope for.

 

Rusin

Senior member
Jun 25, 2007
573
0
0
Intresting that everyone is all the time assuming that it's just OCP which is using that method and getting those results. There have been examples, but everyone seems to ignore them


----
I just tested Unreal Tournament 3 with time demo (CTF-OmicronDawn 12 bots)at 1650x1080 + 4xAA 16xAF + and graph settings at level 5.

Average fps was 73.08, minimum fps were between 0-5 (first two seconds are always hard for my system in this game. 95% of time it was over 30 fps..and over 83% of the time it was over 60 fps.
Then I tested for five times actual gameplay in that same map and same settings and i got 61 fps on average on my best run.

This happened even though I:
-Closed few programs that were running at that time demo test
-I excluded that first two second problem from results
-I deliberately did try to avoid the most graphically hectic moments.. clear difference to that time demo.

Then I ventured to test similar settings and there was 20 fps difference between actual game play and time demo.
 

madh83

Member
Jan 14, 2007
149
0
0
Anyone who conducts academic research will tell you that the goal is usually try to control for all variables to make them as equal as possible except what you're testing for. Otherwise the results could be influenced by something that's irrelevant to the test. This is pretty much the same thing as objective testing, as it tries to isolate the one different part of a system, i.e. graphics card etc.. If your goal is to compare graphics cards, this is really the ideal way to do it. Almost all statisticians would agree, I didn't say all because there's bound to be an oddball in the entire population as statistics often show = p.

However, I think the argument really boils down to what testing is supposed to accomplish. HardOCP with their testing style is saying that they care more about giving a rough idea of how a new card will function in your system. However, this makes a lot of assumptions about how everyone who buys the card will play, and what kind of system they have. It certainly has its merits in giving you an idea of performance, but as a measure of relative performance it is extremely rough and unscientific as has been pointed out.

Personally, I think H's testing style has its place, in that when buying a card it would probably be helpful to read both type of reviews. However, H's testing is much more prone to personal bias.
 

Rusin

Senior member
Jun 25, 2007
573
0
0
Madh83:
Not entirely on your topic, but this came to my mind:
I'm studying mathematics at Helsinki University and for example 3dMark is like most of those things that we are learning them: They don't have any real life value and we'll use them just because someone has invented them. Also it's like our courses there. They are made from basically two part: First there are lectures (Anandtech) that talk about mathematical theories on and on and talk much about theoretical situations..They don't usually tell how these theories apply in practise. Then there are after these lectures these small group teaching which is all about how these things work in practise (OCP). Thing is that if you test game many times it means that single test doesn't impact on overall score that much that it would make any significant difference.

Well I like Muropaketti's way more.. they try to find setting that runs at 30 fps minimum on that card which is being reviewed and then test other cards with that same setting. Some time the continue testing with the better card to find that "30 fps" for them.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: Rusin
Intresting that everyone is all the time assuming that it's just OCP which is using that method and getting those results. There have been examples, but everyone seems to ignore them


----
I just tested Unreal Tournament 3 with time demo (CTF-OmicronDawn 12 bots)at 1650x1080 + 4xAA 16xAF + and graph settings at level 5.

Average fps was 73.08, minimum fps were between 0-5 (first two seconds are always hard for my system in this game. 95% of time it was over 30 fps..and over 83% of the time it was over 60 fps.
Then I tested for five times actual gameplay in that same map and same settings and i got 61 fps on average on my best run.

This happened even though I:
-Closed few programs that were running at that time demo test
-I excluded that first two second problem from results
-I deliberately did try to avoid the most graphically hectic moments.. clear difference to that time demo.

Then I ventured to test similar settings and there was 20 fps difference between actual game play and time demo.

When Kyle benches you don't know ANY variables he uses

When AT benches you know it is a "clean" bare-bones rig set up from a image and all the games are played at optimum conditions for gaming

when i benchmark, it is with a "messy rig" like most of us have - suited also for all-purpose tasks ... i do not disable, SuperFetch or shut down antiVirus nor do i disconnect from the net

what you did is *different* ... but i can ask YOU to give me your map, your save and a video of the run you did so i can REPLICATE it as closely as is humanly possible ... approximating what i can with do a "canned timedemo"

-I deliberately did try to avoid the most graphically hectic moments.. clear difference to that time demo.
and that will account for most of the difference :p

 

Quiksilver

Diamond Member
Jul 3, 2005
4,726
0
71
This is kind of why I wish developers would include a sort of timed 5 minute demo of which is chock full of intense graphics, set mouse movement (eg. you can't messing it up by moving the cursor), is on a set fixed path, and has plenty of physics and motion as well(but not so much or too little that it wouldn't be outside the realm of not going to happen during gameplay), not to mention it changes accordingly as you increase/decrease your settings; so that your not just watching a video. I guess it would be similar to Aquamark but varies from game to game. At least then the only thing that could vary on benchmarks is that settings you choose and hardware.
 

L1Trauma

Member
Nov 29, 2000
57
0
0
Timedemo benching is as vulnerable to selection bias as any other research. If the sample is biased, then no matter how rigorous the trial is, the data is suspect. The H's point seems to be that the sample is not representative of the game. For those that overclock/bench, this is immaterial, as only the bench matters anyway. As for me, I read these articles to figure out what I can run at 1920x1080 (my monitor's native rez). When I read 1024x768 no AA bench charts I cringe.
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: Quiksilver
This is kind of why I wish developers would include a sort of timed 5 minute demo of which is chock full of intense graphics, set mouse movement (eg. you can't messing it up by moving the cursor), is on a set fixed path, and has plenty of physics and motion as well(but not so much or too little that it wouldn't be outside the realm of not going to happen during gameplay), not to mention it changes accordingly as you increase/decrease your settings; so that your not just watching a video. I guess it would be similar to Aquamark but varies from game to game. At least then the only thing that could vary on benchmarks is that settings you choose and hardware.

anyone with the skills can do it for themselves .. there is nothing 'magic' about it
 
Feb 11, 2008
1
0
0
Came here through Slashdot-Hard-Anand and after reading some of the comments I just can't help myself. how about you Apoppin tell if the results they found is somehow faulty and don't represent the real game on stop masturbating on those saves.

---

Good bye, TROLL.

Harvey
AnandTech Senior Moderator
 

Rusin

Senior member
Jun 25, 2007
573
0
0
Originally posted by: apoppin

When Kyle benches you don't know ANY variables he uses
When AT benches you know it is a "clean" bare-bones rig set up from a image and all the games are played at optimum conditions for gaming
One of my points is that Kyle ain't only one getting these results with same type of testing method.

-I deliberately did try to avoid the most graphically hectic moments.. clear difference to that time demo.
and that will account for most of the difference :p
What? Please explain? I mean with that actual game play I didn't have much going on that screen unlike in that time demo. Still that actual game play lost. Are you really claiming that if there were more happening (more things to render) on screen it would have ran faster?

Oh.. I also tested that (as mentioned in that post) and I was consistently getting 20 fps smaller fps. Yes..difference actually got bigger..not smaller as you did predict.
---

I deliberately did try to make actual game play to have better numbers, but I failed. I did for sake of my argument: I'm saying that these time demos run faster..so I did give actual game play better grounds to work.. but still time demo were faster.
---

Here is that test:
http://downloads.guru3d.com/download.php?det=1816

I heard that recent driver update would give Radeon users chance to use AA in Unreal3 based games..you could try that.
 

bluehaze013

Junior Member
Jan 29, 2008
16
0
0
Having just upgraded from an 8800GTS 512 to a 3870x2, I can say Anandtech's numbers are more relative to the performance I get in game with Crysis, not sure if this is due to testing methods or just coincedence. most likely coincedence as I never get similiar numbers as any of these reviews including H] in game testing. They are usually show much less fps than I actually end up getting however in this case AT review is pretty close to spot on.

I think the biggest problem with H] review method is it caters to a very small audience, that of one running the exact same system as they use to test. Because any difference will yield different results therefore your settings will be much different than those of H] unless you have the exact same hardware and clocks. Therefore the whole max playable settings concept does not seem very reliable to me as it introduces far too many variable for little if any reason.

Case in point H] tests 3870x2 on a 2.9ghz dual core processor, AT tests on a quad core, I am running a Q6600 and my results are similiar to AT's. Now accordibng to H] CPU makes no difference in scaling with the GPu but they are just plain wrong as eveidenced by all the reviews using Quad cores versus all the reviews using dual cores. Driver Heaven used a Quad coire for their review and had much better resulkts and they use virtually the same testing methodology as H].

Here is a link to a site that actually tested the scaling 8800ULTRA vs 3870x2 on different CPU's and you can see the 3870x2 benefits far more from a quad core and higher cpu clocks than an 8800ultra which explains why H] review and conclusion is wrong for the vast majority of people that would purchase a high end graphics card as anyone that is willing to spend 400+$ on a graphics card will most likely be running a high end processor to go with it, not a 2.9ghz dual core such as H] tested with.

http://www.pcgameshardware.de/...image_id=769656&page=1


 

Endgame124

Senior member
Feb 11, 2008
955
669
136
Hello All! I've been out of the hardware circle for quite a while now (the last big event I really followed was the Quake vs Quack timedemo cheating), and I don't quite understand the sides here. I half-heartedly keep up with things on Slashdot from time to time and saw the article, and thought I would post.

Back in the Quake vs Quack days, it seemed the video card companies were cheating the benchmarks used. A video card could get a drastically better score in the timedemo than in actual gameply. Isn't that the same thing thats going on here? If timedemos were useless due to cheating 7 years ago why are people still using them today?

An even better question: If we have 2 cards, A and B, and A is only good at timedemos, wouldn't B be the better card to buy? How would a review site know that B is the better card to buy for playing games unless they actually played the game and ingored the timedemos?

If possible, could someone lay it out in very small words for me? I thought the timedemo thing was gone long, long ago.

Thanks!
 

Nemesis 1

Lifer
Dec 30, 2006
11,366
2
0
Pot calls kettle black. LOL LOl. First off AT methods shows consistancy. Even tho time demo have their faults. Trying to show realworld results is sybjective at best. Real world is easily skewed. Hard is all about ego . They consistanly trying to cause trouble with other sites.
 

nitromullet

Diamond Member
Jan 7, 2004
9,031
36
91
I really like AT forums a lot, and I like the fact that Anandtech.com gains access to a lot of hardware before other review sites do. However, I think a lot of people in this thread are being somewhat overly kind to AT's video benching methodology. Sure, I get that they want to use canned benchmarks to speed things up and make the benchmarks reproducible, what I don't get is that AT often benchmarks card with no AA even when the fps on those benchmarks are over 100+. Seeing two cards compete against each other at 1920x1200 with 0AA were the fps on the two cards are say 120 to 133 is meaningless to me. Both cards rock that game.

Another thing that AT never seems to touch on in benchmarking are special AA modes that both NVIDIA and ATI now have. AT may actually bench a game with what they list as "4xAA", but I have a sneaking suspicion that they aren't using NVIDIA's super sampling or ATI's comparable adaptive AA. Personally, when I game I always try to run with these special AA modes whenever possible, and I think most people that buy the high end cards they bench do the same.

I guess what it boils down to is that I don't really fault AT for using the time demo benchmark approach because it is quick and easily reproducible, but I do sort of feel that their benchmarking methods have become a little lowest common denominator. They really aren't pushing the cards hard enough with regards to IQ to separate the true top performers from just the good performers. Case in point would be he 8800GT vs. the 8800GTX. AT makes the GT look like a "$250 GTX" when it's pretty much been shown that this is not the case once higher levels of IQ are used.

...Right now, if I had to pick only one site to use as my information for a buying purchase it be be neither AT nor HardOCP, but Computerbase.de - and I can't even read German well enough to read the articles!
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Nemesis 1
Pot calls kettle black. LOL LOl. First off AT methods shows consistancy. Even tho time demo have their faults. Trying to show realworld results is sybjective at best. Real world is easily skewed. Hard is all about ego . They consistanly trying to cause trouble with other sites.

Pretty much. It all comes down I'm right you wrong B.S.

They both have a place in the world.

I like HARDOCP apples to apples test that shows the benchmark graphs that fluctuates up and down. Everything else is just their opinion.
 

miniMUNCH

Diamond Member
Nov 16, 2000
4,159
0
0
I really like Anandtech's reviews and the forum community is great... generally kind, helpful, and very well informed...but I do not like the way graphics cards are benchmarked @ AT.

Canned benchies are basically playing cards with a loaded deck. I have a couple of friends in the game industry and have been told this point blank by my friends. e.g. "yeah... we helped nVIDIA make the timedemo run faster by cutting corners, etc."

I imagine that Anand, Derek, et al. probably have far more friends in the game industry so they must know this.

I generally ask my friends what card to get cause they design for and play on the best that nVIDIA and AMD have to offer... they give me their assessment of how well the cards actually run games in terms of performance and image quality.