[TECH Report] As the second turns: the web digests our game testing methods

VulgarDisplay · Jan 25, 2013

Keysplayr said:
erm.. I didn't say 2 people. I don't even think blind test is relevant. Too many variables are present.

So then frame time graphs are also completely irrelevant because the because the conclusions for what the graphs show are based on the perception of one person running the tests?

Makes a lot of sense. :colbert:

f1sherman · Jan 25, 2013

VulgarDisplay said:
This is a neat graph:

Now imagine how useful it would be if there was a line drawn across it at (just tossing a number out) 20ms and anything above that line was a detectable stutter. In that case both manufacturers would have work to do. Why do we not want to know what the magic number is?

Rikard said:
I agree. Just like with everything else in VC&G (FPS, AA etc). But for some reason I cannot grasp there are posters here that prefer ignorance when they are offered knowledge.

Anyway, previously in this thread I posted some graphs together with my personal experience from them. Those cannot be taken as general guidelines since we would need to test more people to see if they agree. But anyway I reiterate:

t= 8.8+-1.5 ms -> No visible stuttering. Probably because even with variation the frametimes are well below 16 ms. (High FPS saves the day)

t=16.7+-0.8 ms -> No visible stuttering. Probably because variation is not large enough (Too small MS)

t=15.9+-2.9 ms -> Acceptable microstuttering. A noticeable fraction of frames are rather slow due to the wider spread than in the previous case, but it is small enough to not be disturbing.

t=18.6+-4.4 ms -> Unacceptable microstuttering. Average frame time is large in combination with a very large spread in frametimes, giving a large fraction of frames with large frame times. At this level it is gives a disturbing effect.

One reason I have not pushed that further is that they involve different games, and different light levels, different movement patterns etc can interfere with the conclusions. I would rather use BrightCandles simulation where conditions can be reproduced systematically before taking this to the next level.

I suggest you do that, because above two ideas are pretty hopeless

VulgarDisplay is confusing stuttering
(which I would roughly define as a function of local frametimes variation and FPS)
with frametimes, and you are confusing it with frametimes deviation(across whole 2minute run).

At least in your attempt correlation relation between two quantities can be expected.
But please, no need for additional uncertainties and errors. You have enough on your plate as it is.

Carry on... ()

Final8ty · Jan 25, 2013

HurleyBird said:
Indeed. I'd be willing to bet most people on this forum would fail a proper double blind ABX test between 50 FPS and 60 FPS. That doesn't mean that 60 FPS isn't better and smoother, or that the difference can't be felt on some level even by those who are unable to determine the 60 FPS sample in a statistically relevant manner.

If the blind testing crowd was logically consistent they would swear off benchmarking entirely in favor of controlled double blind ABX testing. Not likely though.

I think that depends on the game.

When i was play DOA on the XBOX you could set Vsync for 60fps or 50fps, I put it on 50fps one morning before work by mistake, came home after work fired up the xbox and started playing and noticed right away that it was not as smooth to me as it used to be, i went through the settings and realized what i had done.

f1sherman · Jan 25, 2013

VulgarDisplay said:
There was no burden on the reviewer to validate an FPS metric because it is a number that is entirely separate from the human element. Perceived smoothness requires a human observer, which is precisely why they need to blind test a group of human observers to find the worst case undetectable level of frame latency to let anyone reading their review understand the graph.

You are twisting the whole issue upside down.

TR did not measure Perceived smoothness, which now needs to be independently blind-tested.
What they measured is FPS(frametimes), FPS variation and <50ms frametime count.

The idea that FPS does not need to be examined, blind-tested, and evaluated for threshold,
but FPS variation needs this effort in finding thresholds is somewhat weird.
That is, if you consider the fact that FPS itself is not very sound indicator of gameplay experience. We wouldn't be here if it was.

30FPS can be playable, yet 80fps can be utter mess.
That kind of a threshold is difference between GTX 560 and GTX 670. How freaking useful...

And how the hell you imagine to find sweet spot for FPS(frametimes) variation without considering and re-evaluating FPS is beyond me.

VulgarDisplay · Jan 25, 2013

f1sherman said:
I suggest you do that, because above two ideas are pretty hopeless

VulgarDisplay is confusing stuttering
(which I would roughly define as a function of local FPS variation and FPS)
with FPS(!), and you are confusing it with FPS deviation(across whole 2minute run).

At least in your attempt correlation relation between two quantities can be expected.
But please, no need for additional uncertainties and errors. You have enough on your plate as it is.

Carry on... ()

What are you even talking about?

VulgarDisplay · Jan 25, 2013

f1sherman said:
You are twisting the whole issue upside down.

TR did not measure Perceived smoothness, which now needs to be independently blind-tested.
What they measured is FPS(frametimes), FPS variation and <50ms frametime count.

The idea that FPS does not need to be examined, blind-tested, and evaluated for threshold,
but FPS variation needs this effort in finding thresholds is somewhat weird.
That is, if you consider the fact that FPS itself is not very sound indicator of gameplay experience. We wouldn't be here if it was.

30FPS can be playable, yet 80fps can be utter mess.
That kind of a threshold is difference between GTX 560 and GTX 670. How freaking useful...
And how the hell you imagine to find sweet spot for FPS(frametimes) variation without considering and re-evaluating FPS is beyond me.

Anyone looking at a specific FPS reading can easily say if that number is smooth to them. Any user can set a frame limiter to test their own limits.

You can't do that with frametime graphs.

Now explain to me what the purpose of a Frametime graph is if it's not to measure perceived smoothness of motion? If that is not the purpose then they can just completely discard it as a useless metric.

Maybe I need to clarify that the purpose of these proposed blind tests is not to find the number where a group of people can notice frame time variation, but to find the number where no one can notice it. With a large enough test group there would be those that are extremely sensitive to it, and when even those subjects can no longer sense frame time variations then we have the number that Nvidia and AMD need to aim for. I will say it again that the purpose of these tests is not to absolve AMD of any shortcomings, but to give both manufacturers a goal in what their hardware needs to produce. It's obvious you have chosen a side in this argument and feel the need to reject others opinions based on that. You do however, need to realize that these tests would benefit everyone regardless of the brand of hardware they choose.

Final8ty · Jan 25, 2013

VulgarDisplay said:
Anyone looking at a specific FPS reading can easily say if that number is smooth to them. Any user can set a frame limiter to test their own limits.

You can't do that with frametime graphs.

Now explain to me what the purpose of a Frametime graph is if it's not to measure perceived smoothness of motion? If that is not the purpose then they can just completely discard it as a useless metric.

+1

f1sherman · Jan 25, 2013

VulgarDisplay said:
Anyone looking at a specific FPS reading can easily say if that number is smooth to them. Any user can set a frame limiter to test their own limits.

You can't do that with frametime graphs.

WTH man. Do we really need to go back to basics??

For all intents and purposes frametimes and FPS are perfectly interchangeable terms if you keep in mind relation between the two:

FPS = 1000/frametime

In other words:
If you know one, you have perfect knowledge about the other one.

...

VulgarDisplay · Jan 25, 2013

f1sherman said:
WTH man. Do we really need to go back to basics??

For all intents and purposes frametimes and FPS are perfectly interchangeable terms if you keep in mind relation between the two:

FPS = 1000/frametime

In other words:
If you know one, you have perfect knowledge about the other one.

...

Then why is fps alone no longer good enough?

It's just 1000/frametime.

Final8ty · Jan 25, 2013

VulgarDisplay said:
Then why is fps alone no longer good enough?

It's just 1000/frametime.

I would not waste any more of my time on him.
We all know full well that frame time can vary and equal the same FPS over all even though if things were perfect, frame time should equal a specific time intervals based on the FPS.

f1sherman · Jan 25, 2013

VulgarDisplay said:
Then why is fps alone no longer good enough?

It's just 1000/frametime.

Previously when we talked about FPS, we considered either average FPS across whole benchmark run, or we talked about so-called current FPS (shown via FRAPS)

And yet it was not real current FPS, but averaged across FRAPS's polling time (1 second).

But if you have frametimes, you have perfect knowledge about real FPS, and vice versa.
TR could have just as easily graphed FPS instead of Frametimes... and the whole story would remain the same.

to answer your question:

why is fps alone no longer good enough

a) It was never good enough
b) It was never real current FPS, but averaged (usually across 1 sec like in FRAPS), or just average FPS (across whole benchmark run)

VulgarDisplay · Jan 25, 2013

f1sherman said:
Previously when we talked about FPS, we considered either average FPS across whole benchmark run, or we talked about so-called current FPS (shown via FRAPS)

And yet it was not real current FPS, but averaged across FRAPS's polling time (1 second).

But if you have frametimes, you have perfect knowledge about real FPS, and vice versa.
TR could have just as easily graphed FPS instead of Frametimes... and the whole story would remain the same.

to answer your question:

why is fps alone no longer good enough

a) It was never good enough
b) It was never real current FPS, but averaged (usually across 1 sec like in FRAPS), or just average FPS (across whole benchmark run)

Then what is the purpose of a frametime graph if FPS is not good enough. What does it show us? You have already said that a frametime graph just shows us FPS, and proceeded to say that FPS wasn't good enough.

I am well aware there is a relation between framerate and frametime, but you do not seem to be aware of what they are attempting to show us with these frametime graphs. The purpose of those graphs is to show how evenly the frames are being distributed which is all about the perception of how smooth the output is. Which brings us to the missing and most important piece of the puzzle which is at what point does a a spike in frametime become a visible issue to even the most susceptible viewer. Without knowing what point that is we just have a pretty graph that offers a lot of information without much to draw conclusions from. Looking at it and saying, "Well this one is all spikey and this one isn't" doesn't cut it when trying to draw actual conclusions based upon fact.

railven · Jan 25, 2013

3DVagabond said:
Nobody is calling you a liar. If somehow you are offended by what I've suggested, you want to have your feelings hurt, and that's not my fault. I am not trying to personally attack anyone.

Hurt my feelings? Really? Interesting you think you could hurt my feelings. Haha. Actually rather amusing, but let's get to point of my comment since you didn't even take the time to answer.

When you were asked who would you blind test, your response was:

3DVagabond said:
I would blind test the people who claim they see it first.

When asked why your response was:

3DVagabond said:
The whole point is to verify that it's visible during normal gameplay. Until it's verified it's not a fact.

What isn't a fact? That people saw it during gameplay? So, are you implying they are lying? The content of these two posts has absolutely nothing to do with a comparative of which is smoother. It is a out right questioning those that saw it.

Clearly they are lying until you can verify it. By your own wording.

Perhaps you got hurt feelings, I dunno, not even sure why you are questioning the validity of those who have seen it. Unless of course you are trying to claim what we don't see anything.

And because I'm bored:

3DVagabond said:
You forgot one important aspect with your study, the blind testing.

What study? I gave the perspective of two individuals. We saw it. You claim we didn't until it can be verified - by whom, well I dunno since you're the one calling our experiences into question.

3DVagabond said:
I've been involved with blind listening tests for audio equipment. It's really revealing how much difference people can perceive when they know which equipment they are listening to. When they then have to reproduce those perceptions and listening experiences when they don't know whether they are listening to amplifier A or amplifier B though, they can't. I'm not saying that with similar testing that you wouldn't be able still accurately assess which is "smoother". All I'm saying is that I've seen people who were absolutely convinced that amplifier A was far superior to amplifier B using all kinds of superlatives to describe the immensely superior musicality and then when they didn't know which one they were listening to couldn't identify one from the other with any more accuracy than if they flipped a coin. These people were not liars. These people were sincere that these differences existed.

Just in case there are "audiophiles" amongst us (I consider myself one) I'm not saying all amplifiers are equal or even sound alike. With a revealing enough speaker in a well designed listening environment differences can be heard and identified. But assuming two well engineered amplifiers of basically similar design (Not a 35watt tube amp compared to a Krell reference amplifier driving a fatally low impedance load) they sound very very similar.

Interesting, you clearly point out the faults of blind testing yet continually push for blind testing.

So if a person percieves the stutter at 17MS and a person doesn't at 34MS, where are we? Or do we aim for the majority who can not see it at 24MS? Sucks for that guy with the 17MS hawk eyes, cuz instead of pushing for no latency issues, bunch of you are crowd surfing for a status quo.

railven · Jan 25, 2013

VulgarDisplay said:
Then why is fps alone no longer good enough?

It's just 1000/frametime.

I got an honest question for, have you seen the stutter?

I've seen it, now in two games (woof, but one...well it's SW:TOR and I actually think it is more related to the game engine/latency than anything.)

I get 85 FPS with v-sync off and a mountain of tearing in this particular spot that I've been testing. It stutters. It is visible (very.) I can't explain why it does it, but I can tell you right now - 85 FPS, should that be stuttering to you?

And no, it isn't like the game slows down to a crawl, it's just a hiccup. Well a few. 85 FPS? Sure doesn't feel like it.

I'll post a benchmark of my test and the average FPS is 85. The minimum will be in the 20s and the maximum some where in the 100s. You'll say "them good results."

Then I'll post a frame meter time line where it shows these odd divots into the 20s followed by spikes into the 100s. And you should say "that's kind of weird."

f1sherman · Jan 25, 2013

old fps(averaged) is not good enough - hence new approach initiated by TR
new one(real current fps), is calculated from frametimes, so it's just as good

look at FRAPS output file to get a better hang of it

BrightCandle · Jan 25, 2013

Its important to keep with frame time rather than FPS because Frames per second measured within the second based on frame times isn't really right, fps is after all an average over 1 second. Reporting FPS per frame doesn't really make any sense, and it would confuse people in fraps were they to set FPS in the trace instead of frame time they would see the averages and not the individual frame times.

By all means convert them into FPS yourself if wanted, but its easier to just learn the timings and mental convert.

f1sherman · Jan 25, 2013

yeah I agree... my fault, it adds to confusion

but I was provoked because ppl talk about framtimes like it;s an variable from outer space,
and not the one which defines FPS itself.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0555a/BEIGDEGC.html

BrightCandle · Jan 25, 2013

Its worth mentioning we do have some broad strokes information based on a misbehaving driver here. That gives a general impression of good verses bad in regards to microstutter.

Since January last year I have been collecting frame traces and peoples impressions of the amount of microstutter involved. Given that I do now have an impression of where the targets need to be broadly. Its not scientific enough for my liking which is why I am working on software to do this, not least because its hard to recreate the problem in a game.

We don't know to the millisecond yet what the threshold is but we do know sub 5ms of microstutter doesn't seem detectable to most people and that >10ms is unplayable to some. I would argue right now 5ms maximum is a definitely the first target. It might really need to be 3ms or even 1ms but its certainly not above 5ms. That is a good place to draw the first bounding lines and I would do so on the derivative of the the frame time graph (take the difference again which shows the amount the frame time differed between the frames, max you want is +2.5 varying to -2.5)

I have not done the same study in stutters as microstutters so I can't comment on those in the same way.

Imouto · Jan 25, 2013

You can't see a 10 ms deviation in a 60 Hz monitor. Neither 5 ms for that matter. You can see deviations in 16.6 ms steps.

I said this before but rendering frames slower than 60 FPS and displaying them in a 60Hz monitor will show a 33.3 ms frame at least once per second. If the frame rate goes below 30 FPS you will get 50 ms frames.

TR tested Sleeping Dogs slightly above 30 FPS and set the threshold at 50 ms when a 60 HZ monitor would be messing the frame times that you can actually see even more than the shown in the graphs.

The only test that may actually be of any use is the one planned by PCPer and it will be a hell of work to make it readable to get any conclusion.

f1sherman · Jan 25, 2013

@BrightCandle 3,5,10ms... you are talking about frametimes standard deviation across whole benchmark run?

Lets asume you come to a conclusion that 3ms is safe zone.
Then imagine benchmark run where game runs pretty smooth for 80% of duration, but there is a 20% part where micro stuttering goes ape.

This method will under-report microstuttering perhaps like 2ms across whole run - therefore suggesting we are good.
Yet 20% of time we are far from good.

If microstuttering is not either 100% absent, or 100% existent through your whole control run - you are already lowering threshold bellow what's needed.
Also, where is FPS in all this

I am somewhat pessimistic about reaching reach general conclusions that are reliable and precise enough to be used later without the need to ponder over FRAPS output(and even then :hmm

,

but HEY! I appreciate the effort and it's REALLY nice to see some numbers for a change.

Granseth · Jan 25, 2013

Imouto said:
You can't see a 10 ms deviation in a 60 Hz monitor. Neither 5 ms for that matter. You can see deviations in 16.6 ms steps.

(...)

I am also curious to how you can see a stutter of 5ms when the screen won't update in 11,7ms?
If so, can somebody please explain to me how?
Or am I misunderstanding, and 5ms stutter means a frametime of 21,7ms?

Lepton87 · Jan 25, 2013

Keysplayr said:
Knowledge? What is the definitive "answer" you are looking for in these blind tests? When the said tests are all done and gone, what is the final answer you wish to have? More importantly, what is the question you want answered? I've read the thread and yes I have seen the reasons given for the testing, but what will it mean after the tests?

1. If the majority cannot see or detect frame latency (hitching, roughness) than is isn't a concern for gamers.
2. If the majority can see or detect frame latency (hitching, roughness) than don't worry, it will be fixed soon with newer drivers.

Look, we know how this forum works and I'm anticipating why this test is so vehemently requested. Because in all actuality, it ISNT scientific. No two persons eyes, or brain, is alike. Some will see it, some will not as is evident by most AMD users here. They can't see a thing.
Seriously though, TR, ABT, PCPER, all have the right idea. Scientifically reproducible results.

So we can't do science with human subjects?
At first I thought that this is the most idiotic and absurd idea I have ever heard. But then you really got me thinking, I saw a true understanding of the issue at hand in your post. You're are truly my hero. I can't believe I wasted so much time reading crap such as
http://www.sciencedaily.com/news/mind_brain/
http://www.sciencedaily.com/news/living_well/
I can't believe I wasted so much of my life on neuroscience
Thanks for the enlightenment. This will change my life forever.

BrightCandle · Jan 25, 2013

Imouto said:
You can't see a 10 ms deviation in a 60 Hz monitor. Neither 5 ms for that matter. You can see deviations in 16.6 ms steps.

Ummm of course it is. Microstutter has nothing to do with vsync and when it happens to the screen. Its to do with the game world "moment" being taken at uneven points. Games use the system time to determine how far the world has moved since the last frame so they can update your view based on your inputs and the animations. Its that which we are measuring as unsmooth and not the frames delivered to the monitor, which is happily happening at 16ms regardless of what is in them or if they are the same one after the other.

If you don't make the frame in time you are right it gets delayed to 33ms and that in itself can cause stutter but the game will continue on and render into the other buffer. There are a lot more buffers than we realise in the graphics pipelines today, they are often at the DX level into the GPU and then from the GPU out to the monitor and often game engines add an additional one for the game world to a render thread. So its perfectly possible to get even and smooth frames to the monitor from the GPU but have the game world be taken at uneven timings. This is what appears to be happening with the 7950 and many of the instances we see. Some of it might actually be caused later in the pipeline, its hard to tell with our frame time measure as its just time points of the present point but microstutter nor stuttering has to have any relationship to the buffering being used to control the monitor.

f1sherman said:
@BrightCandle 3,5,10ms... you are talking about frametimes standard deviation across whole benchmark run?

No I am not. Why would I take a very granular trace and then average it all out?! No what I am saying is you take the derivative of the frame time graph, draw lines at +2.5 and -2.5 and anything that swings above then below those two points is microstuttering noticeably. If you want to process the graph to determine how badly then I suggest you absolute all the values and then sort them, and graph that. In that case its 2.5 that is the threshold but it doesn't directly show that microstuttering is happening, the fact one frame was +2.5 and the next is -2.5 is what matters and not the absolute number of +2.5 and -2.5 frame deviations. A string of -2.5's in a row is not stuttering or noticeably a problem at all.

I have tried quite a few signal processing techniques to find something better and I haven't yet found anything better than this approach.

Imouto · Jan 25, 2013

Stop there.

Now you're not saying that you have a super-vision and you can tell apart differences at 6 ms level.

You're saying that you can tell apart differences in frame times that your monitor can't even show.

I told you about the emperors clothes back in page 7 but looks like we didn't move an inch in 23 pages.

Talk about brand suggestion.

Keysplayr · Jan 25, 2013

Probably because for the last 23 pages, people have been trying to think up new ways to discredit, delay, destroy, anything to do with the TR testing. Even you promised to go back to lurking, and here you are. The latest attempt is to say that it's impossible to see because the ms latency is to fast for the monitors refresh to show.
Did you not see the comparison video?

[TECH Report] As the second turns: the web digests our game testing methods

Diamond Member

Platinum Member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Platinum Member

Diamond Member

Golden Member

Platinum Member

Senior member

Platinum Member

Diamond Member

Golden Member

Elite Member