Triple Buffering

DerekWilson · Feb 3, 2008

So ... here's the deal ...

A programmer can do what ever they want with the buffers they are given.

That said, it would be very unwise to do anything different than what TheSnowman suggests.

Also, if you look at it in this light, triple buffering is not useful without vsync ... there's no need to render to a third buffer when you can just flip every time your back buffer fills up.

If I were doing triple buffering, I would bounce back and forth between the two back buffers constantly rendering the scene in its current state. When one frame was done rendering I'd overwrite the next. As soon as we hit vblank, I'd swap in the most recently finished frame.

This does NOT create a situation where movement is jerky like BFG suggests.

It isn't "dropping frames" in the sense that we think of for video playback -- these frames never needed to be displayed becaues they represented the state of the game at an earlier time than the frame flip happened. Displaying only the most recently completed frame delivers the most current image that reflects the state of the game at a time nearest to the time it is seen by the users eyes.

In other words, if we do what TheSnowman suggest, the front buffer (what the gamer sees) will always be the nearest reflection of the state of the game.

If we have a game that renderes REALLY quickly (say inifinity frames per second) here's what we get --

double-buffered + novsync = infinity FPS with no input lag and probably lots of tearing

double-buffered + vsync = 60fps with 1/60 seconds input lag -- the next frame is finished rendering with the input state that existed as soon as the frame was flipped -- it must wait a full 1/60 seconds to display the updated frame. from a page flipping perspective this is the scenario that has the potential to cause the most input lag.

triple-buffered (vsync implied) = 60 fps with zero input lag

...

i'm sure a programmer could implement it in a different way. it might be that forcing triple-buffering from a driver causes problems with input lag depending on how the driver interprets what the game wants to do. it could be that some game designers were trying to use triple buffering to render complex multi-frame effects using both back buffers to render one frame.

Furthermore if you get to the state where both buffers are full and there's still no refresh cycle then a triple buffering system will stall just like a double buffered one.

this really would defeat the purpose of triple buffering ... it would be unwise to stall rendering and not overwrite the old buffer.

DerekWilson · Feb 3, 2008

just to add something ...

as above, if frames render much FASTER than 1/60 seconds, triple-buffering does provide an image more reflective of the current game state than double-buffering with vsync, and no tearing. triple-buffering is definitely the best option in games with very high framerate potential.

if it takes almost 1/60th of a second to render each frame, vsync or not, double-buffering or triple-buffering won't matter (except that vsync could reduce the chance of tearing). there really isn't much of a difference in this case, and you'd probably want to go with double-buffering with vsync in order to reduce the amount of graphics memory used while preventing tearing.

if frames take about 1/45th of a second (well, less than 60fps and directly between two even multiples gives the best illustration, but it applies to other situations as well) things get interesting.

double-buffering with vsync would always need to start drawing the next frame after a vblank. we swap buffers, and then it takes a cycle and a half to finish the render and we can't start drawing again until the swap. double-buffering with vsync must wait for the second cycle to finish -- giving us 30fps instead of the potential 45 we could get without vsync.

lets look at triple buffering if we start drawing a new frame right after a swap. It will still take a frame and a half to finish the frame, but we can start drawing the next frame right away. that means we wait two cycles to draw the first new frame, but the second frame is ready at the end of the next cycle because of the additional time the card was able to spend rendering. if we are looking at an ideal situation, we are actually getting the full 45fps we would expect from a no vsync situation (we skip one frame for every 2 we render).

on top of the added performance, frames are less outdated with triple buffering.

that means that triple-buffering, in addition to providing higher frame rates, will put a less outdated image on the screen more often than double-buffering with vsync ... if we look at frame times, in the above situation, double-buffering with vsync would always draw images that are based on a game state 0.0333 seconds in the past. by contrast, triple-buffering offers up frames that are always only 0.0222 seconds old.

if all other things are equal (which they rarely are), triple-buffering has the potential in cases like this to also reduce input lag.

but there really are other factors at work and its never that simple. depending on how the game developer implemented triple-buffering or how the driver handles things or how game state is updated or how input is processed ... i just wouldn't blame input lag ON triple-buffering specifically. If it does cause input lag in the game you're playing, I'd be more inclined to blame the developer or how the driver+hardware handle a triple-buffer situation.

It is also important to seperate the idea of double and triple-buffering from render ahead.

render ahead is different from the type of buffering used. in dx, you can render ahead 0, 1, 2, or 3 frames using either double, or triple buffering ...

BFG10K · Feb 3, 2008

I'm talking about dropping [Y] after [Z] has been completed in the other backbuffer to make room for rendering the frame after [Z]

But again this would be dropping frames which would cause jerky input.

Here is the DX triple buffering guide:

http://msdn2.microsoft.com/en-us/library/ms796537.aspx

Note the quote there:

It is preferable to have at least three flippable surfaces (some games use five or more).

If triple buffering works like you say in that it always overwrites frames and never stalls, what possible reason would a game have to use five or more flippable buffers?

Furthermore. you're KyleB on the B3D forums, aren't you? Why then in threads like this people talk about input lag but there's not even a peep out of you to contest them?

these frames never needed to be displayed becaues they represented the state of the game at an earlier time than the frame flip happened.

Uh, what? Of course the frames are needed to be displayed given you've provided input for the game and the frames contain an interpolation of the game?s tick based on that input.

Let's me put it this way: if I implemented a game that accepts input for 5 frames, drops the first 4 frames and only displays the latest one, would that be smooth?

Or how about for every 10 frames I drop the first 9 and only display the last one? Do you think the input response would be fine if I did that?

With 3 buffers dropping frames that's up to 1/3 of frames that should have been displayed but aren't.

These are legitimate frames that contain legitimate input and legitimate game engine states.

this really would defeat the purpose of triple buffering ... it would be unwise to stall rendering and not overwrite the old buffer.

Then again what possible reason would a game have to use more than three buffers, as quoted above from MSDN?

it might be that forcing triple-buffering from a driver causes problems with input lag depending on how the driver interprets what the game wants to do.

The issue of input lag with triple buffering is almost universal and you'll see it posted time and time again, especially by people playing competitive FPS gaming.

If I force triple buffering + vsync through the driver there is noticeable input lag in just about every game I try while the lag is slightly less with just vsync.

I suppose it's possible that application vsync + application triple buffering could be different but honestly why bother? Dropped frames could be just as bad as input lag plus most games don't even have both settings anyway.

DerekWilson · Feb 3, 2008

first ....

lets look at triple buffering if we start drawing a new frame right after a swap. It will still take a frame and a half to finish the frame, but we can start drawing the next frame right away. that means we wait two cycles to draw the first new frame, but the second frame is ready at the end of the next cycle because of the additional time the card was able to spend rendering. if we are looking at an ideal situation, we are actually getting the full 45fps we would expect from a no vsync situation (we skip one frame for every 2 we render).

on top of the added performance, frames are less outdated with triple buffering.

that means that triple-buffering, in addition to providing higher frame rates, will put a less outdated image on the screen more often than double-buffering with vsync ... if we look at frame times, in the above situation, double-buffering with vsync would always draw images that are based on a game state 0.0333 seconds in the past. by contrast, triple-buffering offers up frames that are always only 0.0222 seconds old.

even if you do triple buffering as that msdn page seems to indicate you should do it, you would still end up with the same result I have outlined here. this example has no "dropped" frames, provides images closer to game state, and has less input lag than if double-buffering with vsync was used.

you should also note that the msdn article doesn't say that you can't do it the way I said I would do it -- just that it provides the game with the ability not to wait on a flip to complete because you've got a spare buffer to draw into that's not involved in the flip.

...

and it is still not dropped frames. it's not anything like dropped frames.

each frame is a snapshot of game state at a specific point in time. if you are using vsync, you can only see one time slice every 1/60th of a second ... there is an infinite number of frames that COULD have been rendered in that time. each one would represent a "legitimate input and legitimate game engine state", but that doesn't mean we have to render every POSSIBLE frame between vertical refreshes. We can only see one and we only need one.

It's like our eyes ... light is constantly pouring into them, but we can only really "see" ~30fps ... Things are still happening and hitting our eye in between things we "see", but it doesn't matter that our eyes discard them even though they are legitimate photons representing legitimate images. In the same way, we can throw out frames that we render between refreshes if we have a more recent image without affecting the state of the game.

Let's look at the movie example -- if you watch a movie at 24fps, there were still things happening in between those frames that aren't rendered into video. if a film maker had recorded the movie at 48fps and then thrown out half the frames and played them back every 1/24th of a second, you would still get the same video as if the film maker had recorded the movie at 24fps.

::EDIT:: in rereading your post, it seems like you are assuming that if a frame is thrown out that the input that generated that frame is discarded ... this doesn't make sense.

Lets say I have two machines and each starts rendering at the same time. One draws frames at 1/3 of a second apart for one second the other draws frames at 1/2 a second apart for one second. Each will have EXACTLY the same first and last frame (provided they are given the same start state and input).

::EDIT 2:: I'm not trying to argue that you aren't seeing what you think you are seeing ... I would suggest not forcing triple-buffering in the driver though as I think game developers should have full control over how their games are rendered. I would encourage you to experiment with games that allow triple buffering to be set in game and see what happens though. Games that do it "right" will improve responsiveness and smoothness.

kylebisme · Feb 4, 2008

Originally posted by: BFG10K

I'm talking about dropping [Y] after [Z] has been completed in the other backbuffer to make room for rendering the frame after [Z]

Click to expand...

But again this would be dropping frames which would cause jerky input.

It is smoother than having the GPU sit idle waiting for vsync.

Originally posted by: BFG10K
Here is the DX triple buffering guide:

http://msdn2.microsoft.com/en-us/library/ms796537.aspx

Note the quote there:

It is preferable to have at least three flippable surfaces (some games use five or more).

Click to expand...

If triple buffering works like you say in that it always overwrites frames and never stalls, what possible reason would a game have to use five or more flippable buffers?

Becuase extra buffers are used to do effects like motion blur and depth of field, it has nothing to do with triple buffering.

Originally posted by: BFG10K
Furthermore. you're KyleB on the B3D forums, aren't you? Why then in threads like this people talk about input lag but there's not even a peep out of you to contest them?

Because often times I just read the last few posts of a thread and comment on those without ever looking at the rest of it.

Originally posted by: DerekWilson
even if you do triple buffering as that msdn page seems to indicate you should do it, you would still end up with the same result I have outlined here. this example has no "dropped" frames, provides images closer to game state, and has less input lag than if double-buffering with vsync was used.

The MSDN page does mention the dropping of frames, just not in those terms:

The DirectDraw front buffer object can now change fpVidMem (the display memory pointer) to make the surface at 22 the primary surface.

That means skipping 33 in favor of making 22 to the frontbuffer on vsync, becuase 22 is the more recently completed frame.

In other words, DirectDraw can write to surface pixel memory 33 because no flip is pending.

Since 22 was completed before vsync , 33 is outadted and that space is cleared to become the new backbuffer for the GPU to contenue rendering to.

BFG10K · Feb 4, 2008

::EDIT:: in rereading your post, it seems like you are assuming that if a frame is thrown out that the input that generated that frame is discarded ... this doesn't make sense.

No, I'm saying the feedback contained in that frame is lost because you never see any of it.

Let's take four frames A, B, C and D. Each frame contains an update of state of the engine (AI, physics, input, etc) so it's direct feedback to the user.

A is displayed as normal, but B and C both come earlier than the refresh cycle which means B has to be dropped to make room for D. C and D are later displayed as normal.

What that means all the feedback (i.e the game state displayed to the user) from B is lost. It also means that feedback for that particular situation increments from the usual single frame to two frames because the gap between B and D is two frames.

So if your mouse was in position 5, 10, 15 and 20 for each frame respectively, you're going to see the mouse jump from 5 to 15 and then go to 20 because you never saw B which contains the mouse at position 10.

Let's look at the movie example -- if you watch a movie at 24fps, there were still things happening in between those frames that aren't rendered into video. if a film maker had recorded the movie at 48fps and then thrown out half the frames and played them back every 1/24th of a second, you would still get the same video as if the film maker had recorded the movie at 24fps.

This is a poor example given film has no interaction with the user. Also film doesn't have an engine tick independent of the framerate tick like games, nor is it generating anything in real-time like a game.

::EDIT 2:: I'm not trying to argue that you aren't seeing what you think you are seeing ... I would suggest not forcing triple-buffering in the driver though as I think game developers should have full control over how their games are rendered. I would encourage you to experiment with games that allow triple buffering to be set in game and see what happens though. Games that do it "right" will improve responsiveness and smoothness.

I may find something that works but again why bother? The whole concept is inferior to a double buffered system running without vsync.

BFG10K · Feb 4, 2008

It is smoother than having the GPU sit idle waiting for vsync.

Not if it's dropping frames and not displaying things that should be displayed. But I don't think its dropping them.

Becuase extra buffers are used to do effects like motion blur and depth of field, it has nothing to do with triple buffering.

They?re talking about performance, not extra effects.

Furthermore they are talking about flippable buffers (i.e. having five more instead of three with triple buffering) so unless you think motion blur and depth of field buffers are flipped and displayed like regular buffers, your answer is nonsensical,.

Again in order to use more than 3 flippable buffers for performance reasons there must be scenarios where triple buffering isn't enough, like I was saying earlier.

The MSDN page does mention the dropping of frames, just not in those terms:

Look closely again. Look at this quote:

One buffer is almost always writable (because it is not involved in a flip) so the driver does not have to wait for the display scan to finish before allowing the back buffer to be written to again.

Almost always writeable, as in there will be situations where even a triple buffered system cannot write to any buffers.

This would back my claim that frames aren't overwritten if they haven't been display yet.

DerekWilson · Feb 4, 2008

Originally posted by: BFG10K

::EDIT:: in rereading your post, it seems like you are assuming that if a frame is thrown out that the input that generated that frame is discarded ... this doesn't make sense.

Click to expand...

No, I'm saying the feedback contained in that frame is lost because you never see any of it.

Let's take four frames A, B, C and D. Each frame contains an update of state of the engine (AI, physics, input, etc) so it's direct feedback to the user.

A is displayed as normal, but B and C both come earlier than the refresh cycle which means B has to be dropped to make room for D. C and D are later displayed as normal.

What that means all the feedback (i.e the game state displayed to the user) from B is lost. It also means that feedback for that particular situation increments from the usual single frame to two frames because the gap between B and D is two frames.

So if your mouse was in position 5, 10, 15 and 20 for each frame respectively, you're going to see the mouse jump from 5 to 15 and then go to 20 because you never saw B which contains the mouse at position 10.

wrong wrong wrong.

let's take your frames and look at what really happens over time.

Lets say these frames each take 0.00833 seconds to render (120fps).

lets say over these frames, we move our mouse from "position 5" to "position 20" perfectly smoothly ...

After frame A is rendered, we get game state and input in order to start rendering frame B. When we get the input, we look at elapsed time and input. Most position isn't absolute, its based on distance since the last update. In this case, we know that the mouse has move "5" units over a period of 0.00833 seconds. this gives us velocity for the move and we can determine the impact that has on the scene accordingly.

After frame B is rendered, we look at C which happens 0.00833 seconds after B. Again, the mouse has moved another "5" units over 0.00833 seconds. Everything is updated based on the time slice that has passed ... whether we just render frame C 0.0166 seconds after frame A or we render frames B and C 0.00833 seconds after their preceeding frames, we end up with the same state at frame C -- 10 units of movement with 0.0166 seconds passed.

if we skip frame B and render frame C 0.0166 seconds after A, we get an acurate reflection of game state -- 0.0166 seconds has passed since A with the mouse having moved 10 units in 0.0166 seconds ... this has the same impact as moving 5 units in 0.00833 seconds two times ... Frame C is the same whether we render it 0.0166 seconds after frame A based on an input of moving 10 units or if C renders 0.00833 seconds after frame B with 5 units of movement which followed 0.00833 seconds after frame A with 5 units of movement.

physics calculations are the same as well -- lets say an object starts falling from frame A to frame D. let's say deltaT is the time since the last update (0.00833). At frame B, the object has position = position + velocity * deltaT + 9.8 * deltaT * deltaT / 2 ... and velocity = velocity + 9.8 * deltaT ... frame C happens 0.00833 seconds after frame B ...

you can do some math yourself and see that if you render frame A and then frame C 0.0166 seconds later you will get the same output as if you render frame A and then frame B 0.00833 seconds later and then frame C 0.00833 seconds after that. well ... as long as you don't round too much

Both frame C outputs will be the same even though the game state and input was updated in between in one case and not another.

In my case here, you'd get less input lag with triple-buffering than with double-buffering + vsync because in the former you'd see frame A, then frame C which represents the entire input movement and game state over the 0.0166 seconds that has elapsed. If you used double buffering with vsync, you would only see frame A and frame B (frame C would wait to be rendered and would represent something different as it would) and frame B would lag in input 0.0833 seconds (it represents a game and input state 0.0833 seconds old).

so ... i mean, these are idealize numbers and assume a flip and getting/updating game state and input take zero time and all that ... but really conceptually this is accurate.

throwing out frames is equivalent to never having rendered them (generally). if you are still displaying the frames at 60 fps, you won't see stuttering and proper triple-buffering will make things smother and reflect input better.

and you still didn't address the case I presented above where no frames were lost and triple-buffering could provide higher framerates and less input lag than double-buffering with vsync ... even if you still don't get what i'm saying here, that example has to show the potential of triple-buffering to reduce input lag and improve framerate.

BFG10K · Feb 4, 2008

Both frame C outputs will be the same even though the game state and input was updated in between in one case and not another.

Yes, but you don't see the in-between updates the same way in both which is my point.

Let's build on your example and create a more extreme case.

I start at mouse position 0 and I move 5 units on the mouse for every 0.00833 seconds for five frames. At the end that puts me at mouse position 25 after a total time of 0.00833 x 5 = 0.04165 seconds.

[*](1) If I display all of those frames I see the mouse at position 5, 10, 15, 20, and 25.

[*](2) If I drop the in-between frames I see the mouse at position at 5, 15 and 25.

[*](3) If I drop everything except the first and last frame I see the mouse at position 5 and then it'll suddenly jump to position 25.

Does this pattern remind you of anything? It's painfully obvious that in terms of displaying data:

[*](1) is displaying at 5 / 0.04165 = ~120 FPS.

[*](2) is displaying at 3 / 0.04165 = ~72 FPS.

[*](3) is displaying at 2 / 0.04165 = ~48 FPS.

Yes, the math proves it'll be at position 25 whether I see 5 frames, 3 frames or 2, but are you honestly suggesting the visual feedback will be the same in all three cases?

If you are then you're basically saying the concept of framerate is meaningless since the game engine always updates internally at the same speed no matter what?s displayed.

It's quite obvious that the more frames there are, the more subdivisions there are with which to display movement and the smoother everything is.

Now before you say "but the framerate counter is showing 120 FPS in all three cases" then that's only the half-truth since the framerate counter isn't factoring dropped frames.

So if the counter is reading 120 FPS but I only see two of the five frames that were rendered, that's effectively displaying things at 48 FPS regardless of what it's doing internally.

and you still didn't address the case I presented above where no frames were lost and triple-buffering could provide higher framerates and less input lag than double-buffering with vsync ... even if you still don't get what i'm saying here, that example has to show the potential of triple-buffering to reduce input lag and improve framerate.

I never argued that triple buffering doesn?t improve framerate over vsync + double buffer because of course it does.

The point you two still don?t understand is that framerate is not necessarily an indicator of lag. It can be but it isn't always.

If a triple buffered system isn't dropping frames and two frames are rendered faster than a single refresh cycle, t'll render one frame ahead of what is being displayed and that introduces lag.

kylebisme · Feb 4, 2008

Originally posted by: BFG10K

It is smoother than having the GPU sit idle waiting for vsync.

Click to expand...

Not if it's dropping frames and not displaying things that should be displayed. But I don't think its dropping them.

It is droping them becuase it can only display as many frames as the monitor refreshes when waiting for vsync to swap frames.

Originally posted by: BFG10K

Becuase extra buffers are used to do effects like motion blur and depth of field, it has nothing to do with triple buffering.

Click to expand...

They?re talking about performance, not extra effects.

They are talking about the fact performance is best when you use at least 3 buffers, and the fact that it takes more than 3 when you are using other buffers for effects.

Originally posted by: BFG10K
Furthermore they are talking about flippable buffers (i.e. having five more instead of three with triple buffering) so unless you think motion blur and depth of field buffers are flipped and displayed like regular buffers, your answer is nonsensical,.

To do depth of field, the rendered frame is fliped from the backbuffer to the accumulation buffer to be blured based on the values in the depth buffer, and then flipped to the frontbuffer. For motion blur, one or more previously rendered frames is saved to blend together with the most current frame for display, and those frames flipped down the line as they are replaced by newer frames.

Originally posted by: BFG10K
Almost always writeable, as in there will be situations where even a triple buffered system cannot write to any buffers.

Almost always, as in the buffer holding the older frame isn't writable until the moment after the newer frame is completed and flaged to become the frontbuffer on vsync.

BFG10K · Feb 4, 2008

It is droping them becuase it can only display as many frames as the monitor refreshes when waiting for vsync to swap frames.

I don't believe it?s dropping them, otherwise you'd have the problems I was outlining to Derek.

They are talking about the fact performance is best when you use at least 3 buffers,

Yep, because 3 isn?t enough sometimes.

and the fact that it takes more than 3 when you are using other buffers for effects.

They didn't mention that anywhere. The entire article was discussing the context of rendering tied to vsync and not slowing down, not for the performance of extra effects.

To do depth of field, the rendered frame is fliped from the backbuffer to the accumulation buffer to be blured based on the values in the depth buffer, and then flipped to the frontbuffer. For motion blur, one or more previously rendered frames is saved to blend together with the most current frame for display, and those frames flipped down the line as they are replaced by newer frames.

I fail to see how any of this fits into the performance equation. I mean about the only other way to do this stuff would be rendering to textures but I sorely doubt that's what they're alluding to when they mention using multiple buffers for performance.

Furthermore if you force triple buffering through the driver or a utility, how exactly are you expecting it to have an effect on those other buffers?

"Almost always" as in the buffer holding the older frame isn't writable until the moment after the newer frame is completed and flaged to become the frontbuffer on vsync..

This is a disingenuous response and you know that's not what they meant. Why would you be writing to another framebuffer if you haven't finished the current frame?

Or to put it another way, if frame A is being displayed and B is still being rendered, why would you even try to start rendering C?

BFG10K · Feb 4, 2008

One other thing, to quote the article:

Increasing the number of buffers that can hold a primary surface increases display performance. It is preferable to have at least three flippable surfaces (some games use five or more).

They're talking about the number of buffers that can hold a primary surface (i.e. a frame that is currently being displayed), something which an accumulation or depth buffer will never be.

DerekWilson · Feb 4, 2008

Why don't you go try writing a game, a video driver, or building graphics hardware and see what happens?

Originally posted by: BFG10K

"Almost always" as in the buffer holding the older frame isn't writable until the moment after the newer frame is completed and flaged to become the frontbuffer on vsync..

Click to expand...

This is a disingenuous response and you know that's not what they meant. Why would you be writing to another framebuffer if you haven't finished the current frame?

Or to put it another way, if frame A is being displayed and B is still being rendered, why would you even try to start rendering C?

That last case is not what TheSnowman meant. And it is not disingenuous because the reason it is "almost" always available is not explained. These guys are worried about how long it takes to lock the back buffer, flip it to the front and the front to the back, and then to unlock the old front buffer for drawing ... it seems within reason to consider that the "almost" has to do with the fact that pointers do need to shift even if you always have one buffer that doesn't need to worry about being locked/unlocked. the game dev might also want to clear the buffer before drawing to it, though they have the option of not clearing buffers after a swap (or they used to anyway).

.... and now on to your counter example to mine ...

Now before you say "but the framerate counter is showing 120 FPS in all three cases" then that's only the half-truth since the framerate counter isn't factoring dropped frames.

So if the counter is reading 120 FPS but I only see two of the five frames that were rendered, that's effectively displaying things at 48 FPS regardless of what it's doing internally.

I would not say that the game ran at 120fps if the game was doing triple-buffering on a 60Hz display -- it would be running at 60fps if frames were rendered faster.

Aside from the fact that TheSnowman answered the question aptly with "it can only display as many frames as the monitor refreshes when waiting for vsync to swap frames," I think we need to go deeper ... this is gonna be painful, but I guess I'll be done after that ...

I would not say that lower framerates are indicative off higher quality -- If your monitor supports 120Hz refresh rate and you get that out of your game it will look and feel smoother than if you've only got 60Hz and your only swapping every other frame for a 30fps result.

But, if I can even remember back that far, the original question was whether you get better performance from triple-buffering over double-buffering with vsync enabled.

If you don't mind tearing and you want to run without vsync you will get the most upto date rendering with as little input lag as possible (due to page flipping anyway -- aggressive render ahead can still cause issues).

If you are going to run with vsync, triple-buffering is better than double-buffering for performance, smoothness, and the reduction of input lag if it is implemented well by the developer and everything else is equal.

The concept of framerate translates directly to how many images are displayed. This is absolutely important. They are images of a time slice and if you take more images of time slices closer together you're eyes have more of a chance of seeing a perfectly smooth image (you're eyes never "sync" with what they "see", so displaying more frames than we can see is useful).

But even what you said doesn't imply jerky input or motion -- these are evenly spaced time slices that happen no matter whether we see them or not. Reducing the frame rate increases the amount of difference between two frames. If you our your scene is moving fast enough to cause very large differences in images rendered 0.00833 seconds apart, you are going to be moving very very fast. If the differences are small (as they normally will be at that rate) then viewing frames at 60fps rather than 120fps won't make a huge difference. And its likely that less tearing will improve visual fidelity.

The bottom line is that even if you could some how actually "see" every single frame in a 120fps sequence that lasted 5 seconds, you would not be physically capable of reacting to the output of every frame every 0.00833 seconds. Do you think you could adjust your movements based on output 600 times in 5 seconds? How about 300 times?

and here is the kicker ... if you don't display the frame right when it's rendered, you are shifting the image further away from when the input happened. In a no vsync situation this is not an issue. But INPUT LAG IS REDUCED WITH TRIPLE-BUFFERING over double-buffering because you don't need to hold frames as long before display. both reduce framerate to 60fps if frames can render at 120fps. the difference is input lag.

from you example in situation 2) with double-buffering and vsync, frame 1 is displayed in position 1, frame 2 is displayed in position 3, frame 3 is displayed in position 5 ... THIS IS INPUT LAG.

if you look at your situation 2) with triple buffering, frame 1 is displayed in position 1, frame 3 is displayed in position 3, frame 5 is displaye in position 5 ... this is achieved BY "dropping" (not displaying) outdated frames in time slices in which they don't belong.

now tell me which situation you would prefer. both display fewer frames than no vsync, both eliminate tearing. triple-buffering is more accurate than double-buffering.

...

I was gonna outline what would happen over a 5 second period if each frame took 0.01111 seconds in the case of triple-buffering and double-buffering with vsync ... but that's a lot of work ...

let me sum it up for you and point you to my previous examples:

double-buffering with vsync provides 60fps with outdated frames, while triple-buffering provides 60fps with more recent frames.

again, which would you rather have it you are going to lock frame rate to 60fps?

and then there's that pesky case of the 45fps game rendering at 30fps with double-buffering with outdated frames but at 45fps with both triple-buffering and no vsync.

I am not saying everyone does triple-buffering well or "right", and you will need to play with the setting in different games and stay away from driver settings as a testament to how well a feature can work.

::EDIT:: just saw your most recent post ... msdn is a good starting point, but playing outside the "rules" is one of the best ways to advance technology. I wouldn't take microsofts word as law.

BFG10K · Feb 4, 2008

But, if I can even remember back that far, the original question was whether you get better performance from triple-buffering over double-buffering with vsync enabled.

I think we've been arguing two different things here. I was never contesting that because I agree.

If I ever run with vsync I always use triple buffering if I can because I don't like the fractional framerate problem.

My comment was that I have found triple buffering can add a bit of extra lag on over of just regular vsync, but this is still better than having to deal with 37 FPS on something that should running at 73 FPS (or whatever).

Furthermore I would also agree with the point you made earlier that triple buffering could be implementation dependent, where some situations behave like Snowman says (dropped frames) while in others they exhibit input lag like I've seen in certain cases.

But even what you said doesn't imply jerky input or motion -- these are evenly spaced time slices that happen no matter whether we see them or not.

Again, it doesn't matter if the engine tick is running at the same speed if the frames you see have big changes between them.

Try one of the ball/bar demos where it's possible to split the screen into different framerates, set one side to 30 FPS and the other to 120 FPS, and you?ll see exactly what I mean.

The 30 FPS side of the object, despite keeping the same onscreen position as the 120 FPS side of the object, will jerk badly because the subdivisons between each frame are very high compared to the 120 FPS side.

kylebisme · Feb 5, 2008

Originally posted by: BFG10K

It is droping them becuase it can only display as many frames as the monitor refreshes when waiting for vsync to swap frames.

Click to expand...

I don't believe it?s dropping them, otherwise you'd have the problems I was outlining to Derek.

The problem you imagined to Derek shows a fundamental misunderstanding of the process as does your refusal to accept that when two frames are rendered between refreshes the older one is dropped. That is because you have yet to understand the diragam on the page you linked. What do you think happens to 33 when, as the illustration shows, 22 is set to become the frontbuffer after the flip? As I have explained, 33 is dropped so that the memory can be made avalable to render the next frame.

Originally posted by: BFG10K

They are talking about the fact performance is best when you use at least 3 buffers,

Click to expand...

Yep, because 3 isn?t enough sometimes.

That "sometimes" being when you are already using a third or more for effects.

Originally posted by: BFG10K

and the fact that it takes more than 3 when you are using other buffers for effects.

Click to expand...

They didn't mention that anywhere. The entire article was discussing the context of rendering tied to vsync and not slowing down, not for the performance of extra effects.

And they mention more buffers being used becuase if you are already using a third or more buffers for effects, it takes yet another one to achive triple buffering.

Originally posted by: BFG10K

To do depth of field, the rendered frame is fliped from the backbuffer to the accumulation buffer to be blured based on the values in the depth buffer, and then flipped to the frontbuffer. For motion blur, one or more previously rendered frames is saved to blend together with the most current frame for display, and those frames flipped down the line as they are replaced by newer frames.

Click to expand...

I fail to see how any of this fits into the performance equation. I mean about the only other way to do this stuff would be rendering to textures but I sorely doubt that's what they're alluding to when they mention using multiple buffers for performance.

It fits in because, if you are using a backbuffer, an accumulation buffer, and a front buffer, that gives you 3 buffers but it doesn't mean you are using triple buffering or doing anything to improve performance. Another buffer has to be added in for that.

Originally posted by: BFG10K
Furthermore if you force triple buffering through the driver or a utility, how exactly are you expecting it to have an effect on those other buffers?

Becuase it doesn't really effect them, it simply places another buffer before the frontbuffer, regardless of how many buffers are being used, allowing a place to hold a finished frame so that the GPU can contenue rendering directly to the next one instead of having to wait for vsync. And again, it throws that stored frame away if the GPU finishes that next frame before vsync, becoming the new backbuffer which makes room to start rendering yet another frame.

Originally posted by: BFG10K

"Almost always" as in the buffer holding the older frame isn't writable until the moment after the newer frame is completed and flaged to become the frontbuffer on vsync..

Click to expand...

This is a disingenuous response and you know that's not what they meant.

I know it is exactly what was meant by the statement, as I know exactly why they didn't bother including the "almost" in this statement a bit higher up the page:

With triple buffering, the third surface is always writable because it is a back buffer and available to draw on immediately (as shown in the following figure).

Always, generally speaking anyway, aside from the tiny moment it takes to unlock that third surface after a newer frame has been completed.

Originally posted by: BFG10K
One other thing, to quote the article:

Increasing the number of buffers that can hold a primary surface increases display performance. It is preferable to have at least three flippable surfaces (some games use five or more).

Click to expand...

They're talking about the number of buffers that can hold a primary surface (i.e. a frame that is currently being displayed), something which an accumulation or depth buffer will never be.

Once the effects are completed in the accumulation buffer, yes that is a finished frame ready to be the primary surface. As for the depth buffer, I only mentioned it in explaining how depth of feild is accomplished.

BFG10K · Feb 5, 2008

The problem you imagined to Derek shows a fundamental misunderstanding of the process as does your refusal to accept that when two frames are rendered between refreshes the older one is dropped.

Uh, what? The problem I described is exactly that: dropped frames.

The problem with you accepting dropped frames shows a fundamental misunderstanding of what I was explaining to Derek.

A dropped frame (i.e a frame that is rendered but never displayed) is not an "imagined" problem. If you think it is turn off your monitor while playing games and see how you go.

In any case I?m not sure why you?re arguing this anymore as I?ve already accepted some implementations of triple buffering can drop frames.

Your problem is your continual refusal to understand and accept the ramifications of what dropping frames entails.

Your other problem is your refusal to accept triple buffering can introduce input lag when this has been observed many times.

nullpointerus · Feb 6, 2008

Originally posted by: BFG10K
A dropped frame (i.e a frame that is rendered but never displayed) is not an "imagined" problem. If you think it is turn off your monitor while playing games and see how you go.

With triple buffering, those dropped frames are due to the game rendering faster than the monitor's refresh cycle allows.

If the monitor runs at 60Hz, that's a constant 60fps where each frame takes exactly the same time to draw, and there's exactly the same waiting period between each frame.

[ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7 ][ 8 ][ 9 ] ( <-- time intervals of unspecified units )
[ x ][ _ ][ _ ][ _ ][ x ][ _ ][ _ ][ _ ][ x ] ( <-- x=frame, _=wait )

Right?

Game fps is a crude approximation that loses information. In reality, the game's rendering is speeding up and slowing down, varying many times per second. For example, the game might render 4 frames (a,b,c,d) between time intervals 1 and 4, yet it might manage to render only 1 frame (e) between intervals 5 and 9. This would look like:

[ 1 ][ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7 ][ 8 ][ 9 ] ( <-- time intervals of unspecified units )
[ x ][ _ ][ _ ][ _ ][ x ][ _ ][ _ ][ _ ][ x ] ( <-- x=frame, _=wait )
[ a ][ b ][ c ][ d ][ e ][ _ ][ _ ][ _ ][ f ] ( <-- a,b,c,d,e,f=frame; _=wait )

a --> monitor displays current front buffer, so frame 'a' goes into back buffer #1
b --> monitor is still waiting, so frame 'b' goes into back buffer #2
c --> monitor is still waiting, so frame 'c' goes into back buffer #1; 'a' is lost
d --> monitor is still waiting, so frame 'd' goes into back buffer #2; 'b' is lost
...
...

So we lose 'a' and 'b' while 'c' and 'd' are still present.

However, when the time comes at time interval '5', it only makes sense to display the most recently rendered frame (i.e. the most in 'sync' with game's internal time), so 'a', 'b', and 'c' are discarded while 'd' is set as the front buffer.

The result is:

front buffer --> freed
back buffer #1 --> freed
back buffer #2 --> 'd'

No matter how many frames are discarded -- we could, in theory, have twenty between time intervals '1' and '4' -- only the most current frame is worth displaying, given the fact that the monitor and the game both have a constant frame rate that is decoupled from the rendering rate.

It's not possible for the game to somehow speed the monitor up and make frames 'a', 'b', and 'c' displayable. (Tearing w/o vsync gets close to this effect, but there's NO way to do this with vsync on regardless of whether double or triple buffering is used.)

With vsync on, when 'd' is completedly rendered, 'a', 'b', and 'c' become meaningless. The monitor will have its most current frame 'd' by the next refresh cycle. There's no point in ever displaying 'a', 'b', or 'c' when 'd' is completed because to do so the game would have to run its output asynchronously from game time (i.e. display a, b, c, d, e, f -- no matter how outdated they have become or how far ahead game input processing has gone).

Since triple buffering has the effect of indefinitely increasing the number of frames that may be kept in waiting, the buffers will hold a frame closer to the monitor's next drawing time interval. Therefore, triple buffering decreases the delay between rendering and viewing. And the faster the rendering, relative to the monitor's refresh rate, the more triple buffering will reduce input lag.

BFG10K, your theories make no sense to me, but your stated experience makes sense to me in the real world. But I would posit that you are not getting increased input lag by enabling triple buffering + vsync (vs. double-buffering + vsync); you are merely seeing the existing vsync input lag being rendered more smoothly. And so, for the frames lost due to double buffering + vsync (vs. triple buffering + vsync) -- the frames you do NOT see -- your mind is interpolating and updating what OUGHT to be there, and, in effect, perceiving less input lag.

...

I have no idea why I just spent time on this... I need rest.

:Q

I hope what I wrote makes some sense in the morning...

BFG10K · Feb 6, 2008

However, when the time comes at time interval '5', it only makes sense to display the most recently rendered frame (i.e. the most in 'sync' with game's internal time), so 'a', 'b', and 'c' are discarded while 'd' is set as the front buffer.

Yes that's true, but like I've said repeatedly you lose the feedback contained in the frames that were dropped so the subdivisions are higher between the frames you do see.

So if your mouse starts at position 0 and moves 5 each increment the frames have mouse positions: A= 5, B=10, C=15 and D=20.

So when you see frame D your mouse has jumped from position 0 to position 20 because you never saw A, B or C or the mouse positions contained therein.

nullpointerus · Feb 6, 2008

Originally posted by: BFG10K

However, when the time comes at time interval '5', it only makes sense to display the most recently rendered frame (i.e. the most in 'sync' with game's internal time), so 'a', 'b', and 'c' are discarded while 'd' is set as the front buffer.

Click to expand...

Yes that's true, but like I've said repeatedly you lose the feedback contained in the frames that were dropped so the subdivisions are higher between the frames you do see.

So if your mouse starts at position 0 and moves 5 each increment the frames have mouse positions: A= 5, B=10, C=15 and D=20.

So when you see frame D your mouse has jumped from position 0 to position 20 because you never saw A, B or C or the mouse positions contained therein.

So...

vsync + 1 back buffer(s) --> displays frame closest to previous refresh
vsync + 2 back buffer(s) --> displays frame closest to current refresh

And you are saying that the former has less input lag.

Now we need a (specific) definition of input lag... I was thinking of it as the delay between what is seen on the screen and what the game's current state is. In this case, triple-buffering is best. But if you define input lag as the delay between what the user input and what his feedback was, double-buffering is best. I'm assuming vsync in both cases, of course.

Which is the most accurate/useful definition?

I'd love to see someone make a little app w/ high contrast (e.g. white on black background, or vice versa) polygons or images and provide settings and logging for frame rendering order, time intervals, etc. The "game engine" would just be something simple that requires coordination, like stacking boxes. Then everybody could see for themselves which buffering modes make input easier, and there would be no endless debate on this subject.

Ah, well...

kylebisme · Feb 6, 2008

Originally posted by: BFG10K

The problem you imagined to Derek shows a fundamental misunderstanding of the process as does your refusal to accept that when two frames are rendered between refreshes the older one is dropped.

Click to expand...

Uh, what? The problem I described is exactly that: dropped frames.

Sure, but like nullpointerus explained, you aren't describing triple buffering. You are just dropping frames without demonstrating any understanding why any triple buffering drops frames, then claiming dropping frames wouldn't effect a framerate counter when in fact frame counters ount the front buffer swaps so it do does, all while accomplishing nothing more than pointing out obvious fact that displaying more frames each second results smoother motion than displaying less. On top of that you somehow figured the first frame to be displayed would predict the position of your mouse at 5, the position it is when that first frame is completed rather than the position the mouse was at when rendering started on the frame. Here, I'll rewright your example to make it relevant to the subject at hand:

You start at mouse position 0 and move 5 units on the mouse for every ~0.00833 seconds, that puts your at mouse position 25 after a total time of 0.00833 x 5 =~ 0.04167 seconds.

[*](1) If you display all of those frames, it is because you running at a refresh rate of at least 120hz, as anything lower can't display 120 full frames each second. If that refresh rate is 120hz, you will see the mouse at position 0, 5, 10, 15, and 20, because each refresh is displaying a new frame.

[*](2) If instead you are running at 72hz refresh rate, you will see the mouse at positions 0, 10 and 20. Each frame takes 3/5's of a refresh to complete, meaning 5 frames are completed during every 3 refreshes, with the frames showing positions 5 and 15 being discarded because newer frames are finished before the next vsync.

[*](3) If you are running at a 48hz refresh rate, then you will see the mouse at 5 and then 20. Each frame takes 2/5's of a refresh to complete, meaning 5 frames are completed every 2 refreshes, with the frames showing 0, 10, and 15 being discarded because newer frames are finished before the next vsync.

Now we will look at what happens when using the same ~0.00833 render time with vsync, but without using triple buffering to discard outdated frames. Instead, rendering cannot contenue on to the next frame until vsync:

[*](120Hz)You get the same results, one frame is being completed for every refresh and the positions displayed are 0, 5, 10, 15, 20.

[*](72Hz)The first frame rendered shows where you started at 0, but instead of having a third buffer to start rendering a second frame immediately, the GPU has to wait vsync before the backbuffer becomes available. At the moment of that next vsync, the mouse pointer has reached ~8.33, which is the position shown on the frame drawn then and displayed on the next vsync. Only then can rendering start on the third frame, at which point the mouse is at ~16.67.

[*](48Hz)The first frame rendered again has to show 0, as instead of being able to render a more current frame the GPU has to hold that first frame to display on vsync. When the backbuffer is finally unlocked on vsync, the pointer has reached 12.5, and that position is rendered at on the secondframe which is displayed on the next vsync.

So, with double buffering rather than triple buffering we get:

[*](120Hz)Same results for both; 0, 5, 10, 15, 20.

[*](72Hz)Double buffering shows 0, 8.33 and 16.67 while triple buffering shows 0, 10 and 20.

[*](48hz)Double buffering shows 0, and 12.5 while triple buffering shows 5 and 20.

And there you have it, when rendering frames quicker than refreshing them, triple buffering reduces latency in comparison to double buffering by allowing rendering to continue on to a new frame directly after completing the last one rather than having the GPU sit idle while waiting for vsync.

Originally posted by: BFG10K
In any case I?m not sure why you?re arguing this anymore as I?ve already accepted some implementations of triple buffering can drop frames.

Becuase it isn't a "some implementations" thing, you can use 3 buffers and not drop the older frame when the newer one is completed if you like, but then you aren't doing triple buffering.

BFG10K · Feb 7, 2008

then claiming dropping frames wouldn't effect a framerate counter when in fact frame counters ount the front buffer swaps so it do does,

If it counted framebuffer swaps then it would display a framerate that exceeded the monitor's refresh rate whenever frames were being dropped, but clearly that never happens.

all while accomplishing nothing more than pointing out obvious fact that displaying more frames each second results smoother motion than displaying less

I thought it was obvious but apparently it isn't to some.

On top of that you somehow figured the first frame to be displayed would predict the position of your mouse at 5, the position it is when that first frame is completed rather than the position the mouse was at when rendering started on the frame.

Uh, no. I started at position 0, moved the mouse to position 5 which the game tick registered then generated the frame rendered at position 5.

And there you have it, when rendering frames quicker than refreshing them, triple buffering reduces latency in comparison to double buffering by allowing rendering to continue on to a new frame directly after completing the last one rather than having the GPU sit idle while waiting for vsync.

In theory. However in practice it can add visible input lag which again has been documented numerous times.

Becuase it isn't a "some implementations" thing, you can use 3 buffers and not drop the older frame when the newer one is completed if you like, but then you aren't doing triple buffering.

Triple buffering means using 3 flippable buffers; it states nothing about what replacement policy will be used, something the game can be free to choose assuming it?s not being forced externally.

If a game has its own framerate cap so it never renders frames faster than refresh rate (e.g. 60 FPS cap on a 85 Hz display) you don?t need to drop any frames but can still be doing triple buffering.

BFG10K · Feb 7, 2008

Now we need a (specific) definition of input lag...

When I refer to lag I mean a delay between user input and screen response.

When I move the mouse around quickly I expect the screen to ?snap? into place and keep up with my movements without delay, instead of ?sliding? around with resistance and momentum generated by a virtual weight.

For this reason double buffering without vsync is my preference at all times, except for compatibility reasons.

Hauk · Feb 7, 2008

Phew, I have a migrane. You guys are smart.

When you guys are finished, let us know which method is best for someone who doesn't like screen tearing.

BFG10K · Feb 7, 2008

When you guys are finished, let us know which method is best for someone who doesn't like screen tearing.

I think we all agree that answer to that is vsync + triple buffering.

taltamir · Feb 7, 2008

sooo... now that crossfire X is upon us and you can get two 3870x2 cards... how about QUAD buffering for you? silky smooth frame rates and quad buffering input lag. Any takers?

Triple Buffering

Platinum Member

Platinum Member

Lifer

Platinum Member

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Diamond Member

Lifer

Golden Member

Lifer

Golden Member

Diamond Member

Lifer

Lifer

Platinum Member

Lifer

Lifer