7950GX2 Quad, the Second Coming.

Wesleyrpg

Junior Member
Aug 22, 2001
19
0
0
hey there,

about 18 months ago, when quad sli was in its infancy, most of the time enabling quad sli on 7950GX2, resulted in slower scores than running one card on its own!

well its been a while since then, and drivers have matured greatly.....but im curious.....did nvidia ever get quad sli working properly....ie with positive scores or did they just give up on the whole quad sli thingy?

Ha anyone got a quad sli setup with recent drivers?
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.
 

TanisHalfElven

Diamond Member
Jun 29, 2001
3,512
0
76
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

isn't that kind of a waste. i mean what if the user turns or fires. all those extra frame are wasted unless the game can predict what the gamer will do.
 

CP5670

Diamond Member
Jun 24, 2004
5,660
762
126
I don't think quad SLI was ever officially supported. It was an experimental feature that worked in some drivers, but I never heard anything about it after the 8 series launch so I guess it was abandoned. We may see it returning with the next Nvidia chipset, but for the 8 series cards only.
 

GundamSonicZeroX

Platinum Member
Oct 6, 2005
2,100
0
0
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

I thought that in SLI, the screen is split into two (or in this case four) and one card renders the top half of the screen (or in this case, two) and the other renders the bottom half of the screen (again, two in this case.)
 

nitromullet

Diamond Member
Jan 7, 2004
9,031
36
91
Originally posted by: GundamSonicZeroX
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

I thought that in SLI, the screen is split into two (or in this case four) and one card renders the top half of the screen (or in this case, two) and the other renders the bottom half of the screen (again, two in this case.)

It depends on what type of rendering is being used. What you described is SFR (split frame rendering). What taltamir described is AFR (alternate frame rendering), where each card renders every second frame. IIRC DX9 had the limitation of only being able to render 3 frames simultaneously, so you really could never fully use four gpus in quad-SLI. With DX10, that number has been increased, so you can theoretically have all four gpus working at the same time.

NVIDIA actually has two types of AFR, I'm not sure what the difference in between them.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,002
126
well its been a while since then, and drivers have matured greatly.....but im curious.....did nvidia ever get quad sli working properly
Quad SLI still doesn't work under Vista and I don't think it ever will, hence nVidia's move to tri-SLI.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Originally posted by: tanishalfelven
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

isn't that kind of a waste. i mean what if the user turns or fires. all those extra frame are wasted unless the game can predict what the gamer will do.

Consider, if you will, that a normal playable framerate is 30 frames per SECOND. And that the ideal (and max on most monitors) is either 60 or 75.

So if you are 3 frames ahead and fire a weapon then frames aren't wasted, you just experience a 1/10 of a second delay before the animation kicks in.
 

nullpointerus

Golden Member
Apr 17, 2003
1,326
0
0
Originally posted by: taltamir
Originally posted by: tanishalfelven
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

isn't that kind of a waste. i mean what if the user turns or fires. all those extra frame are wasted unless the game can predict what the gamer will do.

Consider, if you will, that a normal playable framerate is 30 frames per SECOND. And that the ideal (and max on most monitors) is either 60 or 75.

So if you are 3 frames ahead and fire a weapon then frames aren't wasted, you just experience a 1/10 of a second delay before the animation kicks in.

You and tanishalfelven appear to have two different definitions for the phrase "rendering frames ahead": 1) the extra frames are what the game guesses will happen at some point in the future, so they are discarded (i.e. never shown on the screen) whenever the game guesses wrong; and 2) the extra frames are synchronized with the game engine, so they will be rendered whenever the monitor gets around to it (if v-sync is on), but input is desynchronized so that the keyboard/mouse commands to fire the weapon will not actually be considered until the latent frames have been rendered.

Either case would appear to have diminishing returns. If you had 20 cards, each rendering ahead, then, when a weapon is fired that would either be [case 1]: 19/20 (95%) of your graphics processing power wasted, or [case 2]: input lag of 19 frames (1/3 of a second). But both these cases are for AFR.

SFR would appear to have diminishing returns, too. With 20 cards, each of which is told to render 1/20th of the screen, some of the cards would be rendering the top portion of the screen, where there is typically little or no work to do, while other cards would be rendering the center of the screen, where most of the work occurs.

The real question is, of course, the benefit of 2 vs. 4 cards (not 20, a number which was picked for illustrative purposes only). However, since there are diminishing returns in all cases, quad SLi would never approach 400% efficiency in a real game: each additional card will make less and less of an improvement. The information required to render a frame is still tied to input, a fact which will never change. Expecting 4x 8600GTS to equal 2x 8800GT would probably be a pipe dream, even if the theoretical maximum rendering rates were comparable.

IOW, one could easily create a contrived example in which quad SLi will hit 400% efficiency over a single card of the same type, but the contrived example would never apply to games where input must be tied to frame rendering. Conversely, I can see how quad or octal-SLi would be extremely effective in an render farm because the frames are being rendered as fast as possible from predefined scripts, not dynamic human input.

The biggest problem is compatibility. nVidia and ATI have enough trouble getting (and keeping!) two-GPU solutions working with the most popular games. Quad-SLi would be worse off (in terms of driver problems) even if it became as officially-supported as two-card SLi.

Changing the DX/OGL specifications (or the hardware implementation) to make games more compatible with multi-GPU rendering would be a great first step, IMO. That's why I am interested in the new multi-GPU approaches, which could allow for greatly increased compatibility (since frame element distribution and composition would be directed at the hardware level rather than at the software level).

EDIT: Changed wording to make this post readable. :p
 

Dream Operator

Senior member
Jan 31, 2005
344
0
76
Originally posted by: taltamir
Originally posted by: tanishalfelven
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

isn't that kind of a waste. i mean what if the user turns or fires. all those extra frame are wasted unless the game can predict what the gamer will do.

Consider, if you will, that a normal playable framerate is 30 frames per SECOND. And that the ideal (and max on most monitors) is either 60 or 75.

So if you are 3 frames ahead and fire a weapon then frames aren't wasted, you just experience a 1/10 of a second delay before the animation kicks in.



LOL. That was funny. 30 fps is normal? Maybe for you. Normal should only be defined by ones setup and the settings in the games they play. I might keep my details down so that my norm is 60 (my 7800GT can't keep up with high settings any more!). If someone owns two 7950GX2's, I doubt they are getting 30fps in any game, except Crysis of course (lets not even started on that.).

Monitors do not work in FPS. They do have a refresh rate, and you can sync to that at your discretion. I have two 22" CRT's that go up to 120hz. I can have a vsync'd rate of 120fps, if all things are in order for that.
 

aka1nas

Diamond Member
Aug 30, 2001
4,335
1
0
Originally posted by: GundamSonicZeroX
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

I thought that in SLI, the screen is split into two (or in this case four) and one card renders the top half of the screen (or in this case, two) and the other renders the bottom half of the screen (again, two in this case.)

In SFR, the individual frame is analyzed for complexity and portions are assigned to the cards in an attempt to load-balance equally. In theory, this should give you better scaling than AFR, but the analysis algorithm has much higher overhead and many games don't handle SFR well.

IME, most of the time when a game has SLI issues, Nvidia's fix is to issue a new profile that forces AFR2, which is AFR with some game-specific compatibility flags.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
Originally posted by: nullpointerus
Originally posted by: taltamir
Originally posted by: tanishalfelven
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

isn't that kind of a waste. i mean what if the user turns or fires. all those extra frame are wasted unless the game can predict what the gamer will do.

Consider, if you will, that a normal playable framerate is 30 frames per SECOND. And that the ideal (and max on most monitors) is either 60 or 75.

So if you are 3 frames ahead and fire a weapon then frames aren't wasted, you just experience a 1/10 of a second delay before the animation kicks in.

You and tanishalfelven appear to have two different definitions for the phrase "rendering frames ahead": 1) the extra frames are what the game guesses will happen at some point in the future, so they are discarded (i.e. never shown on the screen) whenever the game guesses wrong; and 2) the extra frames are synchronized with the game engine, so they will be rendered whenever the monitor gets around to it (if v-sync is on), but input is desynchronized so that the keyboard/mouse commands to fire the weapon will not actually be considered until the latent frames have been rendered.

Either case would appear to have diminishing returns. If you had 20 cards, each rendering ahead, then, when a weapon is fired that would either be [case 1]: 19/20 (95%) of your graphics processing power wasted, or [case 2]: input lag of 19 frames (1/3 of a second). But both these cases are for AFR.

SFR would appear to have diminishing returns, too. With 20 cards, each of which is told to render 1/20th of the screen, some of the cards would be rendering the top portion of the screen, where there is typically little or no work to do, while other cards would be rendering the center of the screen, where most of the work occurs.

The real question is, of course, the benefit of 2 vs. 4 cards (not 20, a number which was picked for illustrative purposes only). However, since there are diminishing returns in all cases, quad SLi would never approach 400% efficiency in a real game: each additional card will make less and less of an improvement. The information required to render a frame is still tied to input, a fact which will never change. Expecting 4x 8600GTS to equal 2x 8800GT would probably be a pipe dream, even if the theoretical maximum rendering rates were comparable.

IOW, one could easily create a contrived example in which quad SLi will hit 400% efficiency over a single card of the same type, but the contrived example would never apply to games where input must be tied to frame rendering. Conversely, I can see how quad or octal-SLi would be extremely effective in an render farm because the frames are being rendered as fast as possible from predefined scripts, not dynamic human input.

The biggest problem is compatibility. nVidia and ATI have enough trouble getting (and keeping!) two-GPU solutions working with the most popular games. Quad-SLi would be worse off (in terms of driver problems) even if it became as officially-supported as two-card SLi.

Changing the DX/OGL specifications (or the hardware implementation) to make games more compatible with multi-GPU rendering would be a great first step, IMO. That's why I am interested in the new multi-GPU approaches, which could allow for greatly increased compatibility (since frame element distribution and composition would be directed at the hardware level rather than at the software level).

EDIT: Changed wording to make this post readable. :p

Excellent and very accurate analysis.

Which is why SLI / Xfire are such bad things... No matter how you go about it, it is hugely ineffcient. It makes much more sense to have MCM chips with each one contributing their stream processors etc to render the same frame. Rending one frame at a time with multiple chips WITHOUT splitting the frame OR the ram. Where it is basically a multi core process... which it already is... 320 stream processors are basically 320 cores, each one a very simple core, but none the less each one performs individual calculations.

As far as those different "possibilities". All are used to some degree, those are the different "modes" in which SLI / Xfire can work.
The best way would probably be a combination of discarding frames and introducing lag. Where, a permissible lag of 1/30th of a second will be allowed, any frames beyond it will be discarded and rerendered. That is the most sensible balanced approach... but it is still wasteful. AND it requires much more programming then a simple straight forward mode. Meaning that it is much less likely to exist.


@Dream Operator

I thought it was perfrectly clear that I am saying that 30 is the minimum. No the exact rate at which people play.

Also normal = average. And most people DO play at around 30 fps... some people play at 15fps just to get extra eye candy, some people lower it until they get 60...
But the average person plays at 30-40fps.
 
Dec 30, 2004
12,553
2
76
Originally posted by: nullpointerus
Originally posted by: taltamir
Originally posted by: tanishalfelven
Originally posted by: taltamir
From what I read quad SLI is not practical before DX10 due to limitations on how many frames ahead you can render. With DX10 you can render more frames ahead, thus allowing each card to work on a different frame without having to wait for other cards to finish.

isn't that kind of a waste. i mean what if the user turns or fires. all those extra frame are wasted unless the game can predict what the gamer will do.

Consider, if you will, that a normal playable framerate is 30 frames per SECOND. And that the ideal (and max on most monitors) is either 60 or 75.

So if you are 3 frames ahead and fire a weapon then frames aren't wasted, you just experience a 1/10 of a second delay before the animation kicks in.

You and tanishalfelven appear to have two different definitions for the phrase "rendering frames ahead": 1) the extra frames are what the game guesses will happen at some point in the future, so they are discarded (i.e. never shown on the screen) whenever the game guesses wrong; and 2) the extra frames are synchronized with the game engine, so they will be rendered whenever the monitor gets around to it (if v-sync is on), but input is desynchronized so that the keyboard/mouse commands to fire the weapon will not actually be considered until the latent frames have been rendered.

Either case would appear to have diminishing returns. If you had 20 cards, each rendering ahead, then, when a weapon is fired that would either be [case 1]: 19/20 (95%) of your graphics processing power wasted, or [case 2]: input lag of 19 frames (1/3 of a second). But both these cases are for AFR.

SFR would appear to have diminishing returns, too. With 20 cards, each of which is told to render 1/20th of the screen, some of the cards would be rendering the top portion of the screen, where there is typically little or no work to do, while other cards would be rendering the center of the screen, where most of the work occurs.

The real question is, of course, the benefit of 2 vs. 4 cards (not 20, a number which was picked for illustrative purposes only). However, since there are diminishing returns in all cases, quad SLi would never approach 400% efficiency in a real game: each additional card will make less and less of an improvement. The information required to render a frame is still tied to input, a fact which will never change. Expecting 4x 8600GTS to equal 2x 8800GT would probably be a pipe dream, even if the theoretical maximum rendering rates were comparable.

IOW, one could easily create a contrived example in which quad SLi will hit 400% efficiency over a single card of the same type, but the contrived example would never apply to games where input must be tied to frame rendering. Conversely, I can see how quad or octal-SLi would be extremely effective in an render farm because the frames are being rendered as fast as possible from predefined scripts, not dynamic human input.

The biggest problem is compatibility. nVidia and ATI have enough trouble getting (and keeping!) two-GPU solutions working with the most popular games. Quad-SLi would be worse off (in terms of driver problems) even if it became as officially-supported as two-card SLi.

Changing the DX/OGL specifications (or the hardware implementation) to make games more compatible with multi-GPU rendering would be a great first step, IMO. That's why I am interested in the new multi-GPU approaches, which could allow for greatly increased compatibility (since frame element distribution and composition would be directed at the hardware level rather than at the software level).

EDIT: Changed wording to make this post readable. :p

SFR really would be a solution. All you have to do is apply a load balancing algorithm that looks at which parts of the screen are taking the longest to render and simply give a smaller section of that screen to the card. Give the cards dealing with the top parts of the screen more to chew off since sky isn't hard to render. From a programming perspective, yo could probably guess with ~90% efficiency (bs stat) which part of the screen on the next frame is going to take the longest to render.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
the real problem is that things "throw" their effects across the picture... if you have a light source at the bottom and a reflective surface on the top then they still somehow have to know... either by communicating it or by rendering parts of the other picture anyways.
So the video card rending the bottom has to plot trejectories of things going through the top, reflecing through the top, and from light sources in the top.. thus there is a lot of duplicated work.

With alternate frames you don't get any duplication of work... but you get discarded frames, lag, or BOTH.

The only thing that makes sense is a modular chip... where you don't have extra gpus, but die made of nothing but SP the like that are controlled by a single gpu.
 

BFG10K

Lifer
Aug 14, 2000
22,709
3,002
126
In SFR, the individual frame is analyzed for complexity and portions are assigned to the cards in an attempt to load-balance equally. In theory, this should give you better scaling than AFR,
No, AFR always has theoretically better performance because it's the only mode that doubles vertex performance. SFR style rendering duplicates the vertex load across all GPUs.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
In SFR, the individual frame is analyzed for complexity and portions are assigned to the cards in an attempt to load-balance equally. In theory, this should give you better scaling than AFR,

So, let me see if I understand... Rather then having each frame rendered by a different card. Having the cards perform calculations to analyze the frame and decide which card performs what calculations gives you BETTER performance? What is that amazing analysis algorithm that takes negative amounts of processing power to perform? Can it be used to make a time machine?