VirtualLarry
No Lifer
- Aug 25, 2001
- 56,574
- 10,211
- 126
There's nothing intrisically wrong with a multi-GPU solution, in fact the older, but higher-end, 3DLabs workstation cards, the ones that were a full-length PCI card, with insane (at the time) amounts of memory (like 128MB, when most PCI desktop cards were 2MB or 4MB), and they had multiple rendering units (implementing most of the OpenGL pipeline in hardware), and some had a seperate geometry processor besides (and you thought that hardware T&L was something new, with DirectX 6 and the introduction of the NVidia GeForce) were a fairly sucessful multi-GPU accelerator solution, if only for OpenGL applications.
The problem with the MAXX, was that ATI used the two GPUs to render sucessive full frames with each chip. That introduces significant latencies, and had problems with "render ahead" issues. In fact, in order to obtain higher benchmark scores on modern triple-buffered vsync-enabled cards, most drivers these days also implement render-ahead. However, that can also created wasted work, if the scene changes significantly and the advance-rendered frames have to be discarded. Plus, this is ATI that we are talking about, with their Rage128 chips, and their, uhm, "stellar" drivers.
I think that it says a lot that they were never able to create a working MAXX driver for NT-based OSes. Either that, or they realized that it was an interim stop-gap market solution, and by the time that an NT driver was ready that could take advantage of that hardware, that such a multi-GPU design would have been obsolete and worthless, and thus a waste of precious software R&D funds for ATI.
I think that ATI would be much better off implementing some sort of screen-splitting/tiling arrangement, especially if they can implement hardware region-clipping somehow early on in the pipeline. That way you could send the same geometry to both GPUs, and have each render half of the screen (left/right would probably equalize the load the most), in half of the time, for basically nearly double the overall throughput. Whether or not they could share a single unified geometry/texture memory pool, probably would depend more on memory cost vs. memory-bandwidth. Needless to say, unless they implemented seperate duplicate texture-memory pools for each GPU (significantly raising costs), then I don't see high levels of AA/AF being useful on a multi-GPU card either. You gain some, you lose some, really.
Btw, whatever happened to ATI supporting a triple-head display setup, if you used a dual-head ATI AGP card with one of their recent core-logic chipsets with integrated video as well? Last I heard that was scrapped.
PS. Rollo, that AT article that you linked to, shows that ATI utterly failed to hit their market target with the MAXX board, because it cost nearly as much as the superior GeForce DDR cards, while providing only slightly better performance than the GeForce SDR card that it was meant to compete with. The cost of both dual GPUs, and duplicated memory pools, was just too much expense. (One more reason that the decision to undertake an SLI design should not be done lightly by a company in a highly-competitive market space.) Not to mention that it was last-gen technology that lacked T&L, while the GeForce was a new-gen design the supported it. Interestingly, the SLI-like Voodoo 5 series boards based on the VSA-100, IIRC, also lacked hardware T&L support. Clearly, the market choose to speak about which technology would prosper in the next-generation of video accellerators. I wonder if SM3.0 or the DX-Next unified-SM will play out the same way?
The problem with the MAXX, was that ATI used the two GPUs to render sucessive full frames with each chip. That introduces significant latencies, and had problems with "render ahead" issues. In fact, in order to obtain higher benchmark scores on modern triple-buffered vsync-enabled cards, most drivers these days also implement render-ahead. However, that can also created wasted work, if the scene changes significantly and the advance-rendered frames have to be discarded. Plus, this is ATI that we are talking about, with their Rage128 chips, and their, uhm, "stellar" drivers.
I think that ATI would be much better off implementing some sort of screen-splitting/tiling arrangement, especially if they can implement hardware region-clipping somehow early on in the pipeline. That way you could send the same geometry to both GPUs, and have each render half of the screen (left/right would probably equalize the load the most), in half of the time, for basically nearly double the overall throughput. Whether or not they could share a single unified geometry/texture memory pool, probably would depend more on memory cost vs. memory-bandwidth. Needless to say, unless they implemented seperate duplicate texture-memory pools for each GPU (significantly raising costs), then I don't see high levels of AA/AF being useful on a multi-GPU card either. You gain some, you lose some, really.
Btw, whatever happened to ATI supporting a triple-head display setup, if you used a dual-head ATI AGP card with one of their recent core-logic chipsets with integrated video as well? Last I heard that was scrapped.
PS. Rollo, that AT article that you linked to, shows that ATI utterly failed to hit their market target with the MAXX board, because it cost nearly as much as the superior GeForce DDR cards, while providing only slightly better performance than the GeForce SDR card that it was meant to compete with. The cost of both dual GPUs, and duplicated memory pools, was just too much expense. (One more reason that the decision to undertake an SLI design should not be done lightly by a company in a highly-competitive market space.) Not to mention that it was last-gen technology that lacked T&L, while the GeForce was a new-gen design the supported it. Interestingly, the SLI-like Voodoo 5 series boards based on the VSA-100, IIRC, also lacked hardware T&L support. Clearly, the market choose to speak about which technology would prosper in the next-generation of video accellerators. I wonder if SM3.0 or the DX-Next unified-SM will play out the same way?