• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

NVIDIA NV30 & NV40 Scoopage: Details

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
I was thinking of three full fledged GPUs (similar to voodoo 5000) which would be crazy. So I assumed he meant only 1. If you're talking about spreading out a single GPU across many chips I guess that would work. I don't know why, though, it makes it harder to design the board, there would be more overhead...just doesn't make any sense TO ME. My $0.02

It sounds like the speculation is that portions of the GPU would benefit from parrallelism, while others would not. So you can realize a massive reduction in transistors by not duplicating something 6 times when you really only need 1.

This sound reasonable, TO ME, that it's a lot simpler to design a board when you increase the # of chips 25% (from 4 -> 5, for example) yet you reduce the complexity of EACH chip by 25% (or 33%, or 50%, or whatever you save by breaking out non-choke elements)

But what do we know? I definately don't have hands-on experience with chip or board design, do you? All I know is that to keep increasing processing power forever will occasionally require innovative shifts in design paradigms... Obviously multi-chip designs *are* a well accepted design evolution, as evidenced by the fact that Cray can hardly sell any supercomputers, while purchasers are jumping on designs involving thousands of off-the-shelf pentium-3 processors.
 
Lets just continue with that line of reasoning - there is probably no reason why all three cores would have to be identical in a 3-chip design. The main core could be a centralized control center for sharing memory and duplicating data between cores, plus it could be the sole renderer.

Lets look at a 3-chip design as follows:

Chip "A" = Central control point and the sole output chip
Chip "B" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "C" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering

Scene "1" could be prepped by chip "B" then sent back to be rendered by the main chip "A". The next linear scene, scene "2", is prepped simultaneously for rendering by chip "C". As scene "1" hits the output the scene "2" is already in the chip "A" buffer. Chip "A" and "B" would grab scenese "3" and "4" and continue on with the process. Chip "A" would never prep any single scene, it would only render the scenes after they are prepped by the auxillary cores.

A 5-chip design would be as follows:

Chip "A" = Central control point and the sole output chip
Chip "B" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "C" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "D" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "E" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
 
Originally posted by: rahvin
Rampage was going to be a two chip production with the use of any combination of chips to generate a board. 3dfx designed rampage to be a pixel-texel engine (rampage) + a programable geometry engine (sage). The board combinations would supposidly have been Rampage, Rampage + Sage and 2*Rampage+Sage. 3dfx hinted they could almost run this out to infinatium. Such as a board with X Rampages and Y Sages.

That sounds an awful lot like the 3DLabs Glint boards of five years ago. One (or more) rendering processors, with an additional geometry processor onboard. (Plus an additional optional VGA core, to boot and run DOS with.)


 
Originally posted by: 7757524
Take a look at a video card. Try to imaging 3 GPUs on it. That's insane. Then ammount of power, heat, noise would be unmanagable not to mention price. When he says that it will be an odd number lower than 8 he's letting us know that there will "only" be 1GPU. 3 is just impractical. Common sense. since nvidia is going to .13 and ATI is sticking with .15, nvidia has no reason to go up to two gpu. They can easily hit higher clock frequencies.

You've obviously never had a professional-level 3DLabs card installed, then. Multiple "GPUs", onboard dedicated geometry processor, scads of memory, VRAM for the frame/Z/stencil-buffer, some form of DRAM for the textures and other data. The one I used (briefly), was a full-length PCI card. If you can imagine, yes, that was huge. Arrays of memory chips on both sides of the board. Drivers were only available for NT, not Win9x support at all. Not practical for a consumer-level 3D card, but boy did it scream on OpenGL apps.
 
Originally posted by: MadRat
Lets just continue with that line of reasoning - there is probably no reason why all three cores would have to be identical in a 3-chip design. The main core could be a centralized control center for sharing memory and duplicating data between cores, plus it could be the sole renderer.

Lets look at a 3-chip design as follows:

Chip "A" = Central control point and the sole output chip
Chip "B" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "C" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering

Scene "1" could be prepped by chip "B" then sent back to be rendered by the main chip "A". The next linear scene, scene "2", is prepped simultaneously for rendering by chip "C". As scene "1" hits the output the scene "2" is already in the chip "A" buffer. Chip "A" and "B" would grab scenese "3" and "4" and continue on with the process. Chip "A" would never prep any single scene, it would only render the scenes after they are prepped by the auxillary cores.

A 5-chip design would be as follows:

Chip "A" = Central control point and the sole output chip
Chip "B" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "C" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "D" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering
Chip "E" = Peripheral core; preps scene and then sends it back to Chip "A" for rendering

That's pretty much right, except exactly backwards. The "prep scene" chip, would be the geometry processor, performing T&L and clipping, and it would pass the triangle data to the cluster of rendering chips, which is the job that can really be done in parallel, memory bandwidth allowing. There are three ways to handle parallel rendering, from best to worst: tile-based, scan-line interleaving, and frame-sequential interleaving. (The last being the method employed by ATI's ill-fated Rage128 MAXX card. The reason it didn't work well, other than the driver complexity, is that that method introduces a frame-latency that is avoided with the other methods.) Tile-based rendering, is used by the PowerVR/Kyro chips, and in Sega's Dreamcase and Naomi arcade systems. In the arcade systems, they can run multiple graphics boards in parallel, and link them all together to generate a display. Each board can have multiple rendering processors as well. Generally, tile-based renderers, each individually process the entire triangle list, and clip to their own 2D tile viewport. Scanline-interleave renders, effectively process the same geometry list, but only render every Nth line, but they can all work in parallel as well.

 
<<The "prep scene" chip, would be the geometry processor, performing T&L and clipping, and it would pass the triangle data to the cluster of rendering chips, which is the job that can really be done in parallel, memory bandwidth allowing.>>

Is the geometry, T&L, and clipping parts all that "processor intensive" where multiple cores will be an efficient way to increase overall performance?
 
For the most part, itd be more work than its worth to run the T&L on more than one processor. The two would have to figure out a way to split the scene, and stay synced with not only themselves but the CPU. And god knows how it will get into memory after that.

Although I'm no graphics chip designer, so I could be wrong.

What is easy to split up are the rendering units.

Although how can a chip do proper AA when its only rendering half the lines? To get the data for the AAed pixels, wont it have to render the whole scene anyways?
 
all i have to say is 3DFXs engineering didnt die with them, for that, im glad.

The voodoo5 5500 agp was a victim of late release, with a bigger driver team and faster R&D that woulda stomped everything that was out if it dropped on time.

As for the 6000 agp, they shoulda released it, but the chip shortage killed them.
 
Back
Top