Video card productivity

AdvancedRobotics

Senior member
Jul 30, 2002
324
0
0
I'd like to keep things on topic here. So if you don't fully understand what I am asking, do not answer.

Now it is one thing to look at a processors specs, their clock cycle, their cache sizes, etc. and then look at some SYSmark tests and others to see which is faster. With the whole AMD vs. Intel thing, clockspeed means nothing. (I'm not going any further into this, because this isn't my point, if it was, I wouldn't be posting in the video forum).

Now there are obviously some very distinct differences between your processor and your video card. First being, your processor controls everything and, to make it simple, tells them what to do. It controls the bus speeds and controls the speed of how fast applications will work. The CPU runs with the motherboard, the RAM (DDR, SDR, or Rambus), and the hard drive. The video card, on the other hand, controls the graphics part. The CPU gives the intructions, the graphics card renders the scenes and the extras (lighting, filtering, anti-aliasing, shading, textures, etc.). It runs on it's own board, with its own independant RAM, and it's own processor clock. Two different components with two completely different jobs. The CPU is general hardware, whereas the video card is dedicated to rendering scenes (and the sound card does sound, ethernet card does internet, etc.). You may be wondering where I am going with this, but it'll hopefully make sense.

The video card, unlike the processor, has unique capabilities and aspects that control the graphical parts of the whole computer. It has occlusion culling, pixel/vertex shaders, lighting, pixel pipelines, bandwith, fillrate, etc. But bandwith and clock speed isn't everything, technology plays a big role in performance. A good point would be the Kyro II. It has a similar clock speed to the MX, but performs near the GTS/Pro level. A good reason being the fact that it used a form of DMR (TBR) and not IMR like the GeForce cards utilize. So then we get into the comparison between the 9000 and 8500. Looking at specs alone, the 9000 should be better. But the 9000 is basically an 8500MX. Its an 8500 with a stripped feature set.

You can base CPU performance on clock speed and their clock cycle effectiveness (IPC), but with video cards it is much more complicated. So here is where I ask (and quit talking :)). Obviously, feature set as a lot to do with things, but what do all these numbers and features really do. What do these pixel pipelines and texture units really work. How many passes are made and how quickly do the act. How many textures are in a game, and how does it relate to how many texture units the video card has? And where does the RAM come in? Does it store information (textures, etc.) for whatever applications it is doing? And what about fillrate and memory bandwith? Memory bandwith == how fast information travels? Fillrate == Mpixels? What do these numbers refer to and how do they play a part in games?

Now for technology. Why do no cards utilize the tile-based rendering method that the Kyro II uses? Why does Radeon use their hierarchial-z buffer? And about vertex/pixel shaders? Obviously, they shade, but how fast and how much and how are they calculating what they are suppost to shade? (the T in T&L): Why is there a whole engine behind this, and why do they group the T and the L together. How does manipulating objects in 3d space (T) relate with lighting (L)? And, like shading, how does the graphics card determine what it has to light up in the scene?

And another thing, why does no one use RGSS (like the voodoo 4 and 5). I read the FAQ, and it states, "2X MSAA utilizes a rotated grid like RGSS which results in edge anti aliasing comparable to that of RGSS". Now, that is somewhate like RGSS, but it isn't the real deal. Will any cards use RGSS fully, or will it be forgotten along with the VooDoo cards?

Well, that seems to be it. If you can't give me a straight, in-depth answer to this, please... don't. Links are good, but just keep in mind, I don't feel like reading 50pages of stuff (who would :D).
 

Goi

Diamond Member
Oct 10, 1999
6,770
7
91
That's a lot of questions! I'll just answer those I know and leave those I'm not so sure of to others.

What do these pixel pipelines and texture units really work
A pixel pipeline is much like a CPU pipeline. It does operations on pixels in multiple stages, and the final product is a rendered pixel on the screen. Texture units are closely related to pixel pipelines in that they are used in the pixel pipeline to apply a texture to a pixel.

How many textures are in a game, and how does it relate to how many texture units the video card has?
Most early 3D games were single textured or had no textures at all. Then, as more and more effects(lighting/bump mapping etc) were introduced, dual textures came to stay. It relates to the video card in the sense of speed/performance. A dual pipeline card with 1 texturing unit each would be able to render a dual textured pixel, or 2 single textured pixel, in a single clock cycle. A single pipeline card with 2 texturing units each would also be able to render a dual textured pixel in a single clock cycle, but only 1 single textured pixel per clock cycle, because in this case its unable to utilize its 2nd texturing unit. So, theoretically while both above-mentioned designs have the same texel fillrate, they have different pixel fillrates, and the first design is arguably more efficient. However, very few games are single textured nowadays so performance would be about the same in both designs, all other things being equal.

And where does the RAM come in? Does it store information (textures, etc.) for whatever applications it is doing?
Sort of. The video RAM stores the framebuffer(the final rendered image), the Z-buffer(Z-position information), and the textures, among other things. There are caches in the lower level to help out, just like there are L1 and L2 caches in CPU to help out the main memory.

And what about fillrate and memory bandwith? Memory bandwith == how fast information travels? Fillrate == Mpixels? What do these numbers refer to and how do they play a part in games?
Fillrate is traditionally measured 2 ways - texel fillrate and pixel fillrate. Texel fillrate is loosely used to measure the number of textured pixels rendered per second. Pixel fillrate measured the number of pixels rendered per second, regardless of the number of textures per pixel used.
So, take the GF2 through GF4 architecture for example. They have 4 pixel pipelines, each with 2 texturing units. Therefore, if you clock them at 200MHz, the texel fillrate will be 4x2x200MHz = 1600MTexels/s, and the pixel fillrate will be 4x200MHz = 800MPixels/s. The original Radeon had 2 pixel pipelines, each with 3 texturing units. A similarly clocked Radeon will therefore have a texel fillrate of 2x3x200MHz = 1200MTexels/s and a pixel fillrate of 2x200MHz = 400MPixels/s.
Memory bandwidth is the rate at which data can be transfered to and from the memory, i.e. RAM. It is measured in MB/s or GB/s. Take for example GF2 through GF4 architectures. They have 128-bit DDR-SDRAM. 128bit is the memory width, i.e. the number of lines to the memory. 128bits is 16bytes(8bits=1byte). DDR-SDRAM is simply SDRAM that can be accessed at both the rising and fall edges of the memory clock cycle. So, a card using this memory architecture running memory at 200MHz would yield a memory bandwidth of (16bytes)x(2 transfers/clockcycle)x200MHz = 6.4GB/s.
The way they play their part in games is obviously in the performance department. The higher the fillrate, the more pixels you can render, all other things being equal. Same goes for the memory bandwidth. Of course, if you keep one constant and keep increasing the other, you're gonna hit a saturation point since they're dependent on each other. For example, as you keep increasing fillrate, you can render more and more Gpixels/s, i.e. you can also increase your resolution, increasing texture quality, applying anti-aliasing, etc, while maintaining the same speed. Hwoever, once you start doing this, you start to need more and more memory bandwidth to keep up with the transfer of larger textures, higher resolutions and multiple samples. The other dependent factor is CPU speed. The CPU must also be able to keep up by sending enough information to the GPU at a fast enough rate for it to render, else the GPU will just be sitting there waiting.

Why do no cards utilize the tile-based rendering method that the Kyro II uses?
TBR is a interesting and innovative way to reduce memory bandwidth requirements, however its not the be-all-end-all way. There are several limitations to this method, many of which the Kyro face. The Kyro's architecture doesn't allow for high clock speeds, which is why you still see it clocked so slowly. Also, API support hasn't been that great from the beginning, even though it received more support recently. GPU manufacturers would also need to completely redesign their chip in order to utilize a similar Kyro-like DMR architecture, and this takes time, money and manpower. The potential benefits probably don't outweight the costs/disadvantages.

Why does Radeon use their hierarchial-z buffer?
Because they came up with it and it works. Its one of their own ways to decrease memory bandwidth requirements that works with their existing IMR architecture.

And about vertex/pixel shaders? Obviously, they shade, but how fast and how much and how are they calculating what they are suppost to shade?
Pixel and vertex shaders are another technology trickled down from the high-end graphics market segment, AFAIK. They're implemented in DX8 and newer. Basically, they do more than shade pixels and vertexes. They actually perform minor operations on the pixel and vertex elements, such as change color, position etc. The reason to do this is to reduce the number of polygons and triangles needed to represent an object, while maintaining the same final rendered image fidelity. Other than that, I'm not too familiar with them so perhaps someone with more knowledge would add on if you desire more detailed information.

And another thing, why does no one use RGSS (like the voodoo 4 and 5). I read the FAQ, and it states, "2X MSAA utilizes a rotated grid like RGSS which results in edge anti aliasing comparable to that of RGSS". Now, that is somewhate like RGSS, but it isn't the real deal. Will any cards use RGSS fully, or will it be forgotten along with the VooDoo cards?
RGSS is a method of AA that uses multiple samples of an image, each rotated to a slightly different angle/degree, and then blended together via an algorithm to create a final, AA'ed image. Its a form of MSAA(multiple-sampling anti-aliasing), as opposed to SSAA(super-sampling anti-aliasing), which is easier to do, but less effective/advanced/efficient.

Well, that seems to be it.
*phew* That's all? Thank god! :)