Confusing NV40 article on pipelines

Marsumane

Golden Member
Mar 9, 2004
1,171
0
0
Can someone explain anything on this article about the NV40 pipeline? I really dont get what all of these components within the pipeline do, and they assume you know given things to understand their explanations for other things. If anyone at all (probably VERY few of u) knows anything about what these components of the pipeline do, or what some of these non-trivial terms mean, (except for obvious things like texture units) then please explain. If this is too vague then tell me and ill be more specific.
Link:
http://www.3dcenter.org/artikel/nv40_pipeline/index_e.php
 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
They should have but written out things and not used the Acronym. However in order for someone to explain this to you we need more information. What don't you understand about this? We need a more specific area not just the article in general.

-Kevin
 

CaiNaM

Diamond Member
Oct 26, 2000
3,718
0
0
a "pipeline" is not a physical pipe, rather just a series of things that happen (processing instructions) in order to produce an image. thus, the more "pipelines", the more information the can be processed at the same time, in a single "clock cycle".

this alone, however doesn't constitute how "fast" a card is, as some cards have faster clocks, meaning it processes more "clock cycles" int he same amount of time, meaning while it processes less instructions per cycle, it processes more cycles.

there are also other parts of the gfx architecture which effects overall rendering speed (such as mem. architecutre, among other things). one also has to consider the "power" within each pipeline when comparing different architectures..
 

VIAN

Diamond Member
Aug 22, 2003
6,575
1
0
A Pipeline is to it's simplest explanation is the act of doing a few things at once. Namely fetching data, executing data, and storing data at one time. An example is an assembly line. You can divide those 3 basics into many other things and this lowers your instructions per clock (IPC). I'm not exactly sure how these divisions are made however. To get a round about figure of performance, IPC is multiplied by cycles per second, or Hz. IPC is usually unknown to regular people, but related to how many stages are in the pipeline somehow.

This was just one pipeline. To get more performance, you can add more pipelines. A pipeline can be considered a chip, so a multi-pipeline chip can be considered a multi-core chip, just like what CPUs are starting to do now.

The article looks at some stages, or things that are performed within the pipeline. It gets very technical, so probably not made for many people to understand.
 

Marsumane

Golden Member
Mar 9, 2004
1,171
0
0
Originally posted by: VIAN
A Pipeline is to it's simplest explanation is the act of doing a few things at once. Namely fetching data, executing data, and storing data at one time. An example is an assembly line. You can divide those 3 basics into many other things and this lowers your instructions per clock (IPC). I'm not exactly sure how these divisions are made however. To get a round about figure of performance, IPC is multiplied by cycles per second, or Hz. IPC is usually unknown to regular people, but related to how many stages are in the pipeline somehow.

This was just one pipeline. To get more performance, you can add more pipelines. A pipeline can be considered a chip, so a multi-pipeline chip can be considered a multi-core chip, just like what CPUs are starting to do now.

The article looks at some stages, or things that are performed within the pipeline. It gets very technical, so probably not made for many people to understand.

Yes i know about pipelines. Its just what they are composed of and how each unit works together to get out a shaded pixel (for example). Basically things like the acranyms (alu, fpu, nrm, etc) and how they actually do their indavidual job within the pipeline is what im not getting. Also "dual issue" and "co-issue", along with how shader and texture units are utalized (besides the fact that they shade a pixel, but more so how they do it)
Also, this paragraph:
"The traditional co-issued pipeline can work as a single vector4 unit, or as seperated vector3 and scalar units. NV40's pipelines can also work in a vector2 + vector2 configuration. Dual- and co-issue combined, an NV40 pipe can execute up to four instructions ? while having a single all-purpose arithmetic unit only. For explanatory reasons, we will continue to talk about Unit 1 and 2."
As well as all of this:
"From top to bottom: Two different sources can be used as inputs for the pixel pipeline (the rasterizer or the pipeline loopback). A crossbar transmits the required values to the appropriate interpolators in Unit 1. We don't know how many interpolators are implemented in the hardware. (Shader Model 3.0 logically offers 10 interpolators, instead of 8 in version 2.0 / 2.X). Shader Unit 1 has an SFU built in (yellow), and four multiply channels (shown in blue-green ). A dedicated unit for texture operations follows (orange).

Actually, the special functions RCP and RSQ are two different units in the hardware, but to keep it simple, we abstract them to a single unit we call SFU#1. Now, the whole shader unit can execute up to two instructions per clock, either SFU+MUL3 ("3" stands for up to three components) or SFU+TEX. If only MUL is needed, up to two independent MULs can be executed, but still any clock cycle is limited to four data channels at maximum, meaning MUL2 and MUL3 cannot be computed in a single cycle. Since some data paths are shared, any unit can also just hand over the data, which means this unit is effectively blocked for this cycle. The result of the special functions RCP and RSQ can be used as input for any of the four multiply-channels.

The TEX unit needs texture coordinates as input. Because this unit does not have access to the input registers, Shader Unit 1 has to channel them through. The input for the TEX operations goes through the MUL channels. It is possible to modify the input with the MUL operation right before the TEX operation in the same clock. In most cases, at least the scalar special function can still be used while a TEX operation is performed.

Subsequently, TEX calculates the LOD (level of detail). Both, coordinates and LOD, are transmitted then to the TMU located near the memory controller (which is not included in our picture). The TMU performs the actual sampling of the texture and returns the sample to an input register for Unit 2. Because any texture sampling comes with a latency of some clocks, the pipeline executes instructions for other quads meanwhile."

And finally, on a side note, i was wondering why more pipelines aren't just added to graphic cards to beat the competition. I know they are indavidual units unlike processors where its really one long "pipe" in many stages.(or so i assume) But do they need a certian clock speed to gain speed from more pipes and thus providing a limitation?
 

VIAN

Diamond Member
Aug 22, 2003
6,575
1
0
ALU is probably Arithmetic Logic Unit.
FPU is probably Floating Point Unit.

And finally, on a side note, i was wondering why more pipelines aren't just added to graphic cards to beat the competition. I know they are indavidual units unlike processors where its really one long "pipe" in many stages.(or so i assume) But do they need a certian clock speed to gain speed from more pipes and thus providing a limitation?

More pipelines means a bigger chip, meaning less yields which suck monkey nuts, because then they won't be able to meet demands and prices will soar.

CPUs have an extremely short pipeline compared to VPUs. 31 stages for P4, hundreds of stages for VPUs.

No they don't need more speed to take advantage of pipelines.
 

Rage187

Lifer
Dec 30, 2000
14,276
4
81
Beyond3d is worse than driver heaven.


At least point to a site unbiased like Firing Squad or Anandtech.
 

BFG10K

Lifer
Aug 14, 2000
22,709
2,998
126
Basically things like the acranyms (alu, fpu, nrm, etc) and how they actually do their indavidual job within the pipeline is what im not getting. Also "dual issue" and "co-issue", along with how shader and texture units are utalized (besides the fact that they shade a pixel, but more so how they do it)
Your question is far too broad and technical to be answered in a thread like this. You're going to have to get some books and/or articles on CPU architecture and study the concepts contained in them.
 

Marsumane

Golden Member
Mar 9, 2004
1,171
0
0
Originally posted by: VIAN
ALU is probably Arithmetic Logic Unit.
FPU is probably Floating Point Unit.

And finally, on a side note, i was wondering why more pipelines aren't just added to graphic cards to beat the competition. I know they are indavidual units unlike processors where its really one long "pipe" in many stages.(or so i assume) But do they need a certian clock speed to gain speed from more pipes and thus providing a limitation?

More pipelines means a bigger chip, meaning less yields which suck monkey nuts, because then they won't be able to meet demands and prices will soar.

CPUs have an extremely short pipeline compared to VPUs. 31 stages for P4, hundreds of stages for VPUs.

No they don't need more speed to take advantage of pipelines.

Yes this is a good point. Im sure it would also add to the heat distribution as well as the transistor count.

BFG: Yea i guess it is quite the technical question. Its kinda the nitty gritty of the nitty gritty when it comes down to it.