ATI Reveals Radeon X1900 Details Internally

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Apr 17, 2003
37,622
0
76
Originally posted by: Ackmed
Originally posted by: peleejosh
We all know how reliable the ATI folks are :) I will believe it when I see one in a store.


Because there are 512MB GTX's in any store? We all know how reliable the NV folks are...

in all fairness, the 512 GTX has been the only nVidia blunder (as far as availability) in recent memory
 

WelshBloke

Lifer
Jan 12, 2005
32,886
11,029
136
Originally posted by: dunno99
Hrm, seems like everyone is arguing over symmentics. I guess I'll take a shot at this, but don't flame me if I get it wrong....

Reading chapter 30 of GPU Gems 2 (available for free on nVidia's developer site...or Google for a direct link), it says that the NV45 (GF 6800) has exactly 16 pixel pipelines (and 6 vertex, but we don't care about that right now). However, the 16 pipelines are divided into units of four (which is also why all the NV4x cards have multiples of 4 pixel pipelines), and fragments from each primitive are processed in adjacent groups of 4 at a time (which would mean from the same primitive). What I think this implies is, at border conditions, the pipeline units are less than fully efficient...as in, each unit may process less than 4 pixels (I'm willing to assume that the hardware is smart enough to align the pixels such that it will result in one less "batch" per horizontal scan, if possible). So this means that it isn't really a fully 16 pipeline GPU (although, the performance penalty is probably minimal, and it can probably make up for it if derivatives are used, compared to other cards).

Furthermore, each pixel pipeline has two fp32 shader units (ALUs, I presume) in series, of which the first shader unit can have its result substituted by a texture fetch instead. Both shaders units process instructions in parallel per clock (assuming no hazards or dependencies). From the looks of it, both shader units are full ALUs. Since they probably both can do vec4, vec3 + scalar, or vec2 + vec2, this would mean, at most, 4 parallel instructions (two coissues per shader unit) are processed at each tick of the clock. This is why latency hiding is especially important, and I suppose which is why each texture fetch should be followed by either more texture fetches or non-dependent instructions. (Note: Because of the deep pipeline structure of these GPUs, branching is basically done via a brute force approach. I believe this is the reason why the NV4x GPUs can only perform 4 nested if/else statements, because by taking all branches, 4 nested branches would equate to 2^4 = 16 different paths...but I'm guessing here.)

On the other hand, I'm guessing that the X1900 will have 16 separate (although they might be able to work together) pipelines, each being able to process 3 separate fragments in parallel each. This seems to me to be like the fragment "units" above (I'm guessing the two companies are using the terminology a little differently). So each pipeline is a unit itself, and each unit is composed of 3 individual fragment processors (each being able to work on one fragment at a time). The 16 texture units would mean that each cycle, only 16 of these fragment processors will get to retrieve data from memory. Given that NV45 has 16 texture units and 16 pixel processors with two full ALUs each, that means a 16*2:16 = 2:1 ratio. On the other hand, ATi has 16 texture units and 48 pixel processors with either 1 or 1.5 ALUs each (I don't know which)...this would translate to a 3:1 or 4.5:1.

If anyone notices anything wrong, feel free to correct, not flame. :)

Correct?? I've got to bloody understand it first! :confused:

J/k good post :thumbsup:

post more
 

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
What i am worried is the texture units. For example the X1600XT is a 4 pipe card as some of you might know. It is a 4-1-3-2 architecture which means that each 'pipline' is composed of 3 fragment shaders/shader processors. It has 8 geometrical shaders, BUT crippled with 4 texture units.

Now, i know for a fact (99%) that the RV580(X1900) is based on RV530(X1600). Hence, it might be 16-1-3-2 (could be 16-1-3-1) architecture. BUT it has only 16 texture units.

So right now, i think the RV580 will perform up really well in shader heavy games. But there wont be any significant performance increase on texture heavy games. So we might once again see Nvidia winning here, and ATi here. But i could be wrong. I have yet to see a similiar card like the 9700 pro, that could bring total domination in the benchmarks. :D
 

Stoneburner

Diamond Member
May 29, 2003
3,491
0
76
the 6800ultra was just as nonexistent as the x800 pe's. and I've waited long enough, i'll see how my c3d x1800xl does, otherwise i'll just get one of those evga 7800gt's. And if g71 is 32 pipes, 700 mhz then i think nvidia will win.
 

5150Joker

Diamond Member
Feb 6, 2002
5,549
0
71
www.techinferno.com
Originally posted by: Cookie Monster
What i am worried is the texture units. For example the X1600XT is a 4 pipe card as some of you might know. It is a 4-1-3-2 architecture which means that each 'pipline' is composed of 3 fragment shaders/shader processors. It has 8 geometrical shaders, BUT crippled with 4 texture units.

Now, i know for a fact (99%) that the RV580(X1900) is based on RV530(X1600). Hence, it might be 16-1-3-2 (could be 16-1-3-1) architecture. BUT it has only 16 texture units.

So right now, i think the RV580 will perform up really well in shader heavy games. But there wont be any significant performance increase on texture heavy games. So we might once again see Nvidia winning here, and ATi here. But i could be wrong. I have yet to see a similiar card like the 9700 pro, that could bring total domination in the benchmarks. :D


I think it'll probably be 16-1-3-1 and if it's faced against an nVidia card with 32 pixel pipelines and 24 ROPs then it's going to have a tough time beating the G71.
 

nRollo

Banned
Jan 11, 2002
10,460
0
0
Originally posted by: Cookie Monster
But i could be wrong. I have yet to see a similiar card like the 9700 pro, that could bring total domination in the benchmarks. :D

I don't know, the 7800GTX pretty much dominated all ATI till the X1800XT came out?

 

Wreckage

Banned
Jul 1, 2005
5,529
0
0
Originally posted by: Rollo
Originally posted by: Cookie Monster
But i could be wrong. I have yet to see a similiar card like the 9700 pro, that could bring total domination in the benchmarks. :D

I don't know, the 7800GTX pretty much dominated all ATI till the X1800XT came out?

3dfx voodoo
GeForce 2, 3, 4

Just to name a few ;)
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
Originally posted by: dunno99
Hrm, seems like everyone is arguing over symmentics. I guess I'll take a shot at this, but don't flame me if I get it wrong....

Reading chapter 30 of GPU Gems 2 (available for free on nVidia's developer site...or Google for a direct link), it says that the NV45 (GF 6800) has exactly 16 pixel pipelines (and 6 vertex, but we don't care about that right now). However, the 16 pipelines are divided into units of four (which is also why all the NV4x cards have multiples of 4 pixel pipelines), and fragments from each primitive are processed in adjacent groups of 4 at a time (which would mean from the same primitive). What I think this implies is, at border conditions, the pipeline units are less than fully efficient...as in, each unit may process less than 4 pixels (I'm willing to assume that the hardware is smart enough to align the pixels such that it will result in one less "batch" per horizontal scan, if possible). So this means that it isn't really a fully 16 pipeline GPU (although, the performance penalty is probably minimal, and it can probably make up for it if derivatives are used, compared to other cards).

Furthermore, each pixel pipeline has two fp32 shader units (ALUs, I presume) in series, of which the first shader unit can have its result substituted by a texture fetch instead. Both shaders units process instructions in parallel per clock (assuming no hazards or dependencies). From the looks of it, both shader units are full ALUs. Since they probably both can do vec4, vec3 + scalar, or vec2 + vec2, this would mean, at most, 4 parallel instructions (two coissues per shader unit) are processed at each tick of the clock. This is why latency hiding is especially important, and I suppose which is why each texture fetch should be followed by either more texture fetches or non-dependent instructions. (Note: Because of the deep pipeline structure of these GPUs, branching is basically done via a brute force approach. I believe this is the reason why the NV4x GPUs can only perform 4 nested if/else statements, because by taking all branches, 4 nested branches would equate to 2^4 = 16 different paths...but I'm guessing here.)

On the other hand, I'm guessing that the X1900 will have 16 separate (although they might be able to work together) pipelines, each being able to process 3 separate fragments in parallel each. This seems to me to be like the fragment "units" above (I'm guessing the two companies are using the terminology a little differently). So each pipeline is a unit itself, and each unit is composed of 3 individual fragment processors (each being able to work on one fragment at a time). The 16 texture units would mean that each cycle, only 16 of these fragment processors will get to retrieve data from memory. Given that NV45 has 16 texture units and 16 pixel processors with two full ALUs each, that means a 16*2:16 = 2:1 ratio. On the other hand, ATi has 16 texture units and 48 pixel processors with either 1 or 1.5 ALUs each (I don't know which)...this would translate to a 3:1 or 4.5:1.

If anyone notices anything wrong, feel free to correct, not flame. :)

Nice post, but what you said just went over the top of the heads of 99.9% of the flamers around here. I have a few things to add too. Right now Ati cards have 1 full alu and one mini alu per shader, and I'm guessing this will also be the case for the r580. We also know that in the r520 the pipes are no longer combined texture/shader units - the texture units are detached from the shaders and can be scheduled indpendently somewhat. This should improve the efficiency of the card, and while there may be situations where the r580 will be texture limited, those situations are not likely to occur often in modern games since games are becoming more dependent on shaders.
 

nts

Senior member
Nov 10, 2005
279
0
0
Originally posted by: dunno99
Hrm, seems like everyone is arguing over symmentics. I guess I'll take a shot at this, but don't flame me if I get it wrong....

Reading chapter 30 of GPU Gems 2 (available for free on nVidia's developer site...or Google for a direct link), it says that the NV45 (GF 6800) has exactly 16 pixel pipelines (and 6 vertex, but we don't care about that right now). However, the 16 pipelines are divided into units of four (which is also why all the NV4x cards have multiples of 4 pixel pipelines), and fragments from each primitive are processed in adjacent groups of 4 at a time (which would mean from the same primitive). What I think this implies is, at border conditions, the pipeline units are less than fully efficient...as in, each unit may process less than 4 pixels (I'm willing to assume that the hardware is smart enough to align the pixels such that it will result in one less "batch" per horizontal scan, if possible). So this means that it isn't really a fully 16 pipeline GPU (although, the performance penalty is probably minimal, and it can probably make up for it if derivatives are used, compared to other cards).
AFAIK the batches can't be re-aligned, think of them as tiles. Like a 4x4 pixel grid on the screen/framebuffer. This is todo with how hierarchial z buffering and other compression techniques work.

AFAIK On all hardware the units are divided into groups of 4.
Furthermore, each pixel pipeline has two fp32 shader units (ALUs, I presume) in series,
I believe they are and operate in parallel.

of which the first shader unit can have its result substituted by a texture fetch instead.

When a texture fetch occurs the pipe will stall until the result arrives. I am not sure but I believe on NVIDIA hardware the ALUs will be used for the texture addressing (meaning you can't throw in an ALU op when the next op is TMU op). Not sure about threading on NVIDIA hardware :( will the pipe stall or can you swap threads?

Both shaders units process instructions in parallel per clock (assuming no hazards or dependencies). From the looks of it, both shader units are full ALUs. Since they probably both can do vec4, vec3 + scalar, or vec2 + vec2, this would mean, at most, 4 parallel instructions (two coissues per shader unit) are processed at each tick of the clock.

Yup, ATi has a full unit and a simple unit instead of two full units.

This is why latency hiding is especially important, and I suppose which is why each texture fetch should be followed by either more texture fetches or non-dependent instructions.

On ATi's R5xx hardware latency hiding is done by the programmable threading processor with the help of the shader compiler. On NVIDIA hardware its only the compiler.

Both compilers are generally very good at reorganizing code for maximum efficiency, especially when using a higher level language like GLSL or HLSL. So you don't have to manually order the instructions in most circumstances.

(Note: Because of the deep pipeline structure of these GPUs, branching is basically done via a brute force approach. I believe this is the reason why the NV4x GPUs can only perform 4 nested if/else statements, because by taking all branches, 4 nested branches would equate to 2^4 = 16 different paths...but I'm guessing here.)
On NVIDIA hardware the batch sizes for rendering are 256x256 (AFAIK NV40) and 64x64 (AFAIK G70). When a branch occurs both paths are computed and then selectively selected or thrown away for each pixel in the batch.

On ATi R5xx hardware the batch size is 4x4, extremely efficient. There is also a dedicated unit for branch execution to eliminate overhead.

The R5xx can also have 512+ threads in flight per unit, anybody know how many G70 can have (i've heard very low numbers here, 64, 128 but I am not sure).

On the other hand, I'm guessing that the X1900 will have 16 separate (although they might be able to work together) pipelines, each being able to process 3 separate fragments in parallel each.

The pipes are still grouped in units of 4.

They should be able to process 48 fragments in parallel and they should be able to communicate with one another. If one isn't used and one has a high thread count waiting then it should be possible to swap the thread to another processor but i am not sure if this is the case atm.

This seems to me to be like the fragment "units" above (I'm guessing the two companies are using the terminology a little differently). So each pipeline is a unit itself, and each unit is composed of 3 individual fragment processors (each being able to work on one fragment at a time).

Again grouped in 4 units.

All work is done on 4 fragments at a time, with 3 processors that means upto 48 fragments.

The 16 texture units would mean that each cycle, only 16 of these fragment processors will get to retrieve data from memory.

Yup but with the nature of ATi's R5xx hardware a thread waiting for a memory read could be swapped out and another swapped in to process ALU instructions until it stalls or the first read is finished.

Given that NV45 has 16 texture units and 16 pixel processors with two full ALUs each, that means a 16*2:16 = 2:1 ratio. On the other hand, ATi has 16 texture units and 48 pixel processors with either 1 or 1.5 ALUs each (I don't know which)...this would translate to a 3:1 or 4.5:1.

Its one full ALU and one simple ALU, I am not sure what instructions the simple ALU can do but the ratio is ranging from 3:1 upto 5:1. A lot depending on the compiler and the generated code.

If anyone notices anything wrong, feel free to correct, not flame. :)

Not many corrections, some additions though :)

EDIT: On second thought I am not sure if they are actually processing 48 fragments in parallel or 16 with just wider ALUs, gonna lean toward 16 in parallel for now. So replace the 48 in parallel above with 16 :)

EDIT: If I made a mistake then someone please correct me.

 

dunno99

Member
Jul 15, 2005
145
0
0
Originally posted by: munky
Nice post, but what you said just went over the top of the heads of 99.9% of the flamers around here. I have a few things to add too. Right now Ati cards have 1 full alu and one mini alu per shader, and I'm guessing this will also be the case for the r580. We also know that in the r520 the pipes are no longer combined texture/shader units - the texture units are detached from the shaders and can be scheduled indpendently somewhat. This should improve the efficiency of the card, and while there may be situations where the r580 will be texture limited, those situations are not likely to occur often in modern games since games are becoming more dependent on shaders.

Ah. Yeah, I wasn't too sure about the 1 ALU or 1.5 ALU thing, which is why I gave a 3:1 and a 4.5:1. And I agree, I believe the texture units are detached -- it could be limiting, or it can increase efficiency.

Originally posted by: nts
AFAIK the batches can't be re-aligned, think of them as tiles. Like a 4x4 pixel grid on the screen/framebuffer. This is todo with how hierarchial z buffering and other compression techniques work.

AFAIK On all hardware the units are divided into groups of 4.

Ah, ok. That would make sense then. No shifting to the left/right by one pixel to realign due to HierarchialZ.

Also, what you mean with the groups of four pipelines is that they're just bundled up that way on silicon? I'm not too familiar about that part.

I believe they are and operate in parallel.

I meant that they're connected in series, but operate in parallel. As in, results of unit 1 gets fed into unit 2, but they both process during the same clock (at least this is what it seems to look like on the block diagram).

When a texture fetch occurs the pipe will stall until the result arrives. I am not sure but I believe on NVIDIA hardware the ALUs will be used for the texture addressing (meaning you can't throw in an ALU op when the next op is TMU op). Not sure about threading on NVIDIA hardware will the pipe stall or can you swap threads?

Hrm, the chapter wasn't too clear on that. But from an Out-of-Order perspective, there shouldn't be any problems with going ahead with the computation as long as there aren't any hazards. And from the wording (although I'm sure marketing has had its touch on it...), it seems like the nVidia GPU could actually go ahead and process the fragment as long as no hazard occurs (I remember reading somewhere that the pipe will stall if subsequent operations are dependent on the texture fetch).

On ATi's R5xx hardware latency hiding is done by the programmable threading processor with the help of the shader compiler. On NVIDIA hardware its only the compiler.

Ah, hence the entire article about the threading and stuff.

Both compilers are generally very good at reorganizing code for maximum efficiency, especially when using a higher level language like GLSL or HLSL. So you don't have to manually order the instructions in most circumstances.

Ok, good...then that means I wouldn't have to hack at the assembly post-compile. :thumbsup:

On NVIDIA hardware the batch sizes for rendering are 256x256 (AFAIK NV40) and 64x64 (AFAIK G70). When a branch occurs both paths are computed and then selectively selected or thrown away for each pixel in the batch.

Ok, this is the part where it starts getting confusing for me because I don't know what the terminology mean and what they apply to. What exactly is a "batch?" Is it the set of adjacent fragments of a polygon (technically triangle, since they're all broken down to that anyways) that the rasterizer generates? If that's the case, then 256x256 DOES seem a little large and inefficient. And if this is the case, I think I can assume that these polygons would also have their respective shader code attached.

The R5xx can also have 512+ threads in flight per unit, anybody know how many G70 can have (i've heard very low numbers here, 64, 128 but I am not sure).

I'm guessing each thread equates to one batch? That would seem like the cleanest way to do it (or one thread to one fragment). If it's the case that one thread equates to one batch, then I'm wonder why there's a need for 512+ outstanding threads per unit...isn't the point of the threading just to make sure that the pipe doesn't stall? If that's the case, one would probably only need enough threads to cover the longest latency (worst case scenario). Again, I'm just speculating here. However, if it was 512+ outstanding threads for the entire GPU, then I can understand, since these threads will get passed around and split over the 16 units.

They should be able to process 48 fragments in parallel and they should be able to communicate with one another. If one isn't used and one has a high thread count waiting then it should be possible to swap the thread to another processor but i am not sure if this is the case atm.

By "one," do you mean pipelines? or a unit of pipelines (as in, 4 pipelines per unit)? This is the part that I'm also confused about. However, I think you're right about the swapping...I think the pipelines/units should be able to swap threads with one another.

Its one full ALU and one simple ALU, I am not sure what instructions the simple ALU can do but the ratio is ranging from 3:1 upto 5:1. A lot depending on the compiler and the generated code.

Yup, I was just referring to some of the articles that were stating a 2:1 for nVidia and 3:1 for ATi...just clarifying as how I interpret it. And at this point, it looks like a best-case 4.5:1.

On second thought I am not sure if they are actually processing 48 fragments in parallel or 16 with just wider ALUs, gonna lean toward 16 in parallel for now. So replace the 48 in parallel above with 16

I would think that it's 48 fragments for the X1900, since the X1800 should be 16 (o/w I don't it's possible for the X1800 to match up to the 7800GTX at all -- since if the X1900 is 16, then the X1800 would be something like 5.3...:confused: ).
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: Cookie Monster
What i am worried is the texture units. For example the X1600XT is a 4 pipe card as some of you might know. It is a 4-1-3-2 architecture which means that each 'pipline' is composed of 3 fragment shaders/shader processors. It has 8 geometrical shaders, BUT crippled with 4 texture units.

Now, i know for a fact (99%) that the RV580(X1900) is based on RV530(X1600). Hence, it might be 16-1-3-2 (could be 16-1-3-1) architecture. BUT it has only 16 texture units.
im confused. the x1600xt and pro have 12 pipes not 4. mabye im just not understanding what you are trying to say.

 

nts

Senior member
Nov 10, 2005
279
0
0
Originally posted by: dunno99
Also, what you mean with the groups of four pipelines is that they're just bundled up that way on silicon? I'm not too familiar about that part.

Yup, on silicon they are bundled up with 4 pipes making one unit. All 4 pipes of the unit will also be processing the same shader at the same time just for different fragments.

I meant that they're connected in series, but operate in parallel. As in, results of unit 1 gets fed into unit 2, but they both process during the same clock (at least this is what it seems to look like on the block diagram).

I haven't taken a look on the diagram but I'm not sure that you can feed the results of unit 1 into unit 2, ofcourse I could be very wrong. They could operate like Pentium 4's where each ALU is operating at twice the clock speed.

NVIDIA was also saying how they optimized the MAD instructions so this should be possible actually. Hmmm very interesting.

Hrm, the chapter wasn't too clear on that. But from an Out-of-Order perspective, there shouldn't be any problems with going ahead with the computation as long as there aren't any hazards. And from the wording (although I'm sure marketing has had its touch on it...), it seems like the nVidia GPU could actually go ahead and process the fragment as long as no hazard occurs (I remember reading somewhere that the pipe will stall if subsequent operations are dependent on the texture fetch).

I know ATi can swap in different threads when one stalls but can NVIDIA do this too?

Ok, this is the part where it starts getting confusing for me because I don't know what the terminology mean and what they apply to. What exactly is a "batch?" Is it the set of adjacent fragments of a polygon (technically triangle, since they're all broken down to that anyways) that the rasterizer generates? If that's the case, then 256x256 DOES seem a little large and inefficient. And if this is the case, I think I can assume that these polygons would also have their respective shader code attached.

This part is actually a bit confusing for me too :p

Think of a batch as a group of fragments with multiple batches forming a triangle that get passed/mapped to the pipes along with a mask (where no fragments should be drawn). Atleast thats what it looks like to me.


I'm guessing each thread equates to one batch?

Yup, in its initial state it should be. With branching and looping they will get split into more threads.

*wonders how batches are processed by pipes* since each pipe does 4 fragments at a time would they do the same operation on different fragments in the whole batch (SIMD) or is it something like MIMD... something to check on later.

That would seem like the cleanest way to do it (or one thread to one fragment).

All processing is done on 4 fragments at a time.

If it's the case that one thread equates to one batch, then I'm wonder why there's a need for 512+ outstanding threads per unit...isn't the point of the threading just to make sure that the pipe doesn't stall?

If that's the case, one would probably only need enough threads to cover the longest latency (worst case scenario). Again, I'm just speculating here. However, if it was 512+ outstanding threads for the entire GPU, then I can understand, since these threads will get passed around and split over the 16 units.

Well think about it, what happens when you get a full screen triangle drawn. It generates a lot of batches and threads.

On the R5xx when one stalls on a pipe (texture op) another can be swapped in. You could be doing ALU operations on another thread while the first one is waiting for a texture. Meaning the pipes could be 100% busy at all time (until all threads are waiting for a texture or something).

When a thread is swapped in and out you need to keep track of its whole state, the more of these you can track the more threads could be going through in parallel meaning the pipe is less likely to be fully stalled (not in use).

btw I'm not sure if its 512+ threads per unit or in one shared pool (between the pipes), I'll try to look that up later but its most likely for the entire GPU instead of each unit. Sorry about the confustion.

By "one," do you mean pipelines? or a unit of pipelines (as in, 4 pipelines per unit)? This is the part that I'm also confused about. However, I think you're right about the swapping...I think the pipelines/units should be able to swap threads with one another.

I mean a unit of pipes (4 pipe group).

Yup, I was just referring to some of the articles that were stating a 2:1 for nVidia and 3:1 for ATi...just clarifying as how I interpret it. And at this point, it looks like a best-case 4.5:1.

Yup but in the extreme best case you could hit upto 6:1, probably one or two instructions could hit that.

I would think that it's 48 fragments for the X1900, since the X1800 should be 16 (o/w I don't it's possible for the X1800 to match up to the 7800GTX at all -- since if the X1900 is 16, then the X1800 would be something like 5.3...:confused: ).

There are two ways to think about this as I see.

One is that each of the 3 processors is doing a separate set of fragments and another is that they are working on the same set at the same time.

I am currently leaning toward working on the same set at the same time. On the diagram you saw with the 2 ALU's in series in a very simple representation you can think of these 3 "processors" as 4 more ALUs in that series (upto 6 but instead of 2 full ones you have one and a simple one). How they would actually be mapped, we'll have to wait for reviews to be released.

The X1800 is doing 16 fragments at a time what the X1900 should be doing is keeping the same pipes and simply tripling the ALU "shader processor" power. Atleast this is the way I see it now. Until the hardware and reviews are out, can't be sure either way but the above is looking like the most likely case.
 

nts

Senior member
Nov 10, 2005
279
0
0
Originally posted by: toyota
im confused. the x1600xt and pro have 12 pipes not 4. mabye im just not understanding what you are trying to say.

They actually have 4, its just that each of the 4 pipes has 3 shader processors (for ALU math operations).

4x3 = 12

But each pipe only has one texture unit, this is what is crippling the X1600 performance. Really fast at math but slow at texture operations. I dont see this as a problem on the X1900 though.
 

lopri

Elite Member
Jul 27, 2002
13,310
687
126
Originally posted by: Ackmed
Originally posted by: peleejosh
We all know how reliable the ATI folks are :) I will believe it when I see one in a store.


Because there are 512MB GTX's in any store? We all know how reliable the NV folks are...

I'm so glad that I got 2 GTX 512s on launch day. I've been flying through my games. :D