- Jun 20, 2007
- 259
- 0
- 0
I was looking at some spec comparisons and I saw the 3870 has 320 stream processors and the 8800GT has 112. Why doesn't this translate into performance? Same question with memory width 256 bit vs 512 bit?
No, shader performance has. Witness how 256 bit derivatives (3870, 8800 GT, 8800 GTS 512) perform so well compared to their older cousins with much more memory bandwidth.effective memory badwidth has pretty much been the single most important factor in graphics performance over the last several years imo
Originally posted by: Cookie Monster
Firstly, stream processor is a shady term. Let me break this down to you in an easy to understand way.
ATi has 320 ALUs (Arithmetic Logic Units, or shaders that are grouped into a group of 5. Therefore they are sometimes mentioned as a "Vec5" shader although the term differs slightly with ATI's ALU setup. In RV670/R600, the vec5 shader (one group) consists of 4 ALUs (scalar) and one special or beefy ALU (that can perform other special ops). The RV670 i.e HD38x0 have 64 Vec5 shaders that are clocked at the same frequency as the rest of the chip i.e the core clock speed.
Ok, now we got this far, lets talk about how they work. Vec5 shaders can be advantageous against others IF its feeded (cant think of an easier term) with Vec5 instructions i.e, this one big fat instruction (requiring 5 ALUs all at once) is taken care of by one vec5 shader, i.e speeding up the operation much faster. However the downside is that the instructions must be coded in a way that takes advantage of the 5 ALUs. If say the vec5 shader is feeded with a scalar instruction (i.e requires just one ALU) it will waste 4 ALUs to complete the task. I.e the utilization of a vec5 shader architecture is very low but in a pure arithmetic crunching power, its faster than the nVIDIA counterparts.
Now lets compare G80/G9x. They have scalar ALUs. 128 of them to be exact. Compared to the 320 of the R600/670 its pretty low. However the difference is significant. Firstly, nVIDIA has been using different clock frequencies for parts of the chip, where the shader core (the ALUs) runs as fast as 1650MHz comapred to 750MHz of a RV670. Its quite the difference. Not only that but these are scalar ALUs. Even if feeded with a vec5 instruction (which can be slow for the scalar ALUs), these are broken up into 5 scalar instruction and so forth. So basically the utilization of a scalar shader architecture is roughly near ~100%. However in terms of pure arithmetic power, it lacks behind its competition although not as badly.
IMO i think the G80/G9x architectures handles those instructions fairly well seeing as their shader core is clocked around 2 times as high as their ATi counterparts, plus a utilization of those scalar ALUs at ~100%. Which means that basically nVIDIA cards will be consistent in games while the ATI counterparts might have some drawbacks because of its "vec5" nature. However there are other major variables invovled such as texturing power where the nVIDIA cards has over twice to triple the amount compared to ATI. AA and AF performance also can be mentioned too and both are lacking in ATi GPUs.
Its been years, and even now that bandwidth was never a major favor when concerning game performance. One good example is the R600 and its 512bit bus (i still think this was more of a check box thing then being implemented for practical use). Or 7800GTX 512MB and its monstrous 1800MHz memory clock back in the days (yet the 7900GTX outperformed it because of a higher core clock and for ref had a memory clock of 1600MHz). There are relatively rare occasions where a scene could be bandwidth limited. These are situations where SSAA and high res might come into play. But theres shader/texture/fillrate variables to also think about. IMO, cards were never too fast enough to make use of its overwhelming bandwidth and i.e bandwidth was something last on the list. (Maybe its abit different with G92s as seen by their lack of bandwidth and overwhelming shader/texturing performance).
Hope this answers your question. Bear in mind, ive missed out on alot of other details.
Can anyone explain what the stream processor actually does in the GPU?
Originally posted by: Soulkeeper
effective memory badwidth has pretty much been the single most important factor in graphics performance over the last several years imo
the memory bus width is an important factor to this
Originally posted by: superbooga
Then why was the 2900xt, with its massive memory bandwidth, slower than both the 8800GTX and 3870?
Originally posted by: Cookie Monster
Firstly, stream processor is a shady term. Let me break this down to you in an easy to understand way.
ATi has 320 ALUs (Arithmetic Logic Units, or shaders that are grouped into a group of 5. Therefore they are sometimes mentioned as a "Vec5" shader although the term differs slightly with ATI's ALU setup. In RV670/R600, the vec5 shader (one group) consists of 4 ALUs (scalar) and one special or beefy ALU (that can perform other special ops). The RV670 i.e HD38x0 have 64 Vec5 shaders that are clocked at the same frequency as the rest of the chip i.e the core clock speed.
Ok, now we got this far, lets talk about how they work. Vec5 shaders can be advantageous against others IF its feeded (cant think of an easier term) with Vec5 instructions i.e, this one big fat instruction (requiring 5 ALUs all at once) is taken care of by one vec5 shader, i.e speeding up the operation much faster. However the downside is that the instructions must be coded in a way that takes advantage of the 5 ALUs. If say the vec5 shader is feeded with a scalar instruction (i.e requires just one ALU) it will waste 4 ALUs to complete the task. I.e the utilization of a vec5 shader architecture is very low but in a pure arithmetic crunching power, its faster than the nVIDIA counterparts.
Now lets compare G80/G9x. They have scalar ALUs. 128 of them to be exact. Compared to the 320 of the R600/670 its pretty low. However the difference is significant. Firstly, nVIDIA has been using different clock frequencies for parts of the chip, where the shader core (the ALUs) runs as fast as 1650MHz comapred to 750MHz of a RV670. Its quite the difference. Not only that but these are scalar ALUs. Even if feeded with a vec5 instruction (which can be slow for the scalar ALUs), these are broken up into 5 scalar instruction and so forth. So basically the utilization of a scalar shader architecture is roughly near ~100%. However in terms of pure arithmetic power, it lacks behind its competition although not as badly.
IMO i think the G80/G9x architectures handles those instructions fairly well seeing as their shader core is clocked around 2 times as high as their ATi counterparts, plus a utilization of those scalar ALUs at ~100%. Which means that basically nVIDIA cards will be consistent in games while the ATI counterparts might have some drawbacks because of its "vec5" nature. However there are other major variables invovled such as texturing power where the nVIDIA cards has over twice to triple the amount compared to ATI. AA and AF performance also can be mentioned too and both are lacking in ATi GPUs.
Its been years, and even now that bandwidth was never a major favor when concerning game performance. One good example is the R600 and its 512bit bus (i still think this was more of a check box thing then being implemented for practical use). Or 7800GTX 512MB and its monstrous 1800MHz memory clock back in the days (yet the 7900GTX outperformed it because of a higher core clock and for ref had a memory clock of 1600MHz). There are relatively rare occasions where a scene could be bandwidth limited. These are situations where SSAA and high res might come into play. But theres shader/texture/fillrate variables to also think about. IMO, cards were never too fast enough to make use of its overwhelming bandwidth and i.e bandwidth was something last on the list. (Maybe its abit different with G92s as seen by their lack of bandwidth and overwhelming shader/texturing performance).
Hope this answers your question. Bear in mind, ive missed out on alot of other details.
Originally posted by: BFG10K
No, shader performance has. Witness how 256 bit derivatives (3870, 8800 GT, 8800 GTS 512) perform so well compared to their older cousins with much more memory bandwidth.effective memory badwidth has pretty much been the single most important factor in graphics performance over the last several years imo
I'm not sure what you're saying here. In anything modern (say last three years) shader performance is more important than memory bandwidth and overclocking the core will yield more performance than overclocking memory.that's why i said "IMO" because i don't look at the same niche junk as the next guy
but if you take the same GPU and put it on a board with multiple memory bandwidth configurations the results will be noticeable, moreso than an oc of the gpu
this has been the case for atleast 10yrs
Originally posted by: BFG10K
I'm not sure what you're saying here. In anything modern (say last three years) shader performance is more important than memory bandwidth and overclocking the core will yield more performance than overclocking memory.
Again I refer to the examples of the 3870 and 8800 GT/GTS 512 being competitive and even better than previous offerings with vastly more memory bandwidth.
Sure, if you cripple anything enough it'll start becoming the bottleneck.Welp, that's 'cause they've got 'nuff. On 'tother hand take a G71 with 128-bit bus and o'erclocking the memory makes a beeeg difference. Indeed, a 7600GT outperforms a 7800GS 256-bit.