9800 GTX+/GTS 250

ArchAngel777

Diamond Member
Dec 24, 2000
5,223
61
91
I remember participating in a thread quite a while back that mentioned the 9800GTX+ had almost the same shader power as the 260GTX (192) and I agreed and did the simple math behind it. Here is a basic breakdown. nVidia reference clocks are being compared. For the shader I included the 216sp version as well because the benchmarks will be using the version.


9800 GTX+ - 128sp x 1836mhz = 235,008 - 100%

GTX 260 - 192sp x 1242mhz = 238,464 - 102%

GTX+ 260 - 216sp x 1242mhz = 268,272 - 114%

This is quite astonishing. I am not sure that many people know how much shader power the 9800GTX+ has. It nearly matches the 260GTX.


9800 GTX+ - 738mhz X 16rop = 11,808 Fill Rate - 100%

GTX 260 - 576mhz X 28rop = 16,128 Fill Rate - 137%

GTX+ 260 - 576mhz X 28rop = 16,128 Fill Rate - 137%

Looks like fill rate is a different story. The GTX 260 has quite a bit more.


9800 GTX+ - 70.4GB/sec - 100%

GTX 260 - 111.2GB/sec - 158%

GTX+ 260 - 111.2GB/sec - 158%

No competition for memory bandwidth. The GTX 260 destroys the 9800 GTX+

With these numbers out there in the open, I decided to check ATs review again to see how well the GTS 250 faired and to my surprise, it held its own in most titles (even against a 216sp version). Which would seem to indicate that my initial thoughts about 1 year ago were not correct. I don't think memory bandwidth is as huge as an issue for G92 as I had originally thought.

If memory bandwidth were the constant bottleneck on the 9800 GTX+, we would see the GTX 260 pull ahead by 50%+ each time, but if you look through the benchmarks, they won't back that up.

Benchmark Results taken from AT's GTS 250 review found here. Using the 1GB for reference. Keep in mind this is also the faster version of the GTX 260 as this has 216sp.


Age of Conan

2560x1600

GTS 250 = 22.1

GTX 260 = 19.5

Well, this is interesting... The GTS 250 is actually faster. I wonder if they accidentally switched the graphs around? Presuming they did switch the graphs around, it still would only be right around 11% faster.


Call of Duty

2560x1600

GTS 250 = 34.2

GTX 260 = 41.7

Here we do have a rather large performance increase. 22% increase, but still a far cry from the 58% advantage it holds in memory bandwidth.


Crysis Warhead

2560x1600

GTS 250 = 13.7

GTX 260 = 17.7

The largest difference yet - The GTX 260 is 29% faster. I would be willing to bet that this is a title where memory bandwidth matters a bit more.


Fallout 3

2560x1600

GTS 250 = 30.4

GTX 260 = 33.7

Here we have only an 11% performance advantage.

I think, for the most part, the memory bandwidth was never a problem for G92. There may be certain scenes or specific games where it becomes the bottleneck, but overall, I think G92 has enough to go around.

Thoughts?
 

Zap

Elite Member
Oct 13, 1999
22,377
2
81
Interesting observation. I'm not sure they are exactly comparable though, because of architectural differences. However, very thought provoking.
 

betasub

Platinum Member
Mar 22, 2006
2,677
0
0
Originally posted by: ArchAngel777
I think, for the most part, the memory bandwidth was never a problem for G92. There may be certain scenes or specific games where it becomes the bottleneck, but overall, I think G92 has enough to go around.

:cookie: Nice set of data.

The "certain scenes or specific games" is why both Nvidia and ATI have moved to far greater memory bandwidth on their current high-end cards. And of course developers are only going to demand more and more.

For G92, the imbalance becomes evident with multi-GPU/SLI, as my 9800GX2 shows with a mere 64GB/s. (Somewhat ironically) I upgraded from a 2900xt, where memory bandwidth was not the issue (>100GB/s).

 

ArchAngel777

Diamond Member
Dec 24, 2000
5,223
61
91
Originally posted by: betasub
Originally posted by: ArchAngel777
I think, for the most part, the memory bandwidth was never a problem for G92. There may be certain scenes or specific games where it becomes the bottleneck, but overall, I think G92 has enough to go around.

:cookie: Nice set of data.

The "certain scenes or specific games" is why both Nvidia and ATI have moved to far greater memory bandwidth on their current high-end cards. And of course developers are only going to demand more and more.

For G92, the imbalance becomes evident with multi-GPU/SLI, as my 9800GX2 shows with a mere 64GB/s. (Somewhat ironically) I upgraded from a 2900xt, where memory bandwidth was not the issue (>100GB/s).

I don't know much about SLI as I have never used it, nor wanted it. But I was relatively certain the bandwidth doubles with two cards, but not the memory size itself. So two cards with 512MB of memory still only had 512MB of memory to use when SLI'd, but the bandwidth would double. It would have went from 64GB/s to 128GB/s. Someone who knows this for sure, feel free to comment.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
you have the numbers wrong for Fallout 3. also maybe with Age of Conan at 2560 4xAA the memory exceeded the 896mb the gtx260 had.
 

ArchAngel777

Diamond Member
Dec 24, 2000
5,223
61
91
Originally posted by: toyota
you have the numbers wrong for Fallout 3. also maybe with Age of Conan at 2560 4xAA the memory exceeded the 896mb the gtx260 had.

Thanks, I reverse the Fallout 3 numbers.

You might be right on the Age of Conan, but I have a really, really hard time believing that the game eats up 896mb of memory. Crysis doesn't even come close to that with those settings and that game is a flat out monster.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: ArchAngel777
Originally posted by: toyota
you have the numbers wrong for Fallout 3. also maybe with Age of Conan at 2560 4xAA the memory exceeded the 896mb the gtx260 had.

Thanks, I reverse the Fallout 3 numbers.

You might be right on the Age of Conan, but I have a really, really hard time believing that the game eats up 896mb of memory. Crysis doesn't even come close to that with those settings and that game is a flat out monster.

well a game can eat up video memory while not necessarily being stunning looking. there are other games like Oblivion and Fallout 3 with high texture mods and Far Cry 2 that use more video memory than Crysis.
 

cusideabelincoln

Diamond Member
Aug 3, 2008
3,268
11
81
Of course the memory bandwidth isn't quite so important for the 9800. If you would have looked at the results of the 8800GTX vs. 9800GTX when the 9800 debuted, you would have noticed the same thing: The 9800GTX was just as faster or faster despite having less memory bandwidth (and less memory). You can even compare the HD4850 to the HD4870; the 4870 has almost twice the bandwidth but only holds a 25-30% performance advantage when you compare the stock speeds of the cards - part of this performance advantage is also thanks to the HD4870 having higher core clock speeds too.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: cusideabelincoln
Of course the memory bandwidth isn't quite so important for the 9800. If you would have looked at the results of the 8800GTX vs. 9800GTX when the 9800 debuted, you would have noticed the same thing: The 9800GTX was just as faster or faster despite having less memory bandwidth (and less memory). You can even compare the HD4850 to the HD4870; the 4870 has almost twice the bandwidth but only holds a 25-30% performance advantage when you compare the stock speeds of the cards - part of this performance advantage is also thanks to the HD4870 having higher core clock speeds too.

wasnt there a comparison of the 4850 and 4870 at the same core clockspeed? seems like 10% was about the advantage the 4870 gddr5 memory speed had over the 4850 gdd3.
 

geokilla

Platinum Member
Oct 14, 2006
2,012
3
81
I believe that the G92 higher shader speeds more than make up for the shaders in the GT200 GPUs. Those shaders on the GT200 cores operate much slower than those on the G92 cores.

Yes I know, it's irrelevant to the memory bandwidth that the OP was talking about, but to me it's related.
 

nismotigerwvu

Golden Member
May 13, 2004
1,568
33
91
Wouldn't it be much better if we just had someone with a 9800GTX+ or GTS250 run a few benches?
Stock clocks, 10% core oc, 10% mem oc, 10% on each (the 10% is just a random number I pulled out of nowhere, I've really got no idea how these cards oc).
If it was mem bandwidth starved it would scale much better with a mem oc versus a core oc (and the oc on each would look very similar to just a mem oc).
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
Originally posted by: nismotigerwvu
Wouldn't it be much better if we just had someone with a 9800GTX+ or GTS250 run a few benches?
Stock clocks, 10% core oc, 10% mem oc, 10% on each (the 10% is just a random number I pulled out of nowhere, I've really got no idea how these cards oc).
If it was mem bandwidth starved it would scale much better with a mem oc versus a core oc (and the oc on each would look very similar to just a mem oc).

i got my Galaxy GTS 250 - 512MB version in today for testing
- it does overclock well

but what good will it do as i do not have a 9800GTX+
- i just have 8800-GTX and 9600GT :p
 

SSChevy2001

Senior member
Jul 9, 2008
774
0
0
Originally posted by: toyota
Originally posted by: cusideabelincoln
Of course the memory bandwidth isn't quite so important for the 9800. If you would have looked at the results of the 8800GTX vs. 9800GTX when the 9800 debuted, you would have noticed the same thing: The 9800GTX was just as faster or faster despite having less memory bandwidth (and less memory). You can even compare the HD4850 to the HD4870; the 4870 has almost twice the bandwidth but only holds a 25-30% performance advantage when you compare the stock speeds of the cards - part of this performance advantage is also thanks to the HD4870 having higher core clock speeds too.

wasnt there a comparison of the 4850 and 4870 at the same core clockspeed? seems like 10% was about the advantage the 4870 gddr5 memory speed had over the 4850 gdd3.
Here's the comparison of a 4850 1GB at 750/2400mhz vs 4870 1GB at 750/3600mhz. The problem is the min FPS are taking a bigger hit from the loss in bandwidth.

http://www.xbitlabs.com/articl...d4850-1024mb-gs_7.html

CoD WaW 19x12 4xAA 16xAF MIN / AVG
4850 (750/2400) 33/52.6
4870 (750/3600) 42/63.4 increase of 27/20%

Fallout 3 4xAA 16xAF MIN / AVG
4850 (750/2400) 42/56.3
4870 (750/3600) 52/68.4 increase of 24/21%

FarCry 2 19x12 4xAA 16xAF MIN / AVG
4850 (750/2400) 25/40.8
4870 (750/3600) 32/44.8 increase of 28/10%

Prince of Persia 19x12 4xAA 16xAF MIN / AVG
4850 (750/2400) 20/42.8
4870 (750/3600) 27/52.6 increase of 35/23%
 

lopri

Elite Member
Jul 27, 2002
13,207
593
126
Bandwidth and Frame buffer are two different animals, and iirc 9800 GTX suffered on both counts plus ForceWare's strange overhead on memory management. (which seems to be improving to every new release)
 

apoppin

Lifer
Mar 9, 2000
34,890
1
0
alienbabeltech.com
i'll be damned . . . [anyway] :p

i just tested a 9800GT at stock speeds at 16x10 and 14x9 before i sent it back to another editor
- and i got the GTS 250 in exchange

i can do the testing tomorrow - 15 benchmarks - but they will be stock; i can overclock the GTS 250 - but not the 9800GT on the games
- i CAN however, give you the percentage of which it does overclock in a few games on another system
rose.gif


IF you are interested
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
You aren't going to see 50% scaling from 50% more bandwidth in any modern game, because those rely heavily on shader performance.
 

ArchAngel777

Diamond Member
Dec 24, 2000
5,223
61
91
Originally posted by: munky
You aren't going to see 50% scaling from 50% more bandwidth in any modern game, because those rely heavily on shader performance.

Of course, because bandwidth isn't a constant. It just needs to be 'available'. Even if games did not rely on shader performance, you still would never see a 50% performance increase from 50% more memory bandwidth because the rendering process would never saturate it fully at all times.

Memory bandwidth has diminishing returns and my own testing (8800GTS 512MB) found that increase bandwidth 10% resulted in a 5% gain of performance. The same applied to underclocking the memory. Lowering it 10% resulted in 5% performance loss. Cutting it in in half? it didn't cut performance in half, but by 25%. Bandwidth is something that just needs to be available.


 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: cusideabelincoln
Of course the memory bandwidth isn't quite so important for the 9800. If you would have looked at the results of the 8800GTX vs. 9800GTX when the 9800 debuted, you would have noticed the same thing: The 9800GTX was just as faster or faster despite having less memory bandwidth (and less memory). You can even compare the HD4850 to the HD4870; the 4870 has almost twice the bandwidth but only holds a 25-30% performance advantage when you compare the stock speeds of the cards - part of this performance advantage is also thanks to the HD4870 having higher core clock speeds too.

But then again 9800gtx has whole lot more texture fillrate and processing power to compensate for the bandwidth.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: toyota
wasnt there a comparison of the 4850 and 4870 at the same core clockspeed? seems like 10% was about the advantage the 4870 gddr5 memory speed had over the 4850 gdd3.

About 15% but with more AA and higher resolutions it could be more.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: ArchAngel777
I remember participating in a thread quite a while back that mentioned the 9800GTX+ had almost the same shader power as the 260GTX (192) and I agreed and did the simple math behind it. Here is a basic breakdown. nVidia reference clocks are being compared. For the shader I included the 216sp version as well because the benchmarks will be using the version.


9800 GTX+ - 738mhz X 16rop = 11,808 Fill Rate - 100%

GTX 260 - 576mhz X 28rop = 16,128 Fill Rate - 137%

GTX+ 260 - 576mhz X 28rop = 16,128 Fill Rate - 137%

Looks like fill rate is a different story. The GTX 260 has quite a bit more.


Thoughts?

Don't forget texture fill. GTX250 has more texture fill than GTX260 216SP.

GTS250 47232 MTexels/sec
GTX260 216SP 41472 MTexels/sec

What it lacks it excels in others. GTS250 is definitely hindered by memory bandwidth to a greater degree than GTX260. It does great in raw frame rates without AA and 10% comparable to GTX260. With AA GTX260 trashes GTS250 as much as 50% at times but usually around 30+%. There might be some errors in some of those tests. Some frame rates are exactly the same for GTS 250 and GTX260 which seems a little odd.
 

BFG10K

Lifer
Aug 14, 2000
22,477
2,399
126
Bandwidth has largely been a non-issue on most DX10 parts from ATi and nVidia.

The most interesting example is probably the 4850. It has around half the bandwidth of a 8800 Ultra, yet it?s about the same speed. It also has less bandwidth than a 3870, yet it smashes it in performance.

Bandwidth being a non-issue is an extremely prevalent paradigm among DX10 parts.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: BFG10K
Bandwidth has largely been a non-issue on most DX10 parts from ATi and nVidia.

The most interesting example is probably the 4850. It has around half the bandwidth of a 8800 Ultra, yet it?s about the same speed. It also has less bandwidth than a 3870, yet it smashes it in performance.

Bandwidth being a non-issue is an extremely prevalent paradigm among DX10 parts.

You are citing a card that has relatively maxed out fillrate compared to it's bandwidth. Another would be 2900xt. Ultra just doesn't have the texture fillrate or processing power to compete with 4850. What it does have is high color fill and bandwidth for better AA performance.

The more fillrate the GPU has the more bandwidth it needs. Not to mention bandwidth has tremendous impact on AA performance. To say bandwidth is a non-issue is like saying SP are non issue.
 

BFG10K

Lifer
Aug 14, 2000
22,477
2,399
126
Originally posted by: Azn

You are citing a card that has relatively maxed out fillrate compared to it's bandwidth.
Whatever theory you choose to put forward, the fact is the Ultra is but one of many DX10 cards that show the same traits, namely memory bandwidth not being the primary limiting factor on performance.

What it does have is high color fill and bandwidth for better AA performance.
Not to mention bandwidth has tremendous impact on AA performance.
Right, but the 4850 still competes well against the 8800 Ultra with 4xAA, and is often faster at 8xAA. I should know because actually benchmarked both parts.

So yet again we see the 4850 not being hindered when running AA despite having around half the bandwidth of the Ultra.

To say bandwidth is a non-issue is like saying SP are non issue.
No it?s not, not when it?s readily demonstrated that SP clocks generally have a bigger impact on performance than memory clocks. Again this is something that I?ve tested repeatedly with several parts.
 

ArchAngel777

Diamond Member
Dec 24, 2000
5,223
61
91
Thanks for chiming in BFG. In fact, BFG was one of the few who got it right from the get go. In the thread from around a year ago, he had stated that we were shader limited, not memory bandwidth limited. I sided with Azn at that time, because I truly believed we were bandwidth limited. However, testing and some logical reasoning based on other parts and their performance has caused me to rethink and recant.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: BFG10K
Whatever theory you choose to put forward, the fact is the Ultra is but one of many DX10 cards that show the same traits, namely memory bandwidth not being the primary limiting factor on performance.

It's not a theory. It's been proven over time release of every GPU. All cards have strengths and weaknesses. Ultra is 1 GPU of many dx10 cards that shows different traits. Namely memory bandwidth having detrimental impact on AA performance and lack of fillrate to take advantage of all that bandwidth.

Right, but the 4850 still competes well against the 8800 Ultra with 4xAA, and is often faster at 8xAA. I should know because actually benchmarked both parts. So yet again we see the 4850 not being hindered when running AA despite having around half the bandwidth of the Ultra.

It only competes well with 8800ultra even with 4xAA or 8xAA because ATI have upgraded their AA implementing system by doubling pixel per clock something ultra can't do.

http://techreport.com/r.x/rade...ender-backend-ppcs.gif


No it?s not, not when it?s readily demonstrated that SP clocks generally have a bigger impact on performance than memory clocks. Again this is something that I?ve tested repeatedly with several parts.

You might have tested SP and memory bandwidth in separate occasions on bandwidth happy cards to determine what makes bigger impact but you did not test how the bandwidth effects fillrate performance. Fillrate is hindered by memory bandwidth. Again something that I've tested repeatedly with GPU's.