9800 GTX+/GTS 250

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: ArchAngel777
Thanks for chiming in BFG. In fact, BFG was one of the few who got it right from the get go. In the thread from around a year ago, he had stated that we were shader limited, not memory bandwidth limited. I sided with Azn at that time, because I truly believed we were bandwidth limited. However, testing and some logical reasoning based on other parts and their performance has caused me to rethink and recant.

Kind of early to determine who is right from the get go don't you think? Just because you agree with BFG "now" doesn't make it right. :p

At the time of the argument 8800gt was just released and 8800gts was just about to release. Back then there weren't too many games that were shader limited. Now there are quite a bit of games that are more dependent on shader performance I've said this few months back as well when arguing with Chizow.

G92 are bandwidth limited which can catch up to GTX 260 level of performance if it did have the bandwidth much like 4850 can catch up to 4870 if it had more bandwidth. GTS250 has higher texture fillrate than GTX260 216SP. Again bandwidth is the carrier to carry the fillrate in and out of GPU memory subsystem. Expecting 58% better frame rates just because it has 58% more memory bandwidth is grossly exaggerated because you won't get the same 58% more frame rates even if shader is clocked 58% higher. GTX260 is one of those cards that have more than enough bandwidth to cover the fillrate while GTS250 does not. All cards are not created equal. All games are not created equal. If they were we can pretty much point to any card or game and make general statements like "bandwidth does not matter" "shader does not matter".
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
Originally posted by: Azn
Originally posted by: ArchAngel777
Thanks for chiming in BFG. In fact, BFG was one of the few who got it right from the get go. In the thread from around a year ago, he had stated that we were shader limited, not memory bandwidth limited. I sided with Azn at that time, because I truly believed we were bandwidth limited. However, testing and some logical reasoning based on other parts and their performance has caused me to rethink and recant.

Kind of early to determine who is right from the get go don't you think? Just because you agree with BFG "now" doesn't make it right. :p

At the time of the argument 8800gt was just released and 8800gts was just about to release. Back then there weren't too many games that were shader limited. Now there are quite a bit of games that are more dependent on shader performance I've said this few months back as well when arguing with Chizow.

G92 are bandwidth limited which can catch up to GTX 260 level of performance if it did have the bandwidth much like 4850 can catch up to 4870 if it had more bandwidth. GTS250 has higher texture fillrate than GTX260 216SP. Again bandwidth is the carrier to carry the fillrate in and out of GPU memory subsystem. Expecting 58% better frame rates just because it has 58% more memory bandwidth is grossly exaggerated because you won't get the same 58% more frame rates even if shader is clocked 58% higher. GTX260 is one of those cards that have more than enough bandwidth to cover the fillrate while GTS250 does not. All cards are not created equal. All games are not created equal. If they were we can pretty much point to any card or game and make general statements like "bandwidth does not matter" "shader does not matter".

I'd easily take BFG's word on this due to the fact he has done extensive benchmarking on these very parts. I really don't know what you're arguing at this point though.
Subsequent game releases will always get more and more shader intensive.

A G92 is a less powerful GPU compared to say a GTX260. Memory bandwidth being equal between the two would require the G92 to be clocked into orbit (much as they are now at 738 MHz core and 18xx shader). Clock that G92 down to GTX260 core and shader speeds and watch it crawl in comparison even if memory bandwidth was equal.

Picture tossing a G92 GPU onto a GTX260 board. Update the memory controller on the G92 to 448 bit. Clock it the same as a GTX260, and what do you think we will have? Something probably a bit slower than an 8800GTX/Ultra

And in contrast, just clock the core and shaders on a GTX260 to 738/18xx. Raped ape.
 

BFG10K

Lifer
Aug 14, 2000
22,477
2,399
126
Originally posted by: Azn

It's not a theory. It's been proven over time release of every GPU. All cards have strengths and weaknesses. Ultra is 1 GPU of many dx10 cards that shows different traits.
I?m not attempting to argue what the reasons are, I?m merely pointing out that memory bandwidth isn?t a factor on most DX10 parts because they have enough of it.

We?ve seen this time and time again with nVidia?s 9xxx series, ATi?s 2xxx series vs the 3xxx series, and the 4770 being faster than the 4830 despite having less bandwidth. I also have my own extensive tests of the 4850, GTX260+ and the 285 to fall back on.

Also Derek tested the GTX275 in a similar fashion:

http://www.anandtech.com/video/showdoc.aspx?i=3575&p=2

With a 14% memory overclock, the highest gain was 5.37%.
With a 11% core overclock, the highest gain was 9.73%.
With a 18% shader overclock the higher gain was 7.72%.

The biggest difference there is clearly from the core clock which is something I observed with my 8800 Ultra testing.

This is really not hard; just move the sliders individually and see which shows the most benefit. Again, there?s a plethora of evidence from multiple DX10 parts demonstrating memory bandwidth isn?t the primary limiting factor.

It only competes well with 8800ultra even with 4xAA or 8xAA because ATI have upgraded their AA implementing system by doubling pixel per clock something ultra can't do.
Hang on now, how can the ROPs possibly show a benefit if there?s a bottleneck with the memory like you claimed? ROPs are fillrate, not memory. If the 4850 was limited by memory bandwidth like you claim then beefing up the ROPs would make no difference.

Again this proves the 4850 isn?t memory bandwidth limited because doubling ROP performance basically doubles AA performance over the 3xxx series. Remember that the 4850 has less bandwidth than the 3870, yet it still shows a 2x performance gain in AA situations. This tells me that neither the 3870 or the 4850 are limited by memory bandwidth.

Not only that, but the 4870 has basically double the bandwidth of a 4850 but is only around 20%-30% faster in most AA situations. This incidentally is around the same difference as the core clock is between the cards. That?s not a coincidence by any stretch of the imagination.

And I?m aware of ATi?s new ROPs given I covered them in my Radeon 4000 AA series investigation.

You might have tested SP and memory bandwidth in separate occasions on bandwidth happy cards to determine what makes bigger impact but you did not test how the bandwidth effects fillrate performance.
?Bandwidth happy cards?? Are you claiming the 4850 is bandwidth happy? What about the 4770? Because if you are, you?re basically agreeing with me that memory bandwidth isn?t a factor on most DX10 parts given they have around half the bandwidth of top-end offerings.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0

BFG10K

Lifer
Aug 14, 2000
22,477
2,399
126
Originally posted by: toyota

but the 4770 has a much higher clock on those ROPs too.
Right, thereby proving memory bandwidth isn't the bottleneck given the 4770 is still faster than the 4830 with less of it.

It?s obvious that in this instance the AA bottleneck comes from the ROPs as clocking them higher increases AA performance, even while reducing memory bandwidth.

That?s my point.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: BFG10K
I?m not attempting to argue what the reasons are, I?m merely pointing out that memory bandwidth isn?t a factor on most DX10 parts because they have enough of it.

We?ve seen this time and time again with nVidia?s 9xxx series, ATi?s 2xxx series vs the 3xxx series, and the 4770 being faster than the 4830 despite having less bandwidth. I also have my own extensive tests of the 4850, GTX260+ and the 285 to fall back on.

Also Derek tested the GTX275 in a similar fashion:

http://www.anandtech.com/video/showdoc.aspx?i=3575&p=2

With a 14% memory overclock, the highest gain was 5.37%.
With a 11% core overclock, the highest gain was 9.73%.
With a 18% shader overclock the higher gain was 7.72%.

The biggest difference there is clearly from the core clock which is something I observed with my 8800 Ultra testing.

This is really not hard; just move the sliders individually and see which shows the most benefit. Again, there?s a plethora of evidence from multiple DX10 parts demonstrating memory bandwidth isn?t the primary limiting factor.

Exactly my point. Those results are for 8800ultra or GTX275 that is bandwidth happy. Again all cards do not behave like this as it has different configurations. G92 is a different beast with smaller ROP, bandwidth, more texture fill. I've done tests on my G92 8800gts and the same can't be said about this card. Raising core or shader had very little impact while increasing memory clocks had bigger impact. I went through all of this with Chizow. There should be a thread here somewhere with benchmark numbers if you search.

Hang on now, how can the ROPs possibly show a benefit if there?s a bottleneck with the memory like you claimed? ROPs are fillrate, not memory. If the 4850 was limited by memory bandwidth like you claim then beefing up the ROPs would make no difference. Again this proves the 4850 isn?t memory bandwidth limited because doubling ROP performance basically doubles AA performance over the 3xxx series. Remember that the 4850 has less bandwidth than the 3870, yet it still shows a 2x performance gain in AA situations. This tells me that neither the 3870 or the 4850 are limited by memory bandwidth. Not only that, but the 4870 has basically double the bandwidth of a 4850 but is only around 20%-30% faster in most AA situations. This incidentally is around the same difference as the core clock is between the cards. That?s not a coincidence by any stretch of the imagination. And I?m aware of ATi?s new ROPs given I covered them in my Radeon 4000 AA series investigation.

Back to my original point. Fillrate is hindered by memory subsystems. How much? I do not know the exact numbers because there are too many variables and different for each card. If a card can push 10000 mpixel/s it might hinder it by 20%, 25% or whatever by the bandwidth... Pixel fill has everything to do with AA as you already know. How fast or efficiently it gets there is controlled by the memory sub system. So if the 4850 can push 20000 mpixels/s while doing AA it might be hindered by memory subsystem by 20% while Ultra with 14688 mpixels/s might not be hindered at all because it has enough bandwidth to cover all the fillrate going in and out from the GPU and back to memory sub-system.

Same with 4870. That card has more than enough bandwidth for all the fillrate. Perhaps 50% of the bandwidth sitting idle doing nothing most of the time much like it did with 2900xt.

There are many techreport articles that discuss this issue.

http://techreport.com/articles.x/14524/5

Performance in the single-textured fill rate test tends to track more closely with memory bandwidth than with peak theoretical pixel fill rates, which are well beyond what the graphics cards achieve.


?Bandwidth happy cards?? Are you claiming the 4850 is bandwidth happy? What about the 4770? Because if you are, you?re basically agreeing with me that memory bandwidth isn?t a factor on most DX10 parts given they have around half the bandwidth of top-end offerings.

I implied your ultra was bandwidth happy which you did most of your testings on and even the the new GTX series which follow ultra's tradition. 4850 is pretty much an efficient card although it could probably use another 25% more bandwidth. 4870 in the other hand is not.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: BFG10K
Originally posted by: toyota

but the 4770 has a much higher clock on those ROPs too.
Right, thereby proving memory bandwidth isn't the bottleneck given the 4770 is still faster than the 4830 with less of it.

It?s obvious that in this instance the AA bottleneck comes from the ROPs as clocking them higher increases AA performance, even while reducing memory bandwidth.

That?s my point.

Then again 4850 beats 4770 even with AA and lower pixel fillrate.

Not only 4830 is clocked much lower than 4770 it's shader clocks are much lower as well. Again back to my explanation above 4830 might be hindered by bandwidth say 5% while 4770 might be hindered by bandwidth by 10% and the numbers still favor 4770. Efficiency is key.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Keysplayr
A G92 is a less powerful GPU compared to say a GTX260. Memory bandwidth being equal between the two would require the G92 to be clocked into orbit (much as they are now at 738 MHz core and 18xx shader). Clock that G92 down to GTX260 core and shader speeds and watch it crawl in comparison even if memory bandwidth was equal.

Picture tossing a G92 GPU onto a GTX260 board. Update the memory controller on the G92 to 448 bit. Clock it the same as a GTX260, and what do you think we will have? Something probably a bit slower than an 8800GTX/Ultra

G92 is not less powerful. With equal bandwidth as GTX260 it would easily be as fast as GTX260.

You are just exaggerating what a g92 could be if it followed GTX traditions. Screw the 448 bit memory controllers and raising ROP. Think simplicity. Stick GDDR5 like 4870. A 8800gts is fast as 8800gtx. 9800gtx+ or the GTS250 is fast as ultra with gddr3.. What makes you think it will be slower than 8800gtx/ultra with 448bit memory controller?

And in contrast, just clock the core and shaders on a GTX260 to 738/18xx. Raped ape

This is besides the point. You can clock anything and rape ape.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
Originally posted by: Azn
Originally posted by: Keysplayr
A G92 is a less powerful GPU compared to say a GTX260. Memory bandwidth being equal between the two would require the G92 to be clocked into orbit (much as they are now at 738 MHz core and 18xx shader). Clock that G92 down to GTX260 core and shader speeds and watch it crawl in comparison even if memory bandwidth was equal.

Picture tossing a G92 GPU onto a GTX260 board. Update the memory controller on the G92 to 448 bit. Clock it the same as a GTX260, and what do you think we will have? Something probably a bit slower than an 8800GTX/Ultra

G92 is not less powerful. With equal bandwidth as GTX260 it would easily be as fast as GTX260.

You are just exaggerating what a g92 could be if it followed GTX traditions. Screw the 448 bit memory controllers and raising ROP. Think simplicity. Stick GDDR5 like 4870. A 8800gts is fast as 8800gtx. 9800gtx+ or the GTS250 is fast as ultra with gddr3.. What makes you think it will be slower than 8800gtx/ultra with 448bit memory controller?

And in contrast, just clock the core and shaders on a GTX260 to 738/18xx. Raped ape

This is besides the point. You can clock anything and rape ape.

You are either missing, or ignoring, certain parts of the post.
At any rate, why say screw 448 bit and ROP? You claim G92 is bandwidth starved. So, I gave it more and you say screw it? Instead slap GDDR5 on there? Either way dude, whether you widen the bus or slap faster memory on there, it's still has the same effect. More bandwidth. I don't see a difference.

In a round about way, your kind of trying to say that the extra 64 shaders on the standard GTX 260, just doesn't get used. Tell that to the GTX260 216. Same card, more shaders on core turns out faster performance. Wonder why.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: Keysplayr
You are either missing, or ignoring, certain parts of the post.
At any rate, why say screw 448 bit and ROP? You claim G92 is bandwidth starved. So, I gave it more and you say screw it? Instead slap GDDR5 on there? Either way dude, whether you widen the bus or slap faster memory on there, it's still has the same effect. More bandwidth. I don't see a difference.

Oh I get your point but on Nvidia's configuration you can not raise memory controller without raising ROP. To have 448bit memory controller you need 28ROP. You get it now? GDDr5 is the only solution without raising rop count on g92 that's why I mentioned it.


In a round about way, your kind of trying to say that the extra 64 shaders on the standard GTX 260, just doesn't get used. Tell that to the GTX260 216. Same card, more shaders on core turns out faster performance. Wonder why.

You seem to made up a story about what I'm implying. When did I say those 64 shaders on the standard gtx260 are not being used? Lay down the pipe. ;)
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
It only makes sense that current graphics cards are being held back by a 256-bit interface. Even the 9700Pro was 256-bit, and that was in 2002! It is probably time for a 512-bit bus at this point.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: SickBeast
It only makes sense that current graphics cards are being held back by a 256-bit interface. Even the 9700Pro was 256-bit, and that was in 2002! It is probably time for a 512-bit bus at this point.

you do realize how slow the actual memory was back then dont you? there is no need for 512bit bus when you have memory that can hit insane speeds like gddr5 can. the next gen Nvidia card may stick to the 512bit bus though and just use some lower cost slower speed gddr5 intead of the fastest speeds that will be available at the time.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: toyota
Originally posted by: SickBeast
It only makes sense that current graphics cards are being held back by a 256-bit interface. Even the 9700Pro was 256-bit, and that was in 2002! It is probably time for a 512-bit bus at this point.

you do realize how slow the actual memory was back then dont you? there is no need for 512bit bus when you have memory that can hit insane speeds like gddr5 can. the next gen Nvidia card may stick to the 512bit bus though and just use some lower cost slower speed gddr5 intead of the fastest speeds that will be available at the time.

GDDr5 or 512bit with gddr3. It does the same. I think that's what Sickbeast was implying.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: Azn
Originally posted by: toyota
Originally posted by: SickBeast
It only makes sense that current graphics cards are being held back by a 256-bit interface. Even the 9700Pro was 256-bit, and that was in 2002! It is probably time for a 512-bit bus at this point.

you do realize how slow the actual memory was back then dont you? there is no need for 512bit bus when you have memory that can hit insane speeds like gddr5 can. the next gen Nvidia card may stick to the 512bit bus though and just use some lower cost slower speed gddr5 intead of the fastest speeds that will be available at the time.

GDDr5 or 512bit with gddr3. It does the same. I think that's what Sickbeast was implying.

well he didnt mention anything but the buswidth. so many people see 128 or 256bit and think its not enough but they rarely think about the actual speed of the memory. just like the 4770 is decent on a 128bit bus with 3200 memory.

 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
Originally posted by: toyota
Originally posted by: SickBeast
It only makes sense that current graphics cards are being held back by a 256-bit interface. Even the 9700Pro was 256-bit, and that was in 2002! It is probably time for a 512-bit bus at this point.

you do realize how slow the actual memory was back then dont you? there is no need for 512bit bus when you have memory that can hit insane speeds like gddr5 can. the next gen Nvidia card may stick to the 512bit bus though and just use some lower cost slower speed gddr5 intead of the fastest speeds that will be available at the time.

Point taken, however NV has still not adopted GDDR5 and there are cases where their cards are indeed bottlenecked. There are also AMD cards that are as well (4850, plus a few others).

Actually, I'm pretty sure that the entire reason why the 4850 is slower than the 4870 is a memory bandwidth bottleneck. The performance differential there should give you an idea as to how a GTS250 would perform with a 512-bit bus or else GDDR5 memory, in terms of a percentage.
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: SickBeast
Originally posted by: toyota
Originally posted by: SickBeast
It only makes sense that current graphics cards are being held back by a 256-bit interface. Even the 9700Pro was 256-bit, and that was in 2002! It is probably time for a 512-bit bus at this point.

you do realize how slow the actual memory was back then dont you? there is no need for 512bit bus when you have memory that can hit insane speeds like gddr5 can. the next gen Nvidia card may stick to the 512bit bus though and just use some lower cost slower speed gddr5 intead of the fastest speeds that will be available at the time.

Point taken, however NV has still not adopted GDDR5 and there are cases where their cards are indeed bottlenecked. There are also AMD cards that are as well (4850, plus a few others).

Actually, I'm pretty sure that the entire reason why the 4850 is slower than the 4870 is a memory bandwidth bottleneck. The performance differential there should give you an idea as to how a GTS250 would perform with a 512-bit bus or else GDDR5 memory, in terms of a percentage.

so what Nvidia cards are bottlenecked because they dont have gddr5? also the gts250 would see little to no benefit with a 512bit bus.

the MAIN reason the 4870 is faster than a 4850 is the clockspeeds. the 4870 has almost double the memory bandwidth but when they run at the same core clocks there is only a 10-15% performance difference. did you really think that no one had ever thought of that before?

are you feeling okay tonight because I really thought you were more knowledgeable than this?
 

SickBeast

Lifer
Jul 21, 2000
14,377
19
81
10-15% is still a bottleneck and that is what I was getting at.

Pretty much every midrange NV card since the 8800GT was released is hindered by a 256-bit bus to one extent or another (if it has one of course).

At the same clock speeds, the 4870 will be faster than a 4850, especially at higher resolutions.

I'm feeling fine tonight. You seem to be ready to have a nasty argument. Enjoy having it with someone else! :moon:
 

toyota

Lifer
Apr 15, 2001
12,957
1
0
Originally posted by: SickBeast
10-15% is still a bottleneck and that is what I was getting at.

Pretty much every midrange NV card since the 8800GT was released is hindered by a 256-bit bus to one extent or another (if it has one of course).

At the same clock speeds, the 4870 will be faster than a 4850, especially at higher resolutions.

I'm feeling fine tonight. You seem to be ready to have a nasty argument. Enjoy having it with someone else! :moon:

no argument here but you do keep changing your wording. first it was the "entire reason" the 4870 was faster than the 4850 but now its just small bottleneck that you are referring too. and no the 256bit bus hasnt really held back any of the G92 cards. okay maybe a very little bit on the but certainly not enough to matter. the 9800gtx+ 1gb or gts250 1gb do just fine against and usually easily beat the 8800gtx which had more memory bandwidth. of course the core and shader clocks are much higher on the 9800gtx+ 1gb and/or gts250 1gb so its not a perfect comparison.
 

AzN

Banned
Nov 26, 2001
4,112
2
0
Originally posted by: SickBeast
Actually, I'm pretty sure that the entire reason why the 4850 is slower than the 4870 is a memory bandwidth bottleneck. The performance differential there should give you an idea as to how a GTS250 would perform with a 512-bit bus or else GDDR5 memory, in terms of a percentage.

Obviously when you consider they are the exact same chip just using faster memory.

GTS250 in other hand has more texture fillrate than GTX260 216SP. Hell it might even be a little faster depending on the games of course.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
It's all just compensation with core and shader clocks.
Anything can be as fast or slow as another as long as the clocks can be manipulated. It's all relative. Create a 9770 card using a G92 with 1GB 128-bit GDDR3, and clock the memory as high as possible, the clock the core to 1100 and shaders to 3000. Voila! You'll have a 4770. ;)
 

BFG10K

Lifer
Aug 14, 2000
22,477
2,399
126
Originally posted by: Azn

Exactly my point. Those results are for 8800ultra or GTX275 that is bandwidth happy. Again all cards do not behave like this as it has different configurations. G92 is a different beast with smaller ROP, bandwidth, more texture fill.
What you call ?bandwidth happy? is actually the norm for DX10 parts because they generally have enough bandwidth so that it?s not the primary limiting factor. In essence you agree with me, but try to word it in a way so that it appears you disagree.

Again I?ll ask whether you consider the 4770 to be bandwidth happy? It must be according to your definition because it?s faster than the 4830 while experiencing a reduction in bandwidth.

I've done tests on my G92 8800gts and the same can't be said about this card. Raising core or shader had very little impact while increasing memory clocks had bigger impact. I went through all of this with Chizow. There should be a thread here somewhere with benchmark numbers if you search.
While I?m not disputing the accuracy of your figures, they?re clearly an outlier. This is very obvious by the fact that the 8800 GTS 640 MB has more bandwidth that the 8800 GT, but it?s also slower overall, even with AA. If the 8800 GT was primarily limited by bandwidth, there?s no way it could be faster:

http://www.behardware.com/arti...a-geforce-8800-gt.html

Back to my original point. Fillrate is hindered by memory subsystems. How much? I do not know the exact numbers because there are too many variables and different for each card. If a card can push 10000 mpixel/s it might hinder it by 20%, 25% or whatever by the bandwidth... Pixel fill has everything to do with AA as you already know. How fast or efficiently it gets there is controlled by the memory sub system.
You appear to be typing responses that have nothing to do with what you quoted. Let?s go back to the beginning:

I stated the 4850 isn?t constrained by bandwidth because it beats the Ultra with AA despite having around half the bandwidth, and also that it?s about three times faster than the 3870 with AA, with less bandwidth again. Therefore, the 4850 is not hamstrung by bandwidth.

You then posted a slide showing ROP resolve being doubled and told us that is why the card does so well with AA.

I acknowledged this but then pointed out since improvements in ROPs are dramatically raising AA performance, this means bandwidth isn?t the limiting factor given it?s decreased from either the 3870 or the 8800 Ultra, yet the card is still faster with AA.

So again I?ll ask how an improvement to ROPs can show such performance gains with AA if the 4850 is limited by bandwidth like you claim? How can the reduction of bandwidth ? the aspect you claim the card is primarily limited by and influences AA performance - increase AA performance?

It?s a very simple question, so please address it instead of typing multiple sentences of irrelevant rhetoric. Thanks.

So if the 4850 can push 20000 mpixels/s while doing AA it might be hindered by memory subsystem by 20% while Ultra with 14688 mpixels/s might not be hindered at all because it has enough bandwidth to cover all the fillrate going in and out from the GPU and back to memory sub-system. Same with 4870. That card has more than enough bandwidth for all the fillrate. Perhaps 50% of the bandwidth sitting idle doing nothing most of the time much like it did with 2900xt.
Why are you talking about made up numbers when we have actual benchmarks proving you wrong?

There are many techreport articles that discuss this issue.

http://techreport.com/articles.x/14524/5

Performance in the single-textured fill rate test tends to track more closely with memory bandwidth than with peak theoretical pixel fill rates, which are well beyond what the graphics cards achieve.
Oh yes, the amazing 3DMark, which told us the 2900XT was faster than the 8800 GTX. Oh wait, that?s the complete opposite of reality when running games. Much like your theoretical figures which you made up.

I implied your ultra was bandwidth happy which you did most of your testings on and even the the new GTX series which follow ultra's tradition.
You have no idea what tests I?ve done given I haven?t released all of the results to the public. Furthermore there?s a plethora of tests I can link to that were done by other reviewers that back my claims.

The problem is that you don?t acknowledge such a broad range of tests disproving you but instead cling to your own tests as the sole form of evidence. Again, my tests simply confirm what other reviewers are saying so that makes your tests the outlier, not mine.

4850 is pretty much an efficient card although it could probably use another 25% more bandwidth. 4870 in the other hand is not.
How do you know it could use 25% more memory? Where did this figure come from?

Again it?s not hard; just overclock the 4850?s core and witness an almost linear performance increase without touching the memory at all. Heck, ATi have done it for you with the 4870 where the performance gain in games basically matches the core performance improvement, thereby proving most of the extra bandwidth over the 4850 is simply not needed.

Then again 4850 beats 4770 even with AA and lower pixel fillrate.

Not only 4830 is clocked much lower than 4770 it's shader clocks are much lower as well. Again back to my explanation above 4830 might be hindered by bandwidth say 5% while 4770 might be hindered by bandwidth by 10% and the numbers still favor 4770. Efficiency is key.
The 4850 beating the 4770 could be due to lots of reasons but you certainly can?t automatically infer it?s from more bandwidth. Actually the three cards slot in the order of their shader performance which is something similar with the G80 and G92 line.

OTOH after reducing the bandwidth on the 4770 it?s still faster than the 4830 with more bandwidth, you can safely conclude that the 4770 isn?t primarily limited by bandwidth.
 

BFG10K

Lifer
Aug 14, 2000
22,477
2,399
126
Originally posted by: SickBeast

It only makes sense that current graphics cards are being held back by a 256-bit interface.
No, they really aren't. Again there are numerous examples of DX10 parts with less bandwidth beating other DX10 parts with more bandwidth because they have higher processing capability (e.g. 4850 vs 3870, 8800 GT vs 8800 GTS 640, 4770 vs 4830, etc).

Actually, I'm pretty sure that the entire reason why the 4850 is slower than the 4870 is a memory bandwidth bottleneck.
How do you figure that? The 4870 has a 20% higher core clock and around double the bandwidth of a 4850, but it?s only around 20%-30% faster in most situations. If memory bandwidth was the primary limitation then we should see much higher performance gains that that.

Again it?s not hard to test; just overclock the 4850?s core without touching the memory and witness an almost linear performance increase. Now do the same with memory and you?ll see very little performance change.

The performance differential there should give you an idea as to how a GTS250 would perform with a 512-bit bus or else GDDR5 memory, in terms of a percentage.
Based on current and past trends with DX10 parts, it would likely add very little performance.
 

Denithor

Diamond Member
Apr 11, 2004
6,300
23
81
Originally posted by: Azn
G92 is not less powerful. With equal bandwidth as GTX260 it would easily be as fast as GTX260.

And in contrast, just clock the core and shaders on a GTX260 to 738/18xx. Raped ape

This is besides the point. You can clock anything and rape ape.

The original point of this thread was that a 9800GTX+ (128SP/738/1836) can nearly match a GTX 260/216 (216SP/576/1242) in some games. Think about that for a second. The 9800GTX+ is the fastest-clocked G92 core by far and it can barely match a much much lower clocked GTX 260/216 in certain games. Keys was saying if you set the core/shader clockspeeds equal (downclock 9800 or upclock 260) the higher SP count GTX 260 would rape the G92 card. Bandwidth has nothing to do with this - simply the amount of shader power available.

RE bandwidth - I see bandwidth like the hose on a pump. If you use a hose too small the pump isn't going to be able to do its job efficiently and flow rates will be lower than the pump is capable of producing (remember the old 7600GT anyone? severely bandwidth limited). Using a hose too big for pump capacity won't improve performance by itself - the extra space goes to waste because the pump cannot fill the pipeline (think 2900XT here - twice the bandwidth of any other card of its generation and still suck ass performance).

For memory bandwidth more is generally better but you obtain diminishing returns at some point. Once you have enough to keep the core(s) saturated and working constantly you don't need any more.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,001
126
Originally posted by: Denithor
Originally posted by: Azn
G92 is not less powerful. With equal bandwidth as GTX260 it would easily be as fast as GTX260.

And in contrast, just clock the core and shaders on a GTX260 to 738/18xx. Raped ape

This is besides the point. You can clock anything and rape ape.

The original point of this thread was that a 9800GTX+ (128SP/738/1836) can nearly match a GTX 260/216 (216SP/576/1242) in some games. Think about that for a second. The 9800GTX+ is the fastest-clocked G92 core by far and it can barely match a much much lower clocked GTX 260/216 in certain games. Keys was saying if you set the core/shader clockspeeds equal (downclock 9800 or upclock 260) the higher SP count GTX 260 would rape the G92 card. Bandwidth has nothing to do with this - simply the amount of shader power available.

I think you can look at that from a few different angles though. On one hand you can see the potential of the GTX260, it's much slower clocked but obviously a faster part overall than the GTS250. On the other hand it's pretty impressive that the G92, which is pretty much based off the pretty old now G80 used in the 8800 cards can still get very close to Nvidia's current cards. If you bought an 8800GTX back around it's launch, or a 9800GTX/GTX+ you really got a decent card for the money, with a little staying power. Obviously the 8800GTX is dated now, but you could still game on it and play most anything I would think.

I have a feeling Nvidia initially wanted higher clocks than what the GTX2x0 cards launched at, but they just couldn't get there, or atleast not get enough of those parts to have decent yields.