fermi.. 320bit interface.. again really?

toyota · Mar 20, 2010

Schmide said:
The numbers just don't add up to it being turned off or at the very least recovering the bus lines needed for ecc.

Lets assume it uses the same 4.8gbs ddr5 that cypress uses. Cypress gets 153.6GBs with a 256bit bus, this would mean that fermi should get around 240GBs with a 384bit bus, but all the specs we're getting so far show a 177.4GBs data rate. This equates to a bus somewhere in the 290bit range or 12% more than cypress. (Which would also parity with 25% loss from bus lines dedicated to hamming code)

We'll have to see when the product hits the streets.

did you not see the final spec? the gtx480 is using gddr5 at just over 3600mhz so theres your 177.

cbn · Mar 20, 2010

NoQuarter said:
Actually one of the benefits of tessellation is lowered memory bandwidth consumption. It allows you to create geometry without having to store or read the much more complex models in memory, which saves a lot of memory bandwidth.

So a low poly base model tessellated out to a high poly model should render faster than a high poly base model (if it were equal to the tessellated version) if bandwidth is limited. May be one of the reasons ATI felt comfortable with the lower memory bandwidth on the 5770? Who knows.

Thanks for the very good explanation, but how widely used is high poly model in games today. Based on the Unigine heaven benchmark I would say most games use low poly models?

Does Tessellating low poly base model into high poly model use more memory bandwidth than using low poly base model by itself?

Lonyo · Mar 20, 2010

Computer Bottleneck said:
Thanks for the very good explanation, but how widely used is high poly model in games today. Based on the Unigine heaven benchmark I would say most games use low poly models?

Does Tessellating low poly base model into high poly model use more memory bandwidth than using low poly base model by itself?

http://www.anandtech.com/showdoc.aspx?i=1476&p=4

Once again, since the tessellation process occurs in the TRUFORM T&L segment of the ATI chip, no performance is lost. Also, since the texture information for this new, super triangle is exactly the same texture information needed for the original triangle, no additional information needs to be passed over the memory bus.

Might be slightly different now though, but for an indication of how different tessellation is vs a high detail model:

http://www.anandtech.com/video/showdoc.aspx?i=2988&p=12

RussianSensation · Mar 20, 2010

Computer Bottleneck said:
In fact, HD5870's memory bandwidth to computational power ratio is actually worse than HD4850's.

That's because 5870's architecture is actually more GPU limited than memory bandwidth limited. In other words, you may get 3% performance improvement from a 10% memory bandwidth increase but a 10% gpu clock increase will likely translate into a 7-8% performance increase. Ever since I remember, most graphics cards benefit significantly more from a GPU core clock increase than memory bandwidth. I overclocked my 4890's memory from 4000 mhz to 4600 and it didn't do much.

cbn · Mar 20, 2010

RussianSensation said:
In other words, you may get 3% performance improvement from a 10% memory bandwidth increase but a 10% gpu clock increase will likely translate into a 7-8% performance increase.

When BFG10K did his HD5770 tests he got 11% decrease in performance with a 20% core speed reduction. With respect to dropping memory bandwidth by 20% he lost slightly more than 8%.

So HD5870 is more dependent on memory than you think.....and that is going by average frame rates.

cbn · Mar 20, 2010

Lonyo said:
http://www.anandtech.com/showdoc.aspx?i=1476&p=4

Might be slightly different now though, but for an indication of how different tessellation is vs a high detail model:

http://www.anandtech.com/video/showdoc.aspx?i=2988&p=12

I am sure I am misreading what you have given me. (I have no IT background)

No doubt Tessllation is more efficient right? Otherwise why would they be using it?

However, My question pertains to how much memory bandwidth it consumes over a lower quality standard?

In the unigine benchmark does switching to tessellation increase memory bandwidth consumption? Yes, I am aware Tessellation would save memory bandwidth if the Unigine benchmark could somehow duplicate the tessellated image quality using other non-tessellated methods.

RussianSensation · Mar 20, 2010

Computer Bottleneck said:
When BFG10K did his HD5770 tests he got 11% decrease in performance with a 20% core speed reduction. With respect to dropping memory bandwidth by 20% he lost slightly more than 8%.

So HD5870 is more dependent on memory than you think.....and that is going by average frame rates.

I was not discussing 5770 which has a mere 76.8 GB/sec bandwidth. But you can use this as an example if you want to show that 5xxx series is not bandwidth bottlenecked. 5770 has the almost the same specs as 4870 other than GPU clocks and memory bandwidth:

Both have 40 texture units
Both have 16 ROPs
Both have 800 SPs
Memory bandwidth of has 76.8 GB/sec vs. 115.2 GB/sec for 4870 (+50% more)
GPU clocks 850 (+13%) vs. 750 for 4870

So a 13% gpu increase is enough to overcome a 50% memory bandwidth disadvantage as 5770 is nearly as fast as a 4870.

But let's get back to 5870. Check out Crysis:
http://img9.imageshack.us/img9/8168/5870overclocking.jpg

850/1200 (stock) = 37.225 FPS
900 (+5.8%) / 1230 (+2.5%) = 39.055 FPS (+4.9% increase)
900 (+5.8%) / 1300 (+8.33%) = 39.33 FPS (+5.6% increase)

39.33 / 39.055 = + 0.7% increase with 1300/1230 = +5.7% increase in memory bandwidth

You get a 0.7% increase in frames for a 5.7% increase in memory bandwidth and same 5.8% gpu clock increase.
You get a 4.9% increase in frames over stock for a 2.5% increase in memory bandwidth and same 5.8% gpu clock increase.

This implies that 5870 is more gpu clock speed limited.

Also, lowering memory bandwidth of 5770 from 76.8 to 61 (20% less) may actually make it memory bandwidth limited. This testing methodology is flawed, however. It may be that above 75 GB/sec, the memory bandwidth is sufficient, but at 60 you hit a massive limitation. When you test memory bandwidth, you should start from stock and increase the clock speed since you aren't testing for the lowest memory bandwidth required for a card to tank in performance.

cbn · Mar 20, 2010

RussianSensation said:
I was not discussing 5770 which has a mere 76.8 GB/sec bandwidth. But you can use this as an example if you want to show that 5xxx series is not bandwidth bottlenecked.

I used HD5770 as an example because it is exactly half a HD5870. Half the shaders/Half the bandwidth...therefore it has the same computational power to memory bandwidth ratio.

Voo · Mar 20, 2010

My understanding is that tesselation itself doesn't use any bandwidth at all, because all the computations happen in the GPU and you just transmit the standard modell over the bus.

So you can either use a lower polygon count for your models (e.g. you need to transfer less data over the bus) or you can use the same models and just let them look better with tesselation on. Because a rather large user base doesn't have and won't have a DX11 capable GPU you can't lower the polygon count for your models too much, because they still have to look good without tesselation. So probably more of the second approach.

cbn · Mar 20, 2010

RussianSensation said:
But let's get back to 5870. Check out Crysis:
http://img9.imageshack.us/img9/8168/5870overclocking.jpg

850/1200 (stock) = 37.225 FPS
900 (+5.8&#37 / 1230 (+2.5%) = 39.055 FPS (+4.9% increase)
900 (+5.8%) / 1300 (+8.33%) = 39.33 FPS (+5.6% increase)

39.33 / 39.055 = + 0.7% increase with 1300/1230 = +5.7% increase in memory bandwidth

You get a 0.7% increase in frames for a 5.7% increase in memory bandwidth and same 5.8% gpu clock increase.
You get a 4.9% increase in frames over stock for a 2.5% increase in memory bandwidth and same 5.8% gpu clock increase.

This implies that 5870 is more gpu clock speed limited.

Yeah, but these example you are using involve overclocking the memory.

Evergreen has a new type of error correcting mechanism that keeps re-transmitting the same signals when an error occurs. The major difference between this and the old system used on HD48xx is that it can result in higher looking overclocks with little gain in performance.

Let me see if I can find the article page where this was discussed. EDIT: Here it is--->http://www.anandtech.com/video/showdoc.aspx?i=3643&p=12

From Anantech HD5870 article said:
Like the changes to VRM monitoring, the significant ramifications of this will be felt with overclocking. Overclocking attempts that previously would push the bus too hard and lead to errors now will no longer do so, making higher overclocks possible. However this is a bit of an illusion as retransmissions reduce performance. The scenario laid out to us by AMD is that overclockers who have reached the limits of their card’s memory bus will now see the impact of this as a drop in performance due to retransmissions, rather than crashing or graphical corruption. This means assessing an overclock will require monitoring the performance of a card, along with continuing to look for traditional signs as those will still indicate problems in memory chips and the memory controller itself.

P.S. That very reason I gave you is why some sites began testing memory bandwidth performance changes from an under-clocking rather than overclocking perspective.

cbn · Mar 20, 2010

Voo said:
My understanding is that tesselation itself doesn't use any bandwidth at all, because all the computations happen in the GPU and you just transmit the standard modell over the bus.

So you can either use a lower polygon count for your models (e.g. you need to transfer less data over the bus) or you can use the same models and just let them look better with tesselation on. Because a rather large user base doesn't have and won't have a DX11 capable GPU you can't lower the polygon count for your models too much, because they still have to look good without tesselation. So probably more of the second approach.

Thank you.

If this is true, then DX11 doesn't result in lower memory bandwidth requirements (for all practical purposes).

RussianSensation · Mar 20, 2010

Computer Bottleneck said:
P.S. That very reason I gave you is why some sites began testing memory bandwidth performance changes from an under-clocking rather than overclocking perspective.

Ya I know what you are saying, but this still doesn't prove that having more bandwidth is more beneficial for 5870 than having faster gpu clock speeds. Here is a more extensive analysis comparing a 9% change in each for a variety of games:

GPU = 930 (+9.4% overclock)
Memory = 1318 (+9.8% overclock)
http://firingsquad.com/hardware/ati_radeon_5870_overclocking/page3.asp

I won't summarize the results for each game since Firingsquad already does that at the bottom of each graph. Here is the final summary:

"Based on the benchmarks we just saw, it’s pretty safe to say that when OC’ing the Radeon 5870 you’ll get the best gains from GPU rather than memory OC’ing. We setup both components to be OC’ed by the same 9% ratio, and in most cases the GPU OC scenario came out on top in performance, generally by about 5%. Only in Batman: Arkham Asylum did we finally see memory overclocking deliver greater gains than GPU OC’ing."

This is precisely why ATI chose to go with the 'performance inferior' but cheaper to manufacture 256-bit GDDR5 setup for the current generation. I am sure HD6000 series will again bump the bandwidth. But right now, it's not needed yet.

NoQuarter · Mar 20, 2010

Computer Bottleneck said:
If this is true, then DX11 doesn't result in lower memory bandwidth requirements.

Right, it doesn't yet in current games, but eventually will. Right now even games with tessellation use high quality models and then tessellate out to add just a little more quality, the end goal is to use low quality models to save lots of memory bandwidth, and tessellate it out later since hardware is already having a tough time keeping up with memory demands, along with using DX11's new texture compression methods to preserve more bandwidth..

This won't really happen till DX11 is prominent.

cbn · Mar 20, 2010

RussianSensation said:
Ya I know what you are saying, but this still doesn't prove that having more bandwidth is more beneficial for 5870 than having faster gpu clock speeds. Here is a more extensive analysis comparing a 9% change in each for a variety of games:

GPU = 930 (+9.4% overclock)
Memory = 1318 (+9.8% overclock)
http://firingsquad.com/hardware/ati_radeon_5870_overclocking/page3.asp

I won't summarize the results for each game since Firingsquad already does that at the bottom of each graph. Here is the final summary:

"Based on the benchmarks we just saw, it’s pretty safe to say that when OC’ing the Radeon 5870 you’ll get the best gains from GPU rather than memory OC’ing. We setup both components to be OC’ed by the same 9% ratio, and in most cases the GPU OC scenario came out on top in performance, generally by about 5%. Only in Batman: Arkham Asylum did we finally see memory overclocking deliver greater gains than GPU OC’ing."

Yes, but did you see the edit in post #35 (with reference to the Anandtech article about the error correcting mechanism)?

Because of this I am not sure these memory overclocking results you have are accurate.

digitaldurandal · Mar 20, 2010

Xarick said:
I own and 8800gts 320. With it's 320bit interface. I am very surprised Nvidia is doing this again. It didn't take long for Nvidia to drop their 320 interface and immediately drop back to the standard 256 in the 8800gt. Which by the way performed far better than the gts versions. Futhermore no games or programs have ever recognized the 380 ram on this. All of them report it as 256.

I am wondering what the thinking is here? Will the next refresh again seem them jumping back to conventional settings? Probably. History repeats itself.

The later 8800gt was a revision chip. It is not right to compare it to the first 8800gts. Also the 8800gt did not perform better than the 8800gts 640mb 320bit with AA and AF cranked, only the 320mb version was toasted, I wonder if you meant MB and not bit bandwidth in your post. The first version of both cards there is no contest.

cbn · Mar 20, 2010

NoQuarter said:
the end goal is to use low quality models to save lots of memory bandwidth, and tessellate it out later since hardware is already having a tough time keeping up with memory demands along with using DX11's new texture compression methods.

This won't really happen till DX11 is prominent.

That makes sense, but this implies tessellation doesn't really add much to image quality then?

However, I am sure users of AMD APUs would be happy if low quality models being tessellated out became the standard. Llano APU has considerable GPU power but shares system memory with the CPU.

Lonyo · Mar 20, 2010

Computer Bottleneck said:
Yes, but did you see the edit in post #35 (with reference to the Anandtech article about the error correcting mechanism)?

Because of this I am not sure these memory overclocking results you have are accurate.

Best way to test would be by underclocking, I agree. Use stock clocks and vary mem down and core up and/or down, just to be on the safe side.

zebrax2 · Mar 20, 2010

There was a big discussion some time ago about the 5xxx series being memory starved in this forum and IIRC majority of the evidence points to it not being memory starved. I believe BFGK10 also have an article about that in which he used a 5770?

RussianSensation · Mar 20, 2010

If I did my math correctly, based on VRZone's specs of GTX 470 and GTX 480, GTX 480 memory is 1848MHz, which is Dual Data Rate. Therefore,

GTX480:
924MHz clock x 4 (DDR doubled) x 384 bit bus / 8 bits (1 byte) = 177408 Mb/sec
177408Mb / 1000 = 177.4Gb/sec

Therefore, NV's top of the line card will have only a 15% advantage over 5870's 153.6GB/sec. If memory bandwidth was such a crucial factor this generation, NV would have likely used DDR 4800 x 384 = 230GB/sec (Unless there is some limitation or am I missing something?)

cbn · Mar 20, 2010

RussianSensation said:
If I did my math correctly, based on VRZone's specs of GTX 470 and GTX 480, GTX 480 memory is 1848MHz, which is Dual Data Rate. Therefore,

GTX480:
924MHz clock x 4 (DDR doubled) x 384 bit bus / 8 bits (1 byte) = 177408 Mb/sec
177408Mb / 1000 = 177.4Gb/sec

Therefore, NV's top of the line card will have only a 15% advantage over 5870's 153.6GB/sec. If memory bandwidth was such a crucial factor this generation, NV would have likely used DDR 4800 x 384 = 230GB/sec (Unless there is some limitation or am I missing something?)

It could also be the absolute number is what matters the most rather the GPU/MB ratio.

If this is true, then using HD5770 as an example wouldn't be entirely analogous to what HD5870 requires to render X resolution.

What we need are some under clocking results using HD5870.

NoQuarter · Mar 20, 2010

Computer Bottleneck said:
That makes sense, but this implies tessellation doesn't really add much to image quality then?

I think that rests a lot on the art team and how many polys they use for their models and how well they can use tessellation to expand out. I think with good use of the technique you could end up with better performance and IQ at the same time.

cbn · Mar 20, 2010

http://alienbabeltech.com/main/?p=12474&page=2

I was just looking at the HD5770 results again.

With Crysis warhead (which I assume is a high quality model) decreasing memory bandwidth had a greater effect on performance than decreasing the core. Same with Bioshock.

This begs the question: What sort of quality models will be the basis for games of the future? Does the GPU to MB ratio matter? Or not?

BFG10K · Mar 21, 2010

Memory bus width is only one part of the equation. The fact that the 8800 GTS also had a 320 bit bus is meaningless by itself when you factor in GDDR5 with the GF100.

As for bandwidth in general, the 5870 is primarily limited by its core, not by bandwidth. Weve seen this proven by Firingsquad and B3D. By extension, the 5770 is the same because everything is cut in half in perfect proportion to that of the 5870. My 5770 bottleneck investigation (linked above) confirms this.

According to the leaked specs, the GTX480 has almost double the shading power of the GTX285, but only 11% more memory bandwidth. Like ATi, it appears nVidia have figured out where the real bottleneck is in performance, and its clearly not memory bandwidth.

As for bandwidth use in general, you simply cannot infer its just the model detail responsible for it. The situation is far more complicated than that.

cbn · Mar 21, 2010

BFG10K said:
As for bandwidth in general, the 5870 is primarily limited by its core, not by bandwidth. We’ve seen this proven by Firingsquad and B3D.

Those sites did tests involving under clocking the memory and core?

BFG10K said:
According to the leaked specs, the GTX480 has almost double the shading power of the GTX285, but only 11% more memory bandwidth. Like ATi, it appears nVidia have figured out where the real bottleneck is in performance, and it’s clearly not memory bandwidth.

I am having a hard time understanding why the memory is underclocked on GTX 480?

BFG10K · Mar 21, 2010

They both overclocked IIRC.

I underclocked, and avoiding retransmissions with GDDR5 is one of the main reasons why I did so.

fermi.. 320bit interface.. again really?

Lifer

Lifer

Lifer

Elite Member

Lifer

Lifer

Elite Member

Lifer

Golden Member

Lifer

Lifer

Elite Member

Golden Member

Lifer

Golden Member

Lifer

Lifer

Senior member

Elite Member

Lifer

Golden Member

Lifer

Lifer

Lifer

Lifer