fermi.. 320bit interface.. again really?

cbn · Mar 21, 2010

http://www.beyond3d.com/content/reviews/54/6

In this article, it looks like underclocking on both the core and memory was done.

"be aware that Cypress is a cheeky and tricky beast when it comes to its memory interface, due to its re-training capacity, which probably tightens timings for lower memory clocks, and EDC, which may interfere at higher clocks, where transmission errors may occur, forcing re-transmission and thus lowering performance. We'd expect EDC to be a non-issue in our case, since we're not going out of spec with the clocking, but retraining will make the waters a tad murkier...sadly, that's one parameter we can't mess with for the time being."

cbn · Mar 21, 2010

RussianSensation said:
Therefore, NV's top of the line card will have only a 15% advantage over 5870's 153.6GB/sec. If memory bandwidth was such a crucial factor this generation, NV would have likely used DDR 4800 x 384 = 230GB/sec (Unless there is some limitation or am I missing something?)

Maybe Nvidia wanted a wider bus coupled to lower clocked GDDR5 so it could achieve tighter timings for the bandwidth target?

The Beyond3D article I linked in post #51 got me thinking about this.

RussianSensation · Mar 21, 2010

I wonder what the impact on power consumption at load would be running 1.5GB of GDDR5 at 4800 mhz vs. 3700 mhz (GTX 480). I would imagine they can run the ram at a lower voltage when using 3700 speeds.

cbn · Mar 21, 2010

RussianSensation said:
If I did my math correctly, based on VRZone's specs of GTX 470 and GTX 480, GTX 480 memory is 1848MHz, which is Dual Data Rate. Therefore,

http://vr-zone.com/articles/nvidia-geforce-gtx-480-final-specs--pricing-revealed/8635.html

According to this 470 GTX's memory is clocked even slower than 480 GTXs.

dookulooku · Mar 21, 2010

BFG10K said:
According to the leaked specs, the GTX480 has almost double the shading power of the GTX285, but only 11% more memory bandwidth. Like ATi, it appears nVidia have figured out where the real bottleneck is in performance, and its clearly not memory bandwidth.

As for bandwidth use in general, you simply cannot infer its just the model detail responsible for it. The situation is far more complicated than that.

Correct. Increasing memory bandwidth is just a brute-force approach to solving the efficiency problem.

Think about it. Even if each frame had 1 GB of textures, geometry, and frame buffer data to transfer, 128 GB/s would give you 128 frames/second. We're not getting 128 frames/sec at the highest settings in many games, so clearly that bandwidth, when used optimally, it not needed.

There are usually improvements from one generation to the next generation that improve efficiency, such as:

1) Improved/larger caches: Increased hit rate reduces unnecessary reads/writes to the main video memory.

2) Improved compression: This increases effective bandwidth without actually increasing bandwidth.

3) Improved z-culling: The better you are at getting rid of things that don't appear on the screen, the less bandwidth is needed.

VirtualLarry · Mar 21, 2010

Computer Bottleneck said:
In fact, HD5870's memory bandwidth to computational power ratio is actually worse than HD4850's.

Perhaps this is the reason that they do not perform 2x as fast as the 4xxx series.

SlowSpyder · Mar 21, 2010

It's not that there isn't more performance that can be squeezed out by increasing the memory bandwidth, because there are some improvements. It's that the current cards are pretty well balanced, they have enough memory bandwidth in general to perform well. Increasing the core clockspeed gives you greater gains then increasing the memory bandwidth. The 4870 had an overabundance of memory bandwidth, so did the GTX2xx cards most likely. That's why Nvidia and AMD feel they don't have to use faster memory than they chose.

RussianSensation · Mar 21, 2010

VirtualLarry said:
Perhaps this is the reason that they do not perform 2x as fast as the 4xxx series.

I thought so too 6 months ago when 5870 launched. I glanced at AnandTech's review last night and more or less 5870 was outperforming 4890 by as much as 45%. But, once more intensive games are going to come out with superior textures and more complex shaders in the scenes, the current generation will begin to pull away in line with specs.

Metro 2033 - http://www.pcgameshardware.de/aid,7...tX-11-und-GPU-PhysX/Action-Spiel/Test/?page=2

1920x1200
5870 = 33 min (+94%) / 41.4 avg (+90%)
4890 = 17 min / 21.8 avg

It is actually more than 2x faster than 4870 series. If 5870 was memory bandwidth limited, we wouldn't see 2x the performance increase over 4xxx series, but we do.

I think with newer games, 5870 and GTX470/480 will deliver close to 80-90% performance increase over previous gen.

evolucion8 · Mar 21, 2010

Computer Bottleneck said:
Even in DX10.1 hd5770 (exactly half a hd5870) is already showing memory bandwidth limitations and this was discovered looking purely at average frame rates. See BFG10K's tests over at Alien babel tech.

P.S. Back in the original Fermi white paper Nvidia mentioned increasing the bus to 512 bit during the optical shrink to 28nm. So I am already thinking they realize they are slightly short on bandwidth in certain circumstances. If this is true, then surely ATI is even more bandwidth starved under the same circumstances.

I think you confused the article;

http://alienbabeltech.com/main/?p=12474&all=1

Underclocking the core 20% results in a 11.20% performance loss overall, while doing the same to the memory results in a smaller 8.46% performance hit. This makes the 5770 a reasonably balanced card, with a small bias towards its core. This is unlike 8800 Ultra for example, which showed a massive reliance on its core across the board.

Conclusion

The results generated today seem to disprove the commonly accepted idea that the 5770 is primarily held back by memory bandwidth. In actual fact ATi seems to have equipped the card with enough bandwidth to make it a reasonably balanced part overall.

Also based on the results generated today, if you’re trying to get more performance from your 5770, you should clock the core as high possible.

Computer Bottleneck said:
I remember back when folks said HD4850 was bandwidth limited with 256 bit and GDDR3. That part was only running 625 Mhz on 800 stream processors.

Now we are talking double the stream processors on Cypress running 36% faster (Bumping computational power to over 2.7 TFLOPs). Has memory bandwidth increased sufficiently to match that? No.

In fact, HD5870's memory bandwidth to computational power ratio is actually worse than HD4850's.

Actually isn't the bandwidth the issue, during the HD 5x00 development, the die got so big that they had to remove lots of optimization stuff in the GPU, and that's why an HD 5450 with 50MHz higher core clock, plus having the same bandwidth spec and RAM amount as the HD 4550, it still loosing against the HD 4550, or that the HD 5830 fails to outperform the HD 4890 which has 320 less stream processors. I can say for sure that the RV770 is faster in a per clock basis compared to any of the HD 5x00 family.

dookulooku said:
3) Improved z-culling: The better you are at getting rid of things that don't appear on the screen, the less bandwidth is needed.

Actually ATi has an agressive Z-Culling technology, even an nVidia representative admited it in an interview saying "our competitor has an agressive z culling approach" , I can't seem to find the article.

cbn · Mar 21, 2010

dookulooku said:
Correct. Increasing memory bandwidth is just a brute-force approach to solving the efficiency problem.

Think about it. Even if each frame had 1 GB of textures, geometry, and frame buffer data to transfer, 128 GB/s would give you 128 frames/second. We're not getting 128 frames/sec at the highest settings in many games, so clearly that bandwidth, when used optimally, it not needed.

There are usually improvements from one generation to the next generation that improve efficiency, such as:

1) Improved/larger caches: Increased hit rate reduces unnecessary reads/writes to the main video memory.

2) Improved compression: This increases effective bandwidth without actually increasing bandwidth.

3) Improved z-culling: The better you are at getting rid of things that don't appear on the screen, the less bandwidth is needed.

Thanks for this good explanation.

Does anyone know why Nvidia chose a wider bus combined with much slower memory? Improvement in Latency? Or something else?

Voo · Mar 21, 2010

Computer Bottleneck said:
Does anyone know why Nvidia chose a wider bus combined with much slower memory? Improvement in Latency? Or something else?

Lower TDP for one I'd assume

SHAQ · Mar 21, 2010

You all are saying the cards aren't bandwidth starved but what about with 8XQ or higher AA levels? It seems you all are describing situations with little or no AA. I imagine that is easy to do for Nvidia/ATI because I don't imagine many people use much AA at all especially if they have to force it through the drivers. Most consumers no doubt don't know how to do so unless it is in the game settings.

cbn · Mar 21, 2010

Voo said:
Lower TDP for one I'd assume

How many watts could be saved by undervolting the memory?

Or did Nvidia purposely use the wider bus because they thought latency was more important than saving manufacturing costs?

Search

fermi.. 320bit interface.. again really?

cbn

Lifer

cbn

Lifer

RussianSensation

Elite Member

cbn

Lifer

dookulooku

Member

VirtualLarry

No Lifer

SlowSpyder

Lifer

RussianSensation

Elite Member

evolucion8

Platinum Member

cbn

Lifer

Voo

Golden Member

SHAQ

Senior member

cbn

Lifer

TRENDING THREADS