fermi.. 320bit interface.. again really?

Xarick · Mar 20, 2010

I own and 8800gts 320. With it's 320bit interface. I am very surprised Nvidia is doing this again. It didn't take long for Nvidia to drop their 320 interface and immediately drop back to the standard 256 in the 8800gt. Which by the way performed far better than the gts versions. Futhermore no games or programs have ever recognized the 380 ram on this. All of them report it as 256.

I am wondering what the thinking is here? Will the next refresh again seem them jumping back to conventional settings? Probably. History repeats itself.

SunnyD · Mar 20, 2010

lol. Games/programs don't give a crap how much memory is actually on the card. They allocate based on need until the driver says "No more memory."

Fermi, if it does what it's supposed to do, will need all the memory bandwidth they can give it. 320/384-bit buses provide that.

Xarick · Mar 20, 2010

Heard those same arguments when the 8800gts was released. Which was why I bit.

Zillatech · Mar 20, 2010

Isn't it just a way to create more bandwidth? I mean, software doesn't actually have to be aware of it does it? Doesn't the hardware actually control everything behind the scenes?

Also, a 512-bit interface would also require too much circuitry and up the die size and power requirements even more. Maybe Nvidia feels that a 256-bit interface is not enough anymore. I think they were ahead of their time using it on the 8800 series and it cost them so they quickly reverted back to 256.

512-bit is the future but still not viable yet from what I understand.

crisium · Mar 20, 2010

320-bit is a way to provide 25% more bandwidth. But it also increases costs to manufacture the chip. So I too think it is ridiculous. Increasing memory bandwidth affects performance less than the core or shaders, but whatever. If Nvidia wants to manufacture a more expensive product to get a piecemeal performance boost they can.

And I totally expect the revision to have 256-bit, just like G92 did.

ATi's 5 highest bandwidths: 512-bit, 256-bit, 128-bit, 64-bit
Nvidia's 5 highest bwidths: 512-bit, 448-bit, 384-bit, 320-bit

448-bit, when 384 ain't cutting it but 512 would cut into your flagship product's performance. 0_o

It's Nvidia after all.

scooterlibby · Mar 20, 2010

Xarick said:
I own and 8800gts 320. With it's 320bit interface. I am very surprised Nvidia is doing this again. It didn't take long for Nvidia to drop their 320 interface and immediately drop back to the standard 256 in the 8800gt. Which by the way performed far better than the gts versions. Futhermore no games or programs have ever recognized the 380 ram on this. All of them report it as 256.

I am wondering what the thinking is here? Will the next refresh again seem them jumping back to conventional settings? Probably. History repeats itself.

Fermi uses GDDR5 not GDDR3 and there's a gig and a half on the 480, so it really is not comparable to the bandwidth on your 8880GTS 320.

SlowSpyder · Mar 20, 2010

It doesn't matter if it's a 64 bit or 1024 bit memory interface. All that matters is how much bandwidth the memory has to the GPU core. Faster memory vs. a wider path are just different ways to achieve a bandwidth goal.

It doesn't matter if you have 1GB of 4000MHz memory over a 128 bit connection or 2000MHz memory over a 256 bit connection, they will have the same bandwidth. Games don't have any thing to do with it, they don't recongize how much memory bandwidth the video memory has to the core, how fast it is, or how wide the path is.

A GPU core will need a certain amount of bandwidth to perform at it's potential, the more powerful the GPU core the more bandwidth it needs to the video memory so it can achieve that performance level. 256 bit, 320 bit, 384 bit, 512 bit don't matter, just the total bandwidth that is available to the GPU regardless of how it's achieved.

crisium · Mar 20, 2010

Who knows, we might see a 192-bit, 768MB DDR5 Fermi to compete with the 5770. Or maybe for the first time we'll see 160-bit or 224-bit! Anything is possible with Huang!

SunnyD · Mar 20, 2010

Xarick - if you were disappointed with your gts, the reason isn't the bus, it's the fact that you had what amounted to a castrated card having only 320MB of ram on it. I had a 640, and was quite happy with it for quite some time.

NoQuarter · Mar 20, 2010

The reason the 8800GT and 8800GTS 512MB were so much better than the 8800GTS 320MB and 640MB wasn't because of the memory bus but because they were G92 chips while the 320/640MB variants were G80 chips.

cbn · Mar 20, 2010

crisium said:
Increasing memory bandwidth affects performance less than the core or shaders, but whatever.

Where does higher memory bandwidth affect performance the most? Wouldn't it have the most effect on minimum frame rates?

cbn · Mar 20, 2010

Zillatech said:
Also, a 512-bit interface would also require too much circuitry and up the die size and power requirements even more.

How much of a die size increase would occur with a 512 bit bus?

Lonyo · Mar 20, 2010

Computer Bottleneck said:
How much of a die size increase would occur with a 512 bit bus?

Well, quite a lot at a guess.
Not exactly the same but... http://www.anandtech.com/video/showdoc.aspx?i=3151

Internally, the ring bus dropped from 1024-bit to 512-bit. This cut in bandwidth contributed to a significant drop in transistor count from R600's ~720M. RV670 is made up of 666M transistors, and this includes the addition of UVD hardware, some power saving features, the necessary additions for DX 10.1 and the normal performance tuning we would expect from another iteration of the architecture.

External also went from 512-bit to 256-bit.

But not necessarily a lot... http://www.anandtech.com/video/showdoc.aspx?i=3140&p=2

The G92 is fabbed on a 65nm process, and even though it has fewer SPs, less texturing power, and not as many ROPs as the G80, it's made up of more transistors (754M vs. 681M). This is partly due to the fact that G92 integrates the updated video processing engine (VP2), and the display engine that previously resided off chip. Now, all the display logic including TMDS hardware is integrated onto the GPU itself.

Although G92 increased the transistor count compared to the G80 despite being 256-bit bus instead of 384, although they also added other things, so you can't attribute anything particularly to the bus bittyness.

SlowSpyder · Mar 20, 2010

Computer Bottleneck said:
Where does higher memory bandwidth affect performance the most? Wouldn't it have the most effect on minimum frame rates?

Not necessarily. If the core already is getting all the memory bandwidth it needs to perform up to it's potential, then adding more isn't going to help games at all. I know you've mentioned a few times that you think Fermi will have better minimum frame rates then the 5870, it could if the 5870 didn't have enough memory bandwidth. But from what I've seen, it has plenty. You get far better performance increases by overclocking the GPU, not the memory.

Lonyo · Mar 20, 2010

SlowSpyder said:
Not necessarily. If the core already is getting all the memory bandwidth it needs to perform up to it's potential, then adding more isn't going to help games at all. I know you've mentioned a few times that you think Fermi will have better minimum frame rates then the 5870, it could if the 5870 didn't have enough memory bandwidth. But from what I've seen, it has plenty. You get far better performance increases by overclocking the GPU, not the memory.

Or you might just need more RAM if you are swapping data in and out because the VRAM is full, and that's causing frame rate drops.
There are many things that might improve minimum frame rate, and it'll likely depend on the game, the settings, the rest of the system, and the balance of a particular card.

cbn · Mar 20, 2010

SlowSpyder said:
Not necessarily. If the core already is getting all the memory bandwidth it needs to perform up to it's potential

Wouldn't memory bandwidth be stressed the most in scenes where lots of tessellation, explosions, action are occurring? If so I would expect it to have the most effect on minimum frame rates.

P.S. I realize Fermi has a different core and way of processing tessellation so I do not think its frame rate performance in heaven benchmark purely has to do with its bandwidth.

cbn · Mar 20, 2010

Lonyo said:
Or you might just need more RAM if you are swapping data in and out because the VRAM is full, and that's causing frame rate drops.
There are many things that might improve minimum frame rate, and it'll likely depend on the game, the settings, the rest of the system, and the balance of a particular card.

Yep, VRAM too.

In any event, the way I see things some component of the video card becomes the limiting factor when frame rate drops.

In the case of tessellation scenes it could very well be the tessellator itself that runs out first? Or maybe the tessellator is sufficient but it stresses VRAM or bandwidth too much?

cbn · Mar 20, 2010

SlowSpyder said:
if the 5870 didn't have enough memory bandwidth. But from what I've seen, it has plenty. You get far better performance increases by overclocking the GPU, not the memory.

Even in DX10.1 hd5770 (exactly half a hd5870) is already showing memory bandwidth limitations and this was discovered looking purely at average frame rates. See BFG10K's tests over at Alien babel tech.

P.S. Back in the original Fermi white paper Nvidia mentioned increasing the bus to 512 bit during the optical shrink to 28nm. So I am already thinking they realize they are slightly short on bandwidth in certain circumstances. If this is true, then surely ATI is even more bandwidth starved under the same circumstances.

toyota · Mar 20, 2010

Computer Bottleneck said:
Even in DX10.1 hd5770 (exactly half a hd5870) is already showing memory bandwidth limitations and this was discovered looking purely at average frame rates. See BFG10K's tests over at Alien babel tech.

P.S. Back in the original Fermi white paper Nvidia mentioned increasing the bus to 512 bit during the optical shrink to 28nm. So I am already thinking they realize they are slightly short on bandwidth in certain circumstances. If this is true, then surely ATI is even more bandwidth starved under the same circumstances.

with a 320 and 384 bit bus the gtx470/480 are not going to be bandwidth limited with gddr5.

extra · Mar 20, 2010

crisium said:
Who knows, we might see a 192-bit, 768MB DDR5 Fermi to compete with the 5770. Or maybe for the first time we'll see 160-bit or 224-bit! Anything is possible with Huang!

Haha..God let's hope so. Prices are teh suck right now. I want the 5770 to drop more so I can grab another one to crossfire

.

Schmide · Mar 20, 2010

toyota said:
with a 320 and 384 bit bus the gtx470/480 are not going to be bandwidth limited with gddr5.

It won't be limited but considering ecc takes 25% more data, if they use hamming code, it will be more like a 240/288bit bus respectively; which would put it near parity of cypress and a 448/512bit ddr3 gt200(b).

Does anyone know if they can turn ecc off? I doubt it since it's often implemented in hardware.

cbn · Mar 20, 2010

I remember back when folks said HD4850 was bandwidth limited with 256 bit and GDDR3. That part was only running 625 Mhz on 800 stream processors.

Now we are talking double the stream processors on Cypress running 36% faster (Bumping computational power to over 2.7 TFLOPs). Has memory bandwidth increased sufficiently to match that? No.

In fact, HD5870's memory bandwidth to computational power ratio is actually worse than HD4850's.

Lonyo · Mar 20, 2010

Schmide said:
It won't be limited but considering ecc takes 25% more data, if they use hamming code, it will be more like a 240/288bit bus respectively; which would put it near parity of cypress and a 448/512bit ddr3 gt200(b).

Does anyone know if they can turn ecc off? I doubt it since it's often implemented in hardware.

I think it can be turned off, since it uses some of the RAM present for the ECC bits, so you lose some RAM capacity.
A 3GB Fermi card would actually have something like 2.7GB available (or 8/9ths of 3GB, whatever that is exactly). Since the GTX480 looks like it will have a full 1.5GB, that means it can't be reserving some RAM for the ECC functionality.
Therefore I would conclude that it doesn't have to be active, and possibly it's BIOS related (e.g. you need a Tesla BIOS to be able to use ECC functionality, since it reapportions the RAM).

That means no need to worry about it.

NoQuarter · Mar 20, 2010

Computer Bottleneck said:
Wouldn't memory bandwidth be stressed the most in scenes where lots of tessellation, explosions, action are occurring? If so I would expect it to have the most effect on minimum frame rates.

P.S. I realize Fermi has a different core and way of processing tessellation so I do not think its frame rate performance in heaven benchmark purely has to do with its bandwidth.

Actually one of the benefits of tessellation is lowered memory bandwidth consumption. It allows you to create geometry without having to store or read the much more complex models in memory, which saves a lot of memory bandwidth.

So a low poly base model tessellated out to a high poly model should render faster than a high poly base model (if it were equal to the tessellated version) if bandwidth is limited. May be one of the reasons ATI felt comfortable with the lower memory bandwidth on the 5770? Who knows.

Schmide · Mar 20, 2010

Lonyo said:
I think it can be turned off, since it uses some of the RAM present for the ECC bits, so you lose some RAM capacity.
A 3GB Fermi card would actually have something like 2.7GB available (or 8/9ths of 3GB, whatever that is exactly). Since the GTX480 looks like it will have a full 1.5GB, that means it can't be reserving some RAM for the ECC functionality.
Therefore I would conclude that it doesn't have to be active, and possibly it's BIOS related (e.g. you need a Tesla BIOS to be able to use ECC functionality, since it reapportions the RAM).

That means no need to worry about it.

The numbers just don't add up to it being turned off or at the very least recovering the bus lines needed for ecc.

Lets assume it uses the same 4.8gbs ddr5 that cypress uses. Cypress gets 153.6GBs with a 256bit bus, this would mean that fermi should get around 240GBs with a 384bit bus, but all the specs we're getting so far show a 177.4GBs data rate. This equates to a bus somewhere in the 290bit range or 12% more than cypress. (Which would also parity with 25% loss from bus lines dedicated to hamming code)

We'll have to see when the product hits the streets.

fermi.. 320bit interface.. again really?

Golden Member

Belgian Waffler

Golden Member

Senior member

Platinum Member

Senior member

Lifer

Platinum Member

Belgian Waffler

Golden Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Golden Member

Diamond Member

Lifer

Lifer

Golden Member

Diamond Member