WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Page 15 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Anyone remembers the gtx680/770 2gb defense line? When you showed games running on 7970 taking more than 2gb of vram? It was along the line of:
it takes so much because it can, not because it needs it

There same here. If you play game and see vram usage go above 3.5gb it doesn't mean game will change to stutterfest 5fps gameplay. The same rule apply. It took that much because it could. Does it make it run faster/slower? Nope, it is just populated, not necessarily used enough to impact performance.

The benchmark is whole other matter as is show memory performance of each chunk. While the game could run on 3.5gb and have the rest populated with not so important data.
 

jj109

Senior member
Dec 17, 2013
391
59
91
One oddity from this test I haven't really seen addressed is the Bandwidth discrepancy between the 970 & 980.

Also it's been argued that the 970 has a 208 bit bus, but mathematically it looks like a 216 bit bus - bear with me.

Anything to address here, or is this all irrelevant?


.

It's irrelevant. Look at the benchmark's source code. It's a floating point addition benchmark. The GTX 970 is pulling 3.8 GFLOPS and the GTX 980 is pulling ~ 5 GFLOPS. The only issue with the benchmark & GTX 970 is that it isn't using the 500 MB partition at all, and constantly swapping from RAM instead of loading to VRAM.
 
Feb 19, 2009
10,457
10
76
Anyone remembers the gtx680/770 2gb defense line? When you showed games running on 7970 taking more than 2gb of vram? It was along the line of:

There same here. If you play game and see vram usage go above 3.5gb it doesn't mean game will change to stutterfest 5fps gameplay. The same rule apply. It took that much because it could. Does it make it run faster/slower? Nope, it is just populated, not necessarily used enough to impact performance.

The benchmark is whole other matter as is show memory performance of each chunk. While the game could run on 3.5gb and have the rest populated with not so important data.

Definitely it will depend on the game engine, vram allocated may not be "needed".

It seems to be NV drivers prioritize game assets into 3.5gb of vram, as noticed by reviewers testing vram usage in games, in situations where the 980 is above that, the 970 seems stuck on 3.5gb (I noticed that a few times at GameGPU vram benches but thought nothing of it at the time!). Drivers are doing their work and when it cannot fit, it puts unimportant assets into the last 500mb to not hinder game performance.

Obviously doesn't work for CUDA, or some games without those optimizations, as seen with Skyrim & Arma 3 (both quite old and may not be optimized for).

Edit: Had a quick look at the numbers NV gave to PCPer, its not a 1% difference. Comparing the performance results between the 980 vs 970, the 970 takes a ~6% extra hit when vram usage goes above 3.5gb, relative to the 980! While in single card configs, these fps are not playable anyway but such settings could impact SLI setups where the 970 falls off worse. Also, these results from NV would be the best case scenario (games their drivers are optimized for), so it stands to reason in games without driver optimizations, users reporting stuttering or major spikes could well be true.
 
Last edited:

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
YES the issue DOES exist and Nvidia knows that!

The just released "damage control comparison picture" is meaningless. It shows average FPS.

THE ISSUE is that the 970 incurs stuttering and massive increase of frame-time once 3.2GB-3.5GB VMEM is exceeded. The damage control image does not show what the actual issue here is.

The problem is real, it doesn't exist on the 980.

How did I not experience stuttering in FC4 at 3800MB VRAM usage? Do I have a magical card?! Sweet, it should sell for millions!
 

HurleyBird

Platinum Member
Apr 22, 2003
2,812
1,550
136
How did I not experience stuttering in FC4 at 3800MB VRAM usage? Do I have a magical card?! Sweet, it should sell for millions!

Because, as many others have pointed out already, the data in that last 300MB was not being accessed very often. If it had been, you'd notice a significant degradation in performance.

Depending on application, the GTX 970 may be hit with a subtle performance hit or a significant one over 3500MB. Theoretically, if you have between 3500-4000MB of *intensively* used data stored in VRAM the GTX 970 will fall over a performance cliff that the GTX 980 will avoid.
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
Because, as many others have pointed out already, the data in that last 300MB was not being accessed very often. If it had been, you'd notice a significant degradation in performance.

Depending on application, the GTX 970 may be hit with a subtle performance hit or a significant one over 3500MB. Theoretically, if you have between 3500-4000MB of *intensively* used data stored in VRAM the GTX 970 will fall over a performance cliff that the GTX 980 will avoid.

How do we know this?
 

HurleyBird

Platinum Member
Apr 22, 2003
2,812
1,550
136
Common sense and general tech literacy. Keep in mind that even then I still throw in the word theoretically.
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
Is the memory bandwith the same between the 3.5GB and .5GB partitions? Neither of these "tests" that have been used in this thread use the other .5GB, instead they run the extra .5GB from RAM. Sounds like a benchmark that can't allocate the last .5GB correctly.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Is the memory bandwith the same between the 3.5GB and .5GB partitions? Neither of these "tests" that have been used in this thread use the other .5GB, instead they run the extra .5GB from RAM. Sounds like a benchmark that can't allocate the last .5GB correctly.

I think it's cuda's limitations with the 970 myself.
 

Spanners

Senior member
Mar 16, 2014
325
1
0
I think it's cuda's limitations with the 970 myself.

I see no definitive proof either way that the 0.5GB is using system ram in this benchmark and it doesn't represent the true speed of that block of vram. The bandwidth numbers seem too consistent to account for people running different system memory speeds and architectures.

Additionally isn't ~22GB/s pushing it PCIe bandwidth wise? (v3.0: 15.75 GB/s in each direction in a x16 slot and that's just theoretical)
 
Last edited:

rgallant

Golden Member
Apr 14, 2007
1,361
11
81
GTX 980 Memory Specs:
7.0 GbpsMemory Clock
4 GBStandard Memory Config
GDDR5Memory Interface
256-bitMemory Interface Width
224


GTX 970 Memory Specs:
7.0 GbpsMemory Clock
4 GBStandard Memory Config
GDDR5Memory Interface
256-bitMemory Interface Width
224Memory Bandwidth (GB/sec)

nv states memory config. is the same in the spec's so the 970 & 980 should work the same .
then nv states well it's not really the same because the gtx970 have some parts cut so we used 3.5 + .5gb is that 4 GBStandard Memory Config ????

-if corrected in the drivers , when new cards and games come out will gtx 970 owners see a larger drop off in the benchmarks, than the kelper cards show in the new games today ???
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
I see no definitive proof either way that the 0.5GB is using system ram in this benchmark and it doesn't represent the true speed of that block of vram. The bandwidth numbers seem too consistent to account for people running different system memory speeds and architectures.

HRW-AIDA64.jpg


I just ran it, and the end dropped down to 22.35GB/s, with DDR3-1600 RAM.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
The OP point has still not been rebuked and is holding, granted that there s a 3.5GB main partition but if the 970 had a 256bit bus it should still extract the same bandwith as the 980 in AIDA, yet that s not the case.
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
The OP point has still not been rebuked and is holding, granted that there s a 3.5GB main partition but if the 970 had a 256bit bus it should still extract the same bandwith as the 980 in AIDA, yet that s not the case.

My guess is that it averages all the "chunks", and since the last .5GB for the 970 is from RAM, it will average out lower.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
The OP point has still not been rebuked and is holding, granted that there s a 3.5GB main partition but if the 970 had a 256bit bus it should still extract the same bandwith as the 980 in AIDA, yet that s not the case.

Not if the cut resulted in cuda performance being crippled.
 

Spanners

Senior member
Mar 16, 2014
325
1
0
HRW-AIDA64.jpg


I just ran it, and the end dropped down to 22.35GB/s, with DDR3-1600 RAM.

Could you/somebody run the CUDA test with DDR3-1333 (or 1866) and see if it deviates from 22ish? I would be surprised if nobody who has run this test so far had memory faster or slower than DDR-1600 even the differences in channels/architecture should have shown up.

Also as I edited into my previous post can the PCIe bus support this much bandwidth transferring from ram?

Seems unlikely, unless I have something very wrong, that this is accessing system ram, also the cache results that are tied to this block of memory don't make sense in that case either.
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Nvidia's statement demonstrates there is a performance hit when using the 0.5GB section. They are just saying it's pretty small (few percent). Most likely source of the performance hit is that 0.5GB section having lower throughput than the other 3.5GBs.

On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference. On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference. On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference. As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.

https://forums.geforce.com/default/...tx-970-3-5gb-vram-issue/post/4432672/#4432672

That is FPS though, have to wait and see if there is an impact on frame delivery. Or perhaps this is the source of the dropped frames issue with 970 SLI?
 
Last edited:

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
From my understanding, to achieve the greater than 3.5GB VRAM usage, they upped the settings. The 970 could just be falling behind in performance at the more demanding settings.

Could you/somebody run the CUDA test with DDR3-1333 (or 1866) and see if it deviates from 22ish? I would be surprised if nobody who has run this test so far had memory faster or slower than DDR-1600 even the differences in channels/architecture should have shown up.

Also as I edited into my previous post can the PCIe bus support this much bandwidth transferring from ram?

Seems unlikely, unless I have something very wrong, that this is accessing system ram, also the cache results that are tied to this block of memory don't make sense in that case either.

Honestly, I don't even know how to overclock/underclock RAM. I've always just left it stock...

Not sure on the PCI-E bandwith.
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
When they fix CUDA to see the 0.5GB section it should be trivial to test the throughput. That is, if they fix it.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Not if the cut resulted in cuda performance being crippled.


I said AIDA memory test..

My guess is that it averages all the "chunks", and since the last .5GB for the 970 is from RAM, it will average out lower.

Quite possible but what matters is the final bandwith, in this respect the 970 has lower bandwith despite an alleged similar bus, it s possible that this is an effective 192-208bit bus, this could be due to a simplified memory controler that has fixed RAM adresses for each SM, hence it must swap the data that are on the 0.5GB partition to the 3.5GB one, either through the PCIe or from a cache to said 3.5GB part, this cant be done faster than the data on primary partition are exhausted, hence the collapsing bandwith with the CUDA test, with a game it depend of the executed dataflow, if the 3.5GB datas are not exhausted fast enough, or if too much data present in the 0.5GB is needed the game will stutter, hence the driver behaviour that try to get as much as possible below 3.5GB.
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
I don't think you are understanding that these tests aren't even touching the .5GB partition on the 970. Games do access that extra .5GB of VRAM.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Nvidia's statement demonstrates there is a performance hit when using the 0.5GB section. They are just saying it's pretty small (few percent). Most likely source of the performance hit is that 0.5GB section having lower throughput than the other 3.5GBs.



https://forums.geforce.com/default/...tx-970-3-5gb-vram-issue/post/4432672/#4432672

That is FPS though, have to wait and see if there is an impact on frame delivery. Or perhaps this is the source of the dropped frames issue with 970 SLI?

There are two things that are closely related with this 970, one is that there s a bandwith limitation of the remaining 0.5GB RAM, second is that the 3.5GB RAM partition bandwith doesnt match a 256 bit bus using 6Gbit RAM if we are to compare it to the 980, and not even ask for the theorical numbers, as pointed above this can be due to average but for the times it hasnt been accurately answered as well.

I don't think you are understanding that these tests aren't even touching the .5GB partition on the 970. Games do access that extra .5GB of VRAM.

Actualy that s not a problem of memory access, the whole 4GB can be accessed, it s just that the last 0.5GB are dead slow, this is demonstrated either by the CUDA soft, or eventualy by AIDA if the memory copy is an average, if it s not then the card has a 192-208 bit bus....
 
Last edited:
Status
Not open for further replies.