WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
Looks like TweakGuides got the same results that I did in FC4, but I'm sure people will still say that isn't enough...

And I'm bolding this just to fit in with the people who like to bold random sentences!
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
Does any other video card advertise Vram as a set amount but access it as the GTX 970?
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Does any other video card advertise Vram as a set amount but access it as the GTX 970?

I bet the 192bit 2GB cards do for example. And there may be countless examples of cut down AMD or nVidia GPU in the history doing the same.
 

amenx

Diamond Member
Dec 17, 2004
4,521
2,857
136
I think the big elephant in the room is the 3gb 780ti. Why has this card which has been out for almost 2 years not had any prominent issue when pushed over its 3gb limit in similar conditions? And here comes a weaker card (970), just with more vram, supposedly fumble at 3.5gb.

I think that cuda vram bandwidth program was not the best sort of tool to determine the issue to any proper degree. It tested the 970 under its own specific conditions, not in actual gaming. But thats what threw this non-story into overdrive, that it was some sort of "definitive proof" of a problem.
 

Hitman928

Diamond Member
Apr 15, 2012
6,696
12,373
136
Looks like TweakGuides got the same results that I did in FC4, but I'm sure people will still say that isn't enough...

And I'm bolding this just to fit in with the people who like to bold random sentences!


The problem is, it's really not. I'm not saying this from a the 970 has issues perspective, I'm saying this from a proper testing perspective. He provides a video on youtube where his fps is well below 30 fps to begin with; the whole video is a stutter fest. He also provides no performance data whatsoever which makes the endeavor wholly insufficient.

With that said, I honestly don't know if any 3rd party has the proper tools right now to test for this, even fcat. Nvidia should have the right profiling tools, etc, but I doubt they'd go through the trouble. To me, there's three main problems with testing for this from a 3rd party perspective.

1) The card as designed tries to allocate everything to the first 3.5 GB section on its own if at all possible so going over that limit is more difficult than other 4 GB cards.

2) To hit the window where the resources required are above 3.5 GB but don't completely max out the 4 GB and is playable is very difficult. It seems once you hit the 3.5 GB wall, you need ridiculous unplayable settings to begin with to make it use more and will most likely start swapping with system RAM on any 4 GB card. Even then, how much of an effect will %10 of your VRAM being slower have on game performance? That's a very difficult question to answer because as far as I know, this is the first time the question has ever come up. What about games like BF4 where the VRAM can be filled more "dynamically", will you get slightly less LOD? Will there be any effect?

3) To do a proper test you need a "clean sample" or at least expected behavior to compare against. Perhaps someone could downclock a 980 to roughly 970 performance and then compare. Even that wouldn't be exact because the architectures are in fact different, but maybe between that and a 290 for comparison, you could start to get some kind of picture as to whether or not it's a problem.

I will say that it seems like for current games, it isn't a problem, but that very well could be that for current games, 4 GB is unnecessary unless you run unplayable settings to begin with. Perhaps 970 SLI vs 980 SLI could hold some answers as well but then you run into #2 above where it is hard to find a sweet spot of memory use. Is there really a difference between a 3.5 GB and 4 GB card in today's games (again, only assuming there is a problem for testing purposes, not acknowledging one)? I doubt it. Could it be a problem in the future, who knows.

If it were me and I upgraded every couple of years, I wouldn't worry about it at all. If I wanted to keep a card for 4+ years, it would give me pause until more testing was done. What intrigues me most is the "bandwidth" test people love/hate to show. I think the biggest question there is, does it go to system RAM after 3.5 GB or is this really the second sector performance. I have never done anything in CUDA so I have no idea, but if you can get a firm answer on that, I'd say you at least have a first step in profiling the 970 VRAM performance. Sorry for the long post, I've done verification work before and that side came out a little bit.
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
It would also be a good idea for thoroughness to test some non-AAA titles that can go 3.5+GB.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
I think the big elephant in the room is the 3gb 780ti. Why has this card which has been out for almost 2 years not had any prominent issue when pushed over its 3gb limit in similar conditions? And here comes a weaker card (970), just with more vram, supposedly fumble at 3.5gb.

I think that cuda vram bandwidth program was not the best sort of tool to determine the issue to any proper degree. It tested the 970 under its own specific conditions, not in actual gaming. But thats what threw this non-story into overdrive, that it was some sort of "definitive proof" of a problem.

Well, a 3GB card can't go over 3GB. I've had a pair of 670s for a long time and often they would sit right around 2GB in use and you would get some slight stuttering as it swaps in and out of memory. In those same games I can use over 2GB at the same settings now that I'm using 970s. Each game is different. Some use everything they can while others only take what is necessary but none of them will use more than there is available. If it happens you will have large stutters and a borderline slideshow.
 

amenx

Diamond Member
Dec 17, 2004
4,521
2,857
136
Well, a 3GB card can't go over 3GB. I've had a pair of 670s for a long time and often they would sit right around 2GB in use and you would get some slight stuttering as it swaps in and out of memory. In those same games I can use over 2GB at the same settings now that I'm using 970s. Each game is different. Some use everything they can while others only take what is necessary but none of them will use more than there is available. If it happens you will have large stutters and a borderline slideshow.
I know, but the point I was trying to make is that if a card is hobbled due to vram not being sufficient, ie, past 3.5gb like the 970 (supposedly) in certain game scenarios, how would the 3gb 780ti fare under same situation?
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
I know, but the point I was trying to make is that if a card is hobbled due to vram not being sufficient, ie, past 3.5gb like the 970 in certain game scenarios, how would the 3gb 780ti fare under same situation?

Probably what happens when you have a 2GB card. It either doesn't use the full amount or it stops at or around the max and it's not a big deal. The 780 is a lot faster than the 670 so maybe it can use raw performance to push through. Where my 670s would have some stuttering, the 780 may not exhibit the same.
 

bazilizk

Junior Member
Jan 25, 2015
1
0
0
hi guys.
subscribed here just to say hi :)
and some tests with the G1 gaming gtx 980 with the vramBandwidthTest - guru3d
6 times with different results

10934134_10153573708929535_8890066933527166104_o.jpg





10903864_10153573711399535_4527424532025832995_o.jpg




10842332_10153573728699535_579657787800847474_o.jpg




10847663_10153573734929535_7178120080405474398_o.jpg




10927726_10153574311684535_1248548981_o.jpg




10951005_10153574320419535_1218909646_o.jpg
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
We have had ample evidence that 4gb is usable on the 970. That performance doesnt tank. We had this evidence before TweakGuides investigations and I expect we will see a ton more from valid sources.

There has been a lot of effort put into making this a big deal when its not really been.

Here is data i posted pages back and completely it was ignored.

LL


here is more data

LL


https://www.youtube.com/watch?v=yUi...4688&x-yt-cl=84503534&feature=player_embedded

source: users on the OCN thread http://www.overclock.net/t/1535502/gtx-970s-can-only-use-3-5gb-of-4gb-vram-issue/400

I honestly think Nvidia will release a driver that has totally revamped memory management for the 970. This is just my guess.

There was something going on, the driver only allocates 3.5gb unless the settings are so much it simply cant avoid it. I think nvidia will completely revamp this method they took. I just dont think it will make a huge difference at all in performance but with so many flipping out now with crazy allegations, i think nvidia will change how the memory is managed.

See, I think nvidia already had the most efficient method. But changes will be made, i would bet on it.
As for the claims, its just not true that the card cannot access the whole 4gb and its really not killing performance in games as some people are trying to make it to be.
 
Last edited:

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
As for the claims, its just not true that the card cannot access the whole 4gb and its really not killing performance in games as some people are trying to make it to be.

To the contrary, that s quite possible when looking at the GPU diagram, the memory controlers that are connected to the clusters have 16 bits of adresses, on a total of 64, that are indeed not used since they would feed disabled units, and the simplified MCs cant swap the data from the 0.5GB partitions RAM to feed a nearby unit, actualy it s quite possible that thoses 0.5GB are not functional at all even if they can be adressed, that is you can fill it with datas but you have no mean to send thoses datas to functional units because the design lacks the relevant crossbars.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
To the contrary, that s quite possible when looking at the GPU diagram, the memory controlers that are connected to the clusters have 16 bits of adresses, on a total of 64, that are indeed not used since they would feed disabled units, and the simplified MCs cant swap the data from the 0.5GB partitions RAM to feed a nearby unit, actualy it s quite possible that thoses 0.5GB are not functional at all even if they can be adressed, that is you can fill it with datas but you have no mean to send thoses datas to functional units because the design lacks the relevant crossbars.

So you're saying you have memory can't can't be used again? We already debunked that claim.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
So you're saying you have memory can't can't be used again? We already debunked that claim.

Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...
 

flexy

Diamond Member
Sep 28, 2001
8,464
155
106
Let's shove the 200MB-300MB which WINDOWS/DWM allocates into the "slow" part (in case this wouldn't happen yet)....problem *almost* solved.
 

tential

Diamond Member
May 13, 2008
7,348
642
121
Well, a 3GB card can't go over 3GB. I've had a pair of 670s for a long time and often they would sit right around 2GB in use and you would get some slight stuttering as it swaps in and out of memory. In those same games I can use over 2GB at the same settings now that I'm using 970s. Each game is different. Some use everything they can while others only take what is necessary but none of them will use more than there is available. If it happens you will have large stutters and a borderline slideshow.

Isn't this the fundamental flaw people are making in testing the GTX 970? Rather than ensuring the game is a game that actually needs to use 4GB of VRAM vs loading up all available VRAM, we're simply testing whether we can get the GTX 970 to utilize 4GB of VRAM (Which is a silly test in the first place and shows nothing).

What you need is to find a game that doesn't just load up VRAM, but is actually in need of using a full 4GB of VRAM at the time, then see if you encounter any hitching during that game play.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Let's shove the 200MB-300MB which WINDOWS/DWM allocates into the "slow" part (in case this wouldn't happen yet)....problem *almost* solved.

If what i m pointing is true then Windows DWM will use 200-300MB within the 3.5GB and not within the 0.5GB since thoses latter can be filled with whatever data coming from the PCIe, but once filled thoses 0.5GB cant be adressed by the functional SMMs, only by the disabled SMMs, wich is useless.

I suspect that Nvidia didnt release infos not only because a 4GB card is much more marketable than a 3.5GB, but also because thoses infos are indirectly disclosing their memory controler design, wich seems to be much simplified for power efficency purposes.
 

wand3r3r

Diamond Member
May 16, 2008
3,180
0
0
Isn't this the fundamental flaw people are making in testing the GTX 970? Rather than ensuring the game is a game that actually needs to use 4GB of VRAM vs loading up all available VRAM, we're simply testing whether we can get the GTX 970 to utilize 4GB of VRAM (Which is a silly test in the first place and shows nothing).

What you need is to find a game that doesn't just load up VRAM, but is actually in need of using a full 4GB of VRAM at the time, then see if you encounter any hitching during that game play.

Exactly, since new games going forward will actually require and be able to utilize more VRAM. 4k would seem like a good testing resolution but the games still have to be types which actually use the complete VRAM rather than simply cache something in the extra 500 MB.

I really don't see why they couldn't have simply made it a 3.5GB card, since it seems to be using some hack to access the remaining 500 MB. People would have certainly still bought them. Strange.
 

amenx

Diamond Member
Dec 17, 2004
4,521
2,857
136
Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...
As much as I hate to admit it, does make sense.
 

96Firebird

Diamond Member
Nov 8, 2010
5,742
340
126
Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...

Are you assuming each SMM is saturated and cannot handle the memory coming from the .5GB?
 

amenx

Diamond Member
Dec 17, 2004
4,521
2,857
136
Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...
On second thought, what happens when gaming data is on the .5gb segment if it cant be 'executed'? Cant it be shuffled back (albeit less efficiently) to the 3.5gb 'executing' section when called upon?
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Are you assuming each SMM is saturated and cannot handle the memory coming from the .5GB?

No, what i m assuming is that each SMM has a dedicated space in the RAM that is always the same, the memory controler that is linked to a given SMM has access only to this SMM s RAM dedicated space, it cant retrieve datas from other dedicated spaces and feed them to this SMM, as such the 0.5GB datas can be retrieved by the MC but this latter cant distribute thoses retrieved datas to functionals SMMs, what made me think about this is Nvidia statement that there was a crossbar limitation, the 0.5GB could litteraly be a ghost memory, dont know how the GPU use it, if ever it does.


On second thought, what happens when gaming data is on the .5gb segment if it cant be 'executed'? Cant it be shuffled back (albeit less efficiently) to the 3.5gb 'executing' section when called upon?

As said above i dont know, they could use it if the data can be loaded in the L2 cache and then loaded in another SMM L1 cache, but then why the need of a second partition if it was possible this way.?.

They can eventualy send it through PCIe but this wouldnt make sense, it would be better to have it directly on the system RAM.
 

rgallant

Golden Member
Apr 14, 2007
1,361
11
81
Exactly, since new games going forward will actually require and be able to utilize more VRAM. 4k would seem like a good testing resolution but the games still have to be types which actually use the complete VRAM rather than simply cache something in the extra 500 MB.

I really don't see why they couldn't have simply made it a 3.5GB card, since it seems to be using some hack to access the remaining 500 MB. People would have certainly still bought them. Strange.
marketing
most people buying a new cards today would be coming from low vram cards so with amd at 4 gb , no way would nv put 3.5gb vram on the box.

also what will happen to the gm200 cut down cards ? ,same maxwell ack.
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
Are you assuming each SMM is saturated and cannot handle the memory coming from the .5GB?

in a perfect scenario with fully saturated SMM from cache to memory, the extra 500mb would be useless. And i think that is why it is kept separate anyway, because nvidia tries to have the most efficient flow for maximum throughput. Basically, its such an effiecent set up that ideally there is no gap or scrambling for data. The idea is to keep the SMM as busy as absolutely possible without any wasted cycles. Nvidia would have to have this working seamlessly and in perfect order.

This is a great reason to try to use the 3.5gb first. Because Maxwell has such efficient flow from memory to cache to SMM.

The other 500mb is there and it is usable. completely usable. Its just that you only have so many SMMs and they can saturate the entire system already. Its all flowing for maximum throughput. The 500mb is not needed most of the time. There is no SMM attached to it.

This doesnt mean it cant be used for anything because it can. And it is accessible. Its just not really necessary to keep saturated all the cores

I believe Nvidia already is using the memory as efficiently as they felt they should have. But now after all this has blown up, I am willing to bet that nvidia makes some changes and will force that segment of Ram to be used much more frequently, whether its useful, needed, or not. I dont think we will gain anything as far as performance goes. But who knows.
I believe we will see allocating of that 500mb just to please. I dont see how it would help the other SMM when they are totally saturated anyway, but maybe it can help to some degree.
Maxwell throughput is very solid. Data flows in a very organized manner, in a line. But having data on the chip, even in the 500mb space should be better that having to grab it from the HD or pagefile. Its just in a perfect world, the SMM wouldnt ever need it. They have plenty to keep the saturated already.

disclaimer, just my view. could be totally wrong
 
Last edited:
Status
Not open for further replies.