WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

ShintaiDK · Jan 25, 2015

amenx said:
Meanwhile, Tweakguides has a bit on it and tests his own 970 on FC4 over the 3.5gb point.

http://www.tweakguides.com/

:thumbsup:

It was already tested earlier as well. People dismissed it back then too in search of the drama. Its quite clear that there is no issue. Neither in performance or 4GB usage.

96Firebird · Jan 25, 2015

Looks like TweakGuides got the same results that I did in FC4, but I'm sure people will still say that isn't enough...

And I'm bolding this just to fit in with the people who like to bold random sentences!

guskline · Jan 25, 2015

Does any other video card advertise Vram as a set amount but access it as the GTX 970?

ShintaiDK · Jan 25, 2015

guskline said:
Does any other video card advertise Vram as a set amount but access it as the GTX 970?

I bet the 192bit 2GB cards do for example. And there may be countless examples of cut down AMD or nVidia GPU in the history doing the same.

amenx · Jan 25, 2015

I think the big elephant in the room is the 3gb 780ti. Why has this card which has been out for almost 2 years not had any prominent issue when pushed over its 3gb limit in similar conditions? And here comes a weaker card (970), just with more vram, supposedly fumble at 3.5gb.

I think that cuda vram bandwidth program was not the best sort of tool to determine the issue to any proper degree. It tested the 970 under its own specific conditions, not in actual gaming. But thats what threw this non-story into overdrive, that it was some sort of "definitive proof" of a problem.

Hitman928 · Jan 25, 2015

96Firebird said:
Looks like TweakGuides got the same results that I did in FC4, but I'm sure people will still say that isn't enough...

And I'm bolding this just to fit in with the people who like to bold random sentences!

The problem is, it's really not. I'm not saying this from a the 970 has issues perspective, I'm saying this from a proper testing perspective. He provides a video on youtube where his fps is well below 30 fps to begin with; the whole video is a stutter fest. He also provides no performance data whatsoever which makes the endeavor wholly insufficient.

With that said, I honestly don't know if any 3rd party has the proper tools right now to test for this, even fcat. Nvidia should have the right profiling tools, etc, but I doubt they'd go through the trouble. To me, there's three main problems with testing for this from a 3rd party perspective.

1) The card as designed tries to allocate everything to the first 3.5 GB section on its own if at all possible so going over that limit is more difficult than other 4 GB cards.

2) To hit the window where the resources required are above 3.5 GB but don't completely max out the 4 GB and is playable is very difficult. It seems once you hit the 3.5 GB wall, you need ridiculous unplayable settings to begin with to make it use more and will most likely start swapping with system RAM on any 4 GB card. Even then, how much of an effect will %10 of your VRAM being slower have on game performance? That's a very difficult question to answer because as far as I know, this is the first time the question has ever come up. What about games like BF4 where the VRAM can be filled more "dynamically", will you get slightly less LOD? Will there be any effect?

3) To do a proper test you need a "clean sample" or at least expected behavior to compare against. Perhaps someone could downclock a 980 to roughly 970 performance and then compare. Even that wouldn't be exact because the architectures are in fact different, but maybe between that and a 290 for comparison, you could start to get some kind of picture as to whether or not it's a problem.

I will say that it seems like for current games, it isn't a problem, but that very well could be that for current games, 4 GB is unnecessary unless you run unplayable settings to begin with. Perhaps 970 SLI vs 980 SLI could hold some answers as well but then you run into #2 above where it is hard to find a sweet spot of memory use. Is there really a difference between a 3.5 GB and 4 GB card in today's games (again, only assuming there is a problem for testing purposes, not acknowledging one)? I doubt it. Could it be a problem in the future, who knows.

If it were me and I upgraded every couple of years, I wouldn't worry about it at all. If I wanted to keep a card for 4+ years, it would give me pause until more testing was done. What intrigues me most is the "bandwidth" test people love/hate to show. I think the biggest question there is, does it go to system RAM after 3.5 GB or is this really the second sector performance. I have never done anything in CUDA so I have no idea, but if you can get a firm answer on that, I'd say you at least have a first step in profiling the 970 VRAM performance. Sorry for the long post, I've done verification work before and that side came out a little bit.

Vesku · Jan 25, 2015

It would also be a good idea for thoroughness to test some non-AAA titles that can go 3.5+GB.

cmdrdredd · Jan 25, 2015

amenx said:
I think the big elephant in the room is the 3gb 780ti. Why has this card which has been out for almost 2 years not had any prominent issue when pushed over its 3gb limit in similar conditions? And here comes a weaker card (970), just with more vram, supposedly fumble at 3.5gb.

I think that cuda vram bandwidth program was not the best sort of tool to determine the issue to any proper degree. It tested the 970 under its own specific conditions, not in actual gaming. But thats what threw this non-story into overdrive, that it was some sort of "definitive proof" of a problem.

Well, a 3GB card can't go over 3GB. I've had a pair of 670s for a long time and often they would sit right around 2GB in use and you would get some slight stuttering as it swaps in and out of memory. In those same games I can use over 2GB at the same settings now that I'm using 970s. Each game is different. Some use everything they can while others only take what is necessary but none of them will use more than there is available. If it happens you will have large stutters and a borderline slideshow.

amenx · Jan 25, 2015

cmdrdredd said:
Well, a 3GB card can't go over 3GB. I've had a pair of 670s for a long time and often they would sit right around 2GB in use and you would get some slight stuttering as it swaps in and out of memory. In those same games I can use over 2GB at the same settings now that I'm using 970s. Each game is different. Some use everything they can while others only take what is necessary but none of them will use more than there is available. If it happens you will have large stutters and a borderline slideshow.

I know, but the point I was trying to make is that if a card is hobbled due to vram not being sufficient, ie, past 3.5gb like the 970 (supposedly) in certain game scenarios, how would the 3gb 780ti fare under same situation?

cmdrdredd · Jan 25, 2015

amenx said:
I know, but the point I was trying to make is that if a card is hobbled due to vram not being sufficient, ie, past 3.5gb like the 970 in certain game scenarios, how would the 3gb 780ti fare under same situation?

Probably what happens when you have a 2GB card. It either doesn't use the full amount or it stops at or around the max and it's not a big deal. The 780 is a lot faster than the 670 so maybe it can use raw performance to push through. Where my 670s would have some stuttering, the 780 may not exhibit the same.

bazilizk · Jan 25, 2015

hi guys.
subscribed here just to say hi

and some tests with the G1 gaming gtx 980 with the vramBandwidthTest - guru3d
6 times with different results

ocre · Jan 25, 2015

We have had ample evidence that 4gb is usable on the 970. That performance doesnt tank. We had this evidence before TweakGuides investigations and I expect we will see a ton more from valid sources.

There has been a lot of effort put into making this a big deal when its not really been.

Here is data i posted pages back and completely it was ignored.

here is more data

https://www.youtube.com/watch?v=yUi...4688&x-yt-cl=84503534&feature=player_embedded

source: users on the OCN thread http://www.overclock.net/t/1535502/gtx-970s-can-only-use-3-5gb-of-4gb-vram-issue/400

I honestly think Nvidia will release a driver that has totally revamped memory management for the 970. This is just my guess.

There was something going on, the driver only allocates 3.5gb unless the settings are so much it simply cant avoid it. I think nvidia will completely revamp this method they took. I just dont think it will make a huge difference at all in performance but with so many flipping out now with crazy allegations, i think nvidia will change how the memory is managed.

See, I think nvidia already had the most efficient method. But changes will be made, i would bet on it.
As for the claims, its just not true that the card cannot access the whole 4gb and its really not killing performance in games as some people are trying to make it to be.

Abwx · Jan 25, 2015

ocre said:
As for the claims, its just not true that the card cannot access the whole 4gb and its really not killing performance in games as some people are trying to make it to be.

To the contrary, that s quite possible when looking at the GPU diagram, the memory controlers that are connected to the clusters have 16 bits of adresses, on a total of 64, that are indeed not used since they would feed disabled units, and the simplified MCs cant swap the data from the 0.5GB partitions RAM to feed a nearby unit, actualy it s quite possible that thoses 0.5GB are not functional at all even if they can be adressed, that is you can fill it with datas but you have no mean to send thoses datas to functional units because the design lacks the relevant crossbars.

cmdrdredd · Jan 25, 2015

Abwx said:
To the contrary, that s quite possible when looking at the GPU diagram, the memory controlers that are connected to the clusters have 16 bits of adresses, on a total of 64, that are indeed not used since they would feed disabled units, and the simplified MCs cant swap the data from the 0.5GB partitions RAM to feed a nearby unit, actualy it s quite possible that thoses 0.5GB are not functional at all even if they can be adressed, that is you can fill it with datas but you have no mean to send thoses datas to functional units because the design lacks the relevant crossbars.

So you're saying you have memory can't can't be used again? We already debunked that claim.

Abwx · Jan 25, 2015

cmdrdredd said:
So you're saying you have memory can't can't be used again? We already debunked that claim.

Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...

flexy · Jan 25, 2015

Let's shove the 200MB-300MB which WINDOWS/DWM allocates into the "slow" part (in case this wouldn't happen yet)....problem *almost* solved.

tential · Jan 25, 2015

cmdrdredd said:
Well, a 3GB card can't go over 3GB. I've had a pair of 670s for a long time and often they would sit right around 2GB in use and you would get some slight stuttering as it swaps in and out of memory. In those same games I can use over 2GB at the same settings now that I'm using 970s. Each game is different. Some use everything they can while others only take what is necessary but none of them will use more than there is available. If it happens you will have large stutters and a borderline slideshow.

Isn't this the fundamental flaw people are making in testing the GTX 970? Rather than ensuring the game is a game that actually needs to use 4GB of VRAM vs loading up all available VRAM, we're simply testing whether we can get the GTX 970 to utilize 4GB of VRAM (Which is a silly test in the first place and shows nothing).

What you need is to find a game that doesn't just load up VRAM, but is actually in need of using a full 4GB of VRAM at the time, then see if you encounter any hitching during that game play.

Abwx · Jan 25, 2015

flexy said:
Let's shove the 200MB-300MB which WINDOWS/DWM allocates into the "slow" part (in case this wouldn't happen yet)....problem *almost* solved.

If what i m pointing is true then Windows DWM will use 200-300MB within the 3.5GB and not within the 0.5GB since thoses latter can be filled with whatever data coming from the PCIe, but once filled thoses 0.5GB cant be adressed by the functional SMMs, only by the disabled SMMs, wich is useless.

I suspect that Nvidia didnt release infos not only because a 4GB card is much more marketable than a 3.5GB, but also because thoses infos are indirectly disclosing their memory controler design, wich seems to be much simplified for power efficency purposes.

wand3r3r · Jan 25, 2015

tential said:
Isn't this the fundamental flaw people are making in testing the GTX 970? Rather than ensuring the game is a game that actually needs to use 4GB of VRAM vs loading up all available VRAM, we're simply testing whether we can get the GTX 970 to utilize 4GB of VRAM (Which is a silly test in the first place and shows nothing).

What you need is to find a game that doesn't just load up VRAM, but is actually in need of using a full 4GB of VRAM at the time, then see if you encounter any hitching during that game play.

Exactly, since new games going forward will actually require and be able to utilize more VRAM. 4k would seem like a good testing resolution but the games still have to be types which actually use the complete VRAM rather than simply cache something in the extra 500 MB.

I really don't see why they couldn't have simply made it a 3.5GB card, since it seems to be using some hack to access the remaining 500 MB. People would have certainly still bought them. Strange.

amenx · Jan 25, 2015

Abwx said:
Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...

As much as I hate to admit it, does make sense.

96Firebird · Jan 25, 2015

Abwx said:
Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...

Are you assuming each SMM is saturated and cannot handle the memory coming from the .5GB?

amenx · Jan 25, 2015

Abwx said:
Yes you can load 4GB but only 3.5 can be sent to the SMMs, the data in the remaining 0.5GB cant be sent for execution to the SMMs used for executing the datas that are within the 3.5GB, it s only a theory but the GPU design and the presence of a separated partition point to this implementation, what transpire is that each SMM has a fixed adress space in the RAM and that data meant to be executed by a given SMM must be retired in the relevant adress space, if this given SMM is disabled there s no mean to send the data to another SMM, so as said the whole 4GB is adressable but only 3.5GB can be executed by the GPU computing units...

On second thought, what happens when gaming data is on the .5gb segment if it cant be 'executed'? Cant it be shuffled back (albeit less efficiently) to the 3.5gb 'executing' section when called upon?

Abwx · Jan 25, 2015

96Firebird said:
Are you assuming each SMM is saturated and cannot handle the memory coming from the .5GB?

No, what i m assuming is that each SMM has a dedicated space in the RAM that is always the same, the memory controler that is linked to a given SMM has access only to this SMM s RAM dedicated space, it cant retrieve datas from other dedicated spaces and feed them to this SMM, as such the 0.5GB datas can be retrieved by the MC but this latter cant distribute thoses retrieved datas to functionals SMMs, what made me think about this is Nvidia statement that there was a crossbar limitation, the 0.5GB could litteraly be a ghost memory, dont know how the GPU use it, if ever it does.

amenx said:
On second thought, what happens when gaming data is on the .5gb segment if it cant be 'executed'? Cant it be shuffled back (albeit less efficiently) to the 3.5gb 'executing' section when called upon?

As said above i dont know, they could use it if the data can be loaded in the L2 cache and then loaded in another SMM L1 cache, but then why the need of a second partition if it was possible this way.?.

They can eventualy send it through PCIe but this wouldnt make sense, it would be better to have it directly on the system RAM.

rgallant · Jan 25, 2015

wand3r3r said:
Exactly, since new games going forward will actually require and be able to utilize more VRAM. 4k would seem like a good testing resolution but the games still have to be types which actually use the complete VRAM rather than simply cache something in the extra 500 MB.

I really don't see why they couldn't have simply made it a 3.5GB card, since it seems to be using some hack to access the remaining 500 MB. People would have certainly still bought them. Strange.

marketing
most people buying a new cards today would be coming from low vram cards so with amd at 4 gb , no way would nv put 3.5gb vram on the box.

also what will happen to the gm200 cut down cards ? ,same maxwell ack.

ocre · Jan 25, 2015

96Firebird said:
Are you assuming each SMM is saturated and cannot handle the memory coming from the .5GB?

in a perfect scenario with fully saturated SMM from cache to memory, the extra 500mb would be useless. And i think that is why it is kept separate anyway, because nvidia tries to have the most efficient flow for maximum throughput. Basically, its such an effiecent set up that ideally there is no gap or scrambling for data. The idea is to keep the SMM as busy as absolutely possible without any wasted cycles. Nvidia would have to have this working seamlessly and in perfect order.

This is a great reason to try to use the 3.5gb first. Because Maxwell has such efficient flow from memory to cache to SMM.

The other 500mb is there and it is usable. completely usable. Its just that you only have so many SMMs and they can saturate the entire system already. Its all flowing for maximum throughput. The 500mb is not needed most of the time. There is no SMM attached to it.

This doesnt mean it cant be used for anything because it can. And it is accessible. Its just not really necessary to keep saturated all the cores

I believe Nvidia already is using the memory as efficiently as they felt they should have. But now after all this has blown up, I am willing to bet that nvidia makes some changes and will force that segment of Ram to be used much more frequently, whether its useful, needed, or not. I dont think we will gain anything as far as performance goes. But who knows.
I believe we will see allocating of that 500mb just to please. I dont see how it would help the other SMM when they are totally saturated anyway, but maybe it can help to some degree.
Maxwell throughput is very solid. Data flows in a very organized manner, in a line. But having data on the chip, even in the 500mb space should be better that having to grab it from the HD or pagefile. Its just in a perfect world, the SMM wouldnt ever need it. They have plenty to keep the saturated already.

disclaimer, just my view. could be totally wrong

WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Lifer

Junior Member

Golden Member

Lifer

Lifer

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Golden Member

Golden Member