WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Abwx · Jan 25, 2015

ocre said:
This is a great reason to try to use the 3.5gb first. Because Maxwell has such efficient flow from memory to cache to SMM.

That s not logic at all, the real reason is below, in you own post :

ocre said:
The 500mb is not needed most of the time. There is no SMM attached to it.

That s my point, that the SMMs can not adress the full RAM spaces, but i do not agree on the following :

ocre said:
The other 500mb is there and it is usable. completely usable.

This doesnt mean it cant be used for anything because it can. And it is accessible. Its just not really necessary to keep saturated all the cores

And how can it be usable, what will be the data path.?.

To load them in the GPU RAM and then send them through PCIe back to system RAM to get back to PCIe and then to the usable GPU 3.5GB space.?.

How are the data retrieved and distributed to functional SMMs once they are loaded on thoses 0.5GB.?.

KaRLiToS · Jan 25, 2015

http://www.anandtech.com/show/8931/nvidia-publishes-statement-on-geforce-gtx-970-memory-allocation

Being a high level statement, NVIDIA’s focus is on the performance ramifications – mainly, that there generally aren’t any – and while we’re not prepared to affirm or deny NVIDIA’s claims, it’s clear that this only scratches the surface. VRAM allocation is a multi-variable process; drivers, applications, APIs, and OSes all play a part here, and just because VRAM is allocated doesn’t necessarily mean it’s in use, or that it’s being used in a performance-critical situation. Using VRAM for an application-level resource cache and actively loading 4GB of resources per frame are two very different scenarios, for example, and would certainly be impacted differently by NVIDIA’s split memory partitions.

For the moment with so few answers in hand we’re not going to spend too much time trying to guess what it is NVIDIA has done, but from NVIDIA’s statement it’s clear that there’s some additional investigating left to do. If nothing else, what we’ve learned today is that we know less than we thought we did, and that’s never a satisfying answer. To that end we’ll keep digging, and once we have the answers we need we’ll be back with a deeper answer on how the GTX 970’s memory subsystem works and how it influences the performance of the card.

cmdrdredd · Jan 25, 2015

tential said:
Isn't this the fundamental flaw people are making in testing the GTX 970? Rather than ensuring the game is a game that actually needs to use 4GB of VRAM vs loading up all available VRAM, we're simply testing whether we can get the GTX 970 to utilize 4GB of VRAM (Which is a silly test in the first place and shows nothing).

What you need is to find a game that doesn't just load up VRAM, but is actually in need of using a full 4GB of VRAM at the time, then see if you encounter any hitching during that game play.

What I meant is some games won't go much over 3GB at what I consider some pretty high settings and 1440p and others will start creeping up on 3.7GB or more.

Abwx · Jan 25, 2015

KaRLiToS said:
http://www.anandtech.com/show/8931/nvidia-publishes-statement-on-geforce-gtx-970-memory-allocation

Since there was no accurate info released we can always speculate but i think that Ryan Smith will be somewhat in the annoyance to announce to everybody what seems to be displayed by the recent claims, that is that the 970 has 208 bits and 3328MB effective bus and RAM, bandwith is affected at 182GB/s compared to the 980 s 224GB/s, the remaining 768MB are some ghost RAM that can be used to store non critical datas but probably not datas used by the game criticaly.

This implementation is a direct consequence of the uarch used and i would be surprised if a fix was possible whenever through drivers or firmwares, if it was Nvidia wouldnt had relied on such a tricky scheme.

railven · Jan 25, 2015

Still reading through this whole thread, interesting that the OP got some validation.

This video is exactly the issue I was encountering with Skyrim when I was using SLI GTX 660 Ti:

http://www.dailymotion.com/video/x2ejqt1_skyrim-gtx970-8xmsaa

The hitching was definitely memory related and completely went away when I switched out to a GTX 780. I should have tested a GTX 680 but I was just pissed and swapped out completely to a new card.

Kudos OP, for not caving. I wonder if this tool would work on a GTX 660 Ti and if it would find similar findings. Would make me feel less crazy from when I experienced and basically got told off.

Silverforce11 · Jan 25, 2015

Its going to depend on the game whether it actually needs >3.5gb or it just allocates it. We can see that skyrim run, the drivers try to enforce the 3.5gb limit and when its exceeded, it stutters, definitely some vram swaps happening. Whats needed is an identical run with a 980 to see whether it stutters and how the vram behaves.

When I used to play skyrim, the texture mods at the time only pushed 2gb limit, with 3gb its fine, so I guess whatever mods people are using these days is pushing it to 4gb.

flexy · Jan 25, 2015

Do NOT depend on this tool!

The programmer itself has explained it doesn't even measure DRAM performance. It measures CUDA memory performance from an allocated pool which in some circumstances can also take memory from system memory. I read his description and realized that he doesn't even really know why the memory performance drops there....there are too many unknown variables at play and the developer of the tool even mentions this!

I think a much better way to test is looking at REAL performance (frame times, not avg. FPS) in games, rather than using this tool.

ocre · Jan 25, 2015

Abwx said:
That s not logic at all, the real reason is below, in you own post :

That s my point, that the SMMs can not adress the full RAM spaces, but i do not agree on the following :

And how can it be usable, what will be the data path.?.

To load them in the GPU RAM and then send them through PCIe back to system RAM to get back to PCIe and then to the usable GPU 3.5GB space.?.

How are the data retrieved and distributed to functional SMMs once they are loaded on thoses 0.5GB.?.

Your really taking it far to claim that the data will have to go all the way back to the system ram thru the PCIe bus then turn around and come back up to load into the 3.5gb segment. This is total invented and doesnt have to be the case.

The fact there is cut SM is enough to explain why the ram is segmented all in itself. We can imagine all sorts of stuff up and i have no problem with a discussion that involves invented scenarios.

I do expect nvidia to take some sort of action. I think they will let that ram fill up. Perhaps it will be stuffed with Aero and windows junk. See, just the fact that windows does have allocation to some of the ram (at all times) lets me think that ram can be accessed across without having to go back down and up the PCIe bus. If it didnt then you would have SM starving because of the chunk of vram windows takes up.

Nvidia stated that there were fewer crossbars to the segmented 500mb section, not that there was no crossbars. Why are we thinking that none exist at all? It makes perfect sense that each SM is fed by the cache and ram that comes before it. That nvidia can keep each SM saturated most of the time anyway. Nvidia says there are fewer crossbars to move the data across these segments.....The data that is stored in the ram with no SM. That is reason enough to not let apps fill it up unless there is no other option.

That is exactly how the 970 acts. People have reported that it likes to stay 3500mb and under unless they force it to use more. And then we have examples of people running games at over 3500mb, i have posted several with frame times. Before that, in the beginning of this thread we have people contesting the OP because they play farcry using almost 4gb vram and the performance didnt tank.
We have those examples and then Nvidia comes out with their explanation. It gives good reason as to why games seem to try to avoid going over 3500 unless they have to. Good reason if someone is trying to really look at the situation. They also compare the 980 at over 3500mb vs the 970 at over 3500mb. They show a penalty of 3% when they force the 970 to use that extra vram.

The point is, people have examples of people running at over 3500mb. Its hard to imagine that the data in the 500mb is going all the way back down the pcie, to the system ram, then back up the pcie to be allocated elsewhere. There would be a heck of a lot more than 1-3% penalty. It would be a total mess.

I believe that most likely there are crossbars that allow data to flow between segments. That each SM has its own ram allocated and the GPU performs its best when the flow is straight through. Cant wait to see what comes of this but i expect there will be some update to how the 970 uses its extra 500mb. I feel like nvidia will make some changes and the card will no longer avoid going over 3500mb. I also believe that there will be no performance increase from this at all.

Silverforce11 · Jan 25, 2015

ocre said:
We have those examples and then Nvidia comes out with there explanation. It states why games seem to try to avoid going over 3500 unless they have to. And they also compare the 980 at hover 3500mb vs the 070 at over 3500mb. They show a penalty of 3% when they force the 970 to use that extra ram.

It's ~3% relative to itself <3.5gb but ~5% relative to the 980, in that the 970 suffers more in that AVERAGE FPS result that NV posted. Note that even severe min fps spikes over a long bench will only affect the average fps by a small amount.

The Shadow of Mordor bench clearly show major frame time stutters when vram is above 3.5gb, the results you posted from neogaf shows that. You need to re-examine it again, look at the time axis and compare the vram/frame time. When vram >3.5gb, you get lots of bad frame latency spikes. It's linked.

The skyrim gameplay video shows the stutter behavior also.

I hope it makes sense, if you take a bench that goes on for a few minutes, but one has stutters or low min fps spikes, while the other does not, overall their average fps is going to vary by a small margin.

Abwx · Jan 25, 2015

ocre said:
Your really taking it far to claim that the data will have to go all the way back to the system ram thru the PCIe bus then turn around and come back up to load into the 3.5gb segment. This is total invented and doesnt have to be the case.

I didnt say this although it s a possibility since the last memory chunks bandwith seems close to the PCIe bandwith, but whatever the data in the 768MB pool has to transit by somewhere, and in this respect you didnt provide an answer.

ocre said:
I do expect nvidia to take some sort of action. I think they will let that ram fill up. Perhaps it will be stuffed with Aero and windows junk. See, just the fact that windows does have allocation to some of the ram (at all times) lets me think that ram can be accessed across without having to go back down and up the PCIe bus. If it didnt then you would have SM starving because of the chunk of vram windows takes up.

I dont think that it can be populated by Aero or whatever data that is ready to be processed by the GPU, this secondary pool has no direct access to computation ressources apparently, actualy i think that it s a ghost RAM for the reasons i m pointing.

ocre said:
Nvidia stated that there were fewer crossbars to the segmented 500mb section, not that there was no crossbars. Why are we thinking that none exist at all? It makes perfect sense that each SM is fed by the cache and ram that comes before it. That nvidia can keep each SM saturated most of the time anyway. Nvidia says there are fewer crossbars to move the data across these segments.....The data that is stored in the ram with no SM. That is reason enough to not let apps fill it up unless there is no other option.

They said that there are fewer crossbar ressources, wich is an euphemism, the 980 has not more crossbar ressources, it s just that said 768MB can be adressed by the SMMs that are missing in the 970, in this sense, yes there are more crossbars that are functional...

ocre said:
I believe that most likely there are crossbars that allow data to flow between segments. That each SM has its own ram allocated and the GPU performs its best when the flow is straight through.

That s what i m saying but contrary to you i think that there s no flows between segments, that s why the 768MB partition cant be adressed by the functional SMMs, that s the contrary of an unified memory space, that is, each SMM has a dedicated adress space and cant access the adress spaces of other SMMs.

stahlhart · Jan 26, 2015

FCAT and frame delivery are NOT ON TOPIC IN THIS THREAD ANY LONGER. Drop it, NOW.

Either have the discussion in a separate thread or, better yet, take it to PM, since you've demonstrated yet again that you aren't capable of having a discussion about anything here without turning it into another personal confrontation, and ruining the important and relevant examination of the GTX 970 memory issue that others are trying to have in this thread.

-- stahlhart

Silverforce11 · Jan 26, 2015

Proof. Pretty conclusive actually.
https://www.youtube.com/watch?v=wgRir5JwKyU

That high bus activity when >3.5gb indicates swapping over the PCIE bus. Seems @Abwx may be right, that data needs to go to the bus and return to enter the usable 3.5gb segment. Because that's a very slow process compared to direct vram, its the cause of stutters and min fps drops.

The only good thing is that in games that NV optimize for, if they can store non-required assets (some games may allocate more than *needed*) into the 0.5gb segment, it won't be a mess. But when its not optimized or just can't due to the game engine (as shown in prior examples), frame latency and min fps spikes go bonkers.

And here: PCPER
"UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!"

HurleyBird · Jan 26, 2015

stahlhart said:
FCAT and frame delivery are NOT ON TOPIC IN THIS THREAD ANY LONGER. Drop it, NOW

You mean the "TR and others stopped using FCAT when Maxwell came out" nonsense, right? FCAT and frame delivery at above 3.5GB for the GTX 970 vs. GTX 980 is absolutely on topic, so I'm assuming that's not what you mean.

Grooveriding · Jan 26, 2015

Are any review sites doing some independent testing on this yet ? This thread is massive and I haven't noticed it said anywhere yet. A credible and impartial site needs to develop a methodology and test this properly. It's great that users on the forums discovered and exposed the issue with the 970, but a lot of what I see here and on OCN are just twists of the data trying to play it up or down.

At this point there are just these few comments from nvidia which are worth nothing until an independent site tests it for themselves. If there is a real issue here nvidia is not going to divulge it unless it gets exposed by a 3rd party first. PCPER just seems to be relaying nvidia's PR on it which is not surprising from that site. Hopefully one of the tech sites is actually going to test this and do an article on it.

Just going on nvidia's statements alone is foolish, if the issue is significant enough to the point that the advertised memory bandwidth of the card is lower than stated once this 500GB area is getting used, they won't come out with it until it's exposed by a 3rd party because of possible liabilities.

jj109 · Jan 26, 2015

Silverforce11 said:
Proof. Pretty conclusive actually.
https://www.youtube.com/watch?v=wgRir5JwKyU

That high bus activity when >3.5gb indicates swapping over the PCIE bus. Seems @Abwx may be right, that data needs to go to the bus and return to enter the usable 3.5gb segment. Because that's a very slow process compared to direct vram, its the cause of stutters and min fps drops.

The only good thing is that in games that NV optimize for, if they can store non-required assets (some games may allocate more than *needed*) into the 0.5gb segment, it won't be a mess. But when its not optimized or just can't due to the game engine (as shown in prior examples), frame latency and min fps spikes go bonkers.

And here: PCPER
"UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!"

It's not going over the bus. Someone in OCN just ran a comparison between a pinned copy of 1.5 GB in VRAM against 2 GB in VRAM. The last 500 MB is getting a little over 25 GB/s so that's way too fast for the PCIe bus.

Erenhardt · Jan 26, 2015

Silverforce11 said:
Proof. Pretty conclusive actually.
https://www.youtube.com/watch?v=wgRir5JwKyU

That high bus activity when >3.5gb indicates swapping over the PCIE bus. Seems @Abwx may be right, that data needs to go to the bus and return to enter the usable 3.5gb segment. Because that's a very slow process compared to direct vram, its the cause of stutters and min fps drops.

The only good thing is that in games that NV optimize for, if they can store non-required assets (some games may allocate more than *needed*) into the 0.5gb segment, it won't be a mess. But when its not optimized or just can't due to the game engine (as shown in prior examples), frame latency and min fps spikes go bonkers.

And here: PCPER
"UPDATE 1/26/15 @ 12:10am ET: I now have a lot more information on the technical details of the architecture that cause this issue and more information from NVIDIA to explain it. I spoke with SVP of GPU Engineering Jonah Alben on Sunday night to really dive into the quesitons everyone had. Expect an update here on this page at 10am PT / 1pm ET or so. Bookmark and check back!"

Would't it be faster to get dataz from system RAM to begin with, rather than swaping from unusable 0.5GB via PCIe to system RAM and back again to usable VRAM?

Maybe 970 adress space is 3.5GB VRAM + 0,5GB system RAM, with a VRAM allocation bias.

Final8ty · Jan 26, 2015

https://www.youtube.com/watch?v=bvG12uQiB_E
Noticed that the 980 would use 4GB but the 970 would not go over 3.5GB.

hawtdawg · Jan 26, 2015

The only good way anyones going to see how this effects things, is to play something like Shadow of Mordor with the texture usage on ultra, and to use FRAPS to record FRAMETIMES, and then throw it in FRAFS. FPS benchmarks aren't going to show anything useful.

dacostafilipe · Jan 26, 2015

VRAM swaping (moving frequently mused data from the first memory pool to the second one, like CUDA's unified memory) should be possible here and should not impact performance in a big way.

Still, I wanted to buy a 970GTX this weekend, but now I will wait for those who are going to sell their "broken 970" on ebay

amenx · Jan 26, 2015

Not going to sell mine. I'd still buy one if I was in the market for a card. Even if it was a 3gb card. The 3gb 780ti handled itself pretty well vs its 4gb counterparts, and anything that cripples a 970 vram-wise should also affect a 780ti, but you see no complaints from ti owners.

Atreidin · Jan 26, 2015

Do Nvidia drivers rely on the file name of the .exe for game optimizations?

Maybe this has been discussed already, but has anybody who has a 970 and also sees 4GB use in a game tried renaming the .exe of that game and seeing if RAM usage is then limited to a lower value? I am curious if maybe the game is limited to 3.5GB maximum memory unless the driver specifically allows it due to targeted optimization.

hawtdawg · Jan 26, 2015

amenx said:
Not going to sell mine. I'd still buy one if I was in the market for a card. Even if it was a 3gb card. The 3gb 780ti handled itself pretty well vs its 4gb counterparts, and anything that cripples a 970 vram-wise should also affect a 780ti, but you see no complaints from ti owners.

This is a stupid statement. 780ti owners paid for a 3GB video card. 970 owners paid for a 4GB video card. One set of owners got what they paid for, the other did not. Therefor they should and do have different expectations for the product they purchased.

ShintaiDK · Jan 26, 2015

Atreidin said:
Do Nvidia drivers rely on the file name of the .exe for game optimizations?

Maybe this has been discussed already, but has anybody who has a 970 and also sees 4GB use in a game tried renaming the .exe of that game and seeing if RAM usage is then limited to a lower value? I am curious if maybe the game is limited to 3.5GB maximum memory unless the driver specifically allows it due to targeted optimization.

With DirectX the driver handles the allocation and its aware of it. So it shouldnt matter what game, optimized or not.

With CUDA you are in control.

dacostafilipe · Jan 26, 2015

ShintaiDK said:
With DirectX the driver handles the allocation and its aware of it. So it shouldnt matter what game, optimized or not.

With CUDA you are in control.

Something that crossed my mind:

What about DirectCompute? Does it have the same problem CUDA has? What about games that use a lot of DirectCompute?

amenx · Jan 26, 2015

hawtdawg said:
This is a stupid statement. 780ti owners paid for a 3GB video card. 970 owners paid for a 4GB video card. One set of owners got what they paid for, the other did not. Therefor they should and do have different expectations for the product they purchased.

It would only be a stupid statement if the 780ti did suffer the same issues as the 970 under same settings, circumstances. If the Ti cruised smoothly at conditions the 970 faltered at 3.5gb+ vram, then it is not a vram issue but simply GPU power that is slowing the 970 down. Did you not even consider that possibility ffs?

WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Lifer

Golden Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Lifer

Lifer

Super Moderator Graphics Cards

Lifer

Platinum Member

Diamond Member

Senior member

Diamond Member

Golden Member

Golden Member

Senior member

Diamond Member

Senior member

Golden Member

Lifer

Senior member

Diamond Member