WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

cmdrdredd · Jan 24, 2015

garagisti said:
Wrong. People not only here, but elsewhere are discussing, and have been discussing about how tech works, and not that it just works. More importantly, it may not meet advertising standards of certain countries and may be considered to be fraudulent advertising/ behaviour.

For what it is worth, i agree that gaming performance is what it is, and would remain unaffected for the benchmarks ran as they were at launch. Had this limitation been more public, more investigations into limitations and performance against competition could be evaluated in light of that information. It may also have affected the perceived value of the product. So yes, dodgy behaviour is being reported for that which it is. You don't care, well good for you, but then do not speak for everyone as you just tried to. Several other owners of the cards here and elsewhere, they are rather bemused, and ranging to dismayed.

There's nothing false about it. It is a 4GB card, can use all 4GB, they didn't misrepresent the number of SMM, ROPS, or the capabilities.

You also missed the part where I said forums like this one... so I wasn't talking about just AT. What I was getting at is that none of the major forums had anyone talking about gaming performance issues or stuttering with the 970 and it released back in September. You would think that if there was some major problem someone would have said something long before. We didn't see that.

Also don't get caught up in the hoopla created by new users on Nvidia's forums and people who may not even own a 970 or any Nvidia card calling for the pitchforks. Undoubtedly there's lots of people jumping on this with no personal stake in the situation.

Lastly I was not speaking for anyone whatsoever. I was saying and have always said that nobody said anything about it being a problem and then suddenly out of left field there are these wild claims about the card not being able to use 4GB which was proven false. Then goalposts kept moving until we're now at the point we are in this thread.

Vesku · Jan 24, 2015

cmdrdredd said:
There's nothing false about it. It is a 4GB card, can use all 4GB, they didn't misrepresent the number of SMM, ROPS, or the capabilities.

You also missed the part where I said forums like this one... so I wasn't talking about just AT. What I was getting at is that none of the major forums had anyone talking about gaming performance issues or stuttering with the 970 and it released back in September. You would think that if there was some major problem someone would have said something long before. We didn't see that.

Regular forum posters don't have access to FCAT frame time monitoring. I didn't notice any stuttering with my 7950 but that doesn't mean the FCAT data was a lie back when AMD's drivers weren't as "smooth". People might just chalk it up to the game engine itself since they won't be comparing extremely precise frame time results from multiple GPU SKUs.

Review sites should step up to the plate and do in depth FCAT analysis.

RampantAndroid · Jan 24, 2015

cmdrdredd said:
I have no idea how to do that lol. I know you asked that earlier too.

If you have another PC - even a macbook will work.

If on Win8/8.1, open the start menu and search "remote" - select "Allow remote connections"

If win7 (going from memory here) open file explorer. On the left side, right click my computer and select properties. A window with system info like PC name, processor and such will come up. On the left is the option "Remote Settings" - hit that. Select "Allow remote connections to this PC".

Verify you can remote in now, while on the same network (don't open ports!) by opening an RDP client on your other machine. If you have another windows machine, run the command "mstsc"; if you have a mac, install the Microsoft Remote Client from the app store.

With the remote client open, type in the name of your gaming PC (or the IP address of it) and hit enter. Enter your logon credentials.

If that works, go ahead and disconnect your monitor entirely, reboot and then remote in and run the test.

cmdrdredd · Jan 24, 2015

Vesku said:
Regular forum posters don't have access to FCAT frame time monitoring. I didn't notice any stuttering with my 7950 but that doesn't mean the FCAT data was a lie back when AMD's drivers weren't as "smooth". People might just chalk it up to the game engine itself since they won't be comparing extremely precise frame time results from multiple GPU SKUs.

No but when you can't feel or see it, does it really matter that much? I'd say no and I don't remember specifically my attitude about it when the whole thing blew up a few years ago. I think I largely ignored it because I was using 670s anyway. It's like microstutter that you get with any mGPU configuration, I don't really notice it so I don't call it a problem. Just like some people don't have problems with minor input lag and I am sensitive to that. It's not something that is cut and dry to me where one is bad and another is good. As for game engines, some engines are going to stutter on any configuration at some points. UE3 has always done this for me even way back when it released. Games like Batman and Bioshock that use that engine also exhibit this problem. I used to think it was dropping frames so I tried to change settings and it never really went away no matter what card I was using at the time.

sontin · Jan 24, 2015

Leadbox said:
https://www.youtube.com/watch?v=ZQE6p5r1tYE
This was posted in the pcper article's comments section, probably card specific.

It must be the VRAM and not one of the extra settings he used. :|
The system RAM is increasing from 6GB to 14GB.

NomanA · Jan 24, 2015

Cloudfire777 said:
Told you the benchmark thats been used had no bearing and was a fluke

The benchmark isn't a fluke. What an ignorant thing to say, when Nvidia themselves have admitted about the memory partitioning.

The benchmark basically allocates memory in chunks (128MB default). These all come from VRAM. If you think the system RAM can give you even those reduced bandwidth rates, then you are seriously mistaken. While the CUDA application can't control where the allocated memory resides on the VRAM, the drivers do seem to be giving these chunks in order. For a headless display, the first 25-26 chunks come out of the 3.5GB high region, and the remaining from the other 0.5 GB. Test runs in two ways. In one, it accesses different four bytes of one chunk repeatedly, and in the other, it accesses the same location five times before moving on. The latter way is to test the cache rates.

There is no point denying the fact that access to the second region is very slow. Even the L2 cache access to that region is about half that of regular VRAM access in 3.5GB region. The tool is correct and allocates VRAM (from both regions, it's up to the driver anyway to handle the arbitration, as CUDA memory allocation call is same)

So then the question is how will it affect games, once the game assets are loaded into this lower bandwidth region. It really is a very hard thing to quantify. Drivers tweaked for certain games (or game engines) can be intelligent enough to shuffle the data between the two regions to keep the impact minimum. Remember that even this lower bandwidth is orders of magnitude better than accessing data in system RAM. It's also possible that memory usage which just goes over a little bit into this other region (or if it's a game engine that hasn't been profiled), the driver may decide it's not worth it to do the arbitration logic, and instead limit the VRAM to region 1. It's a classical software resource optimization issue.

It's not ideal, and something that I'd consider a limitation, considering GTX 980M doesn't suffer from it, and it's cut down even further. At the very least, it needs careful management in drivers, unlike 980 or 980M, to reduce the impact.

The bandwidth reduction is real and the tool shows that correctly.

amenx · Jan 24, 2015

Personally I think some credible tech site will do a study of the issue sooner or later. Its generated too much chatter to ignore and would make a good story if proper tests and studies were conducted.

cmdrdredd · Jan 24, 2015

RampantAndroid said:
If you have another PC - even a macbook will work.

If on Win8/8.1, open the start menu and search "remote" - select "Allow remote connections"

If win7 (going from memory here) open file explorer. On the left side, right click my computer and select properties. A window with system info like PC name, processor and such will come up. On the left is the option "Remote Settings" - hit that. Select "Allow remote connections to this PC".

Verify you can remote in now, while on the same network (don't open ports!) by opening an RDP client on your other machine. If you have another windows machine, run the command "mstsc"; if you have a mac, install the Microsoft Remote Client from the app store.

With the remote client open, type in the name of your gaming PC (or the IP address of it) and hit enter. Enter your logon credentials.

If that works, go ahead and disconnect your monitor entirely, reboot and then remote in and run the test.

Ok so I remote in and get rec.exe has stopped working. Nvidia control panel says "Nvidia display settings are unavailable under remote desktop"

I dont think this will work. I tried loading up a game and got an error about DX device unavailable.

RampantAndroid · Jan 24, 2015

cmdrdredd said:
Ok so I remote in and get rec.exe has stopped working. Nvidia control panel says "Nvidia display settings are unavailable under remote desktop"

I dont think this will work. I tried loading up a game and got an error about DX device unavailable.

What is rec.exe?

I expect the control panel to not work - I however also expect a CUDA program to run fine, the GPU is still there, just not as a display device (which is why I think this might be the best way to run this.)

sontin · Jan 24, 2015

NomanA said:
The bandwidth reduction is real and the tool shows that correctly.

No, it doesnt show it. The tool cant write into the second segment of the 4GB unlike Games.

It is flawed in a way that it has only access to the 3.5GB pool.

cmdrdredd · Jan 24, 2015

RampantAndroid said:
What is rec.exe?

I expect the control panel to not work - I however also expect a CUDA program to run fine, the GPU is still there, just not as a display device (which is why I think this might be the best way to run this.)

rec.exe is the cuda benchmark app people have been using. Nai's Benchmark. It opens and says it's allocating memory and you have to "press any key to continue..." when I do it stops working. When I open BOINC Manager and try to run a work unit that uses cuda it tells me the GPU is missing.

Udgnim · Jan 24, 2015

amenx said:
Personally I think some credible tech site will do a study of the issue sooner or later. Its generated too much chatter to ignore and would make a good story if proper tests and studies were conducted.

Tech Report thought something was off with the 970 back in 10/1/2014

http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980

amenx · Jan 24, 2015

Udgnim said:
Tech Report thought something was off with the 970 back in 10/1/2014

http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980

Yes but no mention of the vram issue.

NomanA · Jan 24, 2015

sontin said:
No, it doesnt show it. The tool cant write into the second segment of the 4GB unlike Games.

It is flawed in a way that it has only access to the 3.5GB pool.

It is allocating memory from both regions. Read my post again.

RampantAndroid · Jan 24, 2015

NomanA said:
It is allocating memory from both regions. Read my post again.

From what Virge has said, it should be allocating memory from System RAM...right?

NomanA · Jan 24, 2015

RampantAndroid said:
From what Virge has said, it should be allocating memory from System RAM...right?

No, the drivers do give the applications access to the whole 4GB, otherwise it'd be a much bigger problem.

The first region is preferred, and a draw command limited to 3.5GB will just be treated internally as if there's only 3.5GB available, so the entirety of the rendering will be based on this regular speed region.

This CUDA tool allocates memory in chunks. Some come from regular region. Some from the low speed one. It has no control over it. It's the driver which is arbitrating this logic. And from the results you can see, that it's allocating from the 3.5GB region first.

Cloudfire777 · Jan 24, 2015

So we have gone from full alarm mode, to confirmation from Nvidia that the "issue" only existed in a heavily flawed benchmark that wasnt remotely accurate in accessing and utilizing the VRAM like any real software (gaming, GPGPU etc) would do, to the usual spinning of facts and not able to accept that there wasnt a catastrophe after all which some people asked for.
Thats where we are now?

Nothing unusual in other words?

Oh well, atleast there was something semi interesting that came out of this, and that was how Nvidia devided two memory banks when disabling some of the GM204 chip.

NomanA · Jan 24, 2015

Cloudfire777 said:
So we have gone from full alarm mode, to confirmation from Nvidia that the "issue" only existed in a heavily flawed benchmark that wasnt remotely accurate in accessing and testing the VRAM like any real software (gaming, GPGPU etc) would do, to the usual spinning of facts and not able to accept that there wasnt a catastrophe after all which some people asked for.
Thats where we are now?

Internet drama as usual.

The benchmark isn't flawed at all. You can spin it anyway you want but the fact remains that it shows you the difference in bandwidth between the two regions (second region being eight times slower). The drivers will always have to manage VRAM usage for 970 carefully, as the access to second region is quite crippled. By the way, I have a GTX 970 and am perfectly happy with it.

ocre · Jan 24, 2015

Udgnim said:
Tech Report thought something was off with the 970 back in 10/1/2014

http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980

There are parts cut, the gtx970 is cut down. Its performance is a result of how they cut down the chip. Most everyone bought a gtx970 knowing it wasn't a full gm204 and knowing it wasn't as fast as the 980. This color test exploits the way it has been cut down and shows the 970 has a huge disadvantage in this synthetic test. It has always been the case.

Luckily and perhaps by design, this disadvantage in color fill doesn't show up in games so much. But it doesn't change the fact that the 970 is cut down. There could be more ways to exploit its disadvantage and surely people could write cuda apps that could do just that.

But the original claim that the the 970 isn't a 3gb card is false. It can and there are plenty of examples now showing performance doesn't tank.

Even though the original claim has been busted, there is still an effort to try to make this some terrible thing. We can see in this color test that the way the 970 has been cut down can cause interesting things to happen in specific synthetic scenario but we also so the mountains of reviews on how the card performs in games. It's either not that much of a penalty in the real world or everyone lied and made up fake benchmarks.

I have run over 3.5gb with DSR and haven't noticed anything crazy going on.. To be honest, my gtx 970 performs exactly to my expectations which were based on the mountains of reviews I looked at before my purchase.

Maybe people didn't really know that the 970 was cut down....maybe they didn't see the 3d mark color test, idk. But the claims of the 970 not being able to use 4gb were false and there is no an effort to try to make this into something else completely different.

Cloudfire777 · Jan 24, 2015

NomanA said:
The benchmark isn't flawed at all. You can spin it anyway you want but the fact remains that it shows you the difference in bandwidth between the two regions (second region being eight times slower). The drivers will always have to manage VRAM usage for 970 carefully, as the access to second region is quite crippled. By the way, I have a GTX 970 and am perfectly happy with it.

There is a 3% drop in performance from less than 3.5GB to more than 3.5GB VRAM usage which is neglible, and could very well be from increased textures and the fact that 970 have less TMU`s or was just a coincidence. Its certainly within margin of error

Its certainly isnt a DOOM situation where its 22GB/s bandwidth that drags the entire card down with it like many had hoped for because the benchmark showed a massive drop in bandwidth. And why did it show 22GB/s? Because the benchmark could only access 3.5GB and used the system RAM for the remaining 500MB. 1600MHz DDR3 have a bandwidth of 22GB/s. Flawed benchmark.

Its a non issue and it really doesnt exist

sontin · Jan 24, 2015

The driver doesnt use the second pool. The performance is as bad as is it with a GTX980 and windows overhead. There is no free vram available so it is transfered to the system memory.

I posted GTX980 numbers from MSI Kombuster at Guru3d:
http://forums.guru3d.com/showpost.php?p=4992814&postcount=173

It reflects what nVidia said.

flexy · Jan 24, 2015

sontin said:
No, it doesnt show it. The tool cant write into the second segment of the 4GB unlike Games.

It is flawed in a way that it has only access to the 3.5GB pool.

Well..yes it "has only access to 3.5G". The core issue we're talking about here, isn't it?

cmdrdredd · Jan 24, 2015

NomanA said:
The benchmark isn't flawed at all. You can spin it anyway you want but the fact remains that it shows you the difference in bandwidth between the two regions (second region being eight times slower). The drivers will always have to manage VRAM usage for 970 carefully, as the access to second region is quite crippled. By the way, I have a GTX 970 and am perfectly happy with it.

How do you know this for a fact though? Do you have some inside info that cuda functions properly on the 970? That may very well be the sole problem, but I don't know. I'm wondering how you know.

garagisti · Jan 24, 2015

amenx said:
Personally I think some credible tech site will do a study of the issue sooner or later. Its generated too much chatter to ignore and would make a good story if proper tests and studies were conducted.

Like they did of bumpgate? IIRC, only Charlie was speaking about it, and he was being flamed left, right and center. I wouldn't be surprised to learn that some Charlie shaped effigies were burned, and Pinatas were smashed. Even now, whenever it comes up, you could see someone suggest that it was not that big a deal.

DiogoDX · Jan 24, 2015

So was realy because of the SMMs cut.

The statment confirms that Nvidia knew all the time as they write the driver to priorize the 3.5GB but hide this information of the customers.:thumbsdown:

WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Member

Diamond Member

Member

Golden Member

Member

Golden Member

Golden Member

Diamond Member

Diamond Member

Lifer

Senior member

Senior member