WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

ViRGE · Jan 24, 2015

RampantAndroid said:
But given that VRAM usage DOES show the same results we're seeing...this makes me suspicious. Can anyone comment on what section of VRAM Windows would use first? Would it allocate off the bottom of the VRAM stack?

VRAM allocation at the application level is virtualized. The GPU drivers will give Windows (or any other application) its own memory space, and then allocate physical RAM based on their own algorithms. So Windows can (and does) end up anywhere.

The reason that the program in question always shows memory bandwidth falling near the end is because it's filling up its memory allocation chunk by chunk. It has to fill the 3GB+ before the physical VRAM is maxed out and spills over to system RAM.

jj109 · Jan 24, 2015

This benchmark is adding ten ones to a large contiguous array of 4x1 floating point vectors which are initialized to 0. This benchmark is SMM bottlenecked until the GTX 970 hits the upper quarter of its address space.

The GB/s designation from the benchmark should really be GFLOPS. The GTX 970 gets ~3.75 GFLOPS until the upper quarter, and the GTX 980 gets ~4.8 GFLOPS. This is close enough to the theoretical maximum that we can conclude that it's shader bottlenecked.

The best way to figure out what's going is to look at a CUDA profiler to see why the addition is taking so much more time for the upper GB of the address space. I also don't have a GTX 970 so if anyone of you want to install the CUDA toolkit and run a profiler on the benchmark, that'd be great. Either way I think we'll get an answer soon since some poor CUDA dev at Nvidia is probably working weekends because of this :hmm:

Cloudfire777 · Jan 24, 2015

A) People don`t know what this benchmark does. Does it test the VRAM like a game would do? Can it overload the bus? Why is memory bandwidth lower than what the cards (not just 970) should have in the early stages of the test? Why does the L2 cache suffer as well when its not part of the VRAM and memory bus at all and is located on the die isolated. Ton of unknowns here.

B) No reviewers found anything wrong with GTX 970 when testing the card on various resolutions that would easily go past 3GB usage. We are talking many many reviews. Why didnt they notice anything? They would undoubtly do that if the bandwidth goes down to 22GB/s.

C) No users have experienced any problems gaming with GTX 970. Until this rumor started and people started LOOKING for it. With a benchmark they don`t know anything about.

D) GTX 770, GTX Titan, GTX 980, GTX 970, many cards, not just GTX 970, get bad "results" with this benchmark. Which means many Nvidia cards are broken?
Right....

RampantAndroid · Jan 24, 2015

ViRGE said:
VRAM allocation at the application level is virtualized. The GPU drivers will give Windows (or any other application) its own memory space, and then allocate physical RAM based on their own algorithms. So Windows can (and does) end up anywhere.

The reason that the program in question always shows memory bandwidth falling near the end is because it's filling up its memory allocation chunk by chunk. It has to fill the 3GB+ before the physical VRAM is maxed out and spills over to system RAM.

So are you suggesting the slowdown is due to system memory being allocated as virtual VRAM (and hence the slowdown)? Is there a tool from Nvidia (or anyone) to view the current VRAM that has been allocated?

Abwx · Jan 24, 2015

Cloudfire777 said:
B) No reviewers found anything wrong with GTX 970 when testing the card on various resolutions that would easily go past 3GB usage. We are talking many many reviews. Why didnt they notice anything? They would undoubtly do that if the bandwidth goes down to 22GB/s.

And yet they knew but kept silent in the waiting of Nvidia answer, once someone posted about the issue at hardware.fr the CPU reviewer of this site stated that he knew about it and that it s some time that he asked their GPU reviewer to check the thing with Nvidia, so far they got no answer...

Cloudfire777 · Jan 24, 2015

Abwx said:
And yet they knew but kept silent in the waiting of Nvidia answer, once someone posted about the issue at hardware.fr the CPU reviewer of this site stated that he knew about it and that it s some time that he asked their GPU reviewer to check the thing with Nvidia, so far they got no answer...

That would show up on their review regardless but it didnt. Bandwidth down to 22GB/s would mean a massive drop in performance

Erenhardt · Jan 24, 2015

Cloudfire777 said:
That would show up on their review regardless but it didnt. Bandwidth down to 22GB/s would mean a massive drop in performance

Not if you stick to the guidelines for reviewing the card

ViRGE · Jan 24, 2015

RampantAndroid said:
So are you suggesting the slowdown is due to system memory being allocated as virtual VRAM (and hence the slowdown)? Is there a tool from Nvidia (or anyone) to view the current VRAM that has been allocated?

Correct. If this tool is trying to allocate 4GB but only 3.5GB of physical VRAM is available for any given reason, then the last 512MB would need to spill out. It's clearly not blocked by physical VRAM size, as otherwise the program would hard fail on cards smaller than 4GB*.

* There are ways in CUDA to disallow a program from spilling over. NVIDIA's BandwidthTest sample does this, for example

Cloudfire777 · Jan 24, 2015

Typical internet bandwagon sheep mentality. Here:

http://www.reddit.com/r/pcmasterrace/comments/2tfybe/investigating_the_970_vram_issue/

RaulF · Jan 24, 2015

Cloudfire777 said:
That would show up on their review regardless but it didnt. Bandwidth down to 22GB/s would mean a massive drop in performance

I would not bet on it.

You know there are plenty of reviews out there that will put a product (either game or hardware) on a pedestal. But when the end user gets their hands on it is far from what the review stated. And most of the time they are just testing FPS numbers and image quality. Very rarely will they cover instability.

ShintaiDK · Jan 24, 2015

Remember on a normal GTX980, running in normal mode. Block 27-29 also "fails".

A fresh run here on GTX980 using 347.09 driver:

Cloudfire777 · Jan 24, 2015

ShintaiDK said:
Remember on a normal GTX980, running in normal mode. Block 27-29 also "fails".

A fresh run here on GTX980 using 347.09 driver:

Yep, the benchmark is a fluke.

Why are you not using 347.25 WHQL btw?

caswow · Jan 24, 2015

if my gpu says 4gb 255gb/s then i want it all the way up to 4 gb to be 255gb/s. of course things have to be proven right.

psolord · Jan 24, 2015

Can I use this test on my 570s to see if it works correctly, if it hasn't been done already?

What must I download?

Deders · Jan 24, 2015

Has anyone tried the 'net stop Themes' command to disable Aero completely? I did and on my 780 it gave me a complete run of over 333+ GB/s whereas before the last 2 chunks were massively limited.

net start Themes will get Aero back. This way free's up much more memory than simply disabling Glass.

You can either make a batch file for each command or run it in CMD prompt.

ShintaiDK · Jan 24, 2015

psolord said:
Can I use this test on my 570s to see if it works correctly, if it hasn't been done already?

What must I download?

It wont tell you in any way if it works correctly or not. Thats the entire problem running this benchmark for confirmation.

Instructions are here:
http://forums.anandtech.com/showpost.php?p=37107343&postcount=134

cmdrdredd · Jan 24, 2015

Erenhardt said:
Not if you stick to the guidelines for reviewing the card

Geeze people remove the tinfoil hat. If there was memory bandwidth dropping like a rock during gaming the fps results would clearly show a problem. Reviewers didn't hide the numbers like you're suggesting. Hell one site did framepacing tests (I forget which when I was browsing around) and noted that in SLI the 970 actually was smoother than other single cards at times. If there was a memory issue that wouldn't happen either because you'd have fps drops out the wazoo.

ShintaiDK · Jan 24, 2015

Cloudfire777 said:
Yep, the benchmark is a fluke.

Why are you not using 347.25 WHQL btw?

Upgraded just now, same results.

I do manual upgrades, not always right away if there is nothing new in the release notes I "must have".

mikk · Jan 24, 2015

ShintaiDK said:
Remember on a normal GTX980, running in normal mode. Block 27-29 also "fails".

A fresh run here on GTX980 using 347.09 driver:

Your ignorance is annoying. If you don't get full speed over the entire VRAM on a GTX 980 then you tested it wrong. Use you iGPU For goodness' sake and stop your nonsense. GTX 980 is not affected, my goodness...

Warning issued for personal attack.
-- stahlhart

ShintaiDK · Jan 24, 2015

mikk said:
Your ignorance is annoying. If you don't get full speed over the entire VRAM on a GTX 980 then you tested it wrong. Use you iGPU For goodness' sake and stop your nonsense. GTX 980 is not affected, my goodness...

Take your bad attitude somewhere else.

Even with IGP I can still get up to 2 blocks "failed" depending on the testrun.

You can try to run the process explorer using the GPU graph on the rec.exe process while you run it.

Cloudfire777 · Jan 24, 2015

ShintaiDK said:
Upgraded just now, same results.

I do manual upgrades, not always right away if there is nothing new in the release notes I "must have".

You have to do upgrade again.
347.26 was just leaked

Adds support for GTX 960 and Call of Duty Online
http://forums.laptopvideo2go.com/topic/31320-nvidia-icafe-geforce-34726-released/

No point testing with the benchmark though. It doesnt matter what driver you use if the benchmark is broken/invalid

stahlhart · Jan 24, 2015

Keep the debate in this thread civil. Agree to disagree otherwise.
-- stahlhart

Spanners · Jan 24, 2015

Cloudfire777 said:
A) People don`t know what this benchmark does. Does it test the VRAM like a game would do? Can it overload the bus? Why is memory bandwidth lower than what the cards (not just 970) should have in the early stages of the test? Why does the L2 cache suffer as well when its not part of the VRAM and memory bus at all and is located on the die isolated. Ton of unknowns here.

B) No reviewers found anything wrong with GTX 970 when testing the card on various resolutions that would easily go past 3GB usage. We are talking many many reviews. Why didnt they notice anything? They would undoubtly do that if the bandwidth goes down to 22GB/s.

C) No users have experienced any problems gaming with GTX 970. Until this rumor started and people started LOOKING for it. With a benchmark they don`t know anything about.

D) GTX 770, GTX Titan, GTX 980, GTX 970, many cards, not just GTX 970, get bad "results" with this benchmark. Which means many Nvidia cards are broken?
Right....

A) The post above yours explains what the benchmark does pretty concisely. The source code is here.

B) The bandwidth drop off is around 3300MiB which is roughly 3.5GB. Maybe the drivers are allocating ram differently for the 970. It would explain why people were seeing different usage at the same settings/resolution vs the 980. Maybe it has little real-world consequences besides some very limited scenarios.

C) I've seen a number of posts in the Nvidia forums stating that stuttering occurs at higher ram usage. Can't say if these are accurate or related though.

D) People have been running this benchmark incorrectly, it's easy to create a false result. I've yet to see a single 970 the doesn't exhibit this behavior though. If the benchmark was that erratic surely we'd see some results that were "good" for the 970? Nobody said the cards were broken.

Lots of potential outcomes from the extremely mundane to some juicy drama.

Enigmoid · Jan 24, 2015

DiogoDX said:
Especulation:

GCN decoupled the ROPs of the memory controller but Nvidia din't make that change.

970 can't use all 64 ROP perf eficiently because of the cut SMMs so the same occurs with the memory.

http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980

I posted this in the beginning of the thread.

sontin · Jan 24, 2015

And it has nothing to do with the problem.

A GTX980m can only process 48 pixel/clock, less than a GTX970. However in the Cuda benchmark all memory partitions are available at full speed.

WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Elite Member, Moderator Emeritus

Senior member

Golden Member

Diamond Member

Lifer

Golden Member

Diamond Member

Elite Member, Moderator Emeritus

Golden Member

Senior member

Lifer

Golden Member

Senior member

Platinum Member

Platinum Member

Lifer

Lifer

Lifer

Diamond Member

Lifer

Golden Member

Super Moderator Graphics Cards

Senior member

Platinum Member

Diamond Member