WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

TechFan1

Member
Sep 7, 2013
97
3
71
So the OP was right and Nvidia admitted their issue.

http://www.lazygamer.net/general-news/nvidias-gtx970-has-a-rather-serious-memory-allocation-bug/



It's funny that everyone accused the OP while what is happenning is really an issue on every GTX 970.

If you follow the source in that article, the author improperly read the forum post. The Nvidia moderator was quoting a post and only said, "We are still looking into this and will have an update as soon as possible."
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
146
106
Its funny that the lazygamer article even shows the exact same numbers for a GTX970 as a regular GTX980 gives. And the author shows that the claims are bogus. Unless we are now talking about above 3.6GB for GTX970 with a new goalpost shift ;)

GTX-970-Memory-Flaw.png


As you can see, the bencher even got a browser running. Something that quickly makes sure you will never reach full speed in all tests in the bench.
 
Last edited:

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Can somebody with a GTX970 use the first Maxwell driver 344.11 and test it again?
In the overclock.net forum is a guy who posted a picture with the driver and he has no problem...
 

NomanA

Member
May 15, 2014
134
46
101
The sample size of 970s is not large enough.

And yes there are false positives on the titan and 980.

Look at post #131 by ShintaiDK

We have covered his 980 tests before. They don't prove or disprove anything.

100% of 970s have at least 6 128MiBytes chunks that are bandwidth starved. That's out of dozens of tests, posted on various sites. If the test is run properly (headless DGPU), then 100% of these tests show that the bandwidth gets reduced to eighth at around 3200-3328 MiByte. If you are not running the test properly, the wall comes sooner.

100% of 980 tests run properly, show that 9800 isn't affected by this issue at all.

The way the test, can be affected by VRAM usage, you can get false positives (positive being detecting the issue), but you can't get false negative. That's why there's no 970 test result so far, which shows normal bandwidth usage over the 4GB range. The problem is real, so let's move past this debate about whether or not it's made up or getting misrepresented.

The next, and more interesting questions are about finding out why this issue is happening on 970, and not 980M. What is the impact on real world gaming performance? How do the gaming frame-time variances compare between 980, 980M and 970, when the VRAM usage is 3700MB?
 

NomanA

Member
May 15, 2014
134
46
101
Can somebody with a GTX970 use the first Maxwell driver 344.11 and test it again?
In the overclock.net forum is a guy who posted a picture with the driver and he has no problem...

He has the problem. Look at his results again. Both VRAM and L2 bandwidths are getting hit for at least 512MByte of chunks.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Nothing strange is going on with the 970 in this 4K Ultra Shadows of Mordor test.

http://www.pcper.com/reviews/Graphi...rmance-Testing/4K-Testing-and-Closing-Thought

However, there is the strange thing where all the sites that were using FCAT to better champion smoothness for gamers have suddenly started using FRAPS instead with the launch of the 970 and 980. Very suspicious and annoying but more likely due to AMD actually being "smoother" atm at least in multi-GPU.

To really dig into this would require running Shadows of Mordor Ultra 4K tests with FCAT while observing VRAM usage. Possibly other Ultra texture games known to use up 3.8GB+ VRAM. Perhaps even take up ~500MB of 970 VRAM by using another program and running the tests again. This would be to ensure no driver trickery was used to disguise a memory weakness, i.e. not actually running the settings properly.
 

Spanners

Senior member
Mar 16, 2014
325
1
0
I showed you a GTX980 test that did both. In this thread a person also showed perfectly fine 4GB usage in Far Cry 4.

But its all dismissed isnt it?

Everyone already understands it's possible to run this test incorrectly on a 980 and to run it correctly thereby getting both results. What would be interesting would be a 970 result with no bandwidth drop off. I haven't seen that yet.

Nobody is currently saying that the 970 doesn't have the ability to allocate 4GB, the Cuda test/Far Cry proves it can, just that the performance drops off hugely after a certain point. I know this may seem like shifting goalposts but that's what happens when the knowledge about an issue evolves. If, and it's still a big if I acknowledge, this turns out to be a hardware issue (or one at all) then having massively less bandwidth for that portion of the memory is not far away from not having it at all.

The allocation and the bandwidth performance issues must be tied together though. The reason people started investigating this in the first place was the discrepancy between the 970 and 980 in memory usage in the same games with the same settings. It makes sense if Nvidia was aware of the reduced bandwidth after 3300MiB that the drivers would allocate memory differently for a 970 than a 980.

Not drawing any conclusions yet, want to make that clear. Seems like it's worth looking into and Nvidia are.
 
Last edited:
Feb 19, 2009
10,457
10
76
NV is looking into it, just wait and see rather than throwing your weight behind "no problems, you guys are full of crap" or "there's a problem and you shills are trying to hide it" camp.

For years people on forums have said NV GPUs had worse IQ across all games, but it turns out to be a problem with their hdmi -> pc monitor implementation in the drivers. So they fixed it recently.
 
Feb 19, 2009
10,457
10
76
Headless too? (as in, disconnect the monitor, reboot and remote in to the machine using RDP. That should effectively stop Windows from loading into the VRAM?)

You can run windows off the iGPU on Intel.

I do it when I used to mine bitcoins, because running windows reduce mining output by ~20%.
 

mikk

Diamond Member
May 15, 2012
4,327
2,408
136
Headless too? (as in, disconnect the monitor, reboot and remote in to the machine using RDP. That should effectively stop Windows from loading into the VRAM?)


While using my iGPU of course. The GTX 970/980 screen is coming from me, GTX 970 bench was headless there.
 

mikk

Diamond Member
May 15, 2012
4,327
2,408
136
This is interesting. Now Chunk 25 is running in full speed. Best GTX 970 result so far. I'm going to install 347.25 now and will do a re-test with this driver.


mieftp5g.png
 

DiogoDX

Senior member
Oct 11, 2012
757
336
136
Especulation:

GCN decoupled the ROPs of the memory controller but Nvidia din't make that change.

970 can't use all 64 ROP perf eficiently because of the cut SMMs so the same occurs with the memory.


3dm-color.gif


Despite having superior or equal numbers on paper, the Asus Strix 970 couldn't come close to matching the GTX 980's delivered pixel throughput. I promptly raised an eyebrow upon seeing these results, but I didn't have time to investigate the issue any further.

Then, last week, an email hit my inbox from Damien Triolet at Hardware.fr, one of the best GPU reviewers in the business. He offered a clear and concise explanation for these results—and in the process, he politely pointed out why our numbers for GPU fill rates have essentially been wrong for a while. Damien graciously agreed to let me publish his explanation:

For a while, I've thought I should drop you an email about some pixel fillrate numbers you use in the peak rates tables for GPUs. Actually, most people got those numbers wrong as Nvidia is not crystal clear about those kind of details unless you ask very specifically.

The pixel fillrate can be linked to the number of ROPs for some GPUs, but it’s been limited elsewhere for years for many Nvidia GPUs. Basically there are 3 levels that might have a say at what the peak fillrate is :

The number of rasterizers
The number of SMs
The number of ROPs
On both Kepler and Maxwell each SM appears to use a 128-bit datapath to transfer pixels color data to the ROPs. Those appears to be converted from FP32 to the actual pixel format before being transferred to the ROPs. With classic INT8 rendering (32-bit per pixel) it means each SM has a throughput of 4 pixels/clock. With HDR FP16 (64-bit per pixel), each SM has a throughput of 2 pixels/clock.

On Kepler each rasterizer can output up to 8 pixels/clock. With Maxwell, the rate goes up to 16 pixels/clock (at least with the currently released Maxwell GPUs).

So the actual pixels/cycle peak rate when you look at all the limits (rasterizers/SMs/ROPs) would be :

GTX 750 : 16/16/16
GTX 750 Ti : 16/20/16
GTX 760 : 32/24/32 or 24/24/32 (as there are 2 die configuration options)
GTX 770 : 32/32/32
GTX 780 : 40/48/48 or 32/48/48 (as there are 2 die configuration options)
GTX 780 Ti : 40/60/48
GTX 970 : 64/52/64
GTX 980 : 64/64/64

Extra ROPs are still useful to get better efficiency with MSAA and so. But they don’t participate in the peak pixel fillrate.

That’s in part what explains the significant fillrate delta between the GTX 980 and the GTX 970 (as you measured it in 3DMark Vantage). There is another reason which seem to be that unevenly configured GPCs are less efficient with huge triangles splitting (as it’s usually the case with fillrate tests).
http://techreport.com/blog/27143/here-another-reason-the-geforce-gtx-970-is-slower-than-the-gtx-980
 
Last edited:

mikk

Diamond Member
May 15, 2012
4,327
2,408
136
b4psnqxh.png



344.11 is indeed faster in Chunker 25. Tested while iGPU is in use.
 

NomanA

Member
May 15, 2014
134
46
101
Especulation:

GCN decoupled the ROPs of the memory controller but Nvidia din't make that change.

970 can't use all 64 ROP perf eficiently because of the cut SMMs so the same occurs with the memory.

980M has even fewer SMMs and shows no problem accessing all of 4GB.
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Next is the typical, "It doesn't matter" responses.
It's not a life or death matter. But it is something that NVIDIA will need to respond to in a timely manner. In the meantime unless anyone figures out more about what's going on, everything else is pretty much hot air.
 

amenx

Diamond Member
Dec 17, 2004
4,613
2,930
136
My results.

1. Headless / IGP mode:

nai-ipg.jpg


2. running Metro 2033 in background:

26_chunk.jpg


Gained extra 2 chunks. edit: scratch that, it peters out @ chunks 17, 18, 19 before recovering.
 
Last edited:

RampantAndroid

Diamond Member
Jun 27, 2004
6,591
3
81
Just for the sake of the control group, could someone run this ACTUALLY headless? As in, NO monitor hooked up, RDP session in to the PC to run the tool? VRAM being in use is showing the same slowdowns, so I would like to see the RDP session as well as someone with SLI running the tool on the second GPU.
 

amenx

Diamond Member
Dec 17, 2004
4,613
2,930
136
Just for the sake of the control group, could someone run this ACTUALLY headless? As in, NO monitor hooked up, RDP session in to the PC to run the tool? VRAM being in use is showing the same slowdowns, so I would like to see the RDP session as well as someone with SLI running the tool on the second GPU.
Running in IGP mode is actually 'headless' as far as the main GPU is concerned. It would make no difference imo if you connected to it in RDP mode.
 

RampantAndroid

Diamond Member
Jun 27, 2004
6,591
3
81
Running in IGP mode is actually 'headless' as far as the main GPU is concerned. It would make no difference imo if you connected to it in RDP mode.

On the off chance Windows is for some reason shoving a bunch of stuff into the VRAM...why not try?

I worked (I guess I kinda still do) in software testing for a long time. It's in my nature to think of various scenarios to try. Worst case, you run this and get the same results...

But given that VRAM usage DOES show the same results we're seeing...this makes me suspicious. Can anyone comment on what section of VRAM Windows would use first? Would it allocate off the bottom of the VRAM stack?
 
Status
Not open for further replies.