WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

NomanA

Member
May 15, 2014
134
46
101
Check the link above, there are ROPs that are useless anyway on the 980m since there s a full cluster disabled, the real number of ROPs is 48 if we count only thoses that are functional.

Well, for 980M the ROPs are still 64, and all eight connections to crossbar exist. You'll get the same VRAM bandwidth on the whole of 4GB of VRAM, as shown by the CUDA test tool too. Functionally, of course having 12 SMMs limit the output, but 12SMM/64ROP is still better than 12SMM/48ROP.
 

Head1985

Golden Member
Jul 8, 2014
1,867
699
136
From ryan analysis
This in turn is why the 224GB/sec memory bandwidth number for the GTX 970 is technically correct and yet still not entirely useful as we move past the memory controllers, as it is not possible to actually get that much bandwidth at once on the read side. GTX 970 can read the 3.5GB segment at 196GB/sec (7GHz * 7 ports * 32-bits), or it can read the 512MB segment at 28GB/sec, but not both at once; it is a true XOR situation. Furthermore because the 512MB segment cannot be read at the same time as the 3.5GB segment, reading this segment blocks accessing the 3.5GB segment for that cycle, further reducing the effective memory bandwidth of the card. The larger the percentage of the time the crossbar is reading the 512MB segment, the lower the effective memory bandwidth from the 3.5GB segment.
GTX970=224bit 3.5GB and 32bit 512MB card.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From Ryan Smith's Anandtech analysis. Guess Nvidia implied they do driver level management of segmented memory.
 

Abwx

Lifer
Apr 2, 2011
11,885
4,873
136
Well, for 980M the ROPs are still 64, and all eight connections to crossbar exist. You'll get the same VRAM bandwidth on the whole of 4GB of VRAM, as shown by the CUDA test tool too. Functionally, of course having 12 SMMs limit the output, but 12SMM/64ROP is still better than 12SMM/48ROP.

Yes the MC is still functional but the data cant be transfered using the crossbar since it lacks the relevant links, otherwise they would have done it in the 970, in this latter case, according to their diagram, the datas that come from an adress that has no functional SMMs is fused with the datas of another MC at the output of said MCs, that is at the L2 cache level, before the crossbar.

You ll notice that the relevant SMMs caches will be used for two flows while in a 980 they are used for a single flow since each flow has its dedicated SMMs.

That said i m not convinced by Nvidia technical explanations since they are cautious to not disclose infos that would shed some light in their uarch, what is sure is that they are not telling the truth and are still hiding a lot of details, prove is that Ryan Smith was told this :

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980
So Nvidia is just pretending that they are in the same case as a user that bought this card, they dont know how it works exactly, the driver was surely subcontracted...
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Does anyone think this will have actual market impact?

It seems like it will all be forgotten with the next release.

It's probably shenanigans like this that keep AMD afloat. nVidia's dodgy business model is the only reason I won't buy an nVidia product. I will bet that this episode will sway a few others to not buy nVidia out of principle. Overall it's not going to sway the market over to AMD all on it's own, but it's not a good thing for nVidia.
 

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
Why on earth would they? The only legal recourse that would conceivably make sense in this situation would be a class action on behalf of consumers who purchased GTX 970 graphics cards.

They'll all get nothing, but waste time and money over it. This is all a bunch of crap to me. I regularly see above 3.5GB in use during games and am not hit with performance penalties that make the game experience unplayable. So I don't give a crap how it works, it's working.
 

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
They'll all get nothing, but waste time and money over it. This is all a bunch of crap to me. I regularly see above 3.5GB in use during games and am not hit with performance penalties that make the game experience unplayable. So I don't give a crap how it works, it's working.

For now, it's apparently partially driver managed, which you were in some doubts of previously, so well informed 970 owners will be trusting in Nvidia to keep the partitioned memory penalty low in future games. That is now that the information is out there for them to know about it.
 

Final8ty

Golden Member
Jun 13, 2007
1,172
13
81
AMD and Nvidia advertise the R9 295 and Titan Z as 8GB and 12GB cards, when that really isn't the case in how the memory is used.
The thing is we know how the memory is used and we knew from the get go with multi GPU, yes there will always be some people who dont know but that goes for anything.
 
Last edited:

SPBHM

Diamond Member
Sep 12, 2012
5,066
418
126
those 512MB are looking pretty useless at this point...
this + lower ROP/L2 count... that's pretty poor by Nvidia, on what is a pretty awesome card overall...

they should offer the option of a refund for all the 970 owners imo.
but apparently they don't care, because the card was and still is a good option.
 
Feb 19, 2009
10,457
10
76
So everything that some of us here speculated on is EXACTLY right. That 0.5GB is crippled in bandwidth, (now we know its actually routing through the neighbor SM), if they are in use it has to share the bandwidth and hence runs at 1/7th speeds.

We're also correct that NV is FULLY AWARE of the situation because their drivers were designed to take advantage of the 3.5gb in most scenarios and would try to enforce it. Where it can't, it tries to store non-frequently accessed data into the 0.5gb segment. Where it can't do that, in game engines that aren't compatible to that kind of loading, you WILL get stutters and major frame rate spikes. What people are saying is true all along. But its game & driver "aware" dependent. As more games come out that push the vram limit, this problem is going to escalate, particularly for SLI owners who enjoy maxing out games.

They had 4 months since launch to disclose it, they choose not to. They KNEW about it from the start since thats how 970s behave due to drivers, trying to stay under 3.5gb all that time. They were hoping people wouldn't notice. You can't really escape geeks on the internet, not sure what they were (weren't) thinking.
 

dacostafilipe

Senior member
Oct 10, 2013
805
309
136
Yes the MC is still functional but the data cant be transfered using the crossbar since it lacks the relevant links, otherwise they would have done it in the 970, in this latter case, according to their diagram, the datas that come from an adress that has no functional SMMs is fused with the datas of another MC at the output of said MCs, that is at the L2 cache level, before the crossbar.

There's no real relation between both. NomanA seems to be right on this one.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
They'll all get nothing, but waste time and money over it. This is all a bunch of crap to me. I regularly see above 3.5GB in use during games and am not hit with performance penalties that make the game experience unplayable. So I don't give a crap how it works, it's working.


But it could work better. Nvidia was pretty liberal with their labeling this a 224GB/sec card. For how many months now have people thought they had a 64 ROP card? The performance still is what it is, it isn't like the card suddenly got slower with this new information. But, the card has been mislabeled in my opinion. As I said earlier, I just don't believe for a second that up until now no one at Nvidia has noticed every review and tech site has said this is a GPU with 64 ROPS.

If you're happy with the card, then that's great. But others do have a legit reason to be a somewhat dissatisfied, if you ask me. And I have to wonder how much performance above ~3.5GB of memory usage depends on good driver support for this card.
 

SimianR

Senior member
Mar 10, 2011
609
16
81
Question: How would this memory allocation affect a manufacturer that wanted to release an 8GB 970 similar to what Sapphire did with the 290X.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
Did you read all of it? That is how it would be if they implemented it THAT way, but that is precisely what they averted and chose to implement it differently.

But there is no other way if you need more than 3.5GB.

GM204_arch_575px.jpg


Let's say a gpu randers a frame. It's high resolution gaming with next gen games run on master race settings (easily double vram requirements of consoles which have access to 6GB) and takes whole 4GB.

To make this frame GPU needs data from all DRAM chips. Well, maybe not all, but lets say it NEEDS data from chip 7 and 8 aswell as data from some of the other chips.

For the sake of argument, lets say it takes the same amount of data from each DRAM, processes it equal amount of time - one clock cycle.

You have 1 clock cycle to take data from chip 1-2-3-4-5-6 and 7.

Well your frame can not be finished as you need data from DRAM #8.

So you have your second clock cycle going just to get data from #8.

If you can't fit it all into L2 in one go, there is no other way but to waste 1 full clock cycle - 50% utilization. This is of course the worst case scenario.

The best case scenario is you keep everything in 3.5GB@224bit bus and 970 performs as it did in release reviews.

This card will have worse future than kepler had. Kepler fall behind without a design flaw. Guess what is going to happen when maxwell successor comes out.

PS. Yes, I foresee games using 6+GB of VRAM soon. Just like my GTX8800 320MB suddenly became obsolete with 30% more VRAM than the console.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
The thing is we know how the memory is used and we knew from the get go with multi GPU, yes there will always be some people who dont know but that goes for anything.


Yes, of course. AMD and Nvidia have never hidden that information. I was just making the point that they've always taken a bit of 'license' with what they advertise. But, I think Nvidia was being a sort of deceptive here.
 
Last edited:

Genx87

Lifer
Apr 8, 2002
41,091
513
126
Tsk tsk. I suspect that somebody did screw up in the marketing department. But somebody or department within Nvidia needs to catch that even if it goes out the door and correct it after the fact. The internet will figure this stuff out given enough time.
 

Spanners

Senior member
Mar 16, 2014
325
1
0
Question: How would this memory allocation affect a manufacturer that wanted to release an 8GB 970 similar to what Sapphire did with the 290X.

Manufactures won't release one (imho) but if they did you'd have to assume you'd end up with a 1GB segment with similarly reduced bandwidth.
 
Last edited:

cmdrdredd

Lifer
Dec 12, 2001
27,052
357
126
those 512MB are looking pretty useless at this point...
this + lower ROP/L2 count... that's pretty poor by Nvidia, on what is a pretty awesome card overall...

they should offer the option of a refund for all the 970 owners imo.
but apparently they don't care, because the card was and still is a good option.

It's not useless when as was pointed out so many times that everyone likes to ignore, games can already load up past 3.5GB and don't have problems. Nvidia did not make adjustments for every game and engine in existence. Especially not some older ones I was messing with.
 

SlowSpyder

Lifer
Jan 12, 2005
17,305
1,002
126
Those with 970 SLI could see this problem more often then others because they normally run higher settings/resolution.

:\


Sure, absolutely. SLI users could run resolutions and setting that could benefit from having a full 4GB of memory at 224GB/sec, that's apparently not what the GTX970 is in actuality. But, it seems Keppler driver support has taken a back seat, if and when Maxwell take a backseat, will performance suffer?

Not trying to spread FUD as we don't know how things will play out, just talking out loud. But, I see lots of potential downside from this, not really any upside. But like I said earlier, it still performs how it did when people were willing to shell out their money, it isn't like it got slower suddenly. But, it seems there are a few limitations that Nvidia didn't talk about until their hand was forced.
 
Status
Not open for further replies.