WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

NomanA · Jan 26, 2015

Abwx said:
Check the link above, there are ROPs that are useless anyway on the 980m since there s a full cluster disabled, the real number of ROPs is 48 if we count only thoses that are functional.

Well, for 980M the ROPs are still 64, and all eight connections to crossbar exist. You'll get the same VRAM bandwidth on the whole of 4GB of VRAM, as shown by the CUDA test tool too. Functionally, of course having 12 SMMs limit the output, but 12SMM/64ROP is still better than 12SMM/48ROP.

Head1985 · Jan 26, 2015

From ryan analysis

This in turn is why the 224GB/sec memory bandwidth number for the GTX 970 is technically correct and yet still not entirely useful as we move past the memory controllers, as it is not possible to actually get that much bandwidth at once on the read side. GTX 970 can read the 3.5GB segment at 196GB/sec (7GHz * 7 ports * 32-bits), or it can read the 512MB segment at 28GB/sec, but not both at once; it is a true XOR situation. Furthermore because the 512MB segment cannot be read at the same time as the 3.5GB segment, reading this segment blocks accessing the 3.5GB segment for that cycle, further reducing the effective memory bandwidth of the card. The larger the percentage of the time the crossbar is reading the 512MB segment, the lower the effective memory bandwidth from the 3.5GB segment.

GTX970=224bit 3.5GB and 32bit 512MB card.

n0x1ous · Jan 26, 2015

RampantAndroid said:
What standing does AMD have? They need legal standing.

AMD probably bought some 970's for competitive research :biggrin:

HurleyBird · Jan 26, 2015

Head1985 said:
GTX970=224bit 3.5GB XOR 32bit 512MB card.

Fixed that for you 😛

Vesku · Jan 26, 2015

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From Ryan Smith's Anandtech analysis. Guess Nvidia implied they do driver level management of segmented memory.

Abwx · Jan 26, 2015

NomanA said:
Well, for 980M the ROPs are still 64, and all eight connections to crossbar exist. You'll get the same VRAM bandwidth on the whole of 4GB of VRAM, as shown by the CUDA test tool too. Functionally, of course having 12 SMMs limit the output, but 12SMM/64ROP is still better than 12SMM/48ROP.

Yes the MC is still functional but the data cant be transfered using the crossbar since it lacks the relevant links, otherwise they would have done it in the 970, in this latter case, according to their diagram, the datas that come from an adress that has no functional SMMs is fused with the datas of another MC at the output of said MCs, that is at the L2 cache level, before the crossbar.

You ll notice that the relevant SMMs caches will be used for two flows while in a 980 they are used for a single flow since each flow has its dedicated SMMs.

That said i m not convinced by Nvidia technical explanations since they are cautious to not disclose infos that would shed some light in their uarch, what is sure is that they are not telling the truth and are still hiding a lot of details, prove is that Ryan Smith was told this :

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980

So Nvidia is just pretending that they are in the same case as a user that bought this card, they dont know how it works exactly, the driver was surely subcontracted...

3DVagabond · Jan 26, 2015

jackstar7 said:
Does anyone think this will have actual market impact?

It seems like it will all be forgotten with the next release.

It's probably shenanigans like this that keep AMD afloat. nVidia's dodgy business model is the only reason I won't buy an nVidia product. I will bet that this episode will sway a few others to not buy nVidia out of principle. Overall it's not going to sway the market over to AMD all on it's own, but it's not a good thing for nVidia.

amenx · Jan 26, 2015

SteveGrabowski said:
Good, I'll get a $17 check in 2027 then.

Sounds about right, lol.

http://www.slate.com/blogs/future_t...r_a_class_action_lawsuit_about_pentium_4.html

cmdrdredd · Jan 26, 2015

HurleyBird said:
Why on earth would they? The only legal recourse that would conceivably make sense in this situation would be a class action on behalf of consumers who purchased GTX 970 graphics cards.

They'll all get nothing, but waste time and money over it. This is all a bunch of crap to me. I regularly see above 3.5GB in use during games and am not hit with performance penalties that make the game experience unplayable. So I don't give a crap how it works, it's working.

Vesku · Jan 26, 2015

cmdrdredd said:
They'll all get nothing, but waste time and money over it. This is all a bunch of crap to me. I regularly see above 3.5GB in use during games and am not hit with performance penalties that make the game experience unplayable. So I don't give a crap how it works, it's working.

For now, it's apparently partially driver managed, which you were in some doubts of previously, so well informed 970 owners will be trusting in Nvidia to keep the partitioned memory penalty low in future games. That is now that the information is out there for them to know about it.

Final8ty · Jan 26, 2015

SlowSpyder said:
AMD and Nvidia advertise the R9 295 and Titan Z as 8GB and 12GB cards, when that really isn't the case in how the memory is used.

The thing is we know how the memory is used and we knew from the get go with multi GPU, yes there will always be some people who dont know but that goes for anything.

SPBHM · Jan 26, 2015

those 512MB are looking pretty useless at this point...
this + lower ROP/L2 count... that's pretty poor by Nvidia, on what is a pretty awesome card overall...

they should offer the option of a refund for all the 970 owners imo.
but apparently they don't care, because the card was and still is a good option.

Silverforce11 · Jan 26, 2015

So everything that some of us here speculated on is EXACTLY right. That 0.5GB is crippled in bandwidth, (now we know its actually routing through the neighbor SM), if they are in use it has to share the bandwidth and hence runs at 1/7th speeds.

We're also correct that NV is FULLY AWARE of the situation because their drivers were designed to take advantage of the 3.5gb in most scenarios and would try to enforce it. Where it can't, it tries to store non-frequently accessed data into the 0.5gb segment. Where it can't do that, in game engines that aren't compatible to that kind of loading, you WILL get stutters and major frame rate spikes. What people are saying is true all along. But its game & driver "aware" dependent. As more games come out that push the vram limit, this problem is going to escalate, particularly for SLI owners who enjoy maxing out games.

They had 4 months since launch to disclose it, they choose not to. They KNEW about it from the start since thats how 970s behave due to drivers, trying to stay under 3.5gb all that time. They were hoping people wouldn't notice. You can't really escape geeks on the internet, not sure what they were (weren't) thinking.

dacostafilipe · Jan 26, 2015

Abwx said:
Yes the MC is still functional but the data cant be transfered using the crossbar since it lacks the relevant links, otherwise they would have done it in the 970, in this latter case, according to their diagram, the datas that come from an adress that has no functional SMMs is fused with the datas of another MC at the output of said MCs, that is at the L2 cache level, before the crossbar.

There's no real relation between both. NomanA seems to be right on this one.

SlowSpyder · Jan 26, 2015

cmdrdredd said:
They'll all get nothing, but waste time and money over it. This is all a bunch of crap to me. I regularly see above 3.5GB in use during games and am not hit with performance penalties that make the game experience unplayable. So I don't give a crap how it works, it's working.

But it could work better. Nvidia was pretty liberal with their labeling this a 224GB/sec card. For how many months now have people thought they had a 64 ROP card? The performance still is what it is, it isn't like the card suddenly got slower with this new information. But, the card has been mislabeled in my opinion. As I said earlier, I just don't believe for a second that up until now no one at Nvidia has noticed every review and tech site has said this is a GPU with 64 ROPS.

If you're happy with the card, then that's great. But others do have a legit reason to be a somewhat dissatisfied, if you ask me. And I have to wonder how much performance above ~3.5GB of memory usage depends on good driver support for this card.

SimianR · Jan 26, 2015

Question: How would this memory allocation affect a manufacturer that wanted to release an 8GB 970 similar to what Sapphire did with the 290X.

Erenhardt · Jan 26, 2015

destrekor said:
Did you read all of it? That is how it would be if they implemented it THAT way, but that is precisely what they averted and chose to implement it differently.

But there is no other way if you need more than 3.5GB.

Let's say a gpu randers a frame. It's high resolution gaming with next gen games run on master race settings (easily double vram requirements of consoles which have access to 6GB) and takes whole 4GB.

To make this frame GPU needs data from all DRAM chips. Well, maybe not all, but lets say it NEEDS data from chip 7 and 8 aswell as data from some of the other chips.

For the sake of argument, lets say it takes the same amount of data from each DRAM, processes it equal amount of time - one clock cycle.

You have 1 clock cycle to take data from chip 1-2-3-4-5-6 and 7.

Well your frame can not be finished as you need data from DRAM #8.

So you have your second clock cycle going just to get data from #8.

If you can't fit it all into L2 in one go, there is no other way but to waste 1 full clock cycle - 50% utilization. This is of course the worst case scenario.

The best case scenario is you keep everything in 3.5GB@224bit bus and 970 performs as it did in release reviews.

This card will have worse future than kepler had. Kepler fall behind without a design flaw. Guess what is going to happen when maxwell successor comes out.

PS. Yes, I foresee games using 6+GB of VRAM soon. Just like my GTX8800 320MB suddenly became obsolete with 30% more VRAM than the console.

SPBHM · Jan 26, 2015

SimianR said:
Question: How would this memory allocation affect a manufacturer that wanted to release an 8GB 970 similar to what Sapphire did with the 290X.

7 fast, 1 slow

dacostafilipe · Jan 26, 2015

SlowSpyder said:
And I have to wonder how much performance above ~3.5GB of memory usage depends on good driver support for this card.

Those with 970 SLI could see this problem more often then others because they normally run higher settings/resolution.

:\

SlowSpyder · Jan 26, 2015

Final8ty said:
The thing is we know how the memory is used and we knew from the get go with multi GPU, yes there will always be some people who dont know but that goes for anything.

Yes, of course. AMD and Nvidia have never hidden that information. I was just making the point that they've always taken a bit of 'license' with what they advertise. But, I think Nvidia was being a sort of deceptive here.

Genx87 · Jan 26, 2015

Tsk tsk. I suspect that somebody did screw up in the marketing department. But somebody or department within Nvidia needs to catch that even if it goes out the door and correct it after the fact. The internet will figure this stuff out given enough time.

Spanners · Jan 26, 2015

SimianR said:
Question: How would this memory allocation affect a manufacturer that wanted to release an 8GB 970 similar to what Sapphire did with the 290X.

Manufactures won't release one (imho) but if they did you'd have to assume you'd end up with a 1GB segment with similarly reduced bandwidth.

cmdrdredd · Jan 26, 2015

SPBHM said:
those 512MB are looking pretty useless at this point...
this + lower ROP/L2 count... that's pretty poor by Nvidia, on what is a pretty awesome card overall...

they should offer the option of a refund for all the 970 owners imo.
but apparently they don't care, because the card was and still is a good option.

It's not useless when as was pointed out so many times that everyone likes to ignore, games can already load up past 3.5GB and don't have problems. Nvidia did not make adjustments for every game and engine in existence. Especially not some older ones I was messing with.

SteveGrabowski · Jan 26, 2015

LOL GeForce forums removed the angry thread from their 900 series listings, but surprisingly it still exists if you have the link.

https://forums.geforce.com/default/topic/803518/geforce-900-series/gtx-970-3-5gb-vram-issue/115/

SlowSpyder · Jan 26, 2015

NeoLuxembourg said:
Those with 970 SLI could see this problem more often then others because they normally run higher settings/resolution.

:\

Sure, absolutely. SLI users could run resolutions and setting that could benefit from having a full 4GB of memory at 224GB/sec, that's apparently not what the GTX970 is in actuality. But, it seems Keppler driver support has taken a back seat, if and when Maxwell take a backseat, will performance suffer?

Not trying to spread FUD as we don't know how things will play out, just talking out loud. But, I see lots of potential downside from this, not really any upside. But like I said earlier, it still performs how it did when people were willing to shell out their money, it isn't like it got slower suddenly. But, it seems there are a few limitations that Nvidia didn't talk about until their hand was forced.

WCCftech: Memory allocation problem with GTX 970 [UPDATE] PCPer: NVidia response

Member

Golden Member

Platinum Member

Platinum Member

Diamond Member

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Golden Member

Diamond Member

Lifer

Senior member

Lifer

Senior member

Diamond Member

Diamond Member

Senior member

Lifer

Lifer

Senior member

Lifer

Diamond Member

Lifer