Did you read all of it? That is how it would be if they implemented it THAT way, but that is precisely what they averted and chose to implement it differently.
But there is no other way if you need more than 3.5GB.
Let's say a gpu randers a frame. It's high resolution gaming with next gen games run on master race settings (easily double vram requirements of consoles which have access to 6GB) and takes whole 4GB.
To make this frame GPU needs data from all DRAM chips. Well, maybe not all, but lets say it NEEDS data from chip 7 and 8 aswell as data from some of the other chips.
For the sake of argument, lets say it takes the same amount of data from each DRAM, processes it equal amount of time - one clock cycle.
You have 1 clock cycle to take data from chip 1-2-3-4-5-6 and 7.
Well your frame can not be finished as you need data from DRAM #8.
So you have your second clock cycle going just to get data from #8.
If you can't fit it all into L2 in one go, there is no other way but to waste 1 full clock cycle - 50% utilization. This is of course the worst case scenario.
The best case scenario is you keep everything in 3.5GB@224bit bus and 970 performs as it did in release reviews.
This card will have worse future than kepler had. Kepler fall behind without a design flaw. Guess what is going to happen when maxwell successor comes out.
PS. Yes, I foresee games using 6+GB of VRAM soon. Just like my GTX8800 320MB suddenly became obsolete with 30% more VRAM than the console.