You have a deeply flawed understanding of how GPUs and CPUs work with their memory.
Why is it that a GDDR5 video card blows the doors off a DDR3 video card, even if both are installed in a system with DDR2-800 memory? Ah yeah, because main system memory doesn't have all that much to do with the GPU once the level/resources/textures have been loaded.
It's ALSO why you see a HUGE HUGE difference in GPU performance based on the most important metrics : shaders/GPU clock/GPU Memory bandwidth, and why PCI Express X16 2.0 vs. 3.0 is almost meaningless even on GPUs that are of staggeringly higher performance compared to the 77xx-78xx class stuff in the new consoles.
Let me explain things more simply :
Process A : Loading. Level gets loaded, OS/AI/game engine/sound/etc get sorted and ready to play. 3D resources for the GPU get loaded, plus textures/etc, that gets moved to where the GPU can get to it most quickly (in a GDDR5 PC, this is an order of magnitude faster than DDR3 system memory).
Process B : Executing. The CPU works on tasks it's given by the OS and game engine. Various threads are streaming along taking care of AI, logic, various tasks that are primarily non-graphical. For the GPU side of this, the GPU must reflect changes in the game world VERY quickly, and 99% of what it's doing is moving around stuff that is ALREADY in video memory. What would happen otherwise? A system with DDR2 would suddenly crawl to a complete bog despite having a stout video card.
But you know what happens in the real world? That statistic (CPU to GPU memory access) is of only the most minor importance. Proof? Go compare a 7970 running on a 4770K on PCI Express X16 2.0 vs. 3.0. That's a ludicrously huge gap in bandwidth, but it doesn't affect in-game framerate. Why? Once again, once the game data is loaded and ready to go, there's not a bunch of garbage going to and from the GPU's memory. It ALREADY HAS THE DATA FOR THE LEVEL LOADED, with only the most minor swapping in and out of data along the way. The actual changes, and why the card is constantly chewing through massive amounts of local GPU memory bandwidth, is because you need very fast frame by frame presentations of the SAME visual data already there, just recalculated by the GPU and moved around.
And a final question for you that should completely blow your mind :
If you run a game, lets say Witcher 2, on a PC with 8GB of memory and a 3GB 7970, what do you think happens when the game loads and runs? Oh yeah, it loads the appropriate game resources into system memory, and into GPU memory as the levels load each time.
So, once you run the game, want to hazard how much memory is being used by the Witcher game executables? Ah yes, about 2GB! Oh, and look! A level has loaded completely and you are playing! How much GPU memory for 1080p with high details? Ah, about 1GB! Hard drive access? Ah, almost nothing until the next loading point!
Now, with DDR3 video memory having lower latency as you note, and with only 2GB of game data loaded into system memory at any given time anyway, why would any GPU need GDDR5? After all, transferring 2GB of data over an 80GB/sec bus would be nearly instant, right? And then the low latency would enable a great framerate? Ah, but it doesn't work that way. A DDR3 video card would suck for TW2. Hell they suck for 1080P gaming period.
I can't seriously believe you actually believe that sharing limited memory bandwidth between CPU and GPU is useful. You have to be trolling us. The *only* way it would be useful is if GPGPU got very very well utilized for performing functions normally provided by the CPU. Even then, gaming is typically not CPU bound for games typically played on consoles, and performing GPGPU calculations on your GPU doesn't come 'free'. It uses portions of the GPU which could potentially be used instead for traditional 3d video calculations.
Otherwise, we're down to the same formula :
Faster dedicated buses of non-shared memory will always trump slower non-dedicated buses of shared UMA/nUMA/hUMA design.
The shared memory is there because it's cheaper. If XB1 GPU had it's own GDDR5 4GB, it would be tremendously faster than sharing 8GB with everything else.
And if you can't explain why a GPU goes absolutely bananas with memory usage while playing a game session, even though almost nothing happens requiring new data to be written to that memory during the session, then you have failed to make your case. By the time one minute goes by, a card like a 7970 will have turned over many tens of thousands of gigabytes of memory arrangements from the same loaded ~1-2GB of information initially loaded for the level.
In an analogy, it's a lot like my giving a camera and a box of GI joes and other toys to a particularly helpful person. Let's call him the GPU. Lets call the Gi Joes and the other toys the video data that gets transferred to the video memory when the level loads. Level loaded, good let's go :
As the CPU, I tell the guy to arrange them this way and that way, and to change the perspective on the camera, and to swap out this character or that character out of that box he was given earlier. He does this as fast as he can.
Bah, anyway, don't rely on my word, as you obviously need to do some research. Go find me ONE SINGLE LINK that says that UMA/hUMA/etc is of particular value outside of hypothetical uses for GPGPU?
Better yet, find me ONE SINGLE EXAMPLE where an otherwise identical CPU/GPU combo works better with LESS memory bandwidth in totality.