Once they become fully integrated, what we now see as the GPU part will be a fancy vector coprocessor, like Altivec on steroids, and the capabilities will be more matched to the CPU. Right now, and to the near future, it's a bit lopsided, because AMD has a poor CPU, but great GPU and good drivers, where Intel is on the opposite side of the spectrum.
Once the latency penalty to go from 'CPU' to 'GPU' is gone, then running wasteful loops that the 'CPU', using more efficient code, could do better at, would be done just on the 'CPU'.
As far as bandwidth waste goes, note that there has been a trend for decades now, that memory simply does not scale up with processors. Big iron could add more channels. Everyone else designed their HW and SW to be able to use less DRAM IO, making up for it with caches, and algorithm choices that favored doing much work on small pieces of data, and using data structures that were friendly to HW prefetchers and cache eviction policies, rather than directly working on large data structures (sometimes, that's unavoidable, of course).
GPUs are beginning to do the same thing, with caches and local memories. It's not a problem that can be solved over night, and it won't be. But, it is only insurmountable if you assume that parallel processing engines in CPUs 10+ years from now will be just like your current GPU, except integrated into the CPU. Previously, they were transistor and power-efficient for graphics, now they are becoming more efficient as generalized IO for tons of threads. Over time, they will continue to evolve to be more efficient wrt to memory use, consuming more space and power for resource management and IO, and less space and power for the functional units.