If PS4 design allow GPU and CPU use the same memory without shuffling data back and forth via PCI bus - that's going to be a huge deal.
Some say: PCIe 3.0 gives no improvements to game performance over PCIe 2.0, despite doubling the available bandwidth. Bandwidth is one thing, but latency is the other.
GPU needs data. What it does is:
Take data from VGA memory -> processing -> copy results to VGA memory.
Now you need input data:
http://www.nvidia.com/content/GTC/documents/1122_GTC09.pdf

Here you can see GPU working directly on the data (DMA) from the PCI device. The route looks like:
Device --PCI transfer(+latency)--> system memory (DDR3) --PCI transfer(+latency)--> VGA memory --> Processing --> Results back to VGA memory
If we talk about feeding a GPU it often adds CPU processing to the route (API and what not).
If GPU in PS4 have access directly to memory with results from CPU via memory bus - that can be huge.
http://www.idi.ntnu.no/~elster/master-studs/runejoho/ms-proj-gpgpu-latency-bandwidth.pdf
http://arstechnica.com/information-technology/2013/04/amds-heterogeneous-uniform-memory-access-coming-this-year-in-kaveri/
Some say: PCIe 3.0 gives no improvements to game performance over PCIe 2.0, despite doubling the available bandwidth. Bandwidth is one thing, but latency is the other.
GPU needs data. What it does is:
Take data from VGA memory -> processing -> copy results to VGA memory.
Now you need input data:
http://www.nvidia.com/content/GTC/documents/1122_GTC09.pdf

Here you can see GPU working directly on the data (DMA) from the PCI device. The route looks like:
Device --PCI transfer(+latency)--> system memory (DDR3) --PCI transfer(+latency)--> VGA memory --> Processing --> Results back to VGA memory
If we talk about feeding a GPU it often adds CPU processing to the route (API and what not).
If GPU in PS4 have access directly to memory with results from CPU via memory bus - that can be huge.
http://www.idi.ntnu.no/~elster/master-studs/runejoho/ms-proj-gpgpu-latency-bandwidth.pdf
The chipset proved to give the expected influence over the latency and the bandwidth. However, the two Intel and NVIDIA chipsets we tested probably had different design goals, given that one is mainly a processor manufacturer whereas the other is primarily a graphics cards manufacturer. Our point was not to show that one is better than the other, but that one needs to pay attention to the chipset and ones goal when building a CPU - GPU system for HPC.
However, performance for the two chipsets supports the model in the sense that the chipset are a key component in the GPU system.
One of our more clear results, was the influence of the PCI Express bus.
Both the latency and bandwidth showed good improvements when overclocking the bus. Overclocking is not an exact science, and there might be additional effects from this alteration, but the results show an improvement.
http://arstechnica.com/information-technology/2013/04/amds-heterogeneous-uniform-memory-access-coming-this-year-in-kaveri/
As well as being useful for GPGPU programming, this may also find use in the GPU's traditional domain: graphics. Normally, 3D programs have to use lots of relatively small textures to apply textures to their 3D models. When the GPU has access to demand paging, it becomes practical to use single large textureslarger than will even fit into the GPU's memoryloading the portions of the texture on an as-needed basis. id Software devised a similar technique using existing hardware for Enemy Territory: Quake Wars and called it MegaTexture. With hUMA, developers will get MegaTexture-like functionality built-in.
The big difficulty for AMD is that merely having hUMA isn't enough. Developers actually have to write programs that take advantage of it. hUMA will certainly make developing mixed CPU/GPU software easier, but given AMD's low market share, it's not likely that developers will in any great hurry to rewrite their software to take advantage of it. We asked company representatives if Intel or NVIDIA were going to implement HSA. We're still awaiting an answer.
