GTC Info: Unified memory delayed until 2016

NTMBK

Lifer
Nov 14, 2011
10,398
5,627
136
GTC-2014-021.jpg


Maxwell's unified memory and CPU integration are missing in action...
 

VulgarDisplay

Diamond Member
Apr 3, 2009
6,188
2
76
They were never confirmed in the first place were they? I'm sure DX12 really nullified the need for a CPU core on the GPU.
 

janii

Member
Nov 1, 2013
52
0
0
The CPU was rather for volta or even later as far as I remember.
But certainly not maxwell.

Sad changes imho. Really sad. Are they holding back again?

Im kinda holding off getting maxwell. Id rather stick to gtx 580. Still going super strong, but cant resist extra oomph with downsampling/sgssaa
 

NTMBK

Lifer
Nov 14, 2011
10,398
5,627
136
Everything starting with Kepler has Unified memory under CUDA 6.

https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/

Not the same thing at all. That is "language feature" level of unified memory- the code looks like both CPU and GPU are working on the same memory, but on the hardware level they have separate address spaces, with the runtime copying memory between the two. No performance improvements, the cost is just hidden behind syntax.

Maxwell was meant to bring properly unified address spaces. This is now pushed back to Pascal, which has mysteriously appeared on the roadmap with half the features Maxwell was meant to have...
 

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
But we don't know for sure what they mean with Unified Memory.
For all we know under Pascal it might mean physically unified.

But with local features(!)
Like addressable according to onchip 3d coords, and the closer you are to (x,y,z)=0,0,0 less Jouls are needed to push data :)

Im just guessing, but that roadmap needs few explanations.
 

sontin

Diamond Member
Sep 12, 2011
3,273
149
106
Pascal will allow real unified memory because the NVLink allows the same speed to the CPU like the CPU to the RAM. With CUDA 6 it's "unified virtual memory" but it still needs to copy the data over to the gpu memory in the background.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,398
5,627
136
Pascal will allow real unified memory because the NVLink allows the same speed to the CPU like the CPU to the RAM.

Indeed, but with what CPU? You can be damn sure that AMD and Intel won't play along. The choices will probably be NVidia's ARM processor, or potentially IBM.

With CUDA 6 it's "unified virtual memory" but it still needs to copy the data over to the cpu.

No, that is not what "virtual memory" means. "Unified virtual memory" means the same address space, even though distributed over separate banks of memory. Think of it like NUMA- even though a buffer may be on NUMA node 0, cores on NUMA node 1 can access it just fine within their address space. CUDA 6 does not offer unified virtual memory. It offers syntactic sugar on top of the exact same underlying mechanism.
 

parvadomus

Senior member
Dec 11, 2012
685
14
81
Indeed, but with what CPU? You can be damn sure that AMD and Intel won't play along. The choices will probably be NVidia's ARM processor, or potentially IBM.



No, that is not what "virtual memory" means. "Unified virtual memory" means the same address space, even though distributed over separate banks of memory. Think of it like NUMA- even though a buffer may be on NUMA node 0, cores on NUMA node 1 can access it just fine within their address space. CUDA 6 does not offer unified virtual memory. It offers syntactic sugar on top of the exact same underlying mechanism.

Thats not syntactic sugar, its just more hardware abstraction via software.
 

f1sherman

Platinum Member
Apr 5, 2011
2,243
1
0
Not the same thing at all. That is "language feature" level of unified memory- the code looks like both CPU and GPU are working on the same memory, but on the hardware level they have separate address spaces, with the runtime copying memory between the two. No performance improvements, the cost is just hidden behind syntax.

Maxwell was meant to bring properly unified address spaces. This is now pushed back to Pascal, which has mysteriously appeared on the roadmap with half the features Maxwell was meant to have...

Right.
Apparently faster interconnect is needed, i.e. nvlink, hence no full-performance UM untill Pascal.

Starting with CUDA 6, Unified Memory simplifies memory management by giving you a single pointer to your data, and automatically migrating pages on access to the processor that needs them. On Pascal GPUs, Unified Memory and NVLink will provide the ultimate combination of simplicity and performance. The full-bandwidth access to the CPU’s memory system enabled by NVLink means that NVIDIA’s GPU can access data in the CPU’s memory at the same rate as the CPU can. With the GPU’s superior streaming ability, the GPU will sometimes be able to stream data out of the CPU’s memory system even faster than the CPU.
http://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/
 

NTMBK

Lifer
Nov 14, 2011
10,398
5,627
136
Right.
Apparently faster interconnect is needed, i.e. nvlink, hence no full-performance UM untill Pascal.

Starting with CUDA 6, Unified Memory simplifies memory management by giving you a single pointer to your data, and automatically migrating pages on access to the processor that needs them. On Pascal GPUs, Unified Memory and NVLink will provide the ultimate combination of simplicity and performance. The full-bandwidth access to the CPU’s memory system enabled by NVLink means that NVIDIA’s GPU can access data in the CPU’s memory at the same rate as the CPU can. With the GPU’s superior streaming ability, the GPU will sometimes be able to stream data out of the CPU’s memory system even faster than the CPU.
http://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/

Apparently NVLink 2.0 will add cache coherency (I guess on Volta?), which will be very nice.
 

dangerman1337

Senior member
Sep 16, 2010
337
5
81
Apparently according to TechReport, Volta still exists and comes after Pascal: http://techreport.com/news/26226/nvidia-pascal-to-use-stacked-memory-proprietary-nvlink-interconnect

Turns out Volta remains on the roadmap, but it comes after Pascal and will evidently include more extensive changes to Nvidia's core GPU architecture.

Nvidia has inserted Pascal into its plans in order to take advantage of stacked memory and other innovations sooner. (I'm not sure we can say that Volta has been delayed, since the firm never pinned down that GPU's projected release date.) That makes Pascal intriguing even though its SM will be based on a modified version of the one from Maxwell. Memory bandwidth has long been one of the primary constraints for GPU performance, and bringing DRAM onto the same substrate opens up the possibility of substantial performance gains.

Compared to today's GPU memory subsystems, Huang claimed Pascal's 3D memory will offer "many times" the bandwidth, two and a half times the capacity, and four times the energy efficiency. The Pascal chip itself will not participate in the 3D stacking, but it will have DRAM stacks situated around it on the same package. Those DRAM stacks will be of the HBM type being developed at Hynix. You can see the DRAM stacks cuddled up next to the GPU in the picture of the Pascal test module below.

It seems the next big architecture jump will be with Volta after Maxwell.