Maxwell got unified virtual memory.
![]()
But in the new slide it doesn't mention unified virtual memory under DX12 right by the Maxwell chip...Maxwell got unified virtual memory.
*snip*
But in the new slide it doesn't mention unified virtual memory under DX12 right by the Maxwell chip...
Is there something gone wrong with Maxwell ala Fermi?
Everything starting with Kepler has Unified memory under CUDA 6.
https://devblogs.nvidia.com/parallelforall/unified-memory-in-cuda-6/
Pascal will allow real unified memory because the NVLink allows the same speed to the CPU like the CPU to the RAM.
With CUDA 6 it's "unified virtual memory" but it still needs to copy the data over to the cpu.
Indeed, but with what CPU? You can be damn sure that AMD and Intel won't play along. The choices will probably be NVidia's ARM processor, or potentially IBM.
No, that is not what "virtual memory" means. "Unified virtual memory" means the same address space, even though distributed over separate banks of memory. Think of it like NUMA- even though a buffer may be on NUMA node 0, cores on NUMA node 1 can access it just fine within their address space. CUDA 6 does not offer unified virtual memory. It offers syntactic sugar on top of the exact same underlying mechanism.
Thats not syntactic sugar, its just more hardware abstraction via software.
Not the same thing at all. That is "language feature" level of unified memory- the code looks like both CPU and GPU are working on the same memory, but on the hardware level they have separate address spaces, with the runtime copying memory between the two. No performance improvements, the cost is just hidden behind syntax.
Maxwell was meant to bring properly unified address spaces. This is now pushed back to Pascal, which has mysteriously appeared on the roadmap with half the features Maxwell was meant to have...
Right.
Apparently faster interconnect is needed, i.e. nvlink, hence no full-performance UM untill Pascal.
Starting with CUDA 6, Unified Memory simplifies memory management by giving you a single pointer to your data, and automatically migrating pages on access to the processor that needs them. On Pascal GPUs, Unified Memory and NVLink will provide the ultimate combination of simplicity and performance. The full-bandwidth access to the CPUs memory system enabled by NVLink means that NVIDIAs GPU can access data in the CPUs memory at the same rate as the CPU can. With the GPUs superior streaming ability, the GPU will sometimes be able to stream data out of the CPUs memory system even faster than the CPU.
http://devblogs.nvidia.com/parallelforall/nvlink-pascal-stacked-memory-feeding-appetite-big-data/
Turns out Volta remains on the roadmap, but it comes after Pascal and will evidently include more extensive changes to Nvidia's core GPU architecture.
Nvidia has inserted Pascal into its plans in order to take advantage of stacked memory and other innovations sooner. (I'm not sure we can say that Volta has been delayed, since the firm never pinned down that GPU's projected release date.) That makes Pascal intriguing even though its SM will be based on a modified version of the one from Maxwell. Memory bandwidth has long been one of the primary constraints for GPU performance, and bringing DRAM onto the same substrate opens up the possibility of substantial performance gains.
Compared to today's GPU memory subsystems, Huang claimed Pascal's 3D memory will offer "many times" the bandwidth, two and a half times the capacity, and four times the energy efficiency. The Pascal chip itself will not participate in the 3D stacking, but it will have DRAM stacks situated around it on the same package. Those DRAM stacks will be of the HBM type being developed at Hynix. You can see the DRAM stacks cuddled up next to the GPU in the picture of the Pascal test module below.