Here’s the problem in a nutshell: Current desktops can scale up to 64GB, while high-end
workstations can theoretically address up to 1,536GB of memory. Graphics cards, by contrast, are limited to a fraction of that — 32GB, as of this writing. Worse, the GPU can’t leverage system RAM. If you want to perform a workload on the GPU, you either have to pull it all into GPU memory or rely on the comparatively high latency PCI Express bus.
Moving workloads into on-card NAND flash solves the latency problem — AMD claims that it can access local memory over the M.2 interface at much lower latencies than it can pull data from the PCI Express bus. Based on what we know of the GPU’s path to RAM, that’s probably true. Less clear is whether or not there’s any kind of bandwidth advantage to this kind of access — the prototype uses a pair of Samsung 950 512GB drives in a RAID 0, giving them theoretical access to eight lanes of PCI Express connectivity. That’s still just half of a standard x16 PCI Express 3.0 slot, so latency rather than bandwidth may be the distinguishing factor here.