The company presented an impressive demo that compared the performance of a normal workstation GPU with its new SSD-powered Radeon Pro SSG. The standard configuration was only able to provide 17 FPS with 8K video, while the SSG was able to scrub the 8K video in real time at 92 FPS. The quick demo and lack of further details have spawned incredible interest in the new architecture, so we followed up with AMD for more on the specifics on how the system works.
Moving data between storage and the GPU requires the CPU to stage the data in system DRAM, which consumes DRAM capacity and bandwidth, along with CPU cycles. There are techniques to reduce the overhead and additional latency that the extra steps incur, such as peer-to-peer (p2p) messaging techniques between NVMe SSDs and GPUs (see Project Donard). These techniques provide faster access to the adjacent memory/storage pools, but still require traversing the PCIe bus, which always incurs performance penalties. On a side note, Nvidia's NVLINK technology also uses a p2p technique to enhance communication performance between two GPUs, but it differs architecturally from p2p techniques that employ SSDs.
AMD's solution is simple from a hardware standpoint. The company connected two M.2 slots (PCIe 3.0 x4) directly to the GPU with a PEX8747 PCIe bridge chip, which creates a "private" PCIe bus and gives the GPU complete p2p control of the SSDs (and not the CPU/OS). AMD claims this reduces latency up to 10x and reduces PCIe bus contention.
Either the GPU can have its own private storage volume that is hidden from the OS, or it can share the SSD with the operating system, which adds several interesting possibilities for sharing the same data set with the CPU for additional processing. The current developer kit allows the operating system to see the storage pool, but AMD indicated that developers could use it as a private or shared pool in the future. AMD uses two 512GB Samsung 950 PRO SSDs in a RAID 0 configuration, which provides the bandwidth needed to use the storage as a 1TB extended frame buffer.
An interesting side note: AMD currently employs HSA (Heterogeneous System Architecture) in its APUs. The HSA framework provides numerous benefits that allow the disparate components (CPU, GPU, memory) to function as one heterogeneous unit. One of HSA's most compelling features is its ability to allow the CPU and the GPU to simultaneously work on the same sections of the memory, which eliminates the need to shuffle data back and forth, as noted in our recent coverage.
http://www.tomshardware.com/news/amd-radeon-pro-ssg,32365.html
quite good read