- Dec 10, 2010
The article states that this is to be used as video memory, not storage.
its like, add a bridge on the card for the X16 PCI-E, give the card x8, and 2 x4 for the M2, meaning, the M2 SSD are still regular storage connected to the PCI-E bus, and since the bridge is on the card, the GPU has faster access to them., I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection.
Yeah, for high grade video this card looks awesome. I'm looking at this more from the CAD side and the time savings should also be substantial. I can't wait to see the final product, saving many engineering work hours per week alone will make this card pay for itself in a matter of weeks/months. Add a backup overnight and it's golden.Yeah, definitely a game changer for video professionals. Better than real-time 8K video rendering in such a small package is monumental. I've worked with some multi-TB lidar point cloud data and CAD files that could certainly benefit from this as well.
I don't see any reason why that wouldn't be technically possible. Sticking the thing on the same board as the gpu is the more elegant solution though.Lets put it clear, why i cant get the same result using a 8x/8x MB? those has pci-e bridges/switch to divide the 16x of the CPU intro 2 slots, tecnically you could place a GPU on slot 1, and a PCI-E SSD on slot 2 and have the same configuration as that radeon with built-in pci-e switch and M2 slots.
Unless im missing something here, that is how it works.
Anandtech said:As AMD explains it, the purpose of going this route is to offer another solution to the workset size limitations of current professional graphics cards. Even AMD’s largest card currently tops out at 32GB, and while this is a fair amount, there are workloads that can use more. This is particular the case for workloads with massive datasets (oil & gas), or as AMD demonstrated, scrubbing through an 8K video file.
Current cards can spill over to system memory, and while the PCIe bus is fast, it’s still much slower than local memory, plus it is subject to the latency of the relatively long trip and waiting on the CPU to address requests. Local NAND storage, by comparison, offers much faster round trips, though on paper the bandwidth isn’t as good, so I’m curious to see just how it compares to the real world datasets that spill over to system memory.
From: http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboardAnandtech said:The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.
Bandwidth bottlenecks and chipset pci-e slots aside, that's still no reason a GPU couldn't directly access a pci-e drive with the proper software.
Deep Learning and other machine learning methods might be interesting use cases on such a card.
The shown card seems to contain a Fiji GPU based on the PCB's back side. Future cards might replace it with Vega.
http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboardIn terms of hardware, the Polaris based card is outfit with a PCIe bridge chip – the same PEX8747 bridge chip used on the Radeon Pro Duo, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection. Architecturally the prototype card is essentially a PCIe SSD adapter and a video card on a single board, with no special connectivity in use beyond what the PCIe bridge chip provides.
The SSDs themselves are a pair of 512GB Samsung 950 Pros, which are about the fastest thing available on the market today. These SSDs are operating in RAID-0 (striped) mode to provide the maximum amount of bandwidth. Meanwhile it turns out that due to how the card is configured, the OS actually sees the SSD RAID-0 array as well, at least for the prototype design.
To use the SSDs, applications need to be programmed using AMD’s APIs to recognize the existence of the local storage and that it is “special,” being on the same board as the GPU itself. Ultimately the trick for application developers is directly streaming resources from the SSDs treating it as a level of cache between the DRAM and system storage. The use of NAND in this manner does not fit into the traditional memory hierarchy very well, as while the SSDs are fast, on paper accessing system memory is faster still. But it should be faster than accessing system storage, even if it’s PCIe SSD storage elsewhere on the system. Similarly, don’t expect to see frame buffers spilling over to NAND any time soon. This is about getting large, mostly static resources closer to the GPU for more efficient resource streaming.
That's what I posted in the other thread when people were confused about the benefits of the SSG.
But as far as we know, the GPU is accessing the SSDs via PCI-E bus, and the SSD are seem by OS as a regular storage system.That's what I posted in the other thread when people were confused about the benefits of the SSG.
You bypass the OS/CPU overhead whenever you need to access data in that storage because the GPU is doing it directly, treating it like it's own vram partition.
This is why they had ~800 to 900MB/s with the PCIe SSD on the first system, and 4500MB/s on their setup. Not only is the extra transfer faster, it's NOT even using the main system bus nor CPU cycles wasted for the access/seek/transfer.
Think about what this means when it's just 1 of these SSG Firepros processing a workload, if you need multiple GPUs working on large data, your system bus will be swamped and bottlenecked, it would be very slow.
For large datasets that normally overflow into system ram and beyond that into SSDs, this is unbeatable and a smart move.
Until RAM becomes dense enough to make 2TB easily doable & cheap that is..
You need the API to take avantage of it and use it as cache for the gpu, other than that, they are just regular storage, hell on anadtech article they even said they are on RAID 0...From what I've read, they aren't regular storage, but accessed via AMD's API for it, it's acting as a layer of cache for vram overflow. As a regular storage, it would make little sense.
There is still more overhead involved as it still involves going back to the CPU.But as far as we know, the GPU is accessing the SSDs via PCI-E bus, and the SSD are seem by OS as a regular storage system.
Difference seems to be in software using those SSD that are plugged intro the same PCI-E as cache directly.
As i said earlier, if a PCI-E bridge/switch is used on the card, there is no difference from using a GPU on SLOT 1 with a PCIE SSD on SLOT 2, on a motherboard that is 8/8, thus dividing the CPU PCI-E bus by a bridge on the MB and only having the GPU and SSD on that one, and doing the exact same thing as the card does.
I think that's the point, you do need an API to use it as local cache for the GPU and that's AMD's dev kit and open source work on their new renderers are all about. These are pro class GPUs, they only need to have this special performance in selected pro apps.You need the API to take avantage of it and use it as cache for the gpu, other than that, they are just regular storage, hell on anadtech article they even said they are on RAID 0...