We have a SSD attached to a GPU now.

Stuka87 · Jul 26, 2016

Shivansps said:
I dont understand how this will work, its an SSD where only the GPU has access to? Otherwise it will just go over PCI-E like any other M2, latency will be lower trought.

The article states that this is to be used as video memory, not storage.

Shivansps · Jul 26, 2016

Stuka87 said:
The article states that this is to be used as video memory, not storage.

, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection.

its like, add a bridge on the card for the X16 PCI-E, give the card x8, and 2 x4 for the M2, meaning, the M2 SSD are still regular storage connected to the PCI-E bus, and since the bridge is on the card, the GPU has faster access to them.
-At the cost of adding latency in the access to system ram, and system storage because of the bridge-

It does not seems like much actually, this really makes such a huge difference as show on the example? Unless im getting something wrong, its just a PCI-E bridge that divides the X16 pci-e bus intro x8,x4,x4 for GPU/M2/M2... there are motherboards with bridges that do this already.

Yakk · Jul 26, 2016

therealnickdanger said:
Yeah, definitely a game changer for video professionals. Better than real-time 8K video rendering in such a small package is monumental. I've worked with some multi-TB lidar point cloud data and CAD files that could certainly benefit from this as well.

Yeah, for high grade video this card looks awesome. I'm looking at this more from the CAD side and the time savings should also be substantial. I can't wait to see the final product, saving many engineering work hours per week alone will make this card pay for itself in a matter of weeks/months. Add a backup overnight and it's golden.

Shivansps · Jul 26, 2016

Lets put it clear, why i cant get the same result using a 8x/8x MB? those has pci-e bridges/switch to divide the 16x of the CPU intro 2 slots, tecnically you could place a GPU on slot 1, and a PCI-E SSD on slot 2 and have the same configuration as that radeon with built-in pci-e switch and M2 slots.

Unless im missing something here, that is how it works.

Headfoot · Jul 26, 2016

In theory they would be similar. In practice, its not that easy. Those buses are not empty, there is other traffic traversing them, amongst many other factors.

In practice, they've already demonstrated big speed ups, so clearly there are bottlenecks elsewhere in the pipeline

Bryf50 · Jul 26, 2016

Shivansps said:
Lets put it clear, why i cant get the same result using a 8x/8x MB? those has pci-e bridges/switch to divide the 16x of the CPU intro 2 slots, tecnically you could place a GPU on slot 1, and a PCI-E SSD on slot 2 and have the same configuration as that radeon with built-in pci-e switch and M2 slots.

Unless im missing something here, that is how it works.

I don't see any reason why that wouldn't be technically possible. Sticking the thing on the same board as the gpu is the more elegant solution though.

Headfoot · Jul 26, 2016

Guys, RTFA

Anandtech said:
As AMD explains it, the purpose of going this route is to offer another solution to the workset size limitations of current professional graphics cards. Even AMD’s largest card currently tops out at 32GB, and while this is a fair amount, there are workloads that can use more. This is particular the case for workloads with massive datasets (oil & gas), or as AMD demonstrated, scrubbing through an 8K video file.

Current cards can spill over to system memory, and while the PCIe bus is fast, it’s still much slower than local memory, plus it is subject to the latency of the relatively long trip and waiting on the CPU to address requests. Local NAND storage, by comparison, offers much faster round trips, though on paper the bandwidth isn’t as good, so I’m curious to see just how it compares to the real world datasets that spill over to system memory.

Anandtech said:
The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.

From: http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboard

Bryf50 · Jul 26, 2016

Bandwidth bottlenecks and chipset pci-e slots aside, that's still no reason a GPU couldn't directly access a pci-e drive with the proper software.

Shivansps · Jul 26, 2016

They also seem to be comparing a RAID 0 SSD on the card to a single system SSD.

Shivansps · Jul 26, 2016

Bryf50 said:
Bandwidth bottlenecks and chipset pci-e slots aside, that's still no reason a GPU couldn't directly access a pci-e drive with the proper software.

trips are gona be shorter for sure, but not enoght to make that big of a difference, the key here is that the software needs to use AMD api to recognise the SSD that are on the card and use them in that way, but they are still regular PCI-E storage.

Dresdenboy · Jul 26, 2016

Deep Learning and other machine learning methods might be interesting use cases on such a card.

The shown card seems to contain a Fiji GPU based on the PCB's back side. Future cards might replace it with Vega.

David_k · Jul 26, 2016

Dresdenboy said:
Deep Learning and other machine learning methods might be interesting use cases on such a card.

The shown card seems to contain a Fiji GPU based on the PCB's back side. Future cards might replace it with Vega.

In terms of hardware, the Polaris based card is outfit with a PCIe bridge chip – the same PEX8747 bridge chip used on the Radeon Pro Duo, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection. Architecturally the prototype card is essentially a PCIe SSD adapter and a video card on a single board, with no special connectivity in use beyond what the PCIe bridge chip provides.

The SSDs themselves are a pair of 512GB Samsung 950 Pros, which are about the fastest thing available on the market today. These SSDs are operating in RAID-0 (striped) mode to provide the maximum amount of bandwidth. Meanwhile it turns out that due to how the card is configured, the OS actually sees the SSD RAID-0 array as well, at least for the prototype design.

To use the SSDs, applications need to be programmed using AMD’s APIs to recognize the existence of the local storage and that it is “special,” being on the same board as the GPU itself. Ultimately the trick for application developers is directly streaming resources from the SSDs treating it as a level of cache between the DRAM and system storage. The use of NAND in this manner does not fit into the traditional memory hierarchy very well, as while the SSDs are fast, on paper accessing system memory is faster still. But it should be faster than accessing system storage, even if it’s PCIe SSD storage elsewhere on the system. Similarly, don’t expect to see frame buffers spilling over to NAND any time soon. This is about getting large, mostly static resources closer to the GPU for more efficient resource streaming.

http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboard

Silverforce11 · Jul 26, 2016

Headfoot said:
Guys, RTFA

From: http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboard

That's what I posted in the other thread when people were confused about the benefits of the SSG.

You bypass the OS/CPU overhead whenever you need to access data in that storage because the GPU is doing it directly, treating it like it's own vram partition.

This is why they had ~800 to 900MB/s with the PCIe SSD on the first system, and 4500MB/s on their setup. Not only is the extra transfer faster, it's NOT even using the main system bus nor CPU cycles wasted for the access/seek/transfer.

Think about what this means when it's just 1 of these SSG Firepros processing a workload, if you need multiple GPUs working on large data, your system bus will be swamped and bottlenecked, it would be very slow.

For large datasets that normally overflow into system ram and beyond that into SSDs, this is unbeatable and a smart move.

Until RAM becomes dense enough to make 2TB easily doable & cheap that is..

Shivansps · Jul 26, 2016

Silverforce11 said:
That's what I posted in the other thread when people were confused about the benefits of the SSG.

You bypass the OS/CPU overhead whenever you need to access data in that storage because the GPU is doing it directly, treating it like it's own vram partition.

This is why they had ~800 to 900MB/s with the PCIe SSD on the first system, and 4500MB/s on their setup. Not only is the extra transfer faster, it's NOT even using the main system bus nor CPU cycles wasted for the access/seek/transfer.

Think about what this means when it's just 1 of these SSG Firepros processing a workload, if you need multiple GPUs working on large data, your system bus will be swamped and bottlenecked, it would be very slow.

For large datasets that normally overflow into system ram and beyond that into SSDs, this is unbeatable and a smart move.

Until RAM becomes dense enough to make 2TB easily doable & cheap that is..

But as far as we know, the GPU is accessing the SSDs via PCI-E bus, and the SSD are seem by OS as a regular storage system.

Difference seems to be in software using those SSD that are plugged intro the same PCI-E as cache directly.

As i said earlier, if a PCI-E bridge/switch is used on the card, there is no difference from using a GPU on SLOT 1 with a PCIE SSD on SLOT 2, on a motherboard that is 8/8, thus dividing the CPU PCI-E bus by a bridge on the MB and only having the GPU and SSD on that one, and doing the exact same thing as the card does.

Silverforce11 · Jul 26, 2016

From what I've read, they aren't regular storage, but accessed via AMD's API for it, it's acting as a layer of cache for vram overflow. As a regular storage, it would make little sense.

Shivansps · Jul 26, 2016

Silverforce11 said:
From what I've read, they aren't regular storage, but accessed via AMD's API for it, it's acting as a layer of cache for vram overflow. As a regular storage, it would make little sense.

You need the API to take avantage of it and use it as cache for the gpu, other than that, they are just regular storage, hell on anadtech article they even said they are on RAID 0...

If they say there is no special connections it means its just that, a onboard PCI-E bridge connected to GPU and M2 disks. No reason why the OS cant access them, and there is no reason of why we cant do the same on a MB with a PCI-E bridge.

Adul · Jul 26, 2016

Shivansps said:
But as far as we know, the GPU is accessing the SSDs via PCI-E bus, and the SSD are seem by OS as a regular storage system.

Difference seems to be in software using those SSD that are plugged intro the same PCI-E as cache directly.

As i said earlier, if a PCI-E bridge/switch is used on the card, there is no difference from using a GPU on SLOT 1 with a PCIE SSD on SLOT 2, on a motherboard that is 8/8, thus dividing the CPU PCI-E bus by a bridge on the MB and only having the GPU and SSD on that one, and doing the exact same thing as the card does.

There is still more overhead involved as it still involves going back to the CPU.

Silverforce11 · Jul 26, 2016

Shivansps said:
You need the API to take avantage of it and use it as cache for the gpu, other than that, they are just regular storage, hell on anadtech article they even said they are on RAID 0...

I think that's the point, you do need an API to use it as local cache for the GPU and that's AMD's dev kit and open source work on their new renderers are all about. These are pro class GPUs, they only need to have this special performance in selected pro apps.

Bryf50 · Jul 26, 2016

Adul said:
There is still more overhead involved as it still involves going back to the CPU.

Not necessarily. Pci-e devices can directly access each other. For example, XDMA for crossfire.

Silverforce11 · Jul 26, 2016

Bryf50 said:
Not necessarily. Pci-e devices can directly access each other. For example, XDMA for crossfire.

And that's why it's possible for AMD to do this.

Shivansps · Jul 26, 2016

Adul said:
There is still more overhead involved as it still involves going back to the CPU.

No really, thats what the API is for. This is what i mean.

You will still have a little more latency because its a longer trip, but eletrically its the same thing.

Yakk · Jul 26, 2016

One thing to remember; at 4,500 MB/s that information is not only read/written, but completely processed by the GPU and displayed on the screen. Even if all this information is accessed and processed Asynchronously massively in parallel to reduce latency there is still the processing time.

jhu · Jul 26, 2016

boozzer said:
haha, why not

Indeed. LLVM has a GCN backend. There are also a few other GCN assemblers. Here's one of them. I think there's also a GCN C compiler, although you should theoretically be able to output C code to GCN using LLVM.

Elixer · Jul 26, 2016

Dresdenboy said:
The shown card seems to contain a Fiji GPU based on the PCB's back side. Future cards might replace it with Vega.

Yep, it was, and yeah, that is the theory.

sweetusernames · Jul 26, 2016

Does anyone know the full specs of what they compared the Radeon Pro SSG to?

We have a SSD attached to a GPU now.

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Golden Member

Member

Lifer

Diamond Member

Lifer

Diamond Member

Elite Member

Lifer

Golden Member

Lifer

Diamond Member

Golden Member

Lifer

Lifer

Member