We have a SSD attached to a GPU now.

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Stuka87

Diamond Member
Dec 10, 2010
6,240
2,559
136
I dont understand how this will work, its an SSD where only the GPU has access to? Otherwise it will just go over PCI-E like any other M2, latency will be lower trought.

The article states that this is to be used as video memory, not storage.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
The article states that this is to be used as video memory, not storage.

, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection.

its like, add a bridge on the card for the X16 PCI-E, give the card x8, and 2 x4 for the M2, meaning, the M2 SSD are still regular storage connected to the PCI-E bus, and since the bridge is on the card, the GPU has faster access to them.
-At the cost of adding latency in the access to system ram, and system storage because of the bridge-

It does not seems like much actually, this really makes such a huge difference as show on the example? Unless im getting something wrong, its just a PCI-E bridge that divides the X16 pci-e bus intro x8,x4,x4 for GPU/M2/M2... there are motherboards with bridges that do this already.
 
Last edited:

Yakk

Golden Member
May 28, 2016
1,574
275
81
Yeah, definitely a game changer for video professionals. Better than real-time 8K video rendering in such a small package is monumental. I've worked with some multi-TB lidar point cloud data and CAD files that could certainly benefit from this as well.

Yeah, for high grade video this card looks awesome. I'm looking at this more from the CAD side and the time savings should also be substantial. I can't wait to see the final product, saving many engineering work hours per week alone will make this card pay for itself in a matter of weeks/months. Add a backup overnight and it's golden.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
Lets put it clear, why i cant get the same result using a 8x/8x MB? those has pci-e bridges/switch to divide the 16x of the CPU intro 2 slots, tecnically you could place a GPU on slot 1, and a PCI-E SSD on slot 2 and have the same configuration as that radeon with built-in pci-e switch and M2 slots.

Unless im missing something here, that is how it works.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
In theory they would be similar. In practice, its not that easy. Those buses are not empty, there is other traffic traversing them, amongst many other factors.

In practice, they've already demonstrated big speed ups, so clearly there are bottlenecks elsewhere in the pipeline
 

Bryf50

Golden Member
Nov 11, 2006
1,429
51
91
Lets put it clear, why i cant get the same result using a 8x/8x MB? those has pci-e bridges/switch to divide the 16x of the CPU intro 2 slots, tecnically you could place a GPU on slot 1, and a PCI-E SSD on slot 2 and have the same configuration as that radeon with built-in pci-e switch and M2 slots.

Unless im missing something here, that is how it works.

I don't see any reason why that wouldn't be technically possible. Sticking the thing on the same board as the gpu is the more elegant solution though.
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
Guys, RTFA
Anandtech said:
As AMD explains it, the purpose of going this route is to offer another solution to the workset size limitations of current professional graphics cards. Even AMD’s largest card currently tops out at 32GB, and while this is a fair amount, there are workloads that can use more. This is particular the case for workloads with massive datasets (oil & gas), or as AMD demonstrated, scrubbing through an 8K video file.

Current cards can spill over to system memory, and while the PCIe bus is fast, it’s still much slower than local memory, plus it is subject to the latency of the relatively long trip and waiting on the CPU to address requests. Local NAND storage, by comparison, offers much faster round trips, though on paper the bandwidth isn’t as good, so I’m curious to see just how it compares to the real world datasets that spill over to system memory.

Anandtech said:
The performance differential was actually more than I expected; reading a file from the SSG SSD array was over 4GB/sec, while reading that same file from the system SSD was only averaging under 900MB/sec, which is lower than what we know 950 Pro can do in sequential reads. After putting some thought into it, I think AMD has hit upon the fact that most M.2 slots on motherboards are routed through the system chipset rather than being directly attached to the CPU. This not only adds another hop of latency, but it means crossing the relatively narrow DMI 3.0 (~PCIe 3.0 x4) link that is shared with everything else attached to the chipset.

From: http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboard
 
Last edited:

Bryf50

Golden Member
Nov 11, 2006
1,429
51
91
Bandwidth bottlenecks and chipset pci-e slots aside, that's still no reason a GPU couldn't directly access a pci-e drive with the proper software.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
Bandwidth bottlenecks and chipset pci-e slots aside, that's still no reason a GPU couldn't directly access a pci-e drive with the proper software.


trips are gona be shorter for sure, but not enoght to make that big of a difference, the key here is that the software needs to use AMD api to recognise the SSD that are on the card and use them in that way, but they are still regular PCI-E storage.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Deep Learning and other machine learning methods might be interesting use cases on such a card.

The shown card seems to contain a Fiji GPU based on the PCB's back side. Future cards might replace it with Vega.
 

David_k

Member
Apr 25, 2016
70
1
41
Deep Learning and other machine learning methods might be interesting use cases on such a card.

The shown card seems to contain a Fiji GPU based on the PCB's back side. Future cards might replace it with Vega.

In terms of hardware, the Polaris based card is outfit with a PCIe bridge chip – the same PEX8747 bridge chip used on the Radeon Pro Duo, I’m told – with the bridge connecting the two PCIe x4 M.2 slots to the GPU, and allowing both cards to share the PCIe system connection. Architecturally the prototype card is essentially a PCIe SSD adapter and a video card on a single board, with no special connectivity in use beyond what the PCIe bridge chip provides.

The SSDs themselves are a pair of 512GB Samsung 950 Pros, which are about the fastest thing available on the market today. These SSDs are operating in RAID-0 (striped) mode to provide the maximum amount of bandwidth. Meanwhile it turns out that due to how the card is configured, the OS actually sees the SSD RAID-0 array as well, at least for the prototype design.

To use the SSDs, applications need to be programmed using AMD’s APIs to recognize the existence of the local storage and that it is “special,” being on the same board as the GPU itself. Ultimately the trick for application developers is directly streaming resources from the SSDs treating it as a level of cache between the DRAM and system storage. The use of NAND in this manner does not fit into the traditional memory hierarchy very well, as while the SSDs are fast, on paper accessing system memory is faster still. But it should be faster than accessing system storage, even if it’s PCIe SSD storage elsewhere on the system. Similarly, don’t expect to see frame buffers spilling over to NAND any time soon. This is about getting large, mostly static resources closer to the GPU for more efficient resource streaming.

http://www.anandtech.com/show/10518/amd-announces-radeon-pro-ssg-polaris-with-m2-ssds-onboard
 
Feb 19, 2009
10,457
10
76

That's what I posted in the other thread when people were confused about the benefits of the SSG.

You bypass the OS/CPU overhead whenever you need to access data in that storage because the GPU is doing it directly, treating it like it's own vram partition.

This is why they had ~800 to 900MB/s with the PCIe SSD on the first system, and 4500MB/s on their setup. Not only is the extra transfer faster, it's NOT even using the main system bus nor CPU cycles wasted for the access/seek/transfer.

Think about what this means when it's just 1 of these SSG Firepros processing a workload, if you need multiple GPUs working on large data, your system bus will be swamped and bottlenecked, it would be very slow.

For large datasets that normally overflow into system ram and beyond that into SSDs, this is unbeatable and a smart move.

Until RAM becomes dense enough to make 2TB easily doable & cheap that is..
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
That's what I posted in the other thread when people were confused about the benefits of the SSG.

You bypass the OS/CPU overhead whenever you need to access data in that storage because the GPU is doing it directly, treating it like it's own vram partition.

This is why they had ~800 to 900MB/s with the PCIe SSD on the first system, and 4500MB/s on their setup. Not only is the extra transfer faster, it's NOT even using the main system bus nor CPU cycles wasted for the access/seek/transfer.

Think about what this means when it's just 1 of these SSG Firepros processing a workload, if you need multiple GPUs working on large data, your system bus will be swamped and bottlenecked, it would be very slow.

For large datasets that normally overflow into system ram and beyond that into SSDs, this is unbeatable and a smart move.

Until RAM becomes dense enough to make 2TB easily doable & cheap that is..

But as far as we know, the GPU is accessing the SSDs via PCI-E bus, and the SSD are seem by OS as a regular storage system.

Difference seems to be in software using those SSD that are plugged intro the same PCI-E as cache directly.

As i said earlier, if a PCI-E bridge/switch is used on the card, there is no difference from using a GPU on SLOT 1 with a PCIE SSD on SLOT 2, on a motherboard that is 8/8, thus dividing the CPU PCI-E bus by a bridge on the MB and only having the GPU and SSD on that one, and doing the exact same thing as the card does.
 
Last edited:
Feb 19, 2009
10,457
10
76
From what I've read, they aren't regular storage, but accessed via AMD's API for it, it's acting as a layer of cache for vram overflow. As a regular storage, it would make little sense.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
From what I've read, they aren't regular storage, but accessed via AMD's API for it, it's acting as a layer of cache for vram overflow. As a regular storage, it would make little sense.

You need the API to take avantage of it and use it as cache for the gpu, other than that, they are just regular storage, hell on anadtech article they even said they are on RAID 0...

If they say there is no special connections it means its just that, a onboard PCI-E bridge connected to GPU and M2 disks. No reason why the OS cant access them, and there is no reason of why we cant do the same on a MB with a PCI-E bridge.
 

Adul

Elite Member
Oct 9, 1999
32,999
44
91
danny.tangtam.com
But as far as we know, the GPU is accessing the SSDs via PCI-E bus, and the SSD are seem by OS as a regular storage system.

Difference seems to be in software using those SSD that are plugged intro the same PCI-E as cache directly.

As i said earlier, if a PCI-E bridge/switch is used on the card, there is no difference from using a GPU on SLOT 1 with a PCIE SSD on SLOT 2, on a motherboard that is 8/8, thus dividing the CPU PCI-E bus by a bridge on the MB and only having the GPU and SSD on that one, and doing the exact same thing as the card does.

There is still more overhead involved as it still involves going back to the CPU.
 
Feb 19, 2009
10,457
10
76
You need the API to take avantage of it and use it as cache for the gpu, other than that, they are just regular storage, hell on anadtech article they even said they are on RAID 0...

I think that's the point, you do need an API to use it as local cache for the GPU and that's AMD's dev kit and open source work on their new renderers are all about. These are pro class GPUs, they only need to have this special performance in selected pro apps.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
There is still more overhead involved as it still involves going back to the CPU.

No really, thats what the API is for. This is what i mean.

NsQ8NFh.png


You will still have a little more latency because its a longer trip, but eletrically its the same thing.
 
Last edited:

Yakk

Golden Member
May 28, 2016
1,574
275
81
One thing to remember; at 4,500 MB/s that information is not only read/written, but completely processed by the GPU and displayed on the screen. Even if all this information is accessed and processed Asynchronously massively in parallel to reduce latency there is still the processing time.