Info DirectStorage 1.1 benchmark

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Aapje

Golden Member
Mar 21, 2022
1,385
1,865
106
Seriously, we made fun of Forspoken and it's the only game that used this tech properly if not at all. I did recent runs with the demo and it runs like a champ!
I think that the issue is that they pretty much need to start using it from the beginning, so with development taking multiple years and Direct Storage only being ready in 2022, it's mostly limited to demos for now.

Forspoken was made on the Luminous Engine, which is used for almost nothing else, but where seemed to be working with MS to implement this before it was actually released. Meanwhile, Unreal Engine doesn't even have support yet, so it could be a good long while until games with that engine will have Direct Storage.
 

soresu

Platinum Member
Dec 19, 2014
2,665
1,865
136
Forspoken was made on the Luminous Engine, which is used for almost nothing else, but where seemed to be working with MS to implement this before it was actually released. Meanwhile, Unreal Engine doesn't even have support yet, so it could be a good long while until games with that engine will have Direct Storage.
UE5 virtualises most assets like textures and geometry so that they are only streamed as needed, consequently it doesn't need Direct Storage as much and Epic's main focus currently seems to be on improving Lumen and Nanite which are still pretty rough around the edges still.
 

Aapje

Golden Member
Mar 21, 2022
1,385
1,865
106
UE5 virtualises most assets like textures and geometry so that they are only streamed as needed, consequently it doesn't need Direct Storage as much and Epic's main focus currently seems to be on improving Lumen and Nanite which are still pretty rough around the edges still.
Streaming on demand is actually exactly what Direct Storage is intended to greatly improve. The classic storage APIs have a lot of overhead, but this wasn't a big deal when we had HDDs, because those have a lot of overhead for every request as well. After all, they need to move the head to the right location on the disc. This is a physical operation that takes a relatively long time. So HDDs aren't suitable for reading lots of small things from random locations, which is why games have traditionally read an entire collection of assets and then kept it in memory.

This is not particularly efficient since you often read and store much more than is needed at that time, but it is more efficient than reading all assets separately and having huge seek costs for each asset.

In contrast, NVMes have seek times that are way faster and so the entire calculation changes, because it now makes sense to just get what you need, which also means that you use the bus and VRAM more efficiently.

However, the classic storage APIs were designed for HDDs where you only have a few read operations at a time and where each important read operation is a bulk operation with large seek times, so it doesn't matter that much if the storage API is a bit slow or if you store a relatively large amount of data in memory to keep track of the request.

In contrast, with NVMes, you ideally want to be able to just fire off a ton of separate IO requests with low overhead. So this why Direct Storage was developed, which can very efficiently handle a huge number of requests without bogging down.

It is possible for UE5 to stream assets from RAM to the GPU without having Direct Storage, but I don't see how they can stream from the NVMe to the RAM. They will still have to do bulk reads and thus have long load times.

Note that the end goal, which we'll probably only achieve step by step, is for assets to go directly from the NVMe to the GPU. To do that, we need GPU decompression (and compression formats optimized for GPUs) so the GPU can unpack the textures without needing the CPU to do it and for the GPU's to have their own Direct Storage implementation, so they can retrieve individual textures and such with very low overhead.

This removes a lot of very inefficient data movement, because right now a large collection of textures goes from the NVMe to the CPU which then sends it to the RAM, then when the GPU needs the texture, it asks the CPU for it, which then retrieves it from RAM, decompresses it and sends it to the VRAM. Because it is uncompressed, this takes a long time to send. In the future, the compressed texture will go directly from the NVMe to VRAM, which will mean less load on the CPU and much faster load times.
 
  • Like
Reactions: Makaveli

soresu

Platinum Member
Dec 19, 2014
2,665
1,865
136
Streaming on demand is actually exactly what Direct Storage is intended to greatly improve. The classic storage APIs have a lot of overhead, but this wasn't a big deal when we had HDDs, because those have a lot of overhead for every request as well. After all, they need to move the head to the right location on the disc. This is a physical operation that takes a relatively long time. So HDDs aren't suitable for reading lots of small things from random locations, which is why games have traditionally read an entire collection of assets and then kept it in memory.

This is not particularly efficient since you often read and store much more than is needed at that time, but it is more efficient than reading all assets separately and having huge seek costs for each asset.

In contrast, NVMes have seek times that are way faster and so the entire calculation changes, because it now makes sense to just get what you need, which also means that you use the bus and VRAM more efficiently.

However, the classic storage APIs were designed for HDDs where you only have a few read operations at a time and where each important read operation is a bulk operation with large seek times, so it doesn't matter that much if the storage API is a bit slow or if you store a relatively large amount of data in memory to keep track of the request.

In contrast, with NVMes, you ideally want to be able to just fire off a ton of separate IO requests with low overhead. So this why Direct Storage was developed, which can very efficiently handle a huge number of requests without bogging down.

It is possible for UE5 to stream assets from RAM to the GPU without having Direct Storage, but I don't see how they can stream from the NVMe to the RAM. They will still have to do bulk reads and thus have long load times.

Note that the end goal, which we'll probably only achieve step by step, is for assets to go directly from the NVMe to the GPU. To do that, we need GPU decompression (and compression formats optimized for GPUs) so the GPU can unpack the textures without needing the CPU to do it and for the GPU's to have their own Direct Storage implementation, so they can retrieve individual textures and such with very low overhead.

This removes a lot of very inefficient data movement, because right now a large collection of textures goes from the NVMe to the CPU which then sends it to the RAM, then when the GPU needs the texture, it asks the CPU for it, which then retrieves it from RAM, decompresses it and sends it to the VRAM. Because it is uncompressed, this takes a long time to send. In the future, the compressed texture will go directly from the NVMe to VRAM, which will mean less load on the CPU and much faster load times.
On that subject this recent move from nVidia should alleviate some of the load:

Random Access Neural Texture Compression


 

NikosD

Junior Member
Oct 18, 2014
11
3
71
plus.google.com
I am really struggling to find a free site to upload something to share, but I just rebuilt it from the current code which seems even faster so please try this link: https://uploadnow.io/f/YwWxh03

Please let me know if that link works or please suggest another site to try.
The link has expired.

Also the source has been updated to DirectStorage 1.2.1

Maybe it's a good time to recompile it and upload it to a more stable repository like One Drive with no expiration date.
 

utahraptor

Golden Member
Apr 26, 2004
1,053
199
106
The link has expired.

Also the source has been updated to DirectStorage 1.2.1

Maybe it's a good time to recompile it and upload it to a more stable repository like One Drive with no expiration date.
I'll try to reinstall visual studio and fight it again after I mow the lawn 🤣
 

AdamK47

Lifer
Oct 9, 1999
15,233
2,852
126
ChatGPT tells me you are getting that error because I compiled it in Debug mode rather than Release mode. I have recompiled it in Release Mode:

BulkLoadDemo 1.2.1
It works now.

About the same for me as the original 1.1 benchmark. Not much room for improvement for me it seems.

Inland 2TB TD510 PCI-E 5.0 NVMe:
1687312058667.png


Three 8TB Sabrent Rocket Q PCI-E 3.0 NVMe in 24TB RAID-0:
1687312151779.png
 

psolord

Golden Member
Sep 16, 2009
1,920
1,194
136
Semi off/on topic but it seems Rachet and Clank will be the first direct storage 1.2 game for the pc.


No SSD required they say, lol.

I swear to god, some console....fans...where saying that something like R&C would need 8GB/sec storage and a 3950X to play properly, rofl.

And there's a related story about an nvidia driver that improves direct storage performance, something something..
 

CakeMonster

Golden Member
Nov 22, 2012
1,392
501
136
19GB/s on my 7950X + 4090 + SN850X 4TB.
1.8GB/s on my 16TB ST Exos HDD on the same computer.

Does the CPU usage number have any practical use?

Edit: Tried closing and running it again and now it ends up at 13GB/s on every run. Something weird is going on. I might update this when I reboot, in case something is interfering.
 

CakeMonster

Golden Member
Nov 22, 2012
1,392
501
136
Rebooted. Now it gives me consistently 25GB/s. Left it on for 20 minutes, and no degradation. I'm guessing there was something keeping the GPU or SSD busy even though it wasn't showing up in task manager. Possibly some of the AI applications I play with that take up VRAM.
 
  • Like
Reactions: igor_kavinski

BFG10K

Lifer
Aug 14, 2000
22,709
2,972
126
Ratchet & Clank load times, the first game to use Direct Storage 1.2:


Virtually no difference between NVMe and SATA, and even the HDD is okay once the initial long load is done. This is much more realistic than a meaningless synthetic benchmark.

If I was interested in getting the game, it'd be fun to test it on my 10,000 RPM VelociRaptor
 

Makaveli

Diamond Member
Feb 8, 2002
4,723
1,058
136
Ratchet & Clank load times, the first game to use Direct Storage 1.2:


Virtually no difference between NVMe and SATA, and even the HDD is okay once the initial long load is done. This is much more realistic than a meaningless synthetic benchmark.

If I was interested in getting the game, it'd be fun to test it on my 10,000 RPM VelociRaptor
Only playable on HDD once you have done a played it once and cache everything. There will still be intermittent pauses.

Even a regular SATA ssd isn't immune to some pausing

 

BFG10K

Lifer
Aug 14, 2000
22,709
2,972
126
Only playable on HDD once you have done a played it once and cache everything. There will still be intermittent pauses.
I never said playing on an HDD was the best choice, just that it was "okay". It's completely possible to finish the game, as shown by numerous videos.

Also DirectStorage provides a performance gain on HDDs in at least one game.

Even a regular SATA ssd isn't immune to some pausing
NVMe pauses as well. If it didn't there'd be no load screens.

Even when it''s cached from RAM there are still load screens, proving there's no difference between NVMe speeds, because RAM is far faster than any of them.
 
Last edited:
  • Like
Reactions: psolord

Tup3x

Senior member
Dec 31, 2016
965
951
136
NVMe pauses as well. If it didn't there'd be no load screens.
The fact that there's no difference between gen 4 and gen 3 NVMe drives makes me wonder if it's a deliberate delay.

I wouldn't say that HDD is even okay. It has to be really fast or stutters are severe and you might end up falling through the floor etc. SATA SSD would offer okay experience.
 
  • Like
Reactions: Makaveli

Makaveli

Diamond Member
Feb 8, 2002
4,723
1,058
136
I just added a Western Digital 2TB SN850X to my system and reran this.

Adrenaline 23.7.2 drivers

Corsair 1TB MP600: Sequential Read 4,950 MBps Sequential Write 4,250 MBps
1691035545564.png


Western Digital 2TB SN850X: Sequential Read 7,300 MBps Sequential Write 6,600 MBps
1691035646652.png
 
Last edited: