• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

[Nvidia] Gameworks DX12 DOs and DONTs

Can you at least try to have an objective, unbiased discussion about this, without the usual characters here engaging in the usual agendas?
-- stahlhart
 
Don’t toggle between compute and graphics on the same command queue more than absolutely necessary

ok How much Percentage of Async compute should apply by Developer ?

this "Donts" is vague.
 
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

They also recommend against using ROVs and that you should look at alternatives like advanced blending operations too ...

I'm guessing ROVs are practically free on Intel architectures ...
 
https://developer.nvidia.com/dx12-dos-and-donts

Nvidia listed a bunch of Dos and Donts for DX12.

This one is interesting:


First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.

And also, when they mention toggling Tessellation on and off, they again say to do it as little as possible for the same reasons and Nvidia destroys AMD at tessellation performance, so tinfoil hat engaged.

Nvidia is simply posting good coding strategies to maximize their GPU potential and minimize waste execution. It's really nothing exciting to talk about in the context in which you wish it could be applied.
 
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...
It's more like a design issue especially for the data constants. I think the Geforce hardwares still only able to use 64KB register file for storing these, while the GCN can use as much as 2GB, because the architecture is memory based. With a root level binding the driver can check the data size so it can make optimal decisions. Without it the driver don't have to much information, so in this case the optimal path is to emulate the binding with texture fetches. And this is slow.
D3D12 is heavily designed designed for GCN, or GCN-like memory based architectures, and it is not a problem when NV say something different. Even their newest architecture is not designed for this kind of workload, so they have to find optimal solutions for all the problems.
 
And also, when they mention toggling Tessellation on and off, they again say to do it as little as possible for the same reasons and Nvidia destroys AMD at tessellation performance, so tinfoil hat engaged.

Nvidia is simply posting good coding strategies to maximize their GPU potential and minimize waste execution. It's really nothing exciting to talk about in the context in which you wish it could be applied.
Pretty sure that AMD has the same advise, turning large features on/off is not fast on any system.
It's the large tesselation factor is what AMD didn't like in some cases.
 
Doesn't really seem to be much of an issue, there's going to be optimal ways to code for each architecture and who better to tell the devs how to get the best performance from the parts than the makers?

I'm wondering though, and maybe zlatan or someone with (way) more knowledge than me can help, would this be a simple task like just specifiying some options (maybe when you compile or something) or would devs need to write significantly different codebases for the different architectures?

I guess what I mean is would these different optimistations be fairly trivial to accomplish or would you need to write loads of extra code for each architecture separately?
 
AMD encourages devs to use AMD features, NVidia encourages devs to use NVidia features, sky is still blue.
 
While we're on the topic, how well does DX12 performance scale with memory bandwidth. I'm curious to see what the implications are for APUs, as well as Microsoft's own console which also lacks bandwidth.
 
Doesn't really seem to be much of an issue, there's going to be optimal ways to code for each architecture and who better to tell the devs how to get the best performance from the parts than the makers?

I'm wondering though, and maybe zlatan or someone with (way) more knowledge than me can help, would this be a simple task like just specifiying some options (maybe when you compile or something) or would devs need to write significantly different codebases for the different architectures?

I guess what I mean is would these different optimistations be fairly trivial to accomplish or would you need to write loads of extra code for each architecture separately?

If you want to build a super optimized engine for the PC, than it is important to consider some architecture specific paths. And this shouldn't be a problem if the engine structure built properly, but it will take additional time and resource. We already spend an awful lot of time to understand the drivers, but with the new explicit APIs this will change a lot, because the kernel driver won't affect the performance. In this case all of the code can be profiled and we will able to solve the performance issues easier. This model is really well-known from the consoles. In the end this will make our lives easier, even with some architecture specific paths.

This post from Nvidia is a good thing for the devs. I know the GCN really well, primarily because of the consoles, and I know how things works on Intel, because they also provide documents. But I don't really know how to optimize for Nvidia. This is a stepping stone from them. It is not much compared to what can we get from AMD and Intel, but finally it's a start.
 
Maybe those in the know can decipher the do's and don'ts comparing the NVidia vs AMD hardware.

Wondering if it's like the early days of tessellation....As in don't use it till our hardware is superior.
 
Interesting read. Since I have both Nvidia (GTX980TI SC) and AMD (R9-290s CF) in my rigs it will be interesting to see the performance of upcoming DX12 games.
 
Maybe those in the know can decipher the do's and don'ts comparing the NVidia vs AMD hardware.

Wondering if it's like the early days of tessellation....As in don't use it till our hardware is superior.
Most of the post is valid for AMD and Intel. The only thing is not clarified for me is this:
Be aware of the fact that there is a cost associated with setup and reset of a command list
  • You still need a reasonable number of command lists for efficient parallel work submission
  • Fences force the splitting of command lists for various reasons ( multiple command queues, picking up the results of queries)
They don't really specified the reasons. I never had any problem on this with GCN, but maybe the architecture is much more robust, and it don't needs roundtrips between the CPU and the GPU, while the Geforce hardwares needs it.
 
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

They also recommend against using ROVs and that you should look at alternatives like advanced blending operations too ...

I'm guessing ROVs are practically free on Intel architectures ...

Think most ignored this ROV thing. Big dx 12_1 feature that they were holding over AMD and they had to say that about it. Important IMO
 
Back
Top