[Nvidia] Gameworks DX12 DOs and DONTs

iiiankiii

Senior member
Apr 4, 2008
759
47
91
https://developer.nvidia.com/dx12-dos-and-donts

Nvidia listed a bunch of Dos and Donts for DX12.

This one is interesting:
Don’ts

Don’t toggle between compute and graphics on the same command queue more than absolutely necessary
This is still a heavyweight switch to make

First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
The sky is falling!

My GTX980ti is a paperweight!

Run in circles....
 

stahlhart

Super Moderator Graphics Cards
Dec 21, 2010
4,273
77
91
Can you at least try to have an objective, unbiased discussion about this, without the usual characters here engaging in the usual agendas?
-- stahlhart
 

PhonakV30

Senior member
Oct 26, 2009
987
378
136
Don’t toggle between compute and graphics on the same command queue more than absolutely necessary

ok How much Percentage of Async compute should apply by Developer ?

this "Donts" is vague.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

They also recommend against using ROVs and that you should look at alternatives like advanced blending operations too ...

I'm guessing ROVs are practically free on Intel architectures ...
 

tviceman

Diamond Member
Mar 25, 2008
6,734
514
126
www.facebook.com
https://developer.nvidia.com/dx12-dos-and-donts

Nvidia listed a bunch of Dos and Donts for DX12.

This one is interesting:


First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.

And also, when they mention toggling Tessellation on and off, they again say to do it as little as possible for the same reasons and Nvidia destroys AMD at tessellation performance, so tinfoil hat engaged.

Nvidia is simply posting good coding strategies to maximize their GPU potential and minimize waste execution. It's really nothing exciting to talk about in the context in which you wish it could be applied.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...
It's more like a design issue especially for the data constants. I think the Geforce hardwares still only able to use 64KB register file for storing these, while the GCN can use as much as 2GB, because the architecture is memory based. With a root level binding the driver can check the data size so it can make optimal decisions. Without it the driver don't have to much information, so in this case the optimal path is to emulate the binding with texture fetches. And this is slow.
D3D12 is heavily designed designed for GCN, or GCN-like memory based architectures, and it is not a problem when NV say something different. Even their newest architecture is not designed for this kind of workload, so they have to find optimal solutions for all the problems.
 

Pottuvoi

Senior member
Apr 16, 2012
416
2
81
And also, when they mention toggling Tessellation on and off, they again say to do it as little as possible for the same reasons and Nvidia destroys AMD at tessellation performance, so tinfoil hat engaged.

Nvidia is simply posting good coding strategies to maximize their GPU potential and minimize waste execution. It's really nothing exciting to talk about in the context in which you wish it could be applied.
Pretty sure that AMD has the same advise, turning large features on/off is not fast on any system.
It's the large tesselation factor is what AMD didn't like in some cases.
 

littleg

Senior member
Jul 9, 2015
355
38
91
Doesn't really seem to be much of an issue, there's going to be optimal ways to code for each architecture and who better to tell the devs how to get the best performance from the parts than the makers?

I'm wondering though, and maybe zlatan or someone with (way) more knowledge than me can help, would this be a simple task like just specifiying some options (maybe when you compile or something) or would devs need to write significantly different codebases for the different architectures?

I guess what I mean is would these different optimistations be fairly trivial to accomplish or would you need to write loads of extra code for each architecture separately?
 

NTMBK

Lifer
Nov 14, 2011
10,340
5,410
136
AMD encourages devs to use AMD features, NVidia encourages devs to use NVidia features, sky is still blue.
 

NTMBK

Lifer
Nov 14, 2011
10,340
5,410
136
then I guess "no-async compute" is nv new feature, because that is what nvidia recommends.

They know that it won't perform well on their current architecture, and their optimization advice reflects that. *shrug*
 

Zodiark1593

Platinum Member
Oct 21, 2012
2,230
4
81
While we're on the topic, how well does DX12 performance scale with memory bandwidth. I'm curious to see what the implications are for APUs, as well as Microsoft's own console which also lacks bandwidth.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
Doesn't really seem to be much of an issue, there's going to be optimal ways to code for each architecture and who better to tell the devs how to get the best performance from the parts than the makers?

I'm wondering though, and maybe zlatan or someone with (way) more knowledge than me can help, would this be a simple task like just specifiying some options (maybe when you compile or something) or would devs need to write significantly different codebases for the different architectures?

I guess what I mean is would these different optimistations be fairly trivial to accomplish or would you need to write loads of extra code for each architecture separately?

If you want to build a super optimized engine for the PC, than it is important to consider some architecture specific paths. And this shouldn't be a problem if the engine structure built properly, but it will take additional time and resource. We already spend an awful lot of time to understand the drivers, but with the new explicit APIs this will change a lot, because the kernel driver won't affect the performance. In this case all of the code can be profiled and we will able to solve the performance issues easier. This model is really well-known from the consoles. In the end this will make our lives easier, even with some architecture specific paths.

This post from Nvidia is a good thing for the devs. I know the GCN really well, primarily because of the consoles, and I know how things works on Intel, because they also provide documents. But I don't really know how to optimize for Nvidia. This is a stepping stone from them. It is not much compared to what can we get from AMD and Intel, but finally it's a start.
 

Kenmitch

Diamond Member
Oct 10, 1999
8,505
2,250
136
Maybe those in the know can decipher the do's and don'ts comparing the NVidia vs AMD hardware.

Wondering if it's like the early days of tessellation....As in don't use it till our hardware is superior.
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
Interesting read. Since I have both Nvidia (GTX980TI SC) and AMD (R9-290s CF) in my rigs it will be interesting to see the performance of upcoming DX12 games.
 

zlatan

Senior member
Mar 15, 2011
580
291
136
Maybe those in the know can decipher the do's and don'ts comparing the NVidia vs AMD hardware.

Wondering if it's like the early days of tessellation....As in don't use it till our hardware is superior.
Most of the post is valid for AMD and Intel. The only thing is not clarified for me is this:
Be aware of the fact that there is a cost associated with setup and reset of a command list
  • You still need a reasonable number of command lists for efficient parallel work submission
  • Fences force the splitting of command lists for various reasons ( multiple command queues, picking up the results of queries)
They don't really specified the reasons. I never had any problem on this with GCN, but maybe the architecture is much more robust, and it don't needs roundtrips between the CPU and the GPU, while the Geforce hardwares needs it.
 

Azix

Golden Member
Apr 18, 2014
1,438
67
91
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

They also recommend against using ROVs and that you should look at alternatives like advanced blending operations too ...

I'm guessing ROVs are practically free on Intel architectures ...

Think most ignored this ROV thing. Big dx 12_1 feature that they were holding over AMD and they had to say that about it. Important IMO