[Nvidia] Gameworks DX12 DOs and DONTs

iiiankiii · Sep 27, 2015

https://developer.nvidia.com/dx12-dos-and-donts

Nvidia listed a bunch of Dos and Donts for DX12.

This one is interesting:

Donts

Dont toggle between compute and graphics on the same command queue more than absolutely necessary
This is still a heavyweight switch to make

First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.

LTC8K6 · Sep 27, 2015

The sky is falling!

My GTX980ti is a paperweight!

Run in circles....

thesmokingman · Sep 27, 2015

stahlhart · Sep 27, 2015

Can you at least try to have an objective, unbiased discussion about this, without the usual characters here engaging in the usual agendas?
-- stahlhart

jpiniero · Sep 27, 2015

iiiankiii said:
First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.

Gotta have a reason to upgrade to Pascal y'know.

PhonakV30 · Sep 27, 2015

Dont toggle between compute and graphics on the same command queue more than absolutely necessary

ok How much Percentage of Async compute should apply by Developer ?

this "Donts" is vague.

ThatBuzzkiller · Sep 27, 2015

Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

They also recommend against using ROVs and that you should look at alternatives like advanced blending operations too ...

I'm guessing ROVs are practically free on Intel architectures ...

tviceman · Sep 27, 2015

iiiankiii said:
https://developer.nvidia.com/dx12-dos-and-donts

Nvidia listed a bunch of Dos and Donts for DX12.

This one is interesting:

First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.

And also, when they mention toggling Tessellation on and off, they again say to do it as little as possible for the same reasons and Nvidia destroys AMD at tessellation performance, so tinfoil hat engaged.

Nvidia is simply posting good coding strategies to maximize their GPU potential and minimize waste execution. It's really nothing exciting to talk about in the context in which you wish it could be applied.

Good_fella · Sep 28, 2015

iiiankiii said:
https://developer.nvidia.com/dx12-dos-and-donts

Nvidia listed a bunch of Dos and Donts for DX12.

This one is interesting:

First rule of GameWorks: You do not talk about Async Compute.
Second Rule of GameWorks: You DO NOT talk about Async Compute.

Still hyping Async Compute. 😀

zlatan · Sep 28, 2015

ThatBuzzkiller said:
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

It's more like a design issue especially for the data constants. I think the Geforce hardwares still only able to use 64KB register file for storing these, while the GCN can use as much as 2GB, because the architecture is memory based. With a root level binding the driver can check the data size so it can make optimal decisions. Without it the driver don't have to much information, so in this case the optimal path is to emulate the binding with texture fetches. And this is slow.
D3D12 is heavily designed designed for GCN, or GCN-like memory based architectures, and it is not a problem when NV say something different. Even their newest architecture is not designed for this kind of workload, so they have to find optimal solutions for all the problems.

Pottuvoi · Sep 28, 2015

tviceman said:
And also, when they mention toggling Tessellation on and off, they again say to do it as little as possible for the same reasons and Nvidia destroys AMD at tessellation performance, so tinfoil hat engaged.

Nvidia is simply posting good coding strategies to maximize their GPU potential and minimize waste execution. It's really nothing exciting to talk about in the context in which you wish it could be applied.

Pretty sure that AMD has the same advise, turning large features on/off is not fast on any system.
It's the large tesselation factor is what AMD didn't like in some cases.

littleg · Sep 28, 2015

Doesn't really seem to be much of an issue, there's going to be optimal ways to code for each architecture and who better to tell the devs how to get the best performance from the parts than the makers?

I'm wondering though, and maybe zlatan or someone with (way) more knowledge than me can help, would this be a simple task like just specifiying some options (maybe when you compile or something) or would devs need to write significantly different codebases for the different architectures?

I guess what I mean is would these different optimistations be fairly trivial to accomplish or would you need to write loads of extra code for each architecture separately?

Keysplayr · Sep 28, 2015

Good_fella said:
Still hyping Async Compute. 😀

Second coming brah. Believe it.

NTMBK · Sep 28, 2015

AMD encourages devs to use AMD features, NVidia encourages devs to use NVidia features, sky is still blue.

Erenhardt · Sep 28, 2015

NTMBK said:
AMD encourages devs to use AMD features, NVidia encourages devs to use NVidia features, sky is still blue.

then I guess "no-async compute" is nv new feature, because that is what nvidia recommends.

NTMBK · Sep 28, 2015

Erenhardt said:
then I guess "no-async compute" is nv new feature, because that is what nvidia recommends.

They know that it won't perform well on their current architecture, and their optimization advice reflects that. *shrug*

Genx87 · Sep 28, 2015

Keysplayr said:
Second coming brah. Believe it.

A total game changer! Nvidia, survive, make your time!

Zodiark1593 · Sep 28, 2015

While we're on the topic, how well does DX12 performance scale with memory bandwidth. I'm curious to see what the implications are for APUs, as well as Microsoft's own console which also lacks bandwidth.

zlatan · Sep 28, 2015

littleg said:
Doesn't really seem to be much of an issue, there's going to be optimal ways to code for each architecture and who better to tell the devs how to get the best performance from the parts than the makers?

I'm wondering though, and maybe zlatan or someone with (way) more knowledge than me can help, would this be a simple task like just specifiying some options (maybe when you compile or something) or would devs need to write significantly different codebases for the different architectures?

I guess what I mean is would these different optimistations be fairly trivial to accomplish or would you need to write loads of extra code for each architecture separately?

If you want to build a super optimized engine for the PC, than it is important to consider some architecture specific paths. And this shouldn't be a problem if the engine structure built properly, but it will take additional time and resource. We already spend an awful lot of time to understand the drivers, but with the new explicit APIs this will change a lot, because the kernel driver won't affect the performance. In this case all of the code can be profiled and we will able to solve the performance issues easier. This model is really well-known from the consoles. In the end this will make our lives easier, even with some architecture specific paths.

This post from Nvidia is a good thing for the devs. I know the GCN really well, primarily because of the consoles, and I know how things works on Intel, because they also provide documents. But I don't really know how to optimize for Nvidia. This is a stepping stone from them. It is not much compared to what can we get from AMD and Intel, but finally it's a start.

Kenmitch · Sep 28, 2015

Maybe those in the know can decipher the do's and don'ts comparing the NVidia vs AMD hardware.

Wondering if it's like the early days of tessellation....As in don't use it till our hardware is superior.

guskline · Sep 28, 2015

Interesting read. Since I have both Nvidia (GTX980TI SC) and AMD (R9-290s CF) in my rigs it will be interesting to see the performance of upcoming DX12 games.

zlatan · Sep 28, 2015

Kenmitch said:
Maybe those in the know can decipher the do's and don'ts comparing the NVidia vs AMD hardware.

Wondering if it's like the early days of tessellation....As in don't use it till our hardware is superior.

Most of the post is valid for AMD and Intel. The only thing is not clarified for me is this:

Be aware of the fact that there is a cost associated with setup and reset of a command list

You still need a reasonable number of command lists for efficient parallel work submission

Fences force the splitting of command lists for various reasons ( multiple command queues, picking up the results of queries)

They don't really specified the reasons. I never had any problem on this with GCN, but maybe the architecture is much more robust, and it don't needs roundtrips between the CPU and the GPU, while the Geforce hardwares needs it.

Azix · Oct 9, 2015

ThatBuzzkiller said:
Here are the interesting parts ...

They specifically advise using root signatures as much as possible while MICROSOFT
recommends AGAINST binding your resources into the root signatures ...

That wouldn't be an issue if they had a fully bindless architecture ...

They also recommend against using ROVs and that you should look at alternatives like advanced blending operations too ...

I'm guessing ROVs are practically free on Intel architectures ...

Think most ignored this ROV thing. Big dx 12_1 feature that they were holding over AMD and they had to say that about it. Important IMO

[Nvidia] Gameworks DX12 DOs and DONTs

Senior member

Lifer

Platinum Member

Super Moderator Graphics Cards

Lifer

Senior member

Golden Member

Diamond Member

Member

Senior member

Senior member

Senior member

Elite Member

Lifer

Diamond Member

Lifer

Lifer

Platinum Member

Senior member

Diamond Member

Diamond Member

Senior member

Golden Member