DX12 + Asynchronous compute = Gigantic leap in physical realism

Carfax83 · Mar 4, 2017

Back when all the hoopla about DX12 first started, I remember I made a post stating that asynchronous compute could very well spell the end of GPU accelerated PhysX (which was restricted to CUDA) as we know it. Not an impressive prediction I will admit, and I can't even find the post anyway. But it definitely looks like my prediction is going to come through.

A few days ago, NVidia announced their DX12 Gameworks initiative, although it really didn't get much press it seems. But it does have the potential to be a big game changer in PC land.

The biggest and most impactful change by far, is that AMD hardware will finally be able to get in on the action. Which means, that developers will no longer need to have conniptions when deciding what IHV to whore themselves out to.......which means, that game technology will no longer be held back by targeted marketing.

Ultimately, this should theoretically result in not only better looking games, but more importantly, games will become much more physically based in terms of simulation. Realistic physics simulation has remained elusive over the years, mostly due to the average gamer's CPU (Ryzen making octacores more mainstream should help) not being powerful enough to run advanced physics simulations whilst actually running the game at the same time, and also because NVidia's GPU accelerated physics tech was locked down to CUDA, a proprietary technology.

Not to mention that for a long time, one needed a dedicated PhysX GPU to really reap the benefits of hardware accelerated physics, which just wasn't feasible for most gamers. Now though, DX12 and asynchronous compute has changed this. Now it will be possible to have an AMD and NVidia system running hair and cloth simulation using DX12 compute, with very little or no performance hit. Water and smoke will be harder, but as GPUs become more powerful, eventually even they will be mastered.

So knowingly or unknowingly, Microsoft has solved the physics problem with DX12!

ThatBuzzkiller · Mar 4, 2017

Async compute + SM6 is key for GPU accelerated physics ...

HLSL should've been standardizing shading language features from CUDA ...

Let's hope the new dxc compiler is launched when the creators update is up and we can finally put a rest to the fxc compiler ...

Bacon1 · Mar 4, 2017

Carfax83 said:
The biggest and most impactful change by far, is that AMD hardware will finally be able to get in on the action. Which means, that developers will no longer need to have conniptions when deciding what IHV to whore themselves out to.......which means, that game technology will no longer be held back by targeted marketing.

I really hope this is the case. Just like the issues coming up with VR headsets and exclusives, it shouldn't matter what brand GPU, CPU, Monitor, etc you own, you should have the exact same experience.

"NVIDIA's commitment to DirectX 12 is clear," said Cam McRae, technical director at the Coalition, developers of Gears of War 4. "Having them onsite during the development of Gears of War 4 was immensely beneficial, and helped us to deliver a game that is fast, beautiful and stable."

The Flex 1.1 demo had some interesting stuff in it, didn't see any download for Flow 1.0 yet. Wish I had some other cards to compare it with.

tamz_msc · Mar 4, 2017

Maybe I missed it, but where is the part that says AMD will also benefit from DX12 Gameworks?

Carfax83 · Mar 4, 2017

ThatBuzzkiller said:
Async compute + SM6 is key for GPU accelerated physics ...

I had no idea SM6 could be used for GPU accelerated physics. I figured that SM6 would be graphics only.

Bacon1 · Mar 4, 2017

tamz_msc said:
Maybe I missed it, but where is the part that says AMD will also benefit from DX12 Gameworks?

I was playing with the Flex 1.1 stuff earlier on my Fury. They haven't released anything else from the article yet though

Carfax83 · Mar 4, 2017

Bacon1 said:
I really hope this is the case. Just like the issues coming up with VR headsets and exclusives, it shouldn't matter what brand GPU, CPU, Monitor, etc you own, you should have the exact same experience.

Porting over their tech to DX12 was a good move by NVidia, similar to when they started porting over physics effects like debris, cloth simulation and some other things into their CPU PhysX starting with version 3.0. Before that, cloth simulation was only for hardware accelerated PhysX.

I think their point is to pull in as many gamers as possible into the PC gaming world by making it as tantalizing as possible and reducing segmentation in terms of gaming experience.

Carfax83 · Mar 4, 2017

tamz_msc said:
Maybe I missed it, but where is the part that says AMD will also benefit from DX12 Gameworks?

They didn't state it explicitly, but AMD will definitely also benefit from this because AMD has DX12 hardware and asynchronous compute capability. Much like how AMD hardware can run Hairworks in the Witcher 3, because it uses DirectCompute and not CUDA, which is IHV specific.

tamz_msc · Mar 4, 2017

Carfax83 said:
They didn't state it explicitly, but AMD will definitely also benefit from this because AMD has DX12 hardware and asynchronous compute capability. Much like how AMD hardware can run Hairworks in the Witcher 3, because it uses DirectCompute and not CUDA, which is IHV specific.

IIRC Withcer 3 required tessellation override in Catalyst for AMD to have comparative performance with NVIDIA cards, because by default Hairworks also added tons of MSAA, which was not properly supported on AMD cards.

It seems to me that things may not change much this time, unless they release more information.

Carfax83 · Mar 4, 2017

tamz_msc said:
IIRC Withcer 3 required tessellation override in Catalyst for AMD to have comparative performance with NVIDIA cards, because by default Hairworks also added tons of MSAA, which was not properly supported on AMD cards.

It seems to me that things may not change much this time, unless they release more information.

The point is, AMD wasn't restricted from running Hairworks in the Witcher 3 because Hairworks utilizes DX11 DirectCompute. Now of course there are other reasons as you say why the Hairworks performance in the Witcher 3 was slower on AMD hardware, most notably because it used high levels of tessellation.

But from a purely simulation perspective, AMD hardware wasn't crippled at all. Hairworks with DX12 DirectCompute can theoretically have little or no performance cost for hair simulation, since it will use asynchronous compute. Now on the rendering side it's a different matter.

GodisanAtheist · Mar 4, 2017

It makes a certain kind of sense: when AMD was more hardware competitive with NV, software was the differentiator.

Now that NV is utterly dominant on the hardware front, their hardware competition is consoles not AMD cards as much.

Time to clean up the software for broad adoption and bring more folks over to the PC master race, odds are they'll be buying NV hardware anyway and a poorly coded game works program might be more of an albatross than a draw if mishandled.

ThatBuzzkiller · Mar 4, 2017

Carfax83 said:
I had no idea SM6 could be used for GPU accelerated physics. I figured that SM6 would be graphics only.

Physics need some compute resources so by extension any sort of programmable units like shaders are used and a powerful shading language like SM6 is useful for writing faster shaders since it exposes more functionality ...

bystander36 · Mar 4, 2017

The way I imagine this going is Nvidia is attempting to get ahead of things, and prevent compute from evolving in a way that their hardware doesn't excel. Creating an easy to use, agnostic API, means they have some control of the future of gaming software, similar to how AMD has had a lot of control due to having all the consoles. Hopefully they don't try and abuse their power, and actually attempt to make this a good working solution for all.

Carfax83 · Mar 4, 2017

bystander36 said:
The way I imagine this going is Nvidia is attempting to get ahead of things, and prevent compute from evolving in a way that their hardware doesn't excel. Creating an easy to use, agnostic API, means they have some control of the future of gaming software, similar to how AMD has had a lot of control due to having all the consoles. Hopefully they don't try and abuse their power, and actually attempt to make this a good working solution for all.

Microsoft develops DirectX with the aid of the IHVs and game developers. NVidia of course is a big part of that so I'm sure that NVidia doesn't need to resort to this just to have a say in how DirectCompute evolves. I think it's really just about increasing the viability of the PC gaming platform by removing segmentations and obstacles to a unified gaming experience. Doing this will help AMD as well, but it will help NVidia more in the long run because the technology will proliferate due to being unified and will bring a lot more people into the fold due to its next gen appeal.

The potential of this technology should not be underestimated. Realistic physics is the last bit of the puzzle we need for a truly immersive experience.

Now that said, it's going to be interesting to see whether Volta has any dedicated asynchronous compute engines like what AMD has, or stick with Pascal's method. I don't know how many transistors NVidia spent on their asynchronous compute technology in Pascal, but I bet it's a lot less than what AMD has spent. And when you look at the overall efficacy, it's not dramatically less capable. Fury X gains the most from asynchronous compute (more than 15% in some cases), but one could argue that's because it has the most unutilized resources at any given time which is a design flaw and not a strength.

Asynchronous compute on my GTX 1080 will increase performance by about 5 or 6% at most in Gears of War 4, but the impact on minimum framerates is much more impressive. With asynchronous compute turned on, I notice that the framerate is MUCH more stable than with it off, and it prevents framerate drops from occurring.

dogen1 · Mar 4, 2017

Carfax83 said:
Asynchronous compute on my GTX 1080 will increase performance by about 5 or 6% at most in Gears of War 4, but the impact on minimum framerates is much more impressive. With asynchronous compute turned on, I notice that the framerate is MUCH more stable than with it off, and it prevents framerate drops from occurring.

Very nice. Have you tested at different resolutions? I'd expect lower resolutions to benefit more, but that card does have insane geometry capability...

Carfax83 · Mar 5, 2017

dogen1 said:
Very nice. Have you tested at different resolutions? I'd expect lower resolutions to benefit more, but that card does have insane geometry capability...

No, I only tested at 1440p. With asynchronous compute off, I would get random framerate drops by as much as 10+ frames at times. The framerate drops weren't really a big deal though because the framerates could hit over 100 FPS at times, but still, I definitely found it remarkable that asynchronous compute could have that kind of effect on the minimum framerates. I'm actually thinking about reinstalling Gears of War 4 once the new DX12 optimized drivers come out (which will likely be this week) so I can test it again.

I wonder if the framerates drops might have been driver related? It's possible I suppose, if NVidia still have some inefficiency issues with their DX12 driver.

bystander36 · Mar 5, 2017

Carfax83 said:
Microsoft develops DirectX with the aid of the IHVs and game developers. NVidia of course is a big part of that so I'm sure that NVidia doesn't need to resort to this just to have a say in how DirectCompute evolves. I think it's really just about increasing the viability of the PC gaming platform by removing segmentations and obstacles to a unified gaming experience. Doing this will help AMD as well, but it will help NVidia more in the long run because the technology will proliferate due to being unified and will bring a lot more people into the fold due to its next gen appeal.

The potential of this technology should not be underestimated. Realistic physics is the last bit of the puzzle we need for a truly immersive experience.

Now that said, it's going to be interesting to see whether Volta has any dedicated asynchronous compute engines like what AMD has, or stick with Pascal's method. I don't know how many transistors NVidia spent on their asynchronous compute technology in Pascal, but I bet it's a lot less than what AMD has spent. And when you look at the overall efficacy, it's not dramatically less capable. Fury X gains the most from asynchronous compute (more than 15% in some cases), but one could argue that's because it has the most unutilized resources at any given time which is a design flaw and not a strength.

Asynchronous compute on my GTX 1080 will increase performance by about 5 or 6% at most in Gears of War 4, but the impact on minimum framerates is much more impressive. With asynchronous compute turned on, I notice that the framerate is MUCH more stable than with it off, and it prevents framerate drops from occurring.

Perhaps I didn't word it the way I should have.

I meant to say, is that Nvidia wants to make sure the way DirectCompute is used evolves in a way which is advantageous to them, or at least, to not be at a disadvantage. You know as well as I do, the way the the developers utilize the the API's have a huge impact on which cards perform best. Having some control in the way developers use DX12 features for compute, is going to be in their best interests.

tential · Mar 5, 2017

Just seems like another way to make Nvidia more money. They're using async which is only on their newest gpus.

I predicted this that they would use async to lock out features for older gpus to entice their users to upgrade

I already missed the boat on Nvidia stock but good to see they are down for whatever when it comes to making money. There may be a lot more growth yet still for this company.

Bacon1 · Mar 6, 2017

Sim Time Sim Latency
CUDA 0.6ms 3.8ms
DirectX 0.3ms 3ms

On a 1070

https://www.reddit.com/r/Amd/commen...ow_cloth_and_destruction_now_support/dekllhq/

So sounds like using DX is faster than CUDA all along

Headfoot · Mar 6, 2017

I'm hesitant to chalk it up to API without more evidence. It seems to me that re-factoring the code for DX12 presented a great opportunity to simply write faster code. I would be surprised if the dx12 version is also a new version altogether.

Carfax83 · Mar 7, 2017

It might be due to the context switch. It's the same reason why NVidia used DirectCompute for Hairworks in the Witcher 3 rather than using CUDA, because DirectCompute is native to DX11. So if the Flex demo was programmed using DX12, then DirectCompute would definitely be faster.

DX12 + Asynchronous compute = Gigantic leap in physical realism

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member