Nvidia's Performance Under Vulkan API Explored

Deders · Aug 12, 2016

Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.

Carfax83 · Aug 13, 2016

Elixer said:
What?
Vulkan has a API for this, it isn't AMD specific.
That is the whole point in using Vulkan, it is a standard API, this isn't a example of using custom libs (like gameworks), so, what exactly is Doom supposed to change?

Straight from their FAQ page:

Does DOOM support asynchronous compute when running on the Vulkan API?

Asynchronous compute is a feature that provides additional performance gains on top of the baseline id Tech 6 Vulkan feature set.

Currently asynchronous compute is only supported on AMD GPUs and requires DOOM Vulkan supported drivers to run. We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.

Click here for additional information on asynchronous compute.

So as you can see, they need to enable asynchronous compute for NVidia GPUs in Vulkan. To support concurrent asynchronous compute, you need three things:

1) Hardware support
2) Driver support
3) API and program support

So what's apparently missing is the API support for concurrent asynchronous compute for NVidia GPUs.. This kind of goes in line with what I was saying earlier about Vulkan inheriting the Mantle codebase, which gave AMD a head start on NVidia when it came to low level support for AMD hardware.

DX12 on the other hand already supports concurrent asynchronous compute on Pascal..

Deders said:
Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.

Not only that. The game itself takes longer to load as well..

Elixer · Aug 13, 2016

Carfax83 said:
So as you can see, they need to enable asynchronous compute for NVidia GPUs in Vulkan. To support concurrent asynchronous compute, you need three things:
1) Hardware support
2) Driver support
3) API and program support
So what's apparently missing is the API support for concurrent asynchronous compute for NVidia GPUs.. This kind of goes in line with what I was saying earlier about Vulkan inheriting the Mantle codebase, which gave AMD a head start on NVidia when it came to low level support for AMD hardware.

I think that is the wrong conclusion.
The API *already* supports async, it is not vendor specific.
The hardware does have some level/form of it (for some nvidia cards), so, that leaves drivers.
I suppose they could be querying the drivers to see which vendor, then doing different code paths, but the original point still stands, if they are using pure Vulkan API calls, then it shouldn't matter which vendor the API is being run on, they should be following the specs of the API (via drivers).

Carfax83 · Aug 13, 2016

Elixer said:
I think that is the wrong conclusion.
The API *already* supports async, it is not vendor specific.
The hardware does have some level/form of it (for some nvidia cards), so, that leaves drivers.
I suppose they could be querying the drivers to see which vendor, then doing different code paths, but the original point still stands, if they are using pure Vulkan API calls, then it shouldn't matter which vendor the API is being run on, they should be following the specs of the API (via drivers).

I posted a quote directly from their FAQ page on the developer's website, and you're still challenging me on this

Asynchronous compute isn't vendor specific I agree, but the application/API has to be made aware that the capability exists on hardware before it can submit the work queues appropriately, and the code itself has to be written to take proper advantage. .. For example, when the first DX12 update came out for Rise of the Tomb Raider, asynchronous compute was not supported on either Radeons or NVidia GPUs. Only on the second DX12 update did this occur. Now this is despite the fact that DX12 supports asynchronous compute..

The truth is, Vulkan is very much a work in progress. AMD got a head start due to their Mantle donation, but that's it.

Bacon1 · Aug 13, 2016

Carfax83 said:
I posted a quote directly from their FAQ page on the developer's website, and you're still challenging me on this

Asynchronous compute isn't vendor specific I agree, but the application/API has to be made aware that the capability exists on hardware before it can submit the work queues appropriately, and the code itself has to be written to take proper advantage. .. For example, when the first DX12 update came out for Rise of the Tomb Raider, asynchronous compute was not supported on either Radeons or NVidia GPUs. Only on the second DX12 update did this occur. Now this is despite the fact that DX12 supports asynchronous compute..

The truth is, Vulkan is very much a work in progress. AMD got a head start due to their Mantle donation, but that's it.

Here is the difference between an async compute call and one without:

void D3D12nBodyGravity::OnRender()
{
// Wait for graphics fence to finish
if (AsynchronousComputeEnabled) {
PIXBeginEvent (m_computeCommandQueue.Get (), 0, L"Simulate");
m_computeCommandQueue->Wait (m_graphicsCopyFences [m_lastFrameIndex].Get (), m_graphicsCopyFenceValues [m_lastFrameIndex]);
} else {
PIXBeginEvent (m_graphicsCommandQueue.Get (), 0, L"Simulate");
}
RecordComputeCommandList ();

// Close and execute the command list.
ID3D12CommandList* ppCommandLists[] = { m_computeCommandLists[m_frameIndex].Get () };

if (AsynchronousComputeEnabled) {
m_computeCommandQueue->ExecuteCommandLists (1, ppCommandLists);
m_computeFenceValues [m_frameIndex] = m_computeFenceValue;
m_computeCommandQueue->Signal (m_computeFences [m_frameIndex].Get (), m_computeFenceValue);
PIXEndEvent (m_computeCommandQueue.Get ());
} else {
m_graphicsCommandQueue->ExecuteCommandLists (1, ppCommandLists);
PIXEndEvent (m_graphicsCommandQueue.Get ());
}

++m_computeFenceValue;

RecordCopyCommandList ();

ppCommandLists[0] = { m_graphicsCopyCommandLists[m_frameIndex].Get () };

// Wait for compute fence to finish
if (AsynchronousComputeEnabled) {
m_graphicsCommandQueue->Wait (m_computeFences [m_frameIndex].Get (), m_computeFenceValues [m_frameIndex]);
}

// Execute copy
m_graphicsCommandQueue->ExecuteCommandLists (1, ppCommandLists);
if (AsynchronousComputeEnabled) {
m_graphicsCommandQueue->Signal (m_graphicsCopyFences [m_frameIndex].Get (), m_graphicsCopyFenceValue);
}

PIXBeginEvent (m_graphicsCommandQueue.Get (), 0, L"Render");
++m_graphicsCopyFenceValue;
RecordRenderCommandList ();

// Execute the rendering
ppCommandLists[0] = { m_graphicsCommandLists[m_frameIndex].Get() };
m_graphicsCommandQueue->ExecuteCommandLists(1, ppCommandLists);
if (AsynchronousComputeEnabled) {
m_graphicsCommandQueue->Signal (m_graphicsFences [m_frameIndex].Get (), m_graphicsFenceValue);
}
PIXEndEvent (m_graphicsCommandQueue.Get ());
++m_graphicsFenceValue;

// Present the frame.
ThrowIfFailed(m_swapChain->Present(0, 0));

MoveToNextFrame();
}

Most of the code is identical or has tiny changes, but it's completely vendor agnostic. The only additional requirement is fencing to sync up what is happening.

The part that tells the code to enable / disable that is either an option in settings, or the driver saying "Yes use it or no I can't so disable it". It is up to Nvidia to provide working drivers. We heard the exact same quote from Oxide regarding Async Compute in Ashes of the Singularity for Maxwell. Will Nvidia actually deliver this time?

Carfax83 · Aug 13, 2016

Bacon1 said:
The part that tells the code to enable / disable that is either an option in settings, or the driver saying "Yes use it or no I can't so disable it". It is up to Nvidia to provide working drivers. We heard the exact same quote from Oxide regarding Async Compute in Ashes of the Singularity for Maxwell. Will Nvidia actually deliver this time?

I'm not a graphics programmer, but I doubt asynchronous compute is that simple. According to I/O interactive, the developer behind Hitman, Asynchronous compute is super hard to tune..

That said, if NVidia doesn't have asynchronous compute enabled for Pascal, then how do you explain the consistent 7.5% gain I get in Time Spy when it's turned on?

Now it's possible that their Vulkan driver may not have it enabled currently I'll grant you that, but their DX12 driver definitely has it enabled.

Deders · Aug 13, 2016

the difference with Pascal is instead of async shaders, they have async compute units. It's down to groups of shaders, where's before NVidia would partition a whole segment of the gpu off for compute commands if needed.

Bacon1 · Aug 13, 2016

Carfax83 said:
I'm not a graphics programmer, but I doubt asynchronous compute is that simple. According to I/O interactive, the developer behind Hitman, Asynchronous compute is super hard to tune..

Well I'm just showing how you call it, how to best take advantage of it and design your engine in a way that can handle doing mixed compute & copy is more difficult. But doing that is engine design. Doing async compute work has nothing to do with vendor specific calls or anything like that. The reason that Maxwell is slower is if you don't do the checks I was doing above (or the driver tells it it CAN handle async compute) is the fences will slightly slow down the pipeline and you don't have the benefit from rendering faster, so you get lower performance overall.

Carfax83 · Aug 13, 2016

Bacon1 said:
The reason that Maxwell is slower is if you don't do the checks I was doing above (or the driver tells it it CAN handle async compute) is the fences will slightly slow down the pipeline and you don't have the benefit from rendering faster, so you get lower performance overall.

Maxwell is a special case scenario. While it's capable of asynchronous compute, it's not capable of using asynchronous compute dynamically, unlike Pascal.. Or that's the explanation Ryan gave in his 1080 review. With Maxwell, the driver needs to know beforehand what the asynchronous workload is going to be so it can allocate resources as required. But of course this isn't the way that games are being programmed. Perhaps with a special Maxwell optimized title this could work, but not with most games at all..

And that's why the performance hit was so large for Maxwell when Async compute was turned on..

Elixer · Aug 13, 2016

Carfax83 said:
I posted a quote directly from their FAQ page on the developer's website, and you're still challenging me on this .

I think you are missing the point.
I am challenging your interpretation of what you think that FAQ entry states.

We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.

You take that to mean an API issue, and I am saying it can't be a API issue, it is a driver (or hardware) issue that they must work around.

Bacon1 · Aug 13, 2016

Carfax83 said:
Maxwell is a special case scenario. While it's capable of asynchronous compute, it's not capable of using asynchronous compute dynamically, unlike Pascal.. Or that's the explanation Ryan gave in his 1080 review. With Maxwell, the driver needs to know beforehand what the asynchronous workload is going to be so it can allocate resources as required. But of course this isn't the way that games are being programmed. Perhaps with a special Maxwell optimized title this could work, but not with most games at all..

And that's why the performance hit was so large for Maxwell when Async compute was turned on..

No, Maxwell still can't do async compute at all. Nvidia has never released async compute "ready" drivers.

Why it was slower in games is because the games are still set to use async compute, so the fences are slightly slowing down to "sync" the frame parts but since nothing is out of sync because its already running synchronous it's just an extra delay for no gain.

Carfax83 · Aug 17, 2016

Just letting you guys know, the 372.54 WHQL drivers fixed the stuttering issue in Doom when using the Vulkan pathway with Vsync enabled

ThatBuzzkiller · Aug 17, 2016

The reason why Nvidia probably can't do async compute in Doom is because the implementation is locked behind AMD shader intrinsic extensions ...

If your async compute code path requires an IHV specific extension then no other IHVs will be able to access that very same path UNTIL they support the said implementation of that extension in their drivers ...

Krteq · Aug 17, 2016

Carfax83 said:
Maxwell is a special case scenario. While it's capable of asynchronous compute, it's not capable of using asynchronous compute dynamically, unlike Pascal.

Well, Maxwell is able of "async-compute", but it applies only for CUDA (Compute). It's not compatible with DX12 Multi-engine/Vulkan Async-Compute.

DamZe · Aug 17, 2016

Krteq said:
Well, Maxwell is able of "async-compute", but it applies only for CUDA (Compute). It's not compatible with DX12 Multi-engine/Vulkan Async-Compute.

It really sucks for us Maxwell owners. It has become all too apparent that Maxwell was a pure DX11-12 stepping stone for nVIDIA, never mind the DX12 hype when I bought my 980 for 570+ bucks back in November 2014, for what it’s worth the 900-series of cards are still very capable pieces of hardware, but they weren't as DX12 "futureproof" as many of us thought they would be.

Bacon1 · Aug 17, 2016

ThatBuzzkiller said:
The reason why Nvidia probably can't do async compute in Doom is because the implementation is locked behind AMD shader intrinsic extensions

Source?

Keysplayr · Aug 17, 2016

ThatBuzzkiller said:
The reason why Nvidia probably can't do async compute in Doom is because the implementation is locked behind AMD shader intrinsic extensions ...

If your async compute code path requires an IHV specific extension then no other IHVs will be able to access that very same path UNTIL they support the said implementation of that extension in their drivers ...

Bacon1 said:
Source?

He said probably, so he may just be speculating?

zlatan · Aug 18, 2016

On D3D12 there is no need to write vendor specific code to use async compute. The DXGI will do all the "magic". The renderer must support multi-engine, with selected pipelines and well designed barriers/fences for parallel execution, and job done. With this, the driver has the right to place the compute pipelines to the independent compute queues. The driver also has the right to refuse this, and send the compute pipelines to the OS, which will place them to the graphics queue. So one code base, many possibilities, with guaranteed compatibility. The problem is how to design the async compute workload, and this is the hard part. For one architecture it's easy, but for two... well, it is not always possible.

But there is not necessary to write a code with conditional construct to manually turn on or off the async compute (force single-engine exection). The IHV can use the OS universal graphics scheduler, if the specific async compute workload will not run well on the actual hardware for the actual application. This is basically the same as "async off".

On Vulkan this is managed differently. If the driver report the COMPUTE_BIT flag for the independent compute engines, then there is no chanse to refuse the async compute execution. This will make the Vulkan driver less manageable, because if the flag is there than every async compute Vulkan application will run, even if it hurt the performance. In this case the application must have a code with conditional construct to manually turn on or off the async compute.

I hope this will clarifies some misunderstanding.

Carfax83 · Aug 20, 2016

Zlatan, thanks for clearing it up

Thinker_145 · Aug 20, 2016

Deders said:
Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.

lol if this is true why is it not being talked about? All the talk about more "free" performance kinda goes out the window if the loading times increase.

Sent from my HTC One M9

Red Hawk · Aug 21, 2016

Deders said:
Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.

Nope. Well, I never bothered playing in OpenGL since I got the game after it was updated with Vulkan, but still. Doom's level loading times don't feel particularly long to me. (the time it takes for the game to start is a bit annoying, though).

Atreidin · Aug 21, 2016

Thinker_145 said:
lol if this is true why is it not being talked about? All the talk about more "free" performance kinda goes out the window if the loading times increase.

Sent from my HTC One M9

That's an absurd conclusion.

Thinker_145 · Aug 21, 2016

Atreidin said:
That's an absurd conclusion.

How so? If the loading times are longer then Vulkan mode is inferior no 2 ways about it.

Sent from my HTC One M9

Red Hawk · Aug 21, 2016

Thinker_145 said:
How so? If the loading times are longer then Vulkan mode is inferior no 2 ways about it.

Sent from my HTC One M9

No, not really. The free performance doesn't just go "out the window", it still has smoother gameplay than OpenGL mode.

Thinker_145 · Aug 21, 2016

Red Hawk said:
No, not really. The free performance doesn't just go "out the window", it still has smoother gameplay than OpenGL mode.

Sure but it only matters for those who get poor performance in OpenGL.

Sent from my HTC One M9

Nvidia's Performance Under Vulkan API Explored

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Golden Member

Golden Member

Member

Diamond Member

Elite Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Senior member

Diamond Member

Senior member