Nvidia's Performance Under Vulkan API Explored

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
What?
Vulkan has a API for this, it isn't AMD specific.
That is the whole point in using Vulkan, it is a standard API, this isn't a example of using custom libs (like gameworks), so, what exactly is Doom supposed to change?

Straight from their FAQ page:

Does DOOM support asynchronous compute when running on the Vulkan API?

Asynchronous compute is a feature that provides additional performance gains on top of the baseline id Tech 6 Vulkan feature set.

Currently asynchronous compute is only supported on AMD GPUs and requires DOOM Vulkan supported drivers to run. We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.

Click here for additional information on asynchronous compute.

So as you can see, they need to enable asynchronous compute for NVidia GPUs in Vulkan. To support concurrent asynchronous compute, you need three things:

1) Hardware support
2) Driver support
3) API and program support

So what's apparently missing is the API support for concurrent asynchronous compute for NVidia GPUs.. This kind of goes in line with what I was saying earlier about Vulkan inheriting the Mantle codebase, which gave AMD a head start on NVidia when it came to low level support for AMD hardware.

DX12 on the other hand already supports concurrent asynchronous compute on Pascal..

Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.

Not only that. The game itself takes longer to load as well..
 

Elixer

Lifer
May 7, 2002
10,376
762
126
So as you can see, they need to enable asynchronous compute for NVidia GPUs in Vulkan. To support concurrent asynchronous compute, you need three things:
1) Hardware support
2) Driver support
3) API and program support
So what's apparently missing is the API support for concurrent asynchronous compute for NVidia GPUs.. This kind of goes in line with what I was saying earlier about Vulkan inheriting the Mantle codebase, which gave AMD a head start on NVidia when it came to low level support for AMD hardware.
I think that is the wrong conclusion.
The API *already* supports async, it is not vendor specific.
The hardware does have some level/form of it (for some nvidia cards), so, that leaves drivers.
I suppose they could be querying the drivers to see which vendor, then doing different code paths, but the original point still stands, if they are using pure Vulkan API calls, then it shouldn't matter which vendor the API is being run on, they should be following the specs of the API (via drivers).
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I think that is the wrong conclusion.
The API *already* supports async, it is not vendor specific.
The hardware does have some level/form of it (for some nvidia cards), so, that leaves drivers.
I suppose they could be querying the drivers to see which vendor, then doing different code paths, but the original point still stands, if they are using pure Vulkan API calls, then it shouldn't matter which vendor the API is being run on, they should be following the specs of the API (via drivers).

I posted a quote directly from their FAQ page on the developer's website, and you're still challenging me on this o_O

Asynchronous compute isn't vendor specific I agree, but the application/API has to be made aware that the capability exists on hardware before it can submit the work queues appropriately, and the code itself has to be written to take proper advantage. .. For example, when the first DX12 update came out for Rise of the Tomb Raider, asynchronous compute was not supported on either Radeons or NVidia GPUs. Only on the second DX12 update did this occur. Now this is despite the fact that DX12 supports asynchronous compute..

The truth is, Vulkan is very much a work in progress. AMD got a head start due to their Mantle donation, but that's it.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
I posted a quote directly from their FAQ page on the developer's website, and you're still challenging me on this o_O

Asynchronous compute isn't vendor specific I agree, but the application/API has to be made aware that the capability exists on hardware before it can submit the work queues appropriately, and the code itself has to be written to take proper advantage. .. For example, when the first DX12 update came out for Rise of the Tomb Raider, asynchronous compute was not supported on either Radeons or NVidia GPUs. Only on the second DX12 update did this occur. Now this is despite the fact that DX12 supports asynchronous compute..

The truth is, Vulkan is very much a work in progress. AMD got a head start due to their Mantle donation, but that's it.

Here is the difference between an async compute call and one without:

void D3D12nBodyGravity::OnRender()
{
// Wait for graphics fence to finish
if (AsynchronousComputeEnabled) {
PIXBeginEvent (m_computeCommandQueue.Get (), 0, L"Simulate");
m_computeCommandQueue->Wait (m_graphicsCopyFences [m_lastFrameIndex].Get (), m_graphicsCopyFenceValues [m_lastFrameIndex]);
} else {
PIXBeginEvent (m_graphicsCommandQueue.Get (), 0, L"Simulate");
}

RecordComputeCommandList ();

// Close and execute the command list.
ID3D12CommandList* ppCommandLists[] = { m_computeCommandLists[m_frameIndex].Get () };

if (AsynchronousComputeEnabled) {
m_computeCommandQueue->ExecuteCommandLists (1, ppCommandLists);
m_computeFenceValues [m_frameIndex] = m_computeFenceValue;
m_computeCommandQueue->Signal (m_computeFences [m_frameIndex].Get (), m_computeFenceValue);
PIXEndEvent (m_computeCommandQueue.Get ());
} else {
m_graphicsCommandQueue->ExecuteCommandLists (1, ppCommandLists);
PIXEndEvent (m_graphicsCommandQueue.Get ());
}


++m_computeFenceValue;

RecordCopyCommandList ();

ppCommandLists[0] = { m_graphicsCopyCommandLists[m_frameIndex].Get () };

// Wait for compute fence to finish
if (AsynchronousComputeEnabled) {
m_graphicsCommandQueue->Wait (m_computeFences [m_frameIndex].Get (), m_computeFenceValues [m_frameIndex]);
}


// Execute copy
m_graphicsCommandQueue->ExecuteCommandLists (1, ppCommandLists);
if (AsynchronousComputeEnabled) {
m_graphicsCommandQueue->Signal (m_graphicsCopyFences [m_frameIndex].Get (), m_graphicsCopyFenceValue);
}


PIXBeginEvent (m_graphicsCommandQueue.Get (), 0, L"Render");
++m_graphicsCopyFenceValue;
RecordRenderCommandList ();

// Execute the rendering
ppCommandLists[0] = { m_graphicsCommandLists[m_frameIndex].Get() };
m_graphicsCommandQueue->ExecuteCommandLists(1, ppCommandLists);
if (AsynchronousComputeEnabled) {
m_graphicsCommandQueue->Signal (m_graphicsFences [m_frameIndex].Get (), m_graphicsFenceValue);
}

PIXEndEvent (m_graphicsCommandQueue.Get ());
++m_graphicsFenceValue;

// Present the frame.
ThrowIfFailed(m_swapChain->Present(0, 0));

MoveToNextFrame();
}

Most of the code is identical or has tiny changes, but it's completely vendor agnostic. The only additional requirement is fencing to sync up what is happening.

The part that tells the code to enable / disable that is either an option in settings, or the driver saying "Yes use it or no I can't so disable it". It is up to Nvidia to provide working drivers. We heard the exact same quote from Oxide regarding Async Compute in Ashes of the Singularity for Maxwell. Will Nvidia actually deliver this time?
 
  • Like
Reactions: Yakk

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The part that tells the code to enable / disable that is either an option in settings, or the driver saying "Yes use it or no I can't so disable it". It is up to Nvidia to provide working drivers. We heard the exact same quote from Oxide regarding Async Compute in Ashes of the Singularity for Maxwell. Will Nvidia actually deliver this time?

I'm not a graphics programmer, but I doubt asynchronous compute is that simple. According to I/O interactive, the developer behind Hitman, Asynchronous compute is super hard to tune..

That said, if NVidia doesn't have asynchronous compute enabled for Pascal, then how do you explain the consistent 7.5% gain I get in Time Spy when it's turned on?

Now it's possible that their Vulkan driver may not have it enabled currently I'll grant you that, but their DX12 driver definitely has it enabled.
 

Deders

Platinum Member
Oct 14, 2012
2,401
1
91
the difference with Pascal is instead of async shaders, they have async compute units. It's down to groups of shaders, where's before NVidia would partition a whole segment of the gpu off for compute commands if needed.
 

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
I'm not a graphics programmer, but I doubt asynchronous compute is that simple. According to I/O interactive, the developer behind Hitman, Asynchronous compute is super hard to tune..

Well I'm just showing how you call it, how to best take advantage of it and design your engine in a way that can handle doing mixed compute & copy is more difficult. But doing that is engine design. Doing async compute work has nothing to do with vendor specific calls or anything like that. The reason that Maxwell is slower is if you don't do the checks I was doing above (or the driver tells it it CAN handle async compute) is the fences will slightly slow down the pipeline and you don't have the benefit from rendering faster, so you get lower performance overall.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The reason that Maxwell is slower is if you don't do the checks I was doing above (or the driver tells it it CAN handle async compute) is the fences will slightly slow down the pipeline and you don't have the benefit from rendering faster, so you get lower performance overall.

Maxwell is a special case scenario. While it's capable of asynchronous compute, it's not capable of using asynchronous compute dynamically, unlike Pascal.. Or that's the explanation Ryan gave in his 1080 review. With Maxwell, the driver needs to know beforehand what the asynchronous workload is going to be so it can allocate resources as required. But of course this isn't the way that games are being programmed. Perhaps with a special Maxwell optimized title this could work, but not with most games at all..

And that's why the performance hit was so large for Maxwell when Async compute was turned on..
 

Elixer

Lifer
May 7, 2002
10,376
762
126
I posted a quote directly from their FAQ page on the developer's website, and you're still challenging me on this o_O.
I think you are missing the point.
I am challenging your interpretation of what you think that FAQ entry states.

We are working with NVIDIA to enable asynchronous compute in Vulkan on NVIDIA GPUs. We hope to have an update soon.
You take that to mean an API issue, and I am saying it can't be a API issue, it is a driver (or hardware) issue that they must work around.
 
  • Like
Reactions: boozzer

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
Maxwell is a special case scenario. While it's capable of asynchronous compute, it's not capable of using asynchronous compute dynamically, unlike Pascal.. Or that's the explanation Ryan gave in his 1080 review. With Maxwell, the driver needs to know beforehand what the asynchronous workload is going to be so it can allocate resources as required. But of course this isn't the way that games are being programmed. Perhaps with a special Maxwell optimized title this could work, but not with most games at all..

And that's why the performance hit was so large for Maxwell when Async compute was turned on..

No, Maxwell still can't do async compute at all. Nvidia has never released async compute "ready" drivers.

Why it was slower in games is because the games are still set to use async compute, so the fences are slightly slowing down to "sync" the frame parts but since nothing is out of sync because its already running synchronous it's just an extra delay for no gain.
 
  • Like
Reactions: Yakk

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Just letting you guys know, the 372.54 WHQL drivers fixed the stuttering issue in Doom when using the Vulkan pathway with Vsync enabled :cool:
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
The reason why Nvidia probably can't do async compute in Doom is because the implementation is locked behind AMD shader intrinsic extensions ...

If your async compute code path requires an IHV specific extension then no other IHVs will be able to access that very same path UNTIL they support the said implementation of that extension in their drivers ...
 

Krteq

Senior member
May 22, 2015
991
671
136
Maxwell is a special case scenario. While it's capable of asynchronous compute, it's not capable of using asynchronous compute dynamically, unlike Pascal.
Well, Maxwell is able of "async-compute", but it applies only for CUDA (Compute). It's not compatible with DX12 Multi-engine/Vulkan Async-Compute.
 
Last edited:

DamZe

Member
May 18, 2016
187
80
101
Well, Maxwell is able of "async-compute", but it applies only for CUDA (Compute). It's not compatible with DX12 Multi-engine/Vulkan Async-Compute.

It really sucks for us Maxwell owners. It has become all too apparent that Maxwell was a pure DX11-12 stepping stone for nVIDIA, never mind the DX12 hype when I bought my 980 for 570+ bucks back in November 2014, for what it’s worth the 900-series of cards are still very capable pieces of hardware, but they weren't as DX12 "futureproof" as many of us thought they would be.
 

Keysplayr

Elite Member
Jan 16, 2003
21,209
50
91
The reason why Nvidia probably can't do async compute in Doom is because the implementation is locked behind AMD shader intrinsic extensions ...

If your async compute code path requires an IHV specific extension then no other IHVs will be able to access that very same path UNTIL they support the said implementation of that extension in their drivers ...


He said probably, so he may just be speculating?
 

zlatan

Senior member
Mar 15, 2011
580
291
136
On D3D12 there is no need to write vendor specific code to use async compute. The DXGI will do all the "magic". The renderer must support multi-engine, with selected pipelines and well designed barriers/fences for parallel execution, and job done. With this, the driver has the right to place the compute pipelines to the independent compute queues. The driver also has the right to refuse this, and send the compute pipelines to the OS, which will place them to the graphics queue. So one code base, many possibilities, with guaranteed compatibility. The problem is how to design the async compute workload, and this is the hard part. For one architecture it's easy, but for two... well, it is not always possible. :( But there is not necessary to write a code with conditional construct to manually turn on or off the async compute (force single-engine exection). The IHV can use the OS universal graphics scheduler, if the specific async compute workload will not run well on the actual hardware for the actual application. This is basically the same as "async off".

On Vulkan this is managed differently. If the driver report the COMPUTE_BIT flag for the independent compute engines, then there is no chanse to refuse the async compute execution. This will make the Vulkan driver less manageable, because if the flag is there than every async compute Vulkan application will run, even if it hurt the performance. In this case the application must have a code with conditional construct to manually turn on or off the async compute.

I hope this will clarifies some misunderstanding.
 

Thinker_145

Senior member
Apr 19, 2016
609
58
91
Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.
lol if this is true why is it not being talked about? All the talk about more "free" performance kinda goes out the window if the loading times increase.

Sent from my HTC One M9
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
Has anyone else noticed how much longer a Doom level takes to load with Vulcan compared to OpenGL? Even reloading after death.

Nope. Well, I never bothered playing in OpenGL since I got the game after it was updated with Vulkan, but still. Doom's level loading times don't feel particularly long to me. (the time it takes for the game to start is a bit annoying, though).
 

Red Hawk

Diamond Member
Jan 1, 2011
3,266
169
106
How so? If the loading times are longer then Vulkan mode is inferior no 2 ways about it.

Sent from my HTC One M9

No, not really. The free performance doesn't just go "out the window", it still has smoother gameplay than OpenGL mode.