Hitman Developers talk DX12 (Interview)

Bacon1

Diamond Member
Feb 14, 2016
3,430
1,018
91
http://wccftech.com/hitman-lead-dev-dx12-gains-time-ditching-dx11/

HITMAN on DirectX 12 only offers approximately a 10% performance boost on AMD cards. Are you planning to improve DirectX 12 performance and/or add any further DX12 features via patch and if so, which ones?

We don’t have any DX12 specific improvements planned for our subsequent releases. While we have 10% improvement when completely GPU bound, I think there are gains above that when you’re CPU bound.

NVIDIA cards, on the other hand, have basically the same performance under DX11 and DX12. You’ve mentioned during the GDC 2016 talk that you’re working with NVIDIA to improve their Async Compute implementation – how are things looking on that end and do you have any ETA?

I don’t have any news on that front, sorry.

Async Compute in particular has received a lot of attention from PC enthusiasts, specifically in regards to NVIDIA GPUs lacking hardware support for it. However, in the GDC 2016 talk you said that even AMD cards only got a 5-10% boost and furthermore, you described Async Compute as “super hard” to tune because too much work can make it a penalty. Is it fair to say that the importance of Async Compute has been perhaps overstated in comparison to other factors that determine performance? Do you think NVIDIA may be in trouble if Pascal doesn’t implement a hardware solution for Async Compute?

The main reason it’s hard is that every GPU ideally needs custom tweaking – the bandwidth to compute ration is different for each GPU, ideally requiring tweaking the amount of async work for each one. I don’t think it’s overstated, but obviously YMMW (your mileage may vary). In the current state, Async compute is a nice & easy performance win. In the long run it will be interesting to see if GPU’s get better at running parallel work, since we could potentially get even better wins.

Several DirectX 12 games are out now, but the first outlook isn’t nearly as positive as Microsoft stated (up to 20% more performance from the GPU and up to 50% more performance from the CPU). Do you think that it’s just a matter of time before developers learn how to use the new API, or perhaps the performance benefits have been somewhat overestimated?

I think it will take a bit of time, and the drivers & games need to mature and do the right things. Just reaching parity with DX11 is a lot of work. 50% performance from CPU is possible, but it depends a lot on your game, the driver, and how well they work together. Improving performance by 20% when GPU bound will be very hard, especially when you have a DX11 driver team trying to improve performance on platform as well. It’s worth mentioning we did only a straight port, once we start using some of the new features of dx12, it will open up a lot of new possibilities – and then the gains will definitely be possible. We probably won’t start on those features until we can ditch DX11, since a lot of them require fundamental changes to our render code.

Do you believe HITMAN’s DirectX 12 performance could have been better if the game had been built entirely on it, rather than having DX11 as min spec?

Yes, it obviously would have. DX12 includes new hardware features, which I think over time will make it possible for us to make games run even better.

Another low level API has been recently released: Vulkan. What do you think of it in terms of performance and features? Do you have any plans to add Vulkan support to HITMAN?

Vulkan is a graphics programmer’s wet dream: A high performance API, like d3d12, for all platforms. With that said, we don’t have any plans to add Vulkan support to Hitman. Also, unfortunately it looks like Vulkan won’t be supported on all platforms.

Finally, as a developer, what are your thoughts on the Universal Windows Platform?

I’m not qualified to talk about UWP – I don’t know anything about it. I can say, I really like the Win32 API, it’s been around forever, is well documented, and it works.
 
Last edited:

R0H1T

Platinum Member
Jan 12, 2013
2,582
163
106
NVIDIA cards, on the other hand, have basically the same performance under DX11 and DX12. You’ve mentioned during the GDC 2016 talk that you’re working with NVIDIA to improve their Async Compute implementation – how are things looking on that end and do you have any ETA?

I don’t have any news on that front, sorry.
No major Async compute (r)evolution till Volta I bet D:
 

Adored

Senior member
Mar 24, 2016
256
1
16
Pretty much what I said in a recent video. Async is good, but it's hype right now and requires optimization on a card-per-card basis, which few devs will be arsed with.

Going forward though we can easily imagine a case where it won't need that level of tweaking, or any at all. ;)
 

tential

Diamond Member
May 13, 2008
7,348
642
121
Crap, I guess I should take my 980ti's out of the trash bin.
Too late, already using them and about to add them to my signature.

It's like I've said though, a 980ti owner should not care. They'll upgrade before this matters. Midrange users who are bigger gamers don't upgrade. They hold cards for 3-5 years.

I consider people who buy mid range cards regularly to still be enthusiasts by the way.
 

maddie

Diamond Member
Jul 18, 2010
5,149
5,524
136
Pretty much what I said in a recent video. Async is good, but it's hype right now and requires optimization on a card-per-card basis, which few devs will be arsed with.

Going forward though we can easily imagine a case where it won't need that level of tweaking, or any at all. ;)
And yet, we have this.


Async Compute in particular has received a lot of attention from PC enthusiasts, specifically in regards to NVIDIA GPUs lacking hardware support for it. However, in the GDC 2016 talk you said that even AMD cards only got a 5-10% boost and furthermore, you described Async Compute as “super hard” to tune because too much work can make it a penalty. Is it fair to say that the importance of Async Compute has been perhaps overstated in comparison to other factors that determine performance? Do you think NVIDIA may be in trouble if Pascal doesn’t implement a hardware solution for Async Compute?

The main reason it’s hard is that every GPU ideally needs custom tweaking – the bandwidth to compute ration is different for each GPU, ideally requiring tweaking the amount of async work for each one. I don’t think it’s overstated, but obviously YMMW (your mileage may vary). In the current state, Async compute is a nice & easy performance win. In the long run it will be interesting to see if GPU’s get better at running parallel work, since we could potentially get even better wins.
 

Adored

Senior member
Mar 24, 2016
256
1
16
Yes but 1% is a performance win as well. Hitman shows around 3% on average I believe, not 5-10%.

AMD has a lot bigger advantages than async but they've marketed it well, which makes a change for them.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
^ Yes, but it seems for console ports, it's most likely going to be the case of developers squeezing the last ounce of performance from underpowered XB1/PS4. If the games are originally designed for those consoles to take advantage of Async Compute, it directly translates into a win for PC hardware that supports it. Otoh, Steam data shows that most GPUs (NV) don't support this functionality. This means if an XB1/PS4 game doesn't have Async Compute, it means the developer has to spend extra resources incorporating it into the PC version. How likely are they to do this when AMD's market share is 20-21% right now? It's up to AMD to work with developers then or the rest seems to be dictated by the nature of GCN-optimized XB1/PS4 ports.

If Pascal barely improves Async Compute, add another 2+ years of software delay because it would mean hardware wise Async Compute still won't be mainstream. It's a shame really since AC is a performance boosting feature on more advanced GPU architectures. Who doesn't want another 10-30% boost in performance from hardware that allows parallel processing? I guess the answer to that is obvious...
 

Adored

Senior member
Mar 24, 2016
256
1
16
The key here is standardization. AMD messed around a lot with the first implementation of async but now it looks like they've got it settled. No change in Polaris either. ;)

Async will really start to count when async on the console = async on the PC exactly - with either utterly trivial optimization or none at all, likely involving the new patent that's being discussed as well.

All AMD has to do is provide the standardization. They've got plans going way beyond async though.
 
Last edited:
Feb 19, 2009
10,457
10
76
Yes but 1% is a performance win as well. Hitman shows around 3% on average I believe, not 5-10%.

AMD has a lot bigger advantages than async but they've marketed it well, which makes a change for them.

Hitman is up to 10%. For Hawaii and Fury it's ~10% while for Tahiti it's like 3% or so.

But AFAIK, they only used it for SSAO, shadows for "free" where there's enough ACEs.

In Ashes, most get 10-20%.

11051


And it's because they put more compute in the Async queue, for all their unit lighting.

Want to see something mind blowing?

Heavy usage of Async Compute in QB, but not touching a single SP/ALU, all on the DMA engines in GCN. :)

http://forums.anandtech.com/showpost.php?p=38164220&postcount=349
 

Adored

Senior member
Mar 24, 2016
256
1
16
Don't mistake DX12 for async. Only Ashes shows a big difference that is clear and can be toggled.

It's real as soon as there are a bunch of benchmarks showing similar results with async on and off. Until then, Ashes is an AMD poster child.

The consistent wins that AMD is getting without async is by far the more important DX12 story.
 
Feb 19, 2009
10,457
10
76
Sure, but many people misunderstood the purpose of Async Compute. They often attribute it to better shader utilization and that's just one part of it.

The example of AC in QB is very striking, heavy Copy Queues to get their 4 frame temporal reconstruction rendering to work. In DX12, Copy Queues are a subset of Compute queues, which are a subset of Graphics queues.

Copy Queues don't even touch shaders, yet, GCN is able to accelerate performance far above NV in QB, just because they support DX12's Multi-Engine Rendering and "Async Compute".
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
The consistent wins that AMD is getting without async is by far the more important DX12 story.

If it's not Async Compute, why would GCN cards perform much better under DX12? Lower CPU overhead which allows the GPU to become better utilized?

Could it have more to do with latest games becoming so advanced for XB1/PS4 that developers are forced to squeeze/optimize for every last ounce of GCN?

I am inclined to believe this explanation more. How else can we explain Hitman performing so much faster on 290X/Fury X under DX11?

hit_1920.jpg


It's mind-blowing how much faster R9 290X is over 7970/280X or over 780Ti/970. Console effect imo.
 
Feb 19, 2009
10,457
10
76
@RS
The easy explanation is that AMD sponsored Hitman, and so IO/Square gimps NV GPU performance. ;) The same for Ashes of the Singularity.

But we're seeing even more gimpage in neutral non-sponsored titles so I don't agree that is the cause.

It's a combination of console effect and NV's gimped DX12 hardware/drivers all adding up to a storm with Polaris v Pascal.

A few weeks ago I had major doubts a 2,560 SP Polaris 10 could rival a 2,560 SP GP104 that's a much bigger die... but really, as review sites ditch older games and add new ones, NV GPUs need to brute force a lot to catch up.
 

RussianSensation

Elite Member
Sep 5, 2003
19,458
765
126
A few weeks ago I had major doubts a 2,560 SP Polaris 10 could rival a 2,560 SP GP104 that's a much bigger die... but really, as review sites ditch older games and add new ones, NV GPUs need to brute force a lot to catch up.

Don't be so sure. P100 boosts to 1480mhz. Based on more conservative GPU clocks of NV's larger die products over the years vs. their mid-range and lower-end offerings, I am inclined to believe that 1080 GP104 (980 replacement) will be clocked higher than 1480mhz. Add in after-market versions, I wouldn't be shocked if GP104 can overclock to 1700mhz. For leaked Polaris clocks, we are seeing 850-1050mhz. That means I expect a 40-60% clock speed disadvantage for Polaris, but who knows. Remember before 670/680 launched, early leaks were showing them with 700mhz or so GPU clocks. Either way, since HD5870, AMD went from 850mhz to 1050mhz with Fury X. And now look how high NV's cards clock today. Even on 28nm, Maxwell overclocks to 1500-1550mhz.
 
Feb 19, 2009
10,457
10
76
Base clock is meaningless when they can power gate down and turbo boost individual units within a SIMD with Polaris.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
The main reason it’s hard is that every GPU ideally needs custom tweaking – the bandwidth to compute ration is different for each GPU, ideally requiring tweaking the amount of async work for each one.

So basically expect Hawaii to get most of the benefit.
 

Adored

Senior member
Mar 24, 2016
256
1
16
If it's not Async Compute, why would GCN cards perform much better under DX12? Lower CPU overhead which allows the GPU to become better utilized?

Could it have more to do with latest games becoming so advanced for XB1/PS4 that developers are forced to squeeze/optimize for every last ounce of GCN?

I am inclined to believe this explanation more. How else can we explain Hitman performing so much faster on 290X/Fury X under DX11?


It's mind-blowing how much faster R9 290X is over 7970/280X or over 780Ti/970. Console effect imo.

The first game I noticed it in DX11 was SW:BF. That DICE would be using GCN cards makes complete sense of course, so yes I think we're simply looking at the fact that more devs are just using GCN to start with. If you think about it, AMD probably never had this in their history, yet still stayed pretty close to Nvidia in most cases...
 
Feb 19, 2009
10,457
10
76
The first game I noticed it in DX11 was SW:BF. That DICE would be using GCN cards makes complete sense of course, so yes I think we're simply looking at the fact that more devs are just using GCN to start with. If you think about it, AMD probably never had this in their history, yet still stayed pretty close to Nvidia in most cases...

I first noticed it in Shadow of Mordor.

On Anandtech's bench a reference R290X was keeping up or slightly faster than a 980. That was very unexpected given older titles had a 15% gap.

Basically without GameWorks, NV can't compete when modern games come GCN optimized.

Examine The Division, NV sponsored, a lot of GameWorks tech, but as soon as you disable those GW features (PCSS & HBAO+)... GCN just powers ahead.

2ZYbGsQ.jpg


7kGJ8qc.jpg


2sk8AWz.jpg


This is repeated for Far Cry 4, Dying Light, Rainbow Six, JC3 and other NV sponsored titles. Disable GW, bam, GCN goes ahead at each segment.

What about games where NV don't sponsor? Best example is Far Cry Primal, where a 390 is 30% faster than the 970.

NV actually need to sponsor and get involved with all the games, else GCN just runs too good. And this is in DX11 where GCN is running crippled.
 

Adored

Senior member
Mar 24, 2016
256
1
16
Shadow of Mordor could have been an edge case due to it being so heavy on memory and bandwidth, but yes that one ran surprisingly well on AMD too.
 

finbarqs

Diamond Member
Feb 16, 2005
3,617
2
81
just need something to drive my 3440x1440 monitor at full res... 980ti isn't enough currently... maybe the radeon pro duo...
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
The first game I noticed it in DX11 was SW:BF. That DICE would be using GCN cards makes complete sense of course, so yes I think we're simply looking at the fact that more devs are just using GCN to start with. If you think about it, AMD probably never had this in their history, yet still stayed pretty close to Nvidia in most cases...

I've often said in the past that AMD won't be able to truly compete in the 3D workstation market until they can pry the Quadros out of Autodesk's workstations. When the app is developed on a particular IHV's products they are going to have an inherent advantage. I never thought about the game devs, but it makes sense there too.
 

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
If it's not Async Compute, why would GCN cards perform much better under DX12? Lower CPU overhead which allows the GPU to become better utilized?

Could it have more to do with latest games becoming so advanced for XB1/PS4 that developers are forced to squeeze/optimize for every last ounce of GCN?

I am inclined to believe this explanation more. How else can we explain Hitman performing so much faster on 290X/Fury X under DX11?

hit_1920.jpg


It's mind-blowing how much faster R9 290X is over 7970/280X or over 780Ti/970. Console effect imo.

Yes, and here is why there is such a lag with optimizations for GCN, since we are well in the console lifecycle:

Here is why:
https://youtu.be/VysWXsuGPHQ?t=64

TLDR:
https://youtu.be/VysWXsuGPHQ?t=197

Devs had no hope in PS4 and Xbone. Nobody expected the amazing sales consoles have. Games were not developed for the next gen until it became obvious how big the next gen has become.
Now that they actually try, we see a console effect at play in PC GPU landscape shifting.