Hitman Developers talk DX12 (Interview)

Bacon1 · Apr 15, 2016

http://wccftech.com/hitman-lead-dev-dx12-gains-time-ditching-dx11/

HITMAN on DirectX 12 only offers approximately a 10% performance boost on AMD cards. Are you planning to improve DirectX 12 performance and/or add any further DX12 features via patch and if so, which ones?

We dont have any DX12 specific improvements planned for our subsequent releases. While we have 10% improvement when completely GPU bound, I think there are gains above that when youre CPU bound.

NVIDIA cards, on the other hand, have basically the same performance under DX11 and DX12. Youve mentioned during the GDC 2016 talk that youre working with NVIDIA to improve their Async Compute implementation how are things looking on that end and do you have any ETA?

I dont have any news on that front, sorry.

Async Compute in particular has received a lot of attention from PC enthusiasts, specifically in regards to NVIDIA GPUs lacking hardware support for it. However, in the GDC 2016 talk you said that even AMD cards only got a 5-10% boost and furthermore, you described Async Compute as super hard to tune because too much work can make it a penalty. Is it fair to say that the importance of Async Compute has been perhaps overstated in comparison to other factors that determine performance? Do you think NVIDIA may be in trouble if Pascal doesnt implement a hardware solution for Async Compute?

The main reason its hard is that every GPU ideally needs custom tweaking the bandwidth to compute ration is different for each GPU, ideally requiring tweaking the amount of async work for each one. I dont think its overstated, but obviously YMMW (your mileage may vary). In the current state, Async compute is a nice & easy performance win. In the long run it will be interesting to see if GPUs get better at running parallel work, since we could potentially get even better wins.

Several DirectX 12 games are out now, but the first outlook isnt nearly as positive as Microsoft stated (up to 20% more performance from the GPU and up to 50% more performance from the CPU). Do you think that its just a matter of time before developers learn how to use the new API, or perhaps the performance benefits have been somewhat overestimated?

I think it will take a bit of time, and the drivers & games need to mature and do the right things. Just reaching parity with DX11 is a lot of work. 50% performance from CPU is possible, but it depends a lot on your game, the driver, and how well they work together. Improving performance by 20% when GPU bound will be very hard, especially when you have a DX11 driver team trying to improve performance on platform as well. Its worth mentioning we did only a straight port, once we start using some of the new features of dx12, it will open up a lot of new possibilities and then the gains will definitely be possible. We probably wont start on those features until we can ditch DX11, since a lot of them require fundamental changes to our render code.

Do you believe HITMANs DirectX 12 performance could have been better if the game had been built entirely on it, rather than having DX11 as min spec?

Yes, it obviously would have. DX12 includes new hardware features, which I think over time will make it possible for us to make games run even better.

Another low level API has been recently released: Vulkan. What do you think of it in terms of performance and features? Do you have any plans to add Vulkan support to HITMAN?

Vulkan is a graphics programmers wet dream: A high performance API, like d3d12, for all platforms. With that said, we dont have any plans to add Vulkan support to Hitman. Also, unfortunately it looks like Vulkan wont be supported on all platforms.

Finally, as a developer, what are your thoughts on the Universal Windows Platform?

Im not qualified to talk about UWP I dont know anything about it. I can say, I really like the Win32 API, its been around forever, is well documented, and it works.

CuriousMike · Apr 15, 2016

Bacon1 said:
"I can say, I really like the Win32 API, its been around forever, is well documented, and it works."

Brogrammer fistbump

R0H1T · Apr 15, 2016

NVIDIA cards, on the other hand, have basically the same performance under DX11 and DX12. Youve mentioned during the GDC 2016 talk that youre working with NVIDIA to improve their Async Compute implementation how are things looking on that end and do you have any ETA?

I dont have any news on that front, sorry.

No major Async compute (r)evolution till Volta I bet D:

Adored · Apr 15, 2016

Pretty much what I said in a recent video. Async is good, but it's hype right now and requires optimization on a card-per-card basis, which few devs will be arsed with.

Going forward though we can easily imagine a case where it won't need that level of tweaking, or any at all.

moonbogg · Apr 15, 2016

Crap, I guess I should take my 980ti's out of the trash bin.

tential · Apr 15, 2016

moonbogg said:
Crap, I guess I should take my 980ti's out of the trash bin.

Too late, already using them and about to add them to my signature.

It's like I've said though, a 980ti owner should not care. They'll upgrade before this matters. Midrange users who are bigger gamers don't upgrade. They hold cards for 3-5 years.

I consider people who buy mid range cards regularly to still be enthusiasts by the way.

maddie · Apr 15, 2016

Adored said:
Pretty much what I said in a recent video. Async is good, but it's hype right now and requires optimization on a card-per-card basis, which few devs will be arsed with.

Going forward though we can easily imagine a case where it won't need that level of tweaking, or any at all.

And yet, we have this.

Async Compute in particular has received a lot of attention from PC enthusiasts, specifically in regards to NVIDIA GPUs lacking hardware support for it. However, in the GDC 2016 talk you said that even AMD cards only got a 5-10% boost and furthermore, you described Async Compute as super hard to tune because too much work can make it a penalty. Is it fair to say that the importance of Async Compute has been perhaps overstated in comparison to other factors that determine performance? Do you think NVIDIA may be in trouble if Pascal doesnt implement a hardware solution for Async Compute?

The main reason its hard is that every GPU ideally needs custom tweaking the bandwidth to compute ration is different for each GPU, ideally requiring tweaking the amount of async work for each one. I dont think its overstated, but obviously YMMW (your mileage may vary). In the current state, Async compute is a nice & easy performance win. In the long run it will be interesting to see if GPUs get better at running parallel work, since we could potentially get even better wins.

Adored · Apr 15, 2016

Yes but 1% is a performance win as well. Hitman shows around 3% on average I believe, not 5-10%.

AMD has a lot bigger advantages than async but they've marketed it well, which makes a change for them.

RussianSensation · Apr 15, 2016

^ Yes, but it seems for console ports, it's most likely going to be the case of developers squeezing the last ounce of performance from underpowered XB1/PS4. If the games are originally designed for those consoles to take advantage of Async Compute, it directly translates into a win for PC hardware that supports it. Otoh, Steam data shows that most GPUs (NV) don't support this functionality. This means if an XB1/PS4 game doesn't have Async Compute, it means the developer has to spend extra resources incorporating it into the PC version. How likely are they to do this when AMD's market share is 20-21% right now? It's up to AMD to work with developers then or the rest seems to be dictated by the nature of GCN-optimized XB1/PS4 ports.

If Pascal barely improves Async Compute, add another 2+ years of software delay because it would mean hardware wise Async Compute still won't be mainstream. It's a shame really since AC is a performance boosting feature on more advanced GPU architectures. Who doesn't want another 10-30% boost in performance from hardware that allows parallel processing? I guess the answer to that is obvious...

Adored · Apr 15, 2016

The key here is standardization. AMD messed around a lot with the first implementation of async but now it looks like they've got it settled. No change in Polaris either.

Async will really start to count when async on the console = async on the PC exactly - with either utterly trivial optimization or none at all, likely involving the new patent that's being discussed as well.

All AMD has to do is provide the standardization. They've got plans going way beyond async though.

Silverforce11 · Apr 15, 2016

Adored said:
Yes but 1% is a performance win as well. Hitman shows around 3% on average I believe, not 5-10%.

AMD has a lot bigger advantages than async but they've marketed it well, which makes a change for them.

Hitman is up to 10%. For Hawaii and Fury it's ~10% while for Tahiti it's like 3% or so.

But AFAIK, they only used it for SSAO, shadows for "free" where there's enough ACEs.

In Ashes, most get 10-20%.

And it's because they put more compute in the Async queue, for all their unit lighting.

Want to see something mind blowing?

Heavy usage of Async Compute in QB, but not touching a single SP/ALU, all on the DMA engines in GCN.

http://forums.anandtech.com/showpost.php?p=38164220&postcount=349

Adored · Apr 15, 2016

Don't mistake DX12 for async. Only Ashes shows a big difference that is clear and can be toggled.

It's real as soon as there are a bunch of benchmarks showing similar results with async on and off. Until then, Ashes is an AMD poster child.

The consistent wins that AMD is getting without async is by far the more important DX12 story.

Silverforce11 · Apr 15, 2016

Sure, but many people misunderstood the purpose of Async Compute. They often attribute it to better shader utilization and that's just one part of it.

The example of AC in QB is very striking, heavy Copy Queues to get their 4 frame temporal reconstruction rendering to work. In DX12, Copy Queues are a subset of Compute queues, which are a subset of Graphics queues.

Copy Queues don't even touch shaders, yet, GCN is able to accelerate performance far above NV in QB, just because they support DX12's Multi-Engine Rendering and "Async Compute".

RussianSensation · Apr 15, 2016

Adored said:
The consistent wins that AMD is getting without async is by far the more important DX12 story.

If it's not Async Compute, why would GCN cards perform much better under DX12? Lower CPU overhead which allows the GPU to become better utilized?

Could it have more to do with latest games becoming so advanced for XB1/PS4 that developers are forced to squeeze/optimize for every last ounce of GCN?

I am inclined to believe this explanation more. How else can we explain Hitman performing so much faster on 290X/Fury X under DX11?

It's mind-blowing how much faster R9 290X is over 7970/280X or over 780Ti/970. Console effect imo.

Silverforce11 · Apr 15, 2016

@RS
The easy explanation is that AMD sponsored Hitman, and so IO/Square gimps NV GPU performance.

The same for Ashes of the Singularity.

But we're seeing even more gimpage in neutral non-sponsored titles so I don't agree that is the cause.

It's a combination of console effect and NV's gimped DX12 hardware/drivers all adding up to a storm with Polaris v Pascal.

A few weeks ago I had major doubts a 2,560 SP Polaris 10 could rival a 2,560 SP GP104 that's a much bigger die... but really, as review sites ditch older games and add new ones, NV GPUs need to brute force a lot to catch up.

tweakboy · Apr 15, 2016

That 980 Ti pownz this whole freakin list...

RussianSensation · Apr 15, 2016

Silverforce11 said:
A few weeks ago I had major doubts a 2,560 SP Polaris 10 could rival a 2,560 SP GP104 that's a much bigger die... but really, as review sites ditch older games and add new ones, NV GPUs need to brute force a lot to catch up.

Don't be so sure. P100 boosts to 1480mhz. Based on more conservative GPU clocks of NV's larger die products over the years vs. their mid-range and lower-end offerings, I am inclined to believe that 1080 GP104 (980 replacement) will be clocked higher than 1480mhz. Add in after-market versions, I wouldn't be shocked if GP104 can overclock to 1700mhz. For leaked Polaris clocks, we are seeing 850-1050mhz. That means I expect a 40-60% clock speed disadvantage for Polaris, but who knows. Remember before 670/680 launched, early leaks were showing them with 700mhz or so GPU clocks. Either way, since HD5870, AMD went from 850mhz to 1050mhz with Fury X. And now look how high NV's cards clock today. Even on 28nm, Maxwell overclocks to 1500-1550mhz.

Silverforce11 · Apr 15, 2016

Base clock is meaningless when they can power gate down and turbo boost individual units within a SIMD with Polaris.

poofyhairguy · Apr 16, 2016

The main reason its hard is that every GPU ideally needs custom tweaking the bandwidth to compute ration is different for each GPU, ideally requiring tweaking the amount of async work for each one.

So basically expect Hawaii to get most of the benefit.

Adored · Apr 16, 2016

RussianSensation said:
If it's not Async Compute, why would GCN cards perform much better under DX12? Lower CPU overhead which allows the GPU to become better utilized?

Could it have more to do with latest games becoming so advanced for XB1/PS4 that developers are forced to squeeze/optimize for every last ounce of GCN?

I am inclined to believe this explanation more. How else can we explain Hitman performing so much faster on 290X/Fury X under DX11?

It's mind-blowing how much faster R9 290X is over 7970/280X or over 780Ti/970. Console effect imo.

The first game I noticed it in DX11 was SW:BF. That DICE would be using GCN cards makes complete sense of course, so yes I think we're simply looking at the fact that more devs are just using GCN to start with. If you think about it, AMD probably never had this in their history, yet still stayed pretty close to Nvidia in most cases...

Silverforce11 · Apr 16, 2016

Adored said:
The first game I noticed it in DX11 was SW:BF. That DICE would be using GCN cards makes complete sense of course, so yes I think we're simply looking at the fact that more devs are just using GCN to start with. If you think about it, AMD probably never had this in their history, yet still stayed pretty close to Nvidia in most cases...

I first noticed it in Shadow of Mordor.

On Anandtech's bench a reference R290X was keeping up or slightly faster than a 980. That was very unexpected given older titles had a 15% gap.

Basically without GameWorks, NV can't compete when modern games come GCN optimized.

Examine The Division, NV sponsored, a lot of GameWorks tech, but as soon as you disable those GW features (PCSS & HBAO+)... GCN just powers ahead.

This is repeated for Far Cry 4, Dying Light, Rainbow Six, JC3 and other NV sponsored titles. Disable GW, bam, GCN goes ahead at each segment.

What about games where NV don't sponsor? Best example is Far Cry Primal, where a 390 is 30% faster than the 970.

NV actually need to sponsor and get involved with all the games, else GCN just runs too good. And this is in DX11 where GCN is running crippled.

Adored · Apr 16, 2016

Shadow of Mordor could have been an edge case due to it being so heavy on memory and bandwidth, but yes that one ran surprisingly well on AMD too.

finbarqs · Apr 16, 2016

just need something to drive my 3440x1440 monitor at full res... 980ti isn't enough currently... maybe the radeon pro duo...

3DVagabond · Apr 16, 2016

Adored said:
The first game I noticed it in DX11 was SW:BF. That DICE would be using GCN cards makes complete sense of course, so yes I think we're simply looking at the fact that more devs are just using GCN to start with. If you think about it, AMD probably never had this in their history, yet still stayed pretty close to Nvidia in most cases...

I've often said in the past that AMD won't be able to truly compete in the 3D workstation market until they can pry the Quadros out of Autodesk's workstations. When the app is developed on a particular IHV's products they are going to have an inherent advantage. I never thought about the game devs, but it makes sense there too.

Erenhardt · Apr 16, 2016

RussianSensation said:
If it's not Async Compute, why would GCN cards perform much better under DX12? Lower CPU overhead which allows the GPU to become better utilized?

Could it have more to do with latest games becoming so advanced for XB1/PS4 that developers are forced to squeeze/optimize for every last ounce of GCN?

I am inclined to believe this explanation more. How else can we explain Hitman performing so much faster on 290X/Fury X under DX11?

It's mind-blowing how much faster R9 290X is over 7970/280X or over 780Ti/970. Console effect imo.

Yes, and here is why there is such a lag with optimizations for GCN, since we are well in the console lifecycle:

Here is why:
https://youtu.be/VysWXsuGPHQ?t=64

TLDR:
https://youtu.be/VysWXsuGPHQ?t=197

Devs had no hope in PS4 and Xbone. Nobody expected the amazing sales consoles have. Games were not developed for the next gen until it became obvious how big the next gen has become.
Now that they actually try, we see a console effect at play in PC GPU landscape shifting.

Hitman Developers talk DX12 (Interview)

Diamond Member

Diamond Member

Platinum Member

Senior member

Lifer

Diamond Member

Diamond Member

Senior member

Elite Member

Senior member

Lifer

Senior member

Lifer

Elite Member

Lifer

Diamond Member

Elite Member

Lifer

Lifer

Senior member

Lifer

Senior member

Diamond Member

Lifer

Diamond Member