Game Dev Talking about DX, Vulkan, Mantle and multi GPU amongst others

DownTheSky

Senior member
Apr 7, 2013
787
156
106
Many years ago, I briefly worked at NVIDIA on the DirectX driver team (internship). This is Vista era, when a lot of people were busy with the DX10 transition, the hardware transition, and the OS/driver model transition. My job was to get games that were broken on Vista, dismantle them from the driver level, and figure out why they were broken. While I am not at all an expert on driver matters (and actually sucked at my job, to be honest), I did learn a lot about what games look like from the perspective of a driver and kernel.

The first lesson is: Nearly every game ships broken. We're talking major AAA titles from vendors who are everyday names in the industry. In some cases, we're talking about blatant violations of API rules - one D3D9 game never even called BeginFrame/EndFrame. Some are mistakes or oversights - one shipped bad shaders that heavily impacted performance on NV drivers. These things were day to day occurrences that went into a bug tracker. Then somebody would go in, find out what the game screwed up, and patch the driver to deal with it. There are lots of optional patches already in the driver that are simply toggled on or off as per-game settings, and then hacks that are more specific to games - up to and including total replacement of the shipping shaders with custom versions by the driver team. Ever wondered why nearly every major game release is accompanied by a matching driver release from AMD and/or NVIDIA? There you go.

The second lesson: The driver is gigantic. Think 1-2 million lines of code dealing with the hardware abstraction layers, plus another million per API supported. The backing function for Clear in D3D 9 was close to a thousand lines of just logic dealing with how exactly to respond to the command. It'd then call out to the correct function to actually modify the buffer in question. The level of complexity internally is enormous and winding, and even inside the driver code it can be tricky to work out how exactly you get to the fast-path behaviors. Additionally the APIs don't do a great job of matching the hardware, which means that even in the best cases the driver is covering up for a LOT of things you don't know about. There are many, many shadow operations and shadow copies of things down there.

The third lesson: It's unthreadable. The IHVs sat down starting from maybe circa 2005, and built tons of multithreading into the driver internally. They had some of the best kernel/driver engineers in the world to do it, and literally thousands of full blown real world test cases. They squeezed that system dry, and within the existing drivers and APIs it is impossible to get more than trivial gains out of any application side multithreading. If Futuremark can only get 5% in a trivial test case, the rest of us have no chance.

The fourth lesson: Multi GPU (SLI/CrossfireX) is [redacted] complicated. You cannot begin to conceive of the number of failure cases that are involved until you see them in person. I suspect that more than half of the total software effort within the IHVs is dedicated strictly to making multi-GPU setups work with existing games. (And I don't even know what the hardware side looks like.) If you've ever tried to independently build an app that uses multi GPU - especially if, god help you, you tried to do it in OpenGL - you may have discovered this insane rabbit hole. There is ONE fast path, and it's the narrowest path of all. Take lessons 1 and 2, and magnify them enormously.

Deep breath.

Ultimately, the new APIs are designed to cure all four of these problems.
* Why are games broken? Because the APIs are complex, and validation varies from decent (D3D 11) to poor (D3D 9) to catastrophic (OpenGL). There are lots of ways to hit slow paths without knowing anything has gone awry, and often the driver writers already know what mistakes you're going to make and are dynamically patching in workarounds for the common cases.
* Maintaining the drivers with the current wide surface area is tricky. Although AMD and NV have the resources to do it, the smaller IHVs (Intel, PowerVR, Qualcomm, etc) simply cannot keep up with the necessary investment. More importantly, explaining to devs the correct way to write their render pipelines has become borderline impossible. There's too many failure cases. it's been understood for quite a few years now that you cannot max out the performance of any given GPU without having someone from NVIDIA or AMD physically grab your game source code, load it on a dev driver, and do a hands-on analysis. These are the vanishingly few people who have actually seen the source to a game, the driver it's running on, and the Windows kernel it's running on, and the full specs for the hardware. Nobody else has that kind of access or engineering ability.
* Threading is just a catastrophe and is being rethought from the ground up. This requires a lot of the abstractions to be stripped away or retooled, because the old ones required too much driver intervention to be properly threadable in the first place.
* Multi-GPU is becoming explicit. For the last ten years, it has been AMD and NV's goal to make multi-GPU setups completely transparent to everybody, and it's become clear that for some subset of developers, this is just making our jobs harder. The driver has to apply imperfect heuristics to guess what the game is doing, and the game in turn has to do peculiar things in order to trigger the right heuristics. Again, for the big games somebody sits down and matches the two manually.

Part of the goal is simply to stop hiding what's actually going on in the software from game programmers. Debugging drivers has never been possible for us, which meant a lot of poking and prodding and experimenting to figure out exactly what it is that is making the render pipeline of a game slow. The IHVs certainly weren't willing to disclose these things publicly either, as they were considered critical to competitive advantage. (Sure they are guys. Sure they are.) So the game is guessing what the driver is doing, the driver is guessing what the game is doing, and the whole mess could be avoided if the drivers just wouldn't work so hard trying to protect us.

So why didn't we do this years ago? Well, there are a lot of politics involved (cough Longs Peak) and some hardware aspects but ultimately what it comes down to is the new models are hard to code for. Microsoft and ARB never wanted to subject us to manually compiling shaders against the correct render states, setting the whole thing invariant, configuring heaps and tables, etc. Segfaulting a GPU isn't a fun experience. You can't trap that in a (user space) debugger. So ... the subtext that a lot of people aren't calling out explicitly is that this round of new APIs has been done in cooperation with the big engines. The Mantle spec is effectively written by Johan Andersson at DICE, and the Khronos Vulkan spec basically pulls Aras P at Unity, Niklas S at Epic, and a couple guys at Valve into the fold.

Three out of those four just made their engines public and free with minimal backend financial obligation.

Now there's nothing wrong with any of that, obviously, and I don't think it's even the big motivating raison d'etre of the new APIs. But there's a very real message that if these APIs are too challenging to work with directly, well the guys who designed the API also happen to run very full featured engines requiring no financial commitments*. So I think that's served to considerably smooth the politics involved in rolling these difficult to work with APIs out to the market, encouraging organizations that would have been otherwise reticent to do so.
[Edit/update] I'm definitely not suggesting that the APIs have been made artificially difficult, by any means - the engineering work is solid in its own right. It's also become clear, since this post was originally written, that there's a commitment to continuing DX11 and OpenGL support for the near future. That also helped the decision to push these new systems out, I believe.

The last piece to the puzzle is that we ran out of new user-facing hardware features many years ago. Ignoring raw speed, what exactly is the user-visible or dev-visible difference between a GTX 480 and a GTX 980? A few limitations have been lifted (notably in compute) but essentially they're the same thing. MS, for all practical purposes, concluded that DX was a mature, stable technology that required only minor work and mostly disbanded the teams involved. Many of the revisions to GL have been little more than API repairs. (A GTX 480 runs full featured OpenGL 4.5, by the way.) So the reason we're seeing new APIs at all stems fundamentally from Andersson hassling the IHVs until AMD woke up, smelled competitive advantage, and started paying attention. That essentially took a three year lag time from when we got hardware to the point that compute could be directly integrated into the core of a render pipeline, which is considered normal today but was bluntly revolutionary at production scale in 2012. It's a lot of small things adding up to a sea change, with key people pushing on the right people for the right things.


Phew. I'm no longer sure what the point of that rant was, but hopefully it's somehow productive that I wrote it. Ultimately the new APIs are the right step, and they're retroactively useful to old hardware which is great. They will be harder to code. How much harder? Well, that remains to be seen. Personally, my take is that MS and ARB always had the wrong idea. Their idea was to produce a nice, pretty looking front end and deal with all the awful stuff quietly in the background. Yeah it's easy to code against, but it was always a [redacted] and a half to debug or tune. Nobody ever took that side of the equation into account. What has finally been made clear is that it's okay to have difficult to code APIs, if the end result just works. And that's been my experience so far in retooling: it's a pain in the [redacted], requires widespread revisions to engine code, forces you to revisit a lot of assumptions, and generally requires a lot of infrastructure before anything works. But once it's up and running, there's no surprises. It works smoothly, you're always on the fast path, anything that IS slow is in your OWN code which can be analyzed by common tools. It's worth it.

(*See this post by Unity's Aras P for more thoughts. I have a response comment in there as well.)

Interesting read :thumbsup:

Source:
http://www.gamedev.net/topic/666419-what-are-your-opinions-on-dx12vulkanmantle/

Profanity isn't allowed in VC&G, even in a quote.
-Elfear
 
Last edited by a moderator:

imaheadcase

Diamond Member
May 9, 2005
3,850
7
76
So i don't know much from a programming standpoint, but that last paragraph is what makes programmers never going to understand about a user. He says MS messed up by making the front end of DX easy to work with, but a pain to debug.

My brother made this small 5k line of code program to switch sound devices in windows with click of icon, you have to edit a file to put in name of all sound devices and keybinds you want. Now it took me 10min to get it up and running, but if i had a UI i could just click it and be done with it in a few seconds.

His rant to me was he could do that, but take a long time, make it 15k lines of code for something that all you have to do is edit a ini file and if that is so hard "don't use it".

Now if MS made it a easier way to make code for it but harder to debug..why didn't they do it both ways. One for indie devs to use easier, and that raw code for AAA people.

I don't know, my brain hurts thinking about programming stuff. lol
 

Paul98

Diamond Member
Jan 31, 2010
3,732
199
106
So i don't know much from a programming standpoint, but that last paragraph is what makes programmers never going to understand about a user. He says MS messed up by making the front end of DX easy to work with, but a pain to debug.

My brother made this small 5k line of code program to switch sound devices in windows with click of icon, you have to edit a file to put in name of all sound devices and keybinds you want. Now it took me 10min to get it up and running, but if i had a UI i could just click it and be done with it in a few seconds.

His rant to me was he could do that, but take a long time, make it 15k lines of code for something that all you have to do is edit a ini file and if that is so hard "don't use it".

Now if MS made it a easier way to make code for it but harder to debug..why didn't they do it both ways. One for indie devs to use easier, and that raw code for AAA people.

I don't know, my brain hurts thinking about programming stuff. lol

There is no reason it should have taken anywhere close to that many lines of code.
 

BFG10K

Lifer
Aug 14, 2000
22,674
2,821
126
There's nothing surprising in that post and it confirms that low level APIs aren't viable. If AAA games are shipping broken now, that's going to increase tenfold when the driver complexities needed to deal with them are shifted onto game developers.

He says the end result "just works". Sure, as long as it's running on the hardware he originally coded it for. But what happens on future hardware? Is he expecting the whole software industry to constantly patch their games whenever new nVidia/AMD/Intel hardware arrives? We've already seen Thief and BF4 broken with Mantle and the 285.

Low level APIs are only viable for fixed/embedded hardware (e.g. consoles) or back in the primitive days of DOS.

I expect DX12 will have more impact than Mantle/Vulcan simply because of its exposure, but DX11 is going to remain overwhelmingly widespread for the reasons above. No customer is going accept all of their games breaking (which includes running slower than they did before) whenever they buy a new graphics card.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
There's nothing surprising in that post and it confirms that low level APIs aren't viable. If AAA games are shipping broken now, that's going to increase tenfold when the driver complexities needed to deal with them are shifted onto game developers.

He says the end result "just works". Sure, as long as it's running on the hardware he originally coded it for. But what happens on future hardware? Is he expecting the whole software industry to constantly patch their games whenever new nVidia/AMD/Intel hardware arrives? We've already seen Thief and BF4 broken with Mantle and the 285.

Low level APIs are only viable for fixed/embedded hardware (e.g. consoles) or back in the primitive days of DOS.

I expect DX12 will have more impact than Mantle/Vulcan simply because of its exposure, but DX11 is going to remain overwhelmingly widespread for the reasons above. No customer is going accept all of their games breaking (which includes running slower than they did before) whenever they buy a new graphics card.


Afaik the the apis and driver modules have released the abstractions on memory management, among other things. I don't know about old games supporting new uarchs but shipping broken games shouldn't be happening when the debugging system is much ore robust. Hell this might make less games ship broken.

Also it might be possible to programmatically work around new uarchs with heuristics to catch these differences.

Also don't count Vulcan out yet, we are taking about Linux, android, iOS, niche embedded os's, OS X et cetera

Also I read the last paragraph as "I expect mantle/directx 12 will have more impact than mantle/Vulcan..."
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,208
4,940
136
Afaik the the apis and driver modules have released the abstractions on memory management, among other things. I don't know about old games supporting new uarchs but shipping broken games shouldn't be happening when the debugging system is much ore robust. Hell this might make less games ship broken.

Yes, we can now expect every game to be just as stable and robust as the flagship Mantle game, Battlefield 4 :awe:
 

AnandThenMan

Diamond Member
Nov 11, 2004
3,949
504
126
Yes, we can now expect every game to be just as stable and robust as the flagship Mantle game, Battlefield 4 :awe:
I know you're trolling but I'll give a serious response. Seeing Mantle was a brand new API that just came out, the general state of BF4 was relatively excellent. We see to this day D3D games that are a total mess upon release and require multiple patches to get the game even close to proper. And D3D has been around for decades and devs are extremely familiar with it.

On low level APIs like Mantle and DX12, this is what devs asked for. Yes it will take a different approach and will be "more work" depending on how you look at it but the payoff is well worth it.
 

Pottuvoi

Senior member
Apr 16, 2012
416
2
81
I know you're trolling but I'll give a serious response. Seeing Mantle was a brand new API that just came out, the general state of BF4 was relatively excellent. We see to this day D3D games that are a total mess upon release and require multiple patches to get the game even close to proper. And D3D has been around for decades and devs are extremely familiar with it.

On low level APIs like Mantle and DX12, this is what devs asked for. Yes it will take a different approach and will be "more work" depending on how you look at it but the payoff is well worth it.
Yes, it might be more work, but it's also different kind of work.

Thin API should reduce amount of times when features are broken or work because of drivers do something they shouldn't.
Now if something doesn't work developers know that it's their fault.
 
Last edited:

Erenhardt

Diamond Member
Dec 1, 2012
3,251
105
101
What was the problem with mantle in bf4? I can't recall...There were issues with netcode, but that have nothing to do with rendering api.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,229
9,990
126
As a former game dev myself, that rant makes perfect sense to me. Sounds about right. I was in the industry, as a DOS low-level (assembly language, bit-banging hardware and VGA registers, etc.) programmer, and the project I was on, was transitioning to "Games SDK for Windows" (which was eventually re-named DirectX). So pretty-much all of my code didn't end up in the final product, as they shipped for Win9x, rather than DOS, but it was a fun experience.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
The problem people dont seem to grasp is that BF4 was already FUBAR when Mantle patch arrived. So you have an API in alpha stage debuting in a game that on DX11 alone makes you BSOD from time to time.... yeah that's what I call a good mix :rolleyes:

The mantle implementation on the other FB3 games are telling on how BF4 sucks big time (and sucked even more in the first months after launch).
 

96Firebird

Diamond Member
Nov 8, 2010
5,709
316
126
One of the major problems with BF4 and Mantle is compatibility with new cards. The only real example of this is the 285, but that has been hit or miss on working. I hope this doesn't extend to the future APIs.

Edit - Looks like it is still an issue for Hardline as well, so I'm not sure who the blame rests on...
 
Last edited:

biostud

Lifer
Feb 27, 2003
18,193
4,674
136
So making a good game engine will be hard work, but once it is there it will be easier to debug?
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
Yes, we can now expect every game to be just as stable and robust as the flagship Mantle game, Battlefield 4 :awe:

Beta Software: Beta software refers to computer software that is undergoing testing and has not yet been officially released. The beta phase follows the alpha phase, but precedes the final version. Some beta software is only made available to a select number of users, while other beta programs are released to the general public.

Software developers release beta versions of software in order to garner useful feedback before releasing the final version of a program. They often provide web forums that allow beta testers to post their feedback and discuss their experience using software. Some beta software programs even have a built-in feedback feature that allows users to submit feature requests or bugs directly to the developer.

In most cases, a software developer will release multiple "beta" versions of a program during the beta phase. Each version includes updates and bug fixes that have been made in response to user feedback. The beta phase may last anywhere from a few weeks for a small program to several months for a large program.

Each beta version is typically labeled with the final version number followed by a beta version identifier. For example, the fifth beta release of the second version of a software program may have the version number "2.0b5." If a developer prefers not to list the specific version of a beta program, the version number may simply have the term "(beta)" after the program name, e.g. "My New App (beta)." This naming convention is commonly used for beta versions of websites or web applications.

Since beta software is a pre-release version of the final application, it may be unstable or lack features that will be be included in the final release. Therefore, beta software often comes with a disclaimer that testers should use the software at their own risk. If you choose to beta test a program, be aware that it may not function as expected.
 

mindbomb

Senior member
May 30, 2013
363
0
0
Bf4 isn't broken with the 285. The ultra mantle settings were designed for the 4GB r9 290 series, and reviewers were using the same settings on the 2gb 285, causing huge pci express related bottlenecks.
 
Feb 19, 2009
10,457
10
76
There's nothing surprising in that post and it confirms that low level APIs aren't viable. If AAA games are shipping broken now, that's going to increase tenfold when the driver complexities needed to deal with them are shifted onto game developers.

Agreed, low level programming is on another level of complexity & potential mess up. That's why the post emphasized that these advantages are incorporated into game engines that people/developers can use.

Developers are moving away from being programmers and towards content creators instead, which is great as they can focus on what they are good at. This leaves programmers with the task of making awesome game engines.

All the big engines are going to be DX12 functional with support for Vulkan (for mobiles in particular). That is all that matters.
 

96Firebird

Diamond Member
Nov 8, 2010
5,709
316
126
Bf4 isn't broken with the 285. The ultra mantle settings were designed for the 4GB r9 290 series, and reviewers were using the same settings on the 2gb 285, causing huge pci express related bottlenecks.

Do you have anything to back this up? As far as I know, Mantle doesn't add anything visually compared to DX11 in BF4 and BF Hardline. The games run fine in DX11 on the 285, but is hit or miss in Mantle performance, and so far I've only seen miss with Hardline.
 

itsmydamnation

Platinum Member
Feb 6, 2011
2,744
3,077
136
Do you have anything to back this up? As far as I know, Mantle doesn't add anything visually compared to DX11 in BF4 and BF Hardline. The games run fine in DX11 on the 285, but is hit or miss in Mantle performance, and so far I've only seen miss with Hardline.

nope this isn't it, Dave on B3D gave details i can't remember what it was exactly or find the damn post but i think it was different GPR count changes with the newer GCN rev and the software not using them. he also said these counts are exposed via mantle so im guessing in future DX12 /Vulkan these will also be exposed and the big AAA engines will handle this better then the first generation of mantle engines.

seriously some of you guys are a joke, look for anything to justify your position not reality, lets compare mantle to DX1 for a "valid" comparison :cool:

mantle has run perfectly for me on the two games i own that use it (DAI and BF4) what people continually ignore with mantle is how silky smooth it is, notice it easily in BF4 when things get nutts.

/290 owner
 
Last edited:

96Firebird

Diamond Member
Nov 8, 2010
5,709
316
126
nope this isn't it, Dave on B3D gave details i can't remember what it was exactly or find the damn post but i think it was different GPR count changes with the newer GCN rev and the software not using them. he also said these counts are exposed via mantle so im guessing in future DX12 /Vulkan these will also be exposed and the big AAA engines will handle this better then the first generation of mantle engines.

I'm not sure what you are trying to say here... Is it the fault of the game developer, the engine developer, or the driver? What needs to be updated for newer cards to work?

For reference of what I am talking about...

Ryan Smith said:
On a tangential note, this does raise the question of how well Direct3D 12 may handle the issue. By its vendor-limited nature Mantle has the opportunity to work even lower than a cross-vendor low level API like Direct3D 12, but D3D12 is still going to be low level and exposed to some of these hazards. For that reason it will be interesting to keep an eye on Direct3D development over the next year to see how Microsoft and its partners handle the issue. We would expect to see Microsoft have a better handle on forward-compatibility – in their position they pretty much have to – but if nothing else we’re curious just what it will take from game developers, API developers, and hardware developers alike to ensure that necessary level of forward-compatibility.

Anandtech 285 Review

That was on launch, so maybe some things needed to be worked out...

But then, for the 960 launch, the 285 still had trouble with BF4 and Mantle according to HardOCP:

Brent Justice said:
When we tested BF4 this round we started off testing under Mantle on the R9 285 GPU. We got our results, and it seemed a little low to us. We tested again but this time under Direct3D 11 and actually found performance to be better under DX11. This happens from time to time in this game, as new patches are released sometimes new drivers have to come along that help Mantle performance. For right now, we stuck with the best performing runs we got, which were under DX11, so all cards are tested under DX11 for this article.

HardOCP 960 Review

And now, BF Hardline is showing performance loss with the 285 and Mantle...

PCLab.pl said:
AMD cards we tested in DirectX mode and Mantle. Only in the case of the Radeon R9 285 Mantle mode performance was significantly worse than in DirectX. The remaining cards behave like included in the graph Radeon R9 290X.

PCLab.pl BF Hardline benchmark (untranslated)

What I want to know is, can we expect something similar with DX12 and new cards? What is the issue with the 285 and Mantle, who is responsible, and why hasn't it been mitigated? My guess is it lies on the engine developers, or the game developers. I haven't heard of any problems with the 285 and Civ:BE or DAI using Mantle, but I couldn't find many reviews. I wanted to check for an update from Anandtech on this, but they didn't review the 960 and the Titan X review doesn't include the 285.

Sorry if this seems off-topic, but can you imagine a newer, faster card getting released but is shown as slower in DX12 games because "whatever" hasn't been updated yet?
 

dacostafilipe

Senior member
Oct 10, 2013
771
244
116
There's nothing surprising in that post and it confirms that low level APIs aren't viable. If AAA games are shipping broken now, that's going to increase tenfold when the driver complexities needed to deal with them are shifted onto game developers.

A broken game is a broken games, it has nothing to do with the API it's using. If the studio does not have the money to invest in testing, there's nothing that can help here.

For those that have the money, a low level API is a lot easier to use, even if there's more code.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136
Bf4 isn't broken with the 285. The ultra mantle settings were designed for the 4GB r9 290 series, and reviewers were using the same settings on the 2gb 285, causing huge pci express related bottlenecks.

That's one of the major problems with moving too much into the game devs hands. Sure the game dev might be very keen to have more control as they develop a game, but the moment it's out and been sold for 6 months they are done and won't touch it again. They won't fix it for new graphics cards, or new versions of windows or anything else, they no longer care.

Now if the gpu company is basically in control then they can hack the game drivers so it works with their latest card, however if they don't have that control then how do they fix it? 285 is a pretty clear example it's very hard.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
I'm not sure what you are trying to say here... Is it the fault of the game developer, the engine developer, or the driver? What needs to be updated for newer cards to work?

For reference of what I am talking about...



Anandtech 285 Review

That was on launch, so maybe some things needed to be worked out...

But then, for the 960 launch, the 285 still had trouble with BF4 and Mantle according to HardOCP:



HardOCP 960 Review

And now, BF Hardline is showing performance loss with the 285 and Mantle...



PCLab.pl BF Hardline benchmark (untranslated)

What I want to know is, can we expect something similar with DX12 and new cards? What is the issue with the 285 and Mantle, who is responsible, and why hasn't it been mitigated? My guess is it lies on the engine developers, or the game developers. I haven't heard of any problems with the 285 and Civ:BE or DAI using Mantle, but I couldn't find many reviews. I wanted to check for an update from Anandtech on this, but they didn't review the 960 and the Titan X review doesn't include the 285.

Sorry if this seems off-topic, but can you imagine a newer, faster card getting released but is shown as slower in DX12 games because "whatever" hasn't been updated yet?

Interesting with the 285 and still being unfixed. I dont think this translates into DX12 tho. But talk about bad customer support from an AMD sponsored game.