After talking things over with NVIDIA, they've agreed to allow me to discuss the precise changes they made to boost their Civ V performance by so much. So gather around children, crazy uncle Ryan has a story to tell.
In our description of Civ V, I've mentioned that it uses a slew of DirectX 11 technologies" but I've never gone in to great detail on what those are. I'm not going to go into deep detail on that now - there's a good article over at PC Games Hardware
that contains an interview with Firaxis about that - but I will quickly explain the ins and outs.
Often from a gamer standpoint it's natural to look at the immediate visual benefits of a new API. With DX11, the big feature is tessellation with a secondary feature of contact hardening shadows. However there's also a great deal of stuff going on in the backend for developers to make things faster - making things faster allows developers to use new graphical effects that may not have been practical before. So for DX11 on top of tessellation and contact hardening shadows there's also things like multithreaded rendering, compute shaders, support for larger textures, and the implementation of a pull model for certain attribute evaluation.
So why do I like Civ V? Because the LORE engine it's based on implements so many of these features. Sure, something like AvP will have tessellation added, or Bad Company 2 will implement contact hardening shadows, but most of the DX11 games today are adding one or two graphical features that improve the look of the game, but only begin to scratch the surface of the API. LORE goes much, much deeper. Firaxis uses multi-threaded rendering, they use compute shaders for texture compression, and they use tessellation. Today it's probably the most extensive AAA DX11 game that has been released so far. This makes it a great GPU benchmark, as it's a real game we can use to test features other games don't touch.
So what then is going on that made Civ V so much faster for NVIDIA? Admittedly I had to press NVIDIA for this - performance practically doubled on high-end GPUs, which is unheard of. Until they told me what exactly they did, I wasn't convinced it was real or if they had come up with a really sweet cheat. It definitely wasn't a cheat.
If you recall from our articles, I keep pointing to how we seem to be CPU limited at the time. Now if you go back to the list of DX11 features Civ V uses, a light bulb should light up: multithreaded rendering. Civ V uses multi-threaded rendering, in fact it uses it quite extensively. Now why do we have multi-threaded rendering in the first place? Half of this is to better mesh with multi-threaded games by enabling additional threads to directly contribute without having to go through a master thread first. But a second purposes is because multi-threaded rendering helps the GPU just as much as it helps the CPU.
Traditionally, rendering is a very serial process. The program needs to setup a bunch of objects and then pass that on to the video drivers and finally to the GPU. There's a high degree of submission overhead, meaning it's possible to choke the CPU while submitting a large number of objects to the GPU. In DirectX 11, multi-threaded rendering is achieved by turning the D3D pipeline into a 3 step process: the Device, the Immediate Context, and the Deferred Context. The important bit here is that the deferred context is full of things that have yet to be sent to the GPU, and that you can have a deferred context for each thread. When developers talk about multi-threaded rendering with DX11, this is what they're referring to. When you use DX11s multi-threaded rendering capabilities correctly, you can have several threads assemble their deferred contexts, and then combine them into a single command list once it comes time to render the scene.
So Civ V uses proper multi-threaded rendering, that's great! So why isn't this the end of the story?
It turns out that you don't actually need to support
all these nifty multi-threading features to be DX11(or rather D3D11) compliant - those features are optional - and that's what happened. And this is what changed my perspective on DX11, as before now I've never realized that anything in the API/spec was optional. Previously we had all the pieces to understand what was going on, but without knowing that AMD and NVIDIA did not fully support multi-threaded rendering, it was never clear what the bottleneck was.
But let's be clear here: multi-threaded rendering is a massive undertaking on the driver and hardware side. You're doing the GPU equivalent of inventing the multi-tasking operating system. NVIDIA and AMD have not until this point supported multi-threaded rendering in their drivers, as they have needed time to implement this feature correctly in their drivers. If you have the DX SDK installed, in the DX Caps Viewer this is visible in the D3D11 section under the title "Driver Command Lists".
So in a nutshell, 4 months ago Civ V supported multi-threaded rendering. AMD and NVIDIA did not.
Originally Posted by Firaxis @ PC Games Hardware
Civilization V, as far as we know, is the first fully threaded DX11 game.
Unfortunately, because no other games have used this feature yet, neither Nvidia nor AMD have publically released threaded drivers, so users may not experience all the benefits just yet. We decided to keep threading enabled for Civilization V, however, because we are continuing to work closely with Nvidia and AMD on their support for multi-threading. We expect publically available threaded drivers shortly.
The internal architecture of the Civilization V graphics engine, however, is heavily multi-threaded and users will see multi-processor benefits even with drivers that are not threaded (including DX9). We have developed a series of configurable benchmark modes that we use internally for measuring our threading ability. These are fully described in the readme file. After some discussion, we decided to expose these internal tests on the released version so, if the users view the readme file, they will see that there are detailed instructions of these benchmark modes.
Can you guess then what changed?
With the Release 265 series drivers, NVIDIA enabled partial support for DX11's multi-threaded rendering features. At the time this support was limited to just
Civ V, and while it was beyond the experimental stage it was clearly limited to Civ V as that allowed NVIDIA to deploy it against a single known program while they collected feedback and finished the other aspects of multi-threaded rendering.
With NVIDIA's drivers now allowing Civ V to use multiple deferred contexts, Civ V's performance shot way up. With high-end GPUs performance damn near doubled at lower resolutions. Civ V was in fact CPU limited - it was CPU limited because it was only able to use a single thread to assemble its contexts, and that thread was maxing out the single GPU core it could use. This is why drivers played such a big part in Civ V's performance, because how drivers handled D3D11 contexts was the key to unlocking Civ V's performance.
At this point in time we appear to be GPU limited, but we may also be CPU limited. Firaxis says Civ V can scale to 12 threads
; this would be a hex-core CPU with hyperthreading. Our testbed is only a quad-core CPU with HT, meaning we probably aren't maxing out Civ V on the CPU side. And even with HT, it's likely that 12 real cores would improve on performance relative to 6 cores + HT. Firaxis believes they're GPU limited, but it's hard to definitively tell which it is.
Image from Firaxis GDC11 presentation
In any case, full support for multi-threaded rendering was finally enabled in NVIDIA's Release 270 drivers, which were released last week. At this point any game or application can take advantage of the feature, and not just Civ V. This is also why NVIDIA has finally allowed me to write about what they're previously told me, as they no longer consider it a secret. Finally, on a side note the fact that Civ V had this feature enabled in NVIDIA's drivers early is why performance does not appear to have changed between Release 265 and Release 270.
Anyhow, as far as I know, AMD does not currently offer fully support for multi-threaded rendering (I don't have an AMD card plugged in right now to run the DX Caps Viewer against). I'm not sure where they are on it, though I doubt they're very far behind.
So in conclusion, the reason NVIDIA beats AMD in Civ V is that NVIDIA currently offers full support for multi-threaded rendering/deferred contexts/command lists, while AMD does not. Civ V uses massive amounts of objects and complex terrain, and because it's multi-threaded rendering capable the introduction of multi-threaded rendering support in NVIDIA's drivers means that NVIDIA's GPUs can now rip through the game.
This is the true power of DX11. When properly implemented in both drivers and games, DX11's multi-threaded rendering capabilities are going to allow developers to push a lot more stuff out to the GPU without immediately bottlenecking the CPU.
On a future note, while Civ V is the first game to use DX11 multi-threaded rendering, it is not going to be the last. Battlefield 3 will most likely use it - DICE was lamenting the lack of driver support last month at GDC
. The Capcom team responsible for Lost Planet 2 also mentioned how they would have liked to have this feature working before LP2, though I can't find the article at this time.
Coincidentally, last month's interview with AMD's Richard Huddy at Bit-Tech
also has a lot in common with this. AMD says DX11 multi-threaded rendering can double object/draw-call throughput, and they want to go well beyond that by bypassing the DX11 API.
Further Reading: AnandTech, Revealing The Power of DirectX 11