Nvidia vs AMD's Driver Approach in DX11

bigboxes · Apr 3, 2017

Carfax83 said:
Blaming gameworks is intellectually lazy to be honest

It's a valid concern.

tamz_msc · Apr 3, 2017

Carfax83 said:
Might be, but it's still relevant because it shows that NVidia has been pursuing multithreaded drivers for a long time. Who knows what form it has taken in modern times? Only the NVidia software engineers know.

No, that article is about offloading vertex shader processing on to the CPU and managing the long pipelines in GPUs of that era. Irrelevant to this present discussion about NVIDIA's DX11 multi-threading.

EDIT: You should be more thorough with the understanding of your own links that you post; here is the last paragraph of that article

Despite the apparent gains offered by multithreading, de Waal expressed some skepticism about the prospects for thread-level parallelism for CPUs. He was concerned that multithreaded games could blunt the impact of multithreaded graphics drivers, among other things.

TLP is inherent in GCN; NVIDIA drivers can extract TLP in DX11 from Kepler onward.

Despoiler · Apr 3, 2017

2is said:
All their effort wasn't much. Mantle was abandoned soon after it was implemented.

Bacon1 said:
Yeah Vulkan and DX12 aren't a thing right?

Not to mention Mantle is the core of LiquidVR which is obviously still being leveraged.

Carfax83 · Apr 4, 2017

bigboxes said:
It's a valid concern.

If you honestly believe that, then you don't really understand what Gameworks is. And then you'd have to explain to me why certain Gameworks titles ran faster on AMD hardware, and vice versa.

Carfax83 · Apr 4, 2017

tamz_msc said:
No, that article is about offloading vertex shader processing on to the CPU and managing the long pipelines in GPUs of that era. Irrelevant to this present discussion about NVIDIA's DX11 multi-threading.

EDIT: You should be more thorough with the understanding of your own links that you post; here is the last paragraph of that article

I think you need to take your own advice and read the article more closely.

The article is about NVidia drivers exploiting multicore/multithreaded CPUs to increase performance. Offloading vertex processing was just ONE example of that. In fact, at the time that article was written, vertex processing was already being offloaded to the CPU when the GPU was busy:

De Waal cited several opportunities for driver performance gains with multithreading. Among them: vertex processing. He noted that NVIDIA's drivers currently do load balancing for vertex processing, offloading some work to the CPU when the GPU is busy. This sort of vertex processing load could be spun off into a separate thread and processed in parallel.

So like I said, you're wrong and it's clear that you're trying to minimize NVidia's longstanding efforts to exploit multicore/multithreaded CPUs and make it seem like it's something that only occurred with the advent of DX11 when it's clear this has been going on for much longer.

TLP is inherent in GCN; NVIDIA drivers can extract TLP in DX11 from Kepler onward.

I have no idea what this means. TLP is inherent in GCN?

Well tell me this, if "TLP is inherent to GCN," then why does GCN have such a hard time coping with DX11 CPU performance?

As for DX11, it inherently has minimal ability to scale rendering across multicore/multithreaded CPUs, which is exactly why NVidia developed a way to partially circumvent that limitation, and why AMD hardware has been so affected by underutilization.

Bacon1 · Apr 4, 2017

Carfax83 said:
so you can't just use deferred contexts and not driver command lists.

Carfax83 said:
No. Deferred contexts requires driver command lists to work properly. The two are complimentary, and need to be used together for the draw call batch submission to work in parallel.

Yes you can. AMD does not support driver command lists but does support deferred context lists. I've tested this myself.

Carfax83 said:
I think the really relevant question is, how many games today are actually draw call bottlenecked? Very few, and even then only in certain circumstances.

Hence why I said: "Obviously the GPUs can't handle that much work to use that many drawcalls in an actual game, but the difference in the APIs is massive and the difference will grow as GPUs are more powerful and limited by DX11."

Carfax83 said:
It scales all the way up to a deca-core CPU, which is incredible.

I never said they can't do multithreading in DX11, just that its much better in Vulkan / DX12. Developers have stated the same thing and that's one of the reasons they want to use it.

Carfax83 said:
task based parallelism

Yeah been a thing for a decade now

Carfax83 said:
Blaming gameworks is intellectually lazy to be honest

It's not lazy when its true. Gameworks titles are optimized for Nvidia first and foremost. Please show me any AAA GW title that doesn't run very poor on AMD hardware at launch and that isn't at least 10% faster within a few weeks.

Bacon1 · Apr 4, 2017

Carfax83 said:
As for DX11, it inherently has minimal ability to scale rendering across multicore/multithreaded CPUs

Didn't you just claim the opposite when pointing out how amazingly well Ghost Recon Wildlands scales??

Carfax83 said:
which is exactly why NVidia developed a way to partially circumvent that limitation, and why AMD hardware has been so affected by underutilization.

I don't think a single person is claiming otherwise. We all know that Nvidia does more work in drivers instead of hardware. That's what the whole post is about.

tamz_msc · Apr 4, 2017

Carfax83 said:
If you honestly believe that, then you don't really understand what Gameworks is. And then you'd have to explain to me why certain Gameworks titles ran faster on AMD hardware, and vice versa.

That's pretty simple. Many of the effects in the Gameworks SDK are based on compute shaders. GCN can handle compute very well; as long as there aren't other bottlenecks.

tamz_msc · Apr 4, 2017

Carfax83 said:
I think you need to take your own advice and read the article more closely. The article is about NVidia drivers exploiting multicore/multithreaded CPUs to increase performance. Offloading vertex processing was just ONE example of that. In fact, at the time that article was written, vertex processing was already being offloaded to the CPU when the GPU was busy:

So like I said, you're wrong and it's clear that you're trying to minimize NVidia's longstanding efforts to exploit multicore/multithreaded CPUs and make it seem like it's something that only occurred with the advent of DX11 when it's clear this has been going on for much longer.

Let's be more specific.

It talks about offloading vertex processing to a separate thread. Again, this is before unified shaders existed.

Some of the driver's other functions don't lend themselves so readily to parallel threading, so NVIDIA will use a combination of fully parallel threads and linear pipelining. We've seen the benefits of linear pipelining in our LAME audio encoding tests; this technique uses a simple buffering scheme to split work between two threads without creating the synchronization headaches of more parallel threading techniques.

That's just regular pipelining.

Lastly,

He was concerned that multithreaded games could blunt the impact of multithreaded graphics drivers, among other things.

tamz_msc · Apr 4, 2017

Carfax83 said:
I have no idea what this means. TLP is inherent in GCN? Well tell me this, if "TLP is inherent to GCN," then why does GCN have such a hard time coping with DX11 CPU performance?

As long as you don't load the primary thread with other stuff, it doesn't.

Carfax83 said:
As for DX11, it inherently has minimal ability to scale rendering across multicore/multithreaded CPUs, which is exactly why NVidia developed a way to partially circumvent that limitation, and why AMD hardware has been so affected by underutilization.

That is simply not true, go through this - straight from Ryan Smith himself.

2is · Apr 4, 2017

Bacon1 said:
Yeah Vulkan and DX12 aren't a thing right?

Please point me to the part of my post where I mention Vulkan and/or DX12. While you're at it, go ahead and explain how Mantle wasn't abandoned shortly after it's inception, which is what my post said.

Bacon1 · Apr 4, 2017

2is said:
Please point me to the part of my post where I mention Vulkan and/or DX12. While you're at it, go ahead and explain how Mantle wasn't abandoned shortly after it's inception, which is what my post said.

Uhh I did, did you not read the links? Vulkan is Mantle. They took what AMD had done and used that to create Vulkan. I mean even the name... Mantle = inside of of the Earth. Vulkan for Volcano, erupting... Even DX12 is influenced by Mantle which is why I mentioned it as well. Its clear because Vulkan and DX12 have many very similar features and capabilities which is why its easier to port between them than from either OpenGL or DX11 to them in the first place.

Also as @Despoiler pointed out Vulkan still exists and is used in LiquidVR and internal projects at AMD for testing new features.

Krteq · Apr 4, 2017

Guys, all these new low(er)-level APIs are based on AMD's command buffer submission and control model. It started with Mantle => Vulkan... and DX12, Metal uses the same model.

GCN is designed to take advantages of this model and Intel was quite interested in Mantle at the beginning - Intel approached AMD about access to Mantle, so Intel can design Gen 9 GPU uarch accordingly.

nV, on other hand, is quite behind due to their focus on DX11.

Peicy · Apr 4, 2017

Since this video is in dispute here and in other places as well, would someone would like to chime in and try to explain what they are actually doing? If its knowledge in the pupic domain of course, maybe no one really knows.

I imagine that its a multitude of little tweaks, tailored to hardware design (see the whole tile-based rendering on Maxwell and above) combined with a huge driver team that has the manpower to put a lot of work into a single title.
The higher CPU utilisation on Nvidia powered rigs with multicore-aware titles is curious though.

Headfoot · Apr 4, 2017

2is said:
Please point me to the part of my post where I mention Vulkan and/or DX12. While you're at it, go ahead and explain how Mantle wasn't abandoned shortly after it's inception, which is what my post said.

Stop intentionally missing the point. It is widely known Vulkan is Mantle + Intel/Nvidia compatibility and DX12 is derivative of it as well in some capacity.

2is · Apr 4, 2017

Bacon1 said:
Uhh I did, did you not read the links? Vulkan is Mantle. They took what AMD had done and used that to create Vulkan. I mean even the name... Mantle = inside of of the Earth. Vulkan for Volcano, erupting... Even DX12 is influenced by Mantle which is why I mentioned it as well. Its clear because Vulkan and DX12 have many very similar features and capabilities which is why its easier to port between them than from either OpenGL or DX11 to them in the first place.

Also as @Despoiler pointed out Vulkan still exists and is used in LiquidVR and internal projects at AMD for testing new features.

The post I quoted said AMD put all their efforts in Mantle, I said AMD abandoned mantle, which they did. Just because other entities use it as a basis for OTHER API's doesn't equate to Mantle not being EOL almost as soon as it started being used.

By your logic Ageia Physx cards are still current and so is Direct X... all of them.

dogen1 · Apr 4, 2017

Bacon1 said:
It's not lazy when its true. Gameworks titles are optimized for Nvidia first and foremost. Please show me any AAA GW title that doesn't run very poor on AMD hardware at launch and that isn't at least 10% faster within a few weeks.

re7, shadow warrior 2, the division, rainbow six siege, the witcher 3, killing floor 2, gta 5, far cry 4?
I think those ran fine on amd, though I'm not 100% sure for all of them.

tamz_msc said:
That is simply not true, go through this - straight from Ryan Smith himself.

Which part is not true? DX11 cannot scale indefinitely? Only submission can be done in multiple threads.

As long as you don't load the primary thread with other stuff, it doesn't.

And you don't bump up against AMDs lower driver throughput. (not that you should lol)

Bacon1 · Apr 4, 2017

2is said:
I said AMD abandoned mantle, which they did

Mantle's core basically had a name change to Vulkan with more support and 3rd party support (Khronos and Nvidia as well). You are trying to paint AMD as abandoning it and wasting their time, when its the complete opposite. They got more support for their project and it was amazing time well spent. Khronos thanked AMD and said it saved them years of development time and allowed them to get "OpenGL Next" out way sooner than they would have been able to.

And again, Mantle itself is still used internally at AMD and in LiquidVR which is used by VR headsets.

Headfoot · Apr 4, 2017

The only thing being "abandoned" by continuing this line of discussion is posters' credibility.

Carfax83 · Apr 4, 2017

Bacon1 said:
Yes you can. AMD does not support driver command lists but does support deferred context lists. I've tested this myself.

It doesn't matter. Deferred contexts must work hand in hand with driver command lists or they're useless.

As this guy found out.

Yeah been a thing for a decade now

Task based parallelism in game engines is actually fairly new. Developers had to learn new and easier ways to parallelize their engines otherwise they would not be able to extract adequate performance from the PS4 and Xbox One which have weak multicore CPUs.

It's not lazy when its true. Gameworks titles are optimized for Nvidia first and foremost. Please show me any AAA GW title that doesn't run very poor on AMD hardware at launch and that isn't at least 10% faster within a few weeks.

Well you set the bar low, so it wasn't very difficult for me to find a AAA GW title that ran well on AMD from the start. And this is with HBAO+ enabled as well.

Carfax83 · Apr 4, 2017

Bacon1 said:
Didn't you just claim the opposite when pointing out how amazingly well Ghost Recon Wildlands scales??

I'm pretty sure that Ghost Recon Wildlands excellent CPU scaling has very little, if anything to do with DX11, and more to do with task based parallelism.

I don't think a single person is claiming otherwise. We all know that Nvidia does more work in drivers instead of hardware. That's what the whole post is about.

Indeed, what is being debated is how NVidia is doing it. It's likely a closely guarded secret and so nobody outside of NVidia knows; not you, not me, nor the guy in that video. To claim otherwise is just foolishness. We can only postulate.

Carfax83 · Apr 4, 2017

tamz_msc said:
That's pretty simple. Many of the effects in the Gameworks SDK are based on compute shaders. GCN can handle compute very well; as long as there aren't other bottlenecks.

It's been shown multiple times that unless tessellation is involved, AMD has a similar performance hit as NVidia when running these Gameworks effects. HardOCP did a fairly good overview of it last year if I recall, using Far Cry 4.

Besides, the last time I checked, most Gameworks effects can be disabled. If you don't like it, turn it off.

Carfax83 · Apr 4, 2017

tamz_msc said:
Let's be more specific.

It talks about offloading vertex processing to a separate thread. Again, this is before unified shaders existed.

Who cares whether it's before unified shaders existed? You're missing the entire point of why I posted that article to begin with. I posted that article to show that NVidia for a LONG time has had an interest in exploiting multicore/multithreaded CPUs to increase their GPU performance.

In that era, vertex processing was just one way of doing it. You have no clue whatsoever how they are doing it in modern times, and neither do I for that matter.

Lastly,

He was talking about thread level parallelism, which was the earliest incarnation of multithreading technology used in games. While it was somewhat effective, it had poor scaling (didn't use more than four threads) and could definitely result in clashes with the driver threads.

Why do you think NVidia has the "threaded optimization" toggle option in the driver control panel? It's a remnant of the days where thread level parallelism was used in game engines, unlike the more efficient task based parallelism that is used today.

Carfax83 · Apr 4, 2017

tamz_msc said:
As long as you don't load the primary thread with other stuff, it doesn't.

The primary thread is always loaded with other stuff, in both DX11 and DX12, so I don't see how this proves your point.

That is simply not true, go through this - straight from Ryan Smith himself.

I think you need to read up on DX11 multithreading. If DX11 multithreading could scale that well, then why wasn't it more prevalent during the DX11 era?

I'll tell you why. Because:

1) Even though its part of the DX11 specification, Microsoft made supporting it optional.

2) AMD never supported driver command lists because they couldn't get it to work in their drivers, and driver command lists are absolutely necessary for DX11 multithreading to work.

2) Even when it does work, it's nowhere near as efficient as a true low level API like DX12 or Vulkan. In fact, scaling to too many threads can easily consume too much CPU power and impact the game's performance.

So back then, dual cores were much more common and quad cores were considered high end rather than the standard. DX11 multithreading likely requires a minimum of a quad core to function with any degree of efficiency. Anything less, and too much CPU ends up being used by the driver, which harms performance.

Bacon1 · Apr 4, 2017

Carfax83 said:
Well you set the bar low, so it wasn't very difficult for me to find a AAA GW title that ran well on AMD from the start. And this is with HBAO+ enabled as well.

Yes, a 290x much slower than a 970 is definitely running well

Carfax83 said:
It doesn't matter. Deferred contexts must work hand in hand with driver command lists or they're useless.

Carfax83 said:
2) AMD never supported driver command lists because they couldn't get it to work in their drivers, and driver command lists are absolutely necessary for DX11 multithreading to work.

Again wrong, I've used them myself and they work without driver command lists (which AMD does not support!)

Carfax83 said:
To claim otherwise is just foolishness. We can only postulate.

Well you seem very hard set in your claims if you are only assuming and don't actually know.

Carfax83 said:
excellent CPU scaling has very little, if anything to do with DX11, and more to do with task based parallelism.

I'm sorry, but that makes absolutely no sense. Task based parallelism isn't some magic thing. It just means structuring your code and game logic so all parts don't conflict and can be run and completed at different times. So you can run your AI separate from weapon collision separate from keyboard input etc. That way you don't bottleneck waiting for one of those to finish. And yes, it has been in game design for a very long time. The difference is being able to make GPU calls not just game engine logic on different threads for DX11 which was further enhanced in DX12/Vulkan.

Nvidia vs AMD's Driver Approach in DX11

Lifer

Diamond Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member