Someone explain the AMD Nvidia DX12 difference?

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

24601

Golden Member
Jun 10, 2007
1,683
39
86
What you and AMD are incorrectly labeling "Async Compute" is not Async Compute. What they mean when they say that is "Multiple Command Processor Issuing."

Due to a flaw in GCN's design, the command processor can only issue commands to 1024 (shaders) at a time, as it was designed explicitly for Tahiti (which has 2048 "shaders")

GCN shaders run instructions from the command processor for 4 cycles. This is how a single command processor can deal with greater than 1024 shaders.

"Multiple Command Processor Issuing," or what AMD incorrectly calls "Asynchronous Computing" is the implementation of the ability to use more than 1 command processor simultaneously.

The reason why "Multiple Command Processor Issuing" is so beneficial for AMD cards is that you have to remember that each command processor can only issue commands to 1024 shaders per clock.

Of course "Multiple Command Processor Issuing" has overhead associated with it, as all unnecessary multi-threading does. The benefits simply usually outweigh the costs when used in GCN due to the abysmal command processor.

The reason you don't see any benefit to "Multiple Command Processor Issuing" for Nvidia cards is that Nvidia cards have command processors sufficient to issue commands to all their "shaders" all by themselves, as they do not have the same design flaw as the GCN command processors.

Thus enabling "Multiple Command Processor Issuing" simply adds overhead with zero possible benefit.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
Like comparing a 4.7GHz 9590 vs. a 3GHz 5960X? How does that comparison work out?

Depends on what the workload, but for almost everything, a Haswell core (1 core/2 thread) has higher IPC than a Piledriver module (1 module/2 thread). From my tests with CPU rendering on Blender the 9590 should be no greater than 1103860 samples/sec and the 5960X should be no greater than 2257824 samples/sec. So 5960X is up to about 2x than a 9590 in Blender.
 

dogen1

Senior member
Oct 14, 2014
739
40
91
We'll see. As of now, Nvidia is not performing very well in async games.

Do you mean it doesn't perform where it should compared to maxwell cards in these games? Or do you just mean AMD cards perform relatively higher in these games than normal?


edit - After checking hitman and Ashes benchmarks for the 1080 I'm not sure what you're talking about. The 1080 performs very well in those games and scales extremely well over the 980 ti. 30-40% in both cases.
 
Last edited:

Cookie Monster

Diamond Member
May 7, 2005
5,161
32
86
I've been thinking about this for some time. Its just a bunch of ideas and theories tangled up so feel free to rip them up or continue on!

Anyway the whole thing starts off with the question, does low level API makes sense on a broader scale? Is leaving all the middle-man tasks (in laymens language) to the developers the way to go?

I could see many benefits of a low level API where the primary benefit would be giving developers more control over the hardware. Means more performance can be extracted etc.

However we know that GPU architectures are always evolving. How does the developer be expected to extract performance from GPUs from different IHVs that have different architectures? Another example could be that there are differences within the same GPU family where they might require different optimizations due to different setup ratios for example.

The developers are now burdened with not just the game but all the compatibility and optimizations of many different video cards out there. Perhaps this is where the IHV can help but why do this in the first place when the previous model was working out ok?

On top of this, GPU architectures are now going to be limited in terms of diversity because of the said API. Right now it prefers GCN, because well GCN is used in consoles and most studios are developing console games with PC ports planned.

For consoles, it makes perfect sense given the fact that its the same hardware. However for the PC world, is it the right choice? So far DX12 has shown numerous problems from compatibility issues and even performance regression in some games. Supposedly an improvement from DX11 yet most DX12 titles to this date is plagued with some kind of issues. Another interesting observation is that there seems to be a pattern where Hawaii based video cards perform better than their Fiji/Tonga counterparts in some modern titles. Did the developers work on Hawaii based workstations?

May sound like fear-mongering but the API feels rushed and not well thought out atleast on a higher level. In DX11, you had one IHV with minimal CPU overhead and one IHV with very high CPU overhead. The former IHVs architecture could be considered to be more forward thinking in terms of power efficiency. The latter isn't however has more performance at its disposal if it can tapped (this could apply to any other GPUs regardless of IHVs but % increase may vary). The thing is that one could blame the API for their shortfalls but also could blame the "middleman" behind the GPU.

Now DX12 API comes out which favors one architecture over another alienating the former IHV while getting rid of several of its shortfalls like CPU overhead. With the introduction of DX12, SLi/CFX maybe heading for the ways of the dodo (highly dependent on the developer). Could also be the reason why we see nVIDIA dropping support for 3/4 way along with no SLi connectors being found on the 1060.

The real question here is, changing something like the graphics API on windows has far more reaching complications and consequences than a GPU architecture does (or a massive driver overhaul e.g. ATi's massive OpenGL overhaul). Its seems pretty clear that with limited R&D funding (and financial troubles), AMD has decided that spending resources were better spent not on next generation GPU architectures but on mantle (with the real target being next gen APIs) so that their architecture may last a lot longer was.

They managed to convince the industry to change its API which suits their architecture and I think it was possible because a) they have all the console business b) MS is part of the console business c) most games are developed on consoles these days (multi platform) and d) MS could infact end up buying AMD (for future console projects etc instead of having to deal with intel and other financially strong companies). Not just that but i have to think that margins on the GPUs being supplied must be very low (citation/sources required - tinfoil hat).

So when one looks at the GCN architecture from the day of its inception not much has changed to say that its a new architecture. Just look at the power efficiencies minus the node jump seen on Polaris. Well dramatic ones anyway where as we've seen clear changes with Kepler -> Maxwell and now Pascal.

Now this API atleast on a performance level is good for GCN. But has it been developed for ALL types of video cards regardless of GPU architectures? The PC world does not revolve around one hardware setup. That is the beauty of the PC. Yet this API seems to go against that idea mainly due to consoles. I guess if that's what the industry wants the PC world to head then so be it but doesn't exactly seem ideal. The API doesn't find a good middle ground for a lack of a better term.. but rather very political and profits driven which again doesn't surprise me.

Then again everything could all work out with a happy ending so this may mean nothing but old man yelling at clouds.

Just some tinned food for thought.
 
Last edited:

24601

Golden Member
Jun 10, 2007
1,683
39
86
I've been thinking about this for some time. Its just a bunch of ideas and theories tangled up so feel free to rip them up or continue on!

Anyway the whole thing starts off with the question, does low level API makes sense on a broader scale? Is leaving all the middle-man tasks (in laymens language) to the developers the way to go?

I could see many benefits of a low level API where the primary benefit would be giving developers more control over the hardware. Means more performance can be extracted etc.

However we know that GPU architectures are always evolving. How does the developer be expected to extract performance from GPUs from different IHVs that have different architectures? Another example could be that there are differences within the same GPU family where they might require different optimizations due to different setup ratios for example.

The developers are now burdened with not just the game but all the compatibility and optimizations of many different video cards out there. Perhaps this is where the IHV can help but why do this in the first place when the previous model was working out ok?

On top of this, GPU architectures are now going to be limited in terms of diversity because of the said API. Right now it prefers GCN, because well GCN is used in consoles and most studios are developing console games with PC ports planned.

For consoles, it makes perfect sense given the fact that its the same hardware. However for the PC world, is it the right choice? So far DX12 has shown numerous problems from compatibility issues and even performance regression in some games. Supposedly an improvement from DX11 yet most DX12 titles to this date is plagued with some kind of issues. Another interesting observation is that there seems to be a pattern where Hawaii based video cards perform better than their Fiji/Tonga counterparts in some modern titles. Did the developers work on Hawaii based workstations?

May sound like fear-mongering but the API feels rushed and not well thought out atleast on a higher level. In DX11, you had one IHV with minimal CPU overhead and one IHV with very high CPU overhead. The former IHVs architecture could be considered to be more forward thinking in terms of power efficiency. The latter isn't however has more performance at its disposal if it can tapped (this could apply to any other GPUs regardless of IHVs but % increase may vary). The thing is that one could blame the API for their shortfalls but also could blame the "middleman" behind the GPU.

Now DX12 API comes out which favors one architecture over another alienating the former IHV while getting rid of several of its shortfalls like CPU overhead. With the introduction of DX12, SLi/CFX maybe heading for the ways of the dodo (highly dependent on the developer). Could also be the reason why we see nVIDIA dropping support for 3/4 way along with no SLi connectors being found on the 1060.

The real question here is, changing something like the graphics API on windows has far more reaching complications and consequences than a GPU architecture does (or a massive driver overhaul e.g. ATi's massive OpenGL overhaul). Its seems pretty clear that with limited R&D funding (and financial troubles), AMD has decided that spending resources were better spent not on next generation GPU architectures but on mantle (with the real target being next gen APIs) so that their architecture may last a lot longer was.

They managed to convince the industry to change its API which suits their architecture and I think it was possible because a) they have all the console business b) MS is part of the console business c) most games are developed on consoles these days (multi platform) and d) MS could infact end up buying AMD (for future console projects etc instead of having to deal with intel and other financially strong companies). Not just that but i have to think that margins on the GPUs being supplied must be very low (citation/sources required - tinfoil hat).

So when one looks at the GCN architecture from the day of its inception not much has changed to say that its a new architecture. Just look at the power efficiencies minus the node jump seen on Polaris. Well dramatic ones anyway where as we've seen clear changes with Kepler -> Maxwell and now Pascal.

Now this API atleast on a performance level is good for GCN. But has it been developed for ALL types of video cards regardless of GPU architectures? The PC world does not revolve around one hardware setup. That is the beauty of the PC. Yet this API seems to go against that idea mainly due to consoles. I guess if that's what the industry wants the PC world to head then so be it but doesn't exactly seem ideal. The API doesn't find a good middle ground for a lack of a better term.. but rather very political and profits driven which again doesn't surprise me.

Then again everything could all work out with a happy ending so this may mean nothing but old man yelling at clouds.

Just some tinned food for thought.

AMD has been determined to take GCN to their grave. They couldn't care less if they take down the entire ecosystem with them.
 

boozzer

Golden Member
Jan 12, 2012
1,549
18
81
Depends on what the workload, but for almost everything, a Haswell core (1 core/2 thread) has higher IPC than a Piledriver module (1 module/2 thread). From my tests with CPU rendering on Blender the 9590 should be no greater than 1103860 samples/sec and the 5960X should be no greater than 2257824 samples/sec. So 5960X is up to about 2x than a 9590 in Blender.
that is cool :thumbsup: damn, there is no way for you to test the gpus in question? damn. would be very cool to know.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
that is cool :thumbsup: damn, there is no way for you to test the gpus in question? damn. would be very cool to know.

I personally don't have the programming skills or hardware resources for that. However, I did find HPL-GPU. I'd like to see results for these myself.
 

poofyhairguy

Lifer
Nov 20, 2005
14,612
318
126
On top of this, GPU architectures are now going to be limited in terms of diversity because of the said API.

I would argue GPU architectures are now limited in terms of economics. With a smaller emphasis on the PC market overall, and the fact that the emphasis is heading towards GPU compute in this space in general, means that the PC gaming market isn't nearly as dynamic or competitive as it's been in history.

When you think about it, there is less driving the PC gaming GPU market than ever before. GPU design goes where the money goes, and for Nvidia it's clear that money is more in HPC and self driving cars and the like. That is now their targeted goal. Just like Intel would rather make those buying servers happy than gamers- there is more money in that market.

Nvidia does advance GPU technology for gamers, but it isn't pushing universal advantages that move the whole market forward like tessellation. Instead we got piles of Gameworks console ports that ripped out the console special sauce (like Async) and instead bolting on Nvidia-specific special sauce so non-key parts of the gaming experience can be tweaked somewhat. The last Nvidia technology that pushed the boundaries for ALL gaming forward (and not just Gameworks eye candy on top) was Physx.

So who is really pushing GPUs forward FOR GAMING for everyone? At this point I would argue the many of the recent advances are only due to CONSOLE R&D, which is executed on the PC side via GCN. The reason Hawaii had all that extra power built in that was never used in Directx11 (like the ACEs) was because Sony/MS already paid the R&D for that technology so it was "free" for AMD to put in their PC GPUS.

If PCs don't begin to follow down the same path as the consoles (via Directx 12) then a major force for all GPU advancement (console R&D) is wasted. If that is wasted then the only thing that is pushing gaming forward is Nvidia via a Gameworks that again is basically only window dressing. To FUNDAMENTALLY improve PC gaming the consoles must be leveraged, which requires doing things differently.
 

seitur

Senior member
Jul 12, 2013
383
1
81
Big architectural changes and low-lvl APIs like DX12 and Vulkan does not compute.

For low-level APIs to take off you need stable architectures and smaller amount of diffrent chips because of amount of work needed to be done by game developers. More stable API & less diffrent chips = more economically viable low-lvl API.

Now low-lvl APIs are needed for AAA games (and some specific cases for AA gaming) beause CPUs have stagnated and are unable to 'fuel' stronger and stronger GPUs in high-lvl API like DX11.


That is why propably both PS4 Neo and Xbox Scorpio will have same GPU chip. PS4 Neo will be either bit cut down or underclocked while Scorpio will be higher clocked, but both of them will have an variant of chip used in RX 470/480.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,120
260
136
Consoles were the first to push fully bindless with multiple queues exposed in games and Microsoft had to cave in pressure from their partnered IHV (AMD) along with other developers heavily petitioning the new API to more resemble to consoles ...

Hardware dictates engine level and API level optimizations so the hardware that's the most ubiquitous will try to capitalize on it's current features which happens to be AMD GPUs for no coincidence ...

If AMD is in a tight rope then their waging their future on this chess game of who's going to dominate the API and they must do so once more to ensure their hardware leadership for AAA development come fall at next year once the new Xbox releases with the 5th or maybe 6th (?) gen GCN architecture ...
 

NTMBK

Lifer
Nov 14, 2011
10,241
5,027
136
The thing to remember is that often developers are not writing against the DirectX API- they are writing against the Unity API, or the Unreal API.
 

Dribble

Platinum Member
Aug 9, 2005
2,076
611
136

If you go further back you could argue there has already been several stages, so originally we had a driver that did optimisation per gpu but was the same driver for all games (i.e. it would act slightly differently depending on whether you had a geforce 256 or a geforce 2 but didn't care if you were running quake or ut99).

Then we went to the driver giving per game optimisation. So now we've just gone up an order of magnitude of complexity. Not only does it change the driver depending on whether you are running a geforce 3 or 4, but it also changes it depending on whether you are running hl2 or doom3. This introduced a lot of bugs due to the massive increase in complexity but say increased performance by 20%.

Now with DX12 we are also expecting devs of each game to do some of the optimisation, so each dev must optimise their game for each gpu - even if they are the same class (e.g. gcn) the bottlenecks depend on the particular gpu (.e.g. robs, mem bandwidth, compute, tesselation, etc), the particular resolution and settings chosen. Obviously console was much easier - you had one gpu, one resolution and one set of settings. Now for PC's this I would argue is another order of magnitude of complexity that will introduce an order of magnitude more bugs and issues for maybe another 20% performance.

Is it worth it? Only if they can't get the games to work reliably - anyone playing the game would prefer a stable game and slightly lower settings. I would particularly worry about how games and gpu's age, it's one thing to get a brand new game to work with the current popular gpu's that it gets reviewed against but when that games no longer brand new are they going to bother fixing it (gpu companies had a stronger motive as they want to sell new gpu's that will need to run that older game, but games companies less so as once the game has reached bargain basement there's no money in it)? If you have an older less popular gpu will the game dev bother making it work properly on it (number of sales of game lost not supporting that old gpu != cost in providing that support)?
 
Last edited:

Pottuvoi

Senior member
Apr 16, 2012
416
2
81
What?!?!

Emm... Primative Discard works hand in hand with Conservative Rasterization.

It's a bit much to type, but to start at the beginning here is a basic introduction video to Conservative Rasterization and DX12 directly from Microsoft Graphics Education.

https://m.youtube.com/watch?v=zL0oSY_YmDY
Pretty sure he talked about occlusion culling which is not what the primitive discard is for.

Primitive discard is a culling method for polygons which do not intersect any rasterization sample points in the image. (pixel centers, MSAA samples.)

IE.
Polygon is 100 pixels wide in X dimension, but 0.1 in Y in it's widest point.
There is huge amount of screen area where it will not create a single pixel in its output.
Even if in minimum it will be go through 100 pixels in ROPs to see if something is rendered and activate 50 2x2 quads for shading.

Funnily if same polygon is put trough conservative rasterization it will always create 100 pixels for output in minimum. (Every pixel it goes trough.)

So if you use Conservative Rasterization there is good change drivers will force fast bypass of primitive discard as it cannot cull polygons which must render a pixel if it resides within screen. (Or at be tested by Z-buffer or other culling methods.)
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
What you and AMD are incorrectly labeling "Async Compute" is not Async Compute. What they mean when they say that is "Multiple Command Processor Issuing."

Due to a flaw in GCN's design, the command processor can only issue commands to 1024 (shaders) at a time, as it was designed explicitly for Tahiti (which has 2048 "shaders")

GCN shaders run instructions from the command processor for 4 cycles. This is how a single command processor can deal with greater than 1024 shaders.

"Multiple Command Processor Issuing," or what AMD incorrectly calls "Asynchronous Computing" is the implementation of the ability to use more than 1 command processor simultaneously.

The reason why "Multiple Command Processor Issuing" is so beneficial for AMD cards is that you have to remember that each command processor can only issue commands to 1024 shaders per clock.

Of course "Multiple Command Processor Issuing" has overhead associated with it, as all unnecessary multi-threading does. The benefits simply usually outweigh the costs when used in GCN due to the abysmal command processor.

The reason you don't see any benefit to "Multiple Command Processor Issuing" for Nvidia cards is that Nvidia cards have command processors sufficient to issue commands to all their "shaders" all by themselves, as they do not have the same design flaw as the GCN command processors.

Thus enabling "Multiple Command Processor Issuing" simply adds overhead with zero possible benefit.

Source? Have not seen this info before if true
 

Headfoot

Diamond Member
Feb 28, 2008
4,444
641
126
The thing to remember is that often developers are not writing against the DirectX API- they are writing against the Unity API, or the Unreal API.

A very good point. The engine vendor is becoming the de-facto high level replacement. This is actually a much better situation than DX11-era. First, for the people who need high level (small studios, hobby programmers, people without very specific engine needs etc.) these engines are significantly more helpful and high level than dx11 was. The level of abstraction is significantly higher. At the same time, now a low level alternative is available for the people with the time and skill to use it (the engine developers themselves).

Combine this with the fact that these engines have never been more inexpensive and more competitive, and its a great time to be a game developer and a gamer.
 
Last edited:

boozzer

Golden Member
Jan 12, 2012
1,549
18
81
This isn't actually true is it? Will now need gpu from each side just to play all dx 12 titles? If so fml.
don't worry, it is just hyperbole from him. if you check amd sponsored games, they actually run really well, even on nv gpus. while nv sponsored runs like crap, for even nv gpus sometimes.

what does that tell you?

and if the future of pc gaming also devolve into exclusives like consoles, that is a gigantic if by the way, 99.9999999% chance of not coming true, you can always just buy a console for gaming. give nv and amd the finger. well mostly nv :D