[WCCFtech] AMD and NVIDIA DX12 big picture mode

Carfax83 · Sep 4, 2015

3DVagabond said:
When did AMD do any deluding? It was a benchmark released by a game dev that blew this open. Other than make a tweet about it. I haven't seen AMD do anything.

This isn't just about the AotS benchmark, this is about the concept of asynchronous compute itself, and how it fits into the whole DX12 picture at large.

As I said before, asynchronous compute isn't a major feature of DX12, and NVidia had no obligation to support it in hardware. But is it a useful feature? Absolutely, and I can see it gaining more of a prominent role for future games.

But the usefulness likely varies from one architecture to another. Just because GCN gains a significant performance increase, doesn't mean that NVidia or Intel architectures will..

The whole point of asynchronous compute is to keep the GPU as occupied as possible. And when you look at the benchmarks for Fiji, it's obvious that it has problems with scaling and utilization.

As Ryan said at the end of his Fury review:

Bringing this video card review to a close, well start off with how the R9 Fury compares to its bigger sibling, the R9 Fury X. Although looking at the bare specifications of the two cards would suggest theyd be fairly far apart in performance, this is not what we have found. Between 4K and 1440p the R9 Furys performance deficit is only 7-8%, noticeably less than what wed expect given the number of disabled CUs.

And here only a 20% increase in average performance between a 290x and a Fury X at 1440p, despite a significant 45% increase in shaders and texture units and the benefit of liquid cooling.

AnandThenMan · Sep 4, 2015

Carfax83 said:
As I said before, asynchronous compute isn't a major feature of DX12,

Nonsense.

Erenhardt · Sep 4, 2015

AnandThenMan said:
Nonsense.

It isn't major feature for nv. Actually, it is a useless feature because their architecture can't use it.

For amd, it is the most important feature on the GPU side of all dx12 features, along side with lower driver overhead and multithreading on the CPU side.

Better CPU utilization will let amd cards run at full potential without CPU bottleneck. AC further increases GPU potential by enabling more resources in form of free compute performance.

20%? 30% Maybe even 50% faster! A dream come true for any GPU enthusiast to see their old card perform a generational leap in performance thanks to the new api, without paying a single penny. I would be salty if I missed that train. Choochoo!

I know DX12 will not be ported to older Microsoft OS's.
The only thing I want now from amd is to make mantle and DX12 compatible. Make DX12 games run on mantle on win7 and win8(.1) systems. I don't know if it is possible technically and legally, but it would be amazing. A huge win for amd for each and every one not willing to upgrade their system to new malware from M$

edit. A huge win for amd users

3DVagabond · Sep 4, 2015

Carfax83 said:
This isn't just about the AotS benchmark, this is about the concept of asynchronous compute itself, and how it fits into the whole DX12 picture at large.

As I said before, asynchronous compute isn't a major feature of DX12, and NVidia had no obligation to support it in hardware. But is it a useful feature? Absolutely, and I can see it gaining more of a prominent role for future games.

But the usefulness likely varies from one architecture to another. Just because GCN gains a significant performance increase, doesn't mean that NVidia or Intel architectures will..

The whole point of asynchronous compute is to keep the GPU as occupied as possible. And when you look at the benchmarks for Fiji, it's obvious that it has problems with scaling and utilization.

As Ryan said at the end of his Fury review:

And here only a 20% increase in average performance between a 290x and a Fury X at 1440p, despite a significant 45% increase in shaders and texture units and the benefit of liquid cooling.

Seems to me that the reason nVidia doesn't use it is their arch is designed for DX11 and not DX12. For AMD it's the opposite. This is what made GCN the best choice for the consoles and why they brought a similar API to the PC.

Silverforce11 · Sep 4, 2015

Carfax83 said:
The whole point of asynchronous compute is to keep the GPU as occupied as possible.

Nope, that's a misunderstanding. Keeping the shaders occupied is the point of multi-thread rendering. AMD suffers lower efficiency in DX11 due to single-thread rendering.

The fix for that in DX12 is one of the basic feature: multi-thread rendering & lower API overhead.

The point of async compute is to take compute which were running in serial mode (thereby slowing down graphics rendering) and make it parallel, leading to a major performance gain.

If a game uses compute, it stand to gain a performance leap on GPUs that has hardware support for async compute.

3DVagabond · Sep 4, 2015

Silverforce11 said:
Nope, that's a misunderstanding. Keeping the shaders occupied is the point of multi-thread rendering.

The point of async compute is to take compute which were running in serial mode (thereby slowing down graphics rendering) and make it parallel, leading to a major performance gain.

If a game uses compute, it stand to gain a performance leap on GPUs that has hardware support for async compute.

You mean it's not just some worthless feature that AMD added because they had to? You know, like Fury required liquid cooling. Which of course has since been shown to have been a complete fabrication. Just like this rhetoric will be too.

Carfax83 · Sep 4, 2015

AnandThenMan said:
Nonsense.

LOL, you show me a image of DX12 key features of which NONE is asynchronous compute as proof?

Asynchronous compute is a feature of DX12, but as your own image suggests, it's not a major feature and support is not required to adhere to DX12_0 or DX12_1 feature level..

Neither NVidia or Intel support asynchronous compute in hardware, despite having advanced support for many other features such as conservative rasterization or ROVs..

Gikaseixas · Sep 4, 2015

My assumption:

AMD worked with both MSFT and Sony to develop close to metal API that saw decent gains compared to DX11. MSFT then started tweaking it and based on this they created DX12. Since both Xbox and PS4 have similar GCN gpus, it is only natural that DX12 integrates better with GCN than anything Intel or Nvidia have designed. That doesn't mean that they wont get better, they will but it will require them to launch new archs.

Hence why Nvidia and Intel are quiet about it

...AMD's masterful plan is unveiled

Mantle purpose was to convince partners that such API's had a future in x86 applications

Carfax83 · Sep 4, 2015

3DVagabond said:
Seems to me that the reason nVidia doesn't use it is their arch is designed for DX11 and not DX12. For AMD it's the opposite. This is what made GCN the best choice for the consoles and why they brought a similar API to the PC.

So are you also going to accuse Intel of designing Skylake's IGP for DX11 and not DX12, despite the fact that Intel's Skylake IGP has the most stringent adherence to the DX12 spec of any current GPU?

That makes a lot of sense...

AnandThenMan · Sep 4, 2015

Carfax83 said:
LOL, you show me a image of DX12 key features of which NONE is asynchronous compute as proof?

Do you have trouble reading?

but as your own image suggests,

That is not my image it is taken from Nvidia's site.

Carfax83 said:
So are you also going to accuse Intel of designing Skylake's IGP for DX11 and not DX12, despite the fact that Intel's Skylake IGP has the most stringent adherence to the DX12 spec of any current GPU?

That makes a lot of sense...

You might want to learn some basic info like the dates of when said hardware was available.

Carfax83 · Sep 4, 2015

Silverforce11 said:
The point of async compute is to take compute which were running in serial mode (thereby slowing down graphics rendering) and make it parallel, leading to a major performance gain.

OK I concede your point. But the performance gain is really dependent on the architecture of the GPU, and whether it has dedicated hardware resources or not..

As I've been saying, GCN obviously has a lot to gain from using asynchronous compute. But whether Maxwell can, is another matter entirely.

Personally I think Maxwell does support asynchronous compute, otherwise the beyond3d test wouldn't show the 31 queue dispatch:

AnandThenMan · Sep 4, 2015

Supported is ≠ to benefiting. Just about anything can be "supported" in software (read emulated).

Carfax83 · Sep 4, 2015

3DVagabond said:
You mean it's not just some worthless feature that AMD added because they had to? You know, like Fury required liquid cooling. Which of course has since been shown to have been a complete fabrication. Just like this rhetoric will be too.

Nobody is saying that asynchronous compute is worthless. It obviously has a major impact for GCN.

What I'm saying, is that the performance gain for other Maxwell will not be near as high as it is for GCN, because:

1) Maxwell lacks dedicated hardware resources.

2) Maxwell's pipeline is much more efficient than what is found in GCN.

Carfax83 · Sep 4, 2015

AnandThenMan said:
Do you have trouble reading?

I read just fine. But apparently you didn't even look at the image you posted. Let me break it down for you since you are clearly having difficulty comprehending.

The image you posted shows Key DX12 features as:

1) Volume Tiled Resource

2) Conservative Raster

3) Raster Order Views

4) Tiled Resource

5) Typed UAV Access

6) Bindless Resources

Then under D3D 12 API it lists:

1) Low Overhead

2) More Control

3) Async Compute

Do you not understand the distinction there? The D3D12 features are just general low level API attributes. Async Compute is not listed as a key DX12 feature, unlike the others.

AnandThenMan · Sep 4, 2015

Carfax83 said:
Nobody is saying that asynchronous compute is worthless. It obviously has a major impact for GCN.

Why do you think this is? Because Mantle and DX12 are essentially the same? Or did AMD somehow predict what Microsoft would do with DX12 and Nvidia didn't.

2) Maxwell's pipeline is much more efficient than what is found in GCN.

Or more utilized under DX11.

Silverforce11 · Sep 4, 2015

Carfax83 said:
2) Maxwell's pipeline is much more efficient than what is found in GCN.

True, only for DX11.

For consoles running close to metal APIs, not true.

Which is where we will be soon with DX12/Vulkan games.

Also, do you remember the ACE thread you started? Back then apparently Async Compute was important.

Now that NV GPUs are found lacking in that feature, you are starting the "Async Compute" is just a minor feature, optional, not useful train... seriously, what is wrong with your appreciation for a very useful feature? Why the sudden flip flop? It's concerning.

Let me break it down.

Async Compute is VERY useful for console developers. They absolutely require it for AAA titles to extract peak performance from the hardware available to them. Removing compute from blocking graphics traffic is important, the console devs are hyping that feature.

FL12.1 could also be VERY useful. I don't deny that at all.

So it comes down to whether games use lots of compute or lots of FL12.1 features.

That's the bottom line. To say Async Compute is not useful is delusional.

Carfax83 · Sep 4, 2015

Silverforce11 said:
True, only for DX11.

There are only a few DX12 apps so far, and none of them are conclusive so we shall see. Though I could argue that an API should not make or break a GPU if it's well designed.

Which is where we will be soon with DX12/Vulkan games.

Consoles AFAIK still run much closer to hardware than DX12 or Vulkan..

Now that NV GPUs are found lacking in that feature, you are starting the "Async Compute" is just a minor feature, optional, not useful train... seriously, what is wrong with your appreciation for a very useful feature? Why the sudden flip flop? It's concerning.

There is no flip flop. It's just an evolution of thought, after finding out that asynchronous compute isn't even a major feature of DX12, and that AMD has very obvious and biased reasons to support it..

That's the bottom line. To say Async Compute is not useful is delusional.

Nowhere have I ever said that async compute is not useful, and I challenge you to quote me. :colbert:

I have said repeatedly that asynchronous compute is useful in fact. The only caveat being that its usefulness has a strong correlation with the architecture of a GPU..

AnandThenMan · Sep 4, 2015

I remember that thread:

Carfax83 said:
I've been researching ACEs a lot lately, as I believe they'll play a massive role in reshaping PC gaming for the better; nearly as much as DX12's higher draw call ceiling.

I agree with this for sure.

Dribble · Sep 4, 2015

NTMBK said:
With Bulldozer, AMD had no way to push the software market towards multithreading. They have put GCN and its ACEs in both consoles, meaning all multiplatform games will be developed for that architecture. It's a totally different situation.

lol no it's not. They put 8 cores in each console just like bulldozer has, if all multi-platform games are developed for consoles then why don't they all require 8 cores? I distinctly remember a lot of fanboy talk saying how bulldozer would shine just because of that after the consoles came out, and it came to nothing.

3DVagabond · Sep 4, 2015

Carfax83 said:
Nobody is saying that asynchronous compute is worthless. It obviously has a major impact for GCN.

What I'm saying, is that the performance gain for other Maxwell will not be near as high as it is for GCN, because:

1) Maxwell lacks dedicated hardware resources.

2) Maxwell's pipeline is much more efficient than what is found in GCN.

We know about the lack of sport. How do you mean Maxwell's pipeline is more efficient?

stuff_me_good · Sep 4, 2015

Carfax83 said:
I don't usually use WCCFtech as a source, but this article of theirs is actually quite good, and helps to put things back into perspective regarding DX12.

blah blah blah...

Normally you wouldn't use WCCFtech as your source, but now that it reads just like you like it... well then it conveniently is so important rant about. Give me a break... :whiste:

Silverforce11 · Sep 4, 2015

Carfax83 said:
There is no flip flop. It's just an evolution of thought, after finding out that asynchronous compute isn't even a major feature of DX12, and that AMD has very obvious and biased reasons to support it..

This is your quote: "AMD is making asynchronous compute out to be a major feature of DX12, but in reality it isn't."

My English interpretation of "not a major feature" = therefore a minor, unimportant feature. Or did you have another intention to that statement?

Also, define major. It's not a major feature because you say so?

It's the core of the DX12 API alongside multi-thread rendering, lower API overhead AND multi-adapter.

Everything else is optional, hence the Feature Levels and Tiers. Get it?

LTC8K6 · Sep 4, 2015

By the time we need these features, if we ever do, we will all be on to the next gen cards.

All this fuss is probably only about the longevity of older cards.

Neither AMD or NV are likely too interested in having people hang on to older cards.

Carfax83 · Sep 4, 2015

3DVagabond said:
How do you mean Maxwell's pipeline is more efficient?

What I really meant to say was, that Maxwell's architecture is just more efficient. I should not have used the word pipeline.. Maxwell GPUs have less transistors than comparable GCN GPUs, but still manages to outperform them at lower TDPs.

Maxwell architecture also scales better than GCN 1.2.

Carfax83 · Sep 4, 2015

Silverforce11 said:
My English interpretation of "not a major feature" = therefore a minor, unimportant feature. Or did you have another intention to that statement?

A few simple questions. Did Microsoft require hardware support for asynchronous compute to meet DX12 specification?

Answer is no.

Is Asynchronous compute mentioned anywhere in the DX12 feature levels?

Answer is no.

It's the core of the DX12 API alongside multi-thread rendering, lower API overhead AND multi-adapter.

Right, asynchronous compute is a function of the GPU that already exists, and is just exposed by the API..

Andrew Lauritzen recently did a good post on this:

Absolutely, and that's another point that people miss here. GPUs are *heavily* pipe-lined and already run many things at the same time. Every GPU I know of for quite a while can run many simultaneous and unique compute kernels at once. You do not need async compute "queues" to expose that - pipelining + appropriate barrier APIs already do that just fine and without adding heavy weight synchronization primitives that multiple queues typically require. Most DX11 drivers already make use of parallel hardware engines under the hood since they need to track dependencies anyways... in fact it would be sort of surprising if AMD was not taking advantage of "async compute" in DX11 as it is certainly quite possible with the API and extensions that they have.

Yes, the scheduling is non-trivial and not really something an application can do well either, but GCN tends to leave a lot of units idle from what I can tell, and thus it needs this sort of mechanism the most. I fully expect applications to tweak themselves for GCN/consoles and then basically have that all undone by the next architectures from each IHV that have different characteristics. If GCN wasn't in the consoles I wouldn't really expect ISVs to care about this very much. Suffice it to say I'm not convinced that it's a magical panacea of portable performance that has just been hiding and waiting for DX12 to expose it.

Source

[WCCFtech] AMD and NVIDIA DX12 big picture mode

Diamond Member

Diamond Member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Platinum Member

Lifer

Senior member

Lifer

Lifer

Diamond Member

Diamond Member