[Microsoft] DirectX 12: A MiniEngine Update

Silverforce11 · Apr 3, 2016

Ed1 said:
that is all true for AMD, which is who talking there.

You mean NV GPUs don't have ROPs or DMA engines?

They are idling while graphics is being rendered by the Cuda Cores. If they have AC support, those units can actually do work. -_-

ThatBuzzkiller · Apr 3, 2016

Silverforce11 said:
You mean NV GPUs don't have ROPs or DMA engines?

They are idling while graphics is being rendered by the Cuda Cores. If they have AC support, those units can actually do work. -_-

While it's true that there's potential for Nvidia GPUs to idle in graphics/fixed function heavy tasks like shadow map generation, rasterizing a G-buffer or a depth prepass I do not believe that to be the case the majority of the time in a lot of modern games ...

There's many ways to leverage parallelism without having async compute like games buffering up several frames and pipelining to keep the GPU fed ...

Nvidia also over engineers their geometry processors and rasterizers to keep idling to a minimum too and it looks like that idea is working for them if you take a look at the market share ...

ThatBuzzkiller · Apr 3, 2016

tential said:
Nvidia supports Async and continues to say their GPUs do so.
How is it biased for MS to have Async when BOTH gpu vendors support it.

Microsoft knows more than you give them credit for ...

Silverforce11 · Apr 3, 2016

ThatBuzzkiller said:
While it's true that there's potential for Nvidia GPUs to idle in graphics/fixed function heavy tasks like shadow map generation, rasterizing a G-buffer or a depth prepass I do not believe that to be the case the majority of the time in a lot of modern games ...

Source or just your beliefs?

There's many ways to leverage parallelism without having async compute like games buffering up several frames and pipelining to keep the GPU fed ...

And if it supports AC, it can buffer those frames FASTER by running other subunits in parallel... -_- It's not a mutually exclusive feature to pre-render frame buffers.

Nvidia also over engineers their geometry processors and rasterizers to keep idling to a minimum too and it looks like that idea is working for them if you take a look at the market share ...

Marketshare talk can go on another thread because I don't want to respond to that to show you how wrong you are to link marketshare with hardware that has inferior performance in the same price bracket, all through 2014 and 2015 and now. In the future, keep marketshare talk to a separate topic because it has no relevance in a DX12 example "best usage scenario" engine demo from MS.

Ma_Deuce · Apr 3, 2016

ThatBuzzkiller said:
Nvidia also over engineers their geometry processors and rasterizers to keep idling to a minimum too and it looks like that idea is working for them if you take a look at the market share ...

Is there a graph somewhere tying those statistics together?

Deders · Apr 3, 2016

Nvidia should remember to follow Microsoft's spec after the whole 32/24bit shader debacle that plagued the FX series.

ThatBuzzkiller · Apr 3, 2016

Silverforce11 said:
Source or just your beliefs?

And if it supports AC, it can buffer those frames FASTER by running other subunits in parallel... -_- It's not a mutually exclusive feature to pre-render frame buffers.

Marketshare talk can go on another thread because I don't want to respond to that to show you how wrong you are to link marketshare with hardware that has inferior performance in the same price bracket, all through 2014 and 2015 and now. In the future, keep marketshare talk to a separate topic because it has no relevance in a DX12 example "best usage scenario" engine demo from MS.

It's really common knowledge among developers that Nvidia GPUs have better frontends and geometry processing. If we take a look at synthetic benchmarks that measure the triangle output Nvidia comes out very far ahead compared to AMD ...

Async compute doesn't buffer those frames faster. Async compute is meant is meant to make it possible to run the seperate compute queue in parallel with the graphics queue ...

Oh but I do need to bring up the marketshare because you seem to have tons of misunderstanding or a severe lack of perspective. It's certainly not the case that GM200 has inferior performance compared to Fiji when we take a look at the majority of the games out today. You need to understand that AMD places too much faith that things will go in their way ...

Silverforce11 · Apr 3, 2016

Top-end GPU (your fixation on 980Ti vs Fury X) has almost no relevance for marketshare buddy.

And while Maxwell has a better front end, it cannot run in parallel in DX11 or DX12 without multi-engine. If it's doing graphics, it's not doing compute, so tasks that can be done on ROPs or Geometry processors are idling while graphics is being worked on.

Async is not just about better utilization of shaders or front-end. It's about utilization of the entire GPU, in parallel, concurrently. Even if NV has better SP or front-end, they will still get a performance speed up if they support AC. -_-

To say they won't benefit is silly.

Watch as NV adds basic multi-engine designs in their next-gen GPUs.

Remember tessellation? When NV GPU lack it, they said it wasn't important. When they had it, it was the best thing to slice white bread!

ThatBuzzkiller · Apr 3, 2016

Silverforce11 said:
Top-end GPU (your fixation on 980Ti vs Fury X) has almost no relevance for marketshare buddy.

And while Maxwell has a better front end, it cannot run in parallel in DX11 or DX12 without multi-engine. If it's doing graphics, it's not doing compute, so tasks that can be done on ROPs or Geometry processors are idling while graphics is being worked on.

Async is not just about better utilization of shaders or front-end. It's about utilization of the entire GPU, in parallel, concurrently. Even if NV has better SP or front-end, they will still get a performance speed up if they support AC. -_-

To say they won't benefit is silly.

Watch as NV adds basic multi-engine designs in their next-gen GPUs.

Remember tessellation? When NV GPU lack it, they said it wasn't important. When they had it, it was the best thing to slice white bread!

Even in the midrange segment Nvidia is still competitive, it's only recently that AMD's equivalent caught up ...

If it's doing graphics, it's not doing compute ?! It's clear at this point that you don't have any idea of what your talking about anymore. I have no idea why you or others go into details if you don't know what you're talking about. Question, are you even a developer ?! If not then any posts about GPU microarchitectures from this point onwards from you is NOT credible!

Async compute is just better utilization of the shaders. AMD is literally advertising the idea of FREE compute shaders!

How do you know that Nvidia would benefit ? Have you done any logic simulation on the RTL design or got any other similar evidence that suggests so ? Do you speak out of pure ignorance ?! Well which one is it ?

Silverforce11 · Apr 4, 2016

ThatBuzzkiller said:
Even in the midrange segment Nvidia is still competitive, it's only recently that AMD's equivalent caught up ...

If it's doing graphics, it's not doing compute ?! It's clear at this point that you don't have any idea of what your talking about anymore. I have no idea why you or others go into details if you don't know what you're talking about. Question, are you even a developer ?! If not then any posts about GPU microarchitectures from this point onwards from you is NOT credible!

Async compute is just better utilization of the shaders. AMD is literally advertising the idea of FREE compute shaders!

How do you know that Nvidia would benefit ? Have you done any logic simulation on the RTL design or got any other similar evidence that suggests so ? Do you speak out of pure ignorance ?! Well which one is it ?

AMD's mid-range is still Hawaii. 390/X existed as non-reference 290/X since 2013. If AMD has just caught up, it isn't to do with actual hardware because it's been the same, again, proving my point that marketshare stuff should not be used as you have "hey, X has more marketshare, therefore X is the better hardware" is ridiculous frankly, Intel's Netburst vs Athlon ring a bell?

Since you decide to go ad hominem with your tirade, let me respond with this: Are you a developer with DX12 experience?

Why do you think your opinions & beliefs which goes against what actual developers and engineers have publicly presented is more valid?

Watch and learn: https://youtu.be/H1L4iLIU9xU?t=14m48s

Much more than better shader utilization. -_-

Seriously, are you saying they are liars and you're right? Because your arrogance would suggest you're some kind of DX12 dev guru and not just another tech forum warrior like the rest of us. lol

ThatBuzzkiller · Apr 4, 2016

Silverforce11 said:
AMD's mid-range is still Hawaii. 390/X existed as non-reference 290/X since 2013. If AMD has just caught up, it isn't to do with actual hardware because it's been the same, again, proving my point that marketshare stuff should not be used as you have "hey, X has more marketshare, therefore X is the better hardware" is retarded frankly, Intel's Netburst vs Athlon ring a bell?

Since you decide to go ad hominem with your tirade, let me respond with this: Are you a developer with DX12 experience?

Why do you think your opinions & beliefs which goes against what actual developers and engineers have publicly presented is more valid?

Watch and learn: https://youtu.be/H1L4iLIU9xU?t=14m48s

Much more than better shader utilization. -_-

Seriously, are you saying they are liars and you're right? Because your arrogance would suggest you're some kind of DX12 dev guru and not just another tech forum warrior like the rest of us. lol

Performance isn't the only thing people consider to be better hardware. You can push as many corner cases as you want but it won't change a thing ...

I don't have any DX12 specific experience but do you have ANY programming or circuit design experience ? Surprise me ...

My opinions and beliefs DON'T go against developer reports but YOU on the otherhand have yet to prove your assertion ...

I'm not saying that the developers are the liars but I know you're one and yes you are some forum warrior for not been able to grasp that

Nvidia hardware =/= AMD hardware ...

Silverforce11 · Apr 4, 2016

ThatBuzzkiller said:
My opinions and beliefs DON'T go against developer reports but YOU on the otherhand have yet to prove your assertion ...

So the part about you claiming, and I quote:

Async compute is just better utilization of the shaders.

ROPs, DMA != shaders.

Did you watch it?

https://youtu.be/H1L4iLIU9xU?t=14m48s

Because you fail at basic understanding if you come out of that presentation and all you think is "Async compute is just better utilization of the shaders."

You have the gall to call me a liar for stating a public fact that Async Compute is more than just shader utilization?!

ps. These aren't my claims about AC, so you accusing me of being a liar has no substance and in-fact, very rude on your behalf.

ThatBuzzkiller · Apr 4, 2016

Silverforce11 said:
So the part about you claiming, and I quote:

ROPs, DMA != shaders.

Did you watch it?

https://youtu.be/H1L4iLIU9xU?t=14m48s

Because you fail at basic understanding if you come out of that presentation and all you think is "Async compute is just better utilization of the shaders."

You have the gall to call me a liar for stating a public fact that Async Compute is more than just shader utilization?!

Yes you are a liar but hopefully you won't end up like John Fruehe of AMD ...

BTW better utilization of fixed function units like ROPs or the rasterizer comes from designing your game around a specific GPU bottleneck so async compute won't do a damn if your rendering workload is already at a perfect balance between graphics and compute or that the game is heavily geared towards compute ...

BlitzWulf · Apr 4, 2016

ThatBuzzkiller said:
Yes you are a liar but hopefully you won't end up like John Fruehe of AMD ...

BTW better utilization of fixed function units like ROPs or the rasterizer comes from designing your game around a specific GPU bottleneck so async compute won't do a damn if your rendering workload is already at a perfect balance between graphics and compute or that the game is heavily geared towards compute ...

Get out of my thread if you cant discuss the technical merits of his arguments civilly

I wont tolerate personal attacks. reported

Silverforce11 · Apr 4, 2016

ThatBuzzkiller said:
Yes you are a liar but hopefully you won't end up like John Fruehe of AMD ...

I don't work for AMD so I won't like be fired by them like John Fruehe.

According to you, I am a liar for saying AC is more than just shader utilization.

You're right buddy, good on you. The devs are wrong, next time someone presents on this topic at GDC, folks, be sure to tell these presenters, ThatBuzzkiller, the forum warrior thinks they are wrong. That AC is just for shader utilization.

When NV adds multi-engines to their next-gen uarch to support AC, they are doing it wrong too, because they already have awesome shader utilization, it's just wasted hardware and transistors.

ThatBuzzkiller · Apr 4, 2016

Silverforce11 said:
I don't work for AMD so I won't like be fired by them like John Fruehe.

According to you, I am a liar for saying AC is more than just shader utilization.

You're right buddy, good on you. The devs are wrong, next time someone presents on this topic at GDC, folks, be sure to tell these presenters, ThatBuzzkiller, the forum warrior thinks they are wrong. That AC is just for shader utilization.

When NV adds multi-engines to their next-gen uarch to support AC, they are doing it wrong too, because they already have awesome shader utilization, it's just wasted hardware and transistors.

Still waiting for your evidence that Nvidia would benefit from async compute on Maxwell ...

The big question here is are you a digital logic designer or are you a liar ?!

You will be forever branded as a liar in this case until you can prove your assertion beyond any reasonable doubt ...

(I know exactly how Scali or the minority would feel like here sometimes.)

Personal attacks are not allowed here.
Markfw900

3DVagabond · Apr 4, 2016

double post

3DVagabond · Apr 4, 2016

ThatBuzzkiller said:
Still waiting for your evidence that Nvidia would benefit from async compute on Maxwell ...

The big question here is are you a digital logic designer or are you a liar ?!

You will be forever branded as a liar in this case until you can prove your assertion beyond any reasonable doubt ...

(I know exactly how Scali or the minority would feel like here sometimes.)

You are not the rock that Silverforce11's legacy will be decided upon.

ThatBuzzkiller · Apr 4, 2016

3DVagabond said:
You are not the rock that Silverforce11's legacy will be decided upon. You think way too highly of yourself.

But this will certainly go on his record as far as his credibility about highly technical matters are concerned ...

Let the educated dominate the uneducated in knowledge so that means Silverforce11 should keep shut until he can get an ABET accredited degree in Electrical/Computer Engineering or in computer science ...

I can't stand ignorance ...

Silverforce11 · Apr 4, 2016

@ThatBuzzkiller

I'm just a regular forum warrior. But since you want to go down that route, we're still waiting for your credentials buddy!

You're making ridiculous claims like "Async compute is just better utilization of the shaders."

Pure ignorance when presented with conference evidence to the contrary from actual DX12 engineers... and you are calling others liars and ignorant?

Nice guy.

Enough thread crapping. Let's return to why MS is biased when they are presenting an example DX12 engine for developers to learn how to work with DX12. -_-

Deders · Apr 4, 2016

Oh dear, can't we all just get along?

I realise that most of the current list of people in my head that said this publicly were either nailed to a cross or shot, but could you try? for me? (couldn't find a smiley blinking fast enough but you get the idea)

It can be very off-putting reading through arguments that span several pages. Even if there is good info in there, I don't know how others feel but for me, the bickering detracts from the points being made.

I'm not intending to call any of you out specifically, I'm just tired of seeing this happen all over the forums.

Maybe step back a little and think about how you might convey the message if you were face to face?

Edit: I've had a huge learning curve myself, but I find that pride plays a big factor in how people react to each other. If pride is dented then it somehow has to be rebalanced. preferably without adding insult to injury.

BlitzWulf · Apr 4, 2016

Deders said:
Oh dear, can't we all just get along?

I realise that most of the current list of people in my head that said this publicly were either nailed to a cross or shot, but could you try? for me? (couldn't find a smiley blinking fast enough but you get the idea)

It can be very off-putting reading through arguments that span several pages. Even if there is good info in there, I don't know how others feel but for me, the bickering detracts from the points being made.

I'm not intending to call either of you out specifically, I'm just tired of seeing this happen all over the forums.

Maybe step back a little and think about how you might convey the message if you were face to face?

Edit: I've had a huge learning curve myself, but I find that pride plays a big factor in how people react to each other. If pride is dented then it somehow has to be rebalanced. preferably without adding insult to injury.

Thank you ,I started this thread to stimulate discussion about the techniques which MS is incorporating into it's engine examples for developers, sadly it quickly degraded to the level of an unfortunately increasing amount of the discussion on here, an experience I could only compare to sitting at a dinner table while your drunken ,emotionally abusive parents that hate each other fight over politics.

Edit: that last part isn't directed at you specifically Silverforce11 but more at your slew of detractors

TheRyuu · Apr 4, 2016

Silverforce11 said:
To say they won't benefit is silly.

It has at least been mentioned on developer slides/presentations that Nvidia GPU's are likely to not benefit as greatly as their equivalent AMD cards since they would not respond in the same way to the additional work (I believe it was mentioned that they performed poorly under a heavy async load where AMD cards did just fine). I don't know if what I mentioned takes into account how Nvidia currently handles async compute or if they handled compute in another way (CUDA) to avoid the complete context switch (from the driver scheduler).

ThatBuzzkiller said:
Async compute is just better utilization of the shaders. AMD is literally advertising the idea of FREE compute shaders!

My understanding of it is it's simply a more efficient way of donig things since you'll get reduced overhead and higher levels of concurrency[1]. Even on Nvidia's current hardware today, which can't do async compute without the context switch, using DX12 async compute shouldn't be ruled out because sometimes the benefits outweigh the downsides of it[1].

There's always a cost (overhead) to async compute even with AMD hardware. It just so happens that when used properly the benefits can outweigh that cost.

http://ext3h.makegames.de/DX12_Compute.html

airfathaaaaa · Apr 4, 2016

ThatBuzzkiller said:
Yes you are a liar but hopefully you won't end up like John Fruehe of AMD ...

BTW better utilization of fixed function units like ROPs or the rasterizer comes from designing your game around a specific GPU bottleneck so async compute won't do a damn if your rendering workload is already at a perfect balance between graphics and compute or that the game is heavily geared towards compute ...

ace engine is just a way to maximize the process time this is what they do at the most basic level even if the shaders are 200% more efficient aces engines will process them faster

Mahigan · Apr 4, 2016

ThatBuzzkiller said:
While it's true that there's potential for Nvidia GPUs to idle in graphics/fixed function heavy tasks like shadow map generation, rasterizing a G-buffer or a depth prepass I do not believe that to be the case the majority of the time in a lot of modern games ...

There's many ways to leverage parallelism without having async compute like games buffering up several frames and pipelining to keep the GPU fed ...

Nvidia also over engineers their geometry processors and rasterizers to keep idling to a minimum too and it looks like that idea is working for them if you take a look at the market share ...

Memory Bandwidth. Cache occupancy/spillage. NVIDIAs Maxwell GPU is beholden to its Memory/Cache Bandwidth. The ROPs, on Maxwell, are tied to the memory controllers. Each 16 ROPs are linked to a dedicated 64-bit memory controller. You can even see the bottle neck arise when looking at synthetics. Even though the following synthetic test reserves all of the memory bandwidth towards ROP operations, we see each Maxwell GPU being unable to hit its theoretical peak.

Kepler hits its theoretical peak and GCN is within a striking distance of its theoretical peaks. Maxwell, on the other hand, is consistently 10GPixels/s behind its theoretical peak across GM204 and GM200.

The end result is that Maxwell, though having a theoretically more robust front end than Fiji, is unable to surpass Fiji when gaming at 4K without operating well beyond its designed frequencies. The boost in GPU frequency, on Maxwell, boosts the cache bandwidth which helps Maxwell a great deal.

Maxwell's SMMs (Shader Multiprocessors) share L1 and Instruction caches with the texture mapping units and polymorph engines. Maxwell thus runs out of immediate caches in between 16 and 32 concurrent warps per SM and begins to spill into the L2 cache. We can see that here:

Maxwell makes up for these deficiencies by ensuring better compute utilization. This is done by having 4 Warp Schedulers per SMM but each Warp scheduler having dominion over its own group of 32 CUDA cores. 32 CUDA cores maps perfectly to a Warp (32 threads) and thus allows Maxwell to make efficient use of its available compute resources.

ThatBuzzkiller said:
It's really common knowledge among developers that Nvidia GPUs have better frontends and geometry processing. If we take a look at synthetic benchmarks that measure the triangle output Nvidia comes out very far ahead compared to AMD ...

Async compute doesn't buffer those frames faster. Async compute is meant is meant to make it possible to run the seperate compute queue in parallel with the graphics queue ...

Oh but I do need to bring up the marketshare because you seem to have tons of misunderstanding or a severe lack of perspective. It's certainly not the case that GM200 has inferior performance compared to Fiji when we take a look at the majority of the games out today. You need to understand that AMD places too much faith that things will go in their way ...

Maxwell does have a superior triangle output over Fiji but this is mostly a non-issue. Neither are limited by their triangle outputs.

Geometry wise, GCN suffers from a lack of primitive discard acceleration. That means that GCN doesn't cull triangles smaller than a pixel. This causes issues when using tessellation factors beyond x16.

ROP wise, GCN does have the more efficient ROPs due to the dedicated color cache afforded to each group of 4 ROPs. Maxwell attempts to make up for its less robust ROP engineering by doubling down on the ROP to memory controller ratio of Kepler. Going from 8:1 to 16:1. This helps Maxwell achieve parity with Fiji at higher resolutions. 96 Maxwell ROPs = 64 Fiji ROPs.

Maxwell also throws in color compression, which is required for Maxwell due to the less efficient memory controllers:

ThatBuzzkiller said:
Even in the midrange segment Nvidia is still competitive, it's only recently that AMD's equivalent caught up ...

If it's doing graphics, it's not doing compute ?! It's clear at this point that you don't have any idea of what your talking about anymore. I have no idea why you or others go into details if you don't know what you're talking about. Question, are you even a developer ?! If not then any posts about GPU microarchitectures from this point onwards from you is NOT credible!

Async compute is just better utilization of the shaders. AMD is literally advertising the idea of FREE compute shaders!

How do you know that Nvidia would benefit ? Have you done any logic simulation on the RTL design or got any other similar evidence that suggests so ? Do you speak out of pure ignorance ?! Well which one is it ?

AMD were caught by a higher API overhead. Hardware wise, GCN has been consistently architecturally superior to every NVIDIA architecture pitted against it except for maybe GM200.

This served as the basis for AMDs Mantle project and now Vulkan/DX12. AMD have made quite a bit of headway with their DX11 drivers. We are starting to see AMD being quite competitive under DX11 scenarios. Even AMDs AotS DX11 driver has improved substantially. So much so that AMDs GCN DX11 performance is now greater than NVIDIAs in this title.

As for Asynchronous Compute + Graphics, it's more like Hyperthreading than it is about better utilisation. Rather than executing a single thread sequentially, GCN can execute two threads in parallel so to speak. This helps cut down on frame time (frame latency) which translates into an FPS boost. You can execute more work per frame if need be or execute the same work in less time.

As for Maxwell, if it's doing texture work, in an SMM, it isn't doing compute work. The two cannot be done in parallel in an SMM as they share caches. That's what the whole preemption/context switch thing is all about.

Maxwell needs to perform a flush of its caches, in an SMM, before it can switch contexts. This adds a delay, this delay can be significant if Maxwell is caught behind a long running draw call. The result of which is coarse grained preemption.

ThatBuzzkiller said:
Yes you are a liar but hopefully you won't end up like John Fruehe of AMD ...

BTW better utilization of fixed function units like ROPs or the rasterizer comes from designing your game around a specific GPU bottleneck so async compute won't do a damn if your rendering workload is already at a perfect balance between graphics and compute or that the game is heavily geared towards compute ...

Rendering work loads are never at a perfect balance between Graphics and Compute. The Nitrous engine, powering AotS, has a 20:80 ratio of compute:graphics. This engine is considered compute heavy.

We're a LONG way from having a perfect balance between the two. Also, Async Compute + Graphics would still help in such a scenario. You're still executing two workloads at once and thus reducing frame time.

Asynchronous compute + Graphics is not what you think it is. Sure, you need the available compute resources to process these extra compute jobs and sure, GCN is heavilly threaded compute wise, has more dedicated caches throughout the architecture to handle extra compute jobs without dropping and spilling into L2 cache.

I think we may be seeing Maxwell's compute limitations under AotS. It may very be that the sheer amount of parallel compute jobs are pushing Maxwell SMMs into spilling into L2 cache.

You're right to mention that Maxwell may not benefit from Asynchronous compute + graphics even if it were capable of executing such tasks. There's simply not enough dedicated cache to handle the amount of compute threads. Heck Maxwell is probably already taxed as is under AotS. In other words, Maxwell will likely regress in performance in upcoming engines and titles titles becoming another Kepler.

Fiji will likely have a longer life span.

[Microsoft] DirectX 12: A MiniEngine Update

Lifer

Golden Member

Golden Member

Lifer

Member

Platinum Member

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Member

Lifer

Golden Member

Lifer

Lifer

Golden Member

Lifer

Platinum Member

Member

Diamond Member

Senior member

Senior member