thesmokingman
Platinum Member
Scheduling in software, ie. using the CPU defeats the point of doing it out of order. The point is to use free CPU cycles to do other things, but that won't happen if the CPU is now taking on the task of scheduling.
What difference does it make at this point? Are there any DX12 games available right now? AMD had a head start on NVidia driver wise because of Mantle, which is similar to DX12.
What difference does it make at this point? Are there any DX12 games available right now? AMD had a head start on NVidia driver wise because of Mantle, which is similar to DX12.
Also, the only thing being done in software is the scheduling, which is actually more power efficient than having hardware schedulers like AMD.
NVidia have some of, if not the best driver engineers in the World and so they have the confidence to implement such a thing in software, rather than using transistors for it. We'll see how effective it is when the driver is released.
As for AMD, they have invested heavily in AC to the point of having multiple hardware schedulers, so they're hoping it will pay big dividends in the long run.
It may be more power efficient, but it also (potentially) comes with significant latency penalties, hence the "potentially catastrophic" remark from Oculus.
more efficient if measuring the gpu power only and not total system power. If his is true then it just shifted power draw from the gpu to the cpu and in the process became less flexible. Doesn't sound like a good trade off to me but they made bank on it, so maybe I don't know anything.
Latency is a big problem for VR, but not for regular games.
What I don't understand is why you are defending NV in particular. You seem to upgrade GPUs every 1.5-2 years anyway which makes me think that for you it truly doesn't matter. Therefore, logically it means you shouldn't even care if Maxwell bombs in 2016-2018 DX12 games since next year you will move on to Pascal so who cares about Maxwell's performance in DX12. Or am I wrong?
Ashes is still just 1 game on 1 game engine. We need a broader picture.
1. Some people are in the graphics card market right now, and are looking for a card that will last a few years. If they buy Nvidia, they won't get as much performance out of their cards in DX12 as the AMD competition. Futureproofing to last 4 years or more is generally pointless, but lasting 2-3 years? That's a reasonable expectation, and we'll have a fair amount of DirectX 12 games in less than a year.
2. We don't know if Pascal will actually implement a better asynchronous compute method than Maxwell. As good as Nvidia's engineers are, it may not be possible for them to just leap forward to cutting edge AC support without the preexisting experience and IP. And who's to say AMD will sit still while Nvidia closes the gap? I've said it before, but it could be a similar situation to what AMD had with tessellation: they couldn't just go from horrible tessellation to terrific tessellation performance with one generation, and even when they greatly improved tessellation performance, Nvidia had already moved the bar even further.
3. It's important to criticize Nvidia now to best encourage them to focus on asynchronous compute support going forward. If Pascal is lackluster in AC performance at this point and there's still a chance to improve the design, they need to know to do that. And they need to know to prioritize AC performance in Volta as well. The last thing gamers and developers want is for criticism of Nvidia to be silenced, because you can't improve on a design if you don't hear the criticisms.
If that latency stalls your rendering pipeline for x ms, then it most certainly is a problem.
Latency stalls are no problem for a GPU, as there are thousands or tens of thousands of threads in flight, so if one stalls, then another will simply take it's place due to the workload being "embarrassingly parallel."
Latency is a bigger problem for CPUs with their more serial workloads..
The whole point of this thread is that graphics shader and compute shaders are not "embarrassingly parallel" when run together, as they have to be run serially unless you utilize async compute.
This is a goalpost switch, and has no bearing on what we were discussing before..
But still, if NVidia is using the CPU for scheduling, then compute tasks will be issued out of order just like with AMD's ACEs, but probably even better as CPUs are the out of order masters. Whatever OoO logic AMD built into their ACEs, is nothing compared to what's in a CPU..
And Maxwell 2 has the capability to run 31 compute tasks in parallel with graphics.
Latency stalls are no problem for a GPU, as there are thousands or tens of thousands of threads in flight, so if one stalls, then another will simply take it's place due to the workload being "embarrassingly parallel."
Latency is a bigger problem for CPUs with their more serial workloads..
Me talking about latency in the context of async compute in a thread that has been all about async compute is a goalpost switch?, eerm ok.
And no, If Nvidia uses the CPU for scheduling the compute will not be issued just like with AMD's ACEs, since, once again, going through software (i.e. the CPU), can and will come with hefty latency penalties.
You haven't really been paying attention have you?, this whole thread has been about the fact that Nvidia cannot run compute in parallel with graphics, in spite of what they have so far claimed.
![]()
Missing a frame refresh is a problem. It's not like a lot of HPC compute where it just reports when it's done, game compute threads need to keep pace with the game.
Nvidia's level of preemption adds an additional layer of difficulty to coordinating Async Compute. It will be very impressive if Nvidia's software engineers manage to optimize their scheduler to such an extent it offsets having to coordinate the compute threads with the regular GPU tasks using PCIe rather than on the GPU die.
*facepalm*
Did I not say that VR was an exception? I'm talking about regular gaming here, not VR..
And I told you, that latency in regular games is not a big deal as everything is parallel.. Even in DX11, compute shaders which are done in serial to rendering are still actually executed in parallel.
Asynchronous compute also uses idle shaders for processing and is heavily dependent on availability, so of course latency is going to naturally be involved.. 🙄
Latency is going to be involved no matter what, but GPUs excel at masking latency by having tens of thousands of threads in flight, so it's not a big deal unless you're doing VR.
On the contrary, I've been posting in this thread since the second page, so I'm very aware of what it's about.
This whole thing was kicked off by an Oxide developer who first claimed that Maxwell didn't even possess the ability to do asynchronous compute, and now the very same developer says that asynchronous compute was broken in the driver from the beginning and NVidia hadn't fully implemented it but is in the process of doing so.
We actually just chatted with Nvidia about Async Compute, indeed the driver hasn't fully implemented it yet, but it appeared like it was.
So as far as I'm concerned, Maxwell 2 does support asynchronous compute in parallel with rendering, as stated in the CUDA developer toolkit and multiple sources.
It is most detrimental to VR but it also means latency may cause Async Compute tasks to also get pushed to the next frame refresh.
So there are obviously some hurdles for Nvidia to overcome and they appear to be quite high.
Wow! Why invent a whole new API? Just offload everything to the CPU. Brilliant!
I can't believe it's actually being put forth that scheduling with software on the CPU is the equivalent of having dedicated hardware onboard the GPU.
And that makes Oxides statement of not being capable wrong because they are going to use the CPU for scheduling. Seems to me that confirms his statement that as far as he could tell, Maxwell couldn't do it.
You don't need hardware schedulers to do asynchronous compute. Only AMD have taken this approach. Intel and NVidia are going to do the scheduling in software..
Obviously we're talking about future games that would make use of async, so no, regular games as they exist today are not affected.
The fact that async compute uses idle shaders, really has nothing to do with latency as such, but either way there will always be latency involved with any task, the point is that going through software (the CPU), will generally incur much larger latency penalties.
Latencies are not created equally, so saying that "latency is going to be involved no matter what" is a cop out. And GPUs being able to run tens of thousands of thread in parallel isn't going to help when you have to stall the entire rendering of a frame to wait for a compute job to finish. A compute job that could have been run in parallel if you had access to async compute.
The dev was correct when he claimed that Maxwell 2 didn't possess the ability to do async compute (with the current drivers), so I don't see what you're problem with that is.
The dev himself never said that asynchronous compute was broken in the driver from the beginning, he simply said that Nvidia had told him that it was broken:
So unless you have a problem with Nvidia's claims (or think that the oxide dev is lying about talking to Nvidia), I really don't see what the issue is.
Because if there's one thing history has taught us, it's that Nvidia's documentation is the most trustworthy thing out there 🙄
Hell you don't even need a GPU for DX12 at all, you can do the whole thing through software, and since you apparently think that doesn't carry any performance penalty, I guess we should all sell our GPUs and just use WARP12.
Now you're just being hyperbolic..
So light of that, why is using the CPU a bad thing? The CPU's out of order capabilities are far superior to what is found in those ACEs, which makes it a much better candidate for that sort of thing.
CPU's throughput capabilities are far superior, but that doesn't mean that their latency capabilities are superior.
hhaha thanks for that. They missed 2 key points: (1) NV screwing over Titan X owners with an after-market 980Ti that beat the Titan X out of the box for $650-700, barely months after TX came out and the after-market 980TI cards have better components and coolers too (2) mobile dGPU overclocking is a bug. <That's perhaps the most hilarious stunt of 2015.
Only NV could lie about GTX970's specs, mobile dGPU overclocking, gimp Kepler, rip TX owners off, and lie about AC shader capabilities for DX12, have atrocious price/performance in low end cards like 750Ti/950/960 and still manage to gain market share quarter after quarter. Holly cow, not even Apple loyalists would likely put up with this type of treatment.
The media is covering this worldwide. Even PCPerspective is covering it, but HardOCP and TechReport haven't posted a single article on this. 😎
Wow! Why invent a whole new API? Just offload everything to the CPU. Brilliant!
I can't believe it's actually being put forth that scheduling with software on the CPU is the equivalent of having dedicated hardware onboard the GPU. And that makes Oxides statement of not being capable wrong because they are going to use the CPU for scheduling. Seems to me that confirms his statement that as far as he could tell, Maxwell couldn't do it.
I suppose that's one way of improving efficiency though. Just don't have the hardware on your GPU in the first place. You use less power and less transistors. Then have the CPU perform the tasks. Again, brilliant!