You can reduce the CPU workload and time:
![]()
https://developer.nvidia.com/transitioning-opengl-vulkan
nVidia cards dont take advantages from DX12 so the CPU overload should be much lower. But this doesnt happen on a GTX980TI... More proof that the rendering path is not optimized for nVidia's hardware.
More proof that you can't read. Their cpu power usage goes down.
980TI From 69W to ~55W (average of both async + non async)
Fury X from 64W to ~55W (average of both async + non async)
So 980Ti actually benefits more than Fury X. Also keep in mind that 980Ti actually still has software scheduler running on cpu. And yes, FX is a bit faster so it uses cpu a bit more.
More proof that you can't read. Their cpu power usage goes down.
980TI From 69W to ~55W (average of both async + non async)
Fury X from 64W to ~55W (average of both async + non async)
So 980Ti actually benefits more than Fury X. Also keep in mind that 980Ti actually still has software scheduler running on cpu. And yes, FX is a bit faster so it uses cpu a bit more.
The CPU power consumption goes slighty down with the low level API. A true implementation of DX12 would reduce it way more because there is no advantages on nVidia hardware. DX12 allows for nearly 9x more draw calls. And in the detail graph the GTX980TI is way over 60W with DX12. The average number for this card is wrong.
Nope, nV itself tells the opposite - see Kepler and Maxwell whitepapers. Majority of scheduling tasks are performed in driver.BTW: the GTX980TI has no software scheduler running on the CPU. Stop it, pls. This fanfiction is annoying. Scheduling happens on the GPU. Read the Anandtech article from the GTX680 launch.
When has pclab ever been regularly consistent vs the rest of the tech sites?
This is why they are considered a joke site, similar to ABT.
[H] is doing all they can to be added to that list with their forum tirade against AMD too. :/ Not impressed at all.
Well, AT got a 20% perf gain from Fury X with Async on vs off at 4K.
Toms got less. But it's very close in terms of % perf gained and % power use increase. Within margin of error.
Now that 390X at Toms, with no TDP limit, it looks as if its mining coins! lol
NVIDIA are more CPU bound than AMD under DX12. All those CPU threads are now batching work (command buffer) and NVIDIA's scheduler is static. So NVIDIA's driver is taking up more CPU time than AMDs.
DX12 does allow for multi-threaded rendering but if your hardware is using static scheduling, you're adding extra work for the CPU.
NVIDIA are more CPU bound than AMD under DX12. All those CPU threads are now batching work (command buffer) and NVIDIA's scheduler is static. So NVIDIA's driver is taking up more CPU time than AMDs.
DX12 does allow for multi-threaded rendering but if your hardware is using static scheduling, you're adding extra work for the CPU.
Same with his bizarre Rise of the Tomb Raider Conclusion, as I posted about here. Criticizing only the Fiji cards for 4GB and then going to recommend the 4GB 980 over the cheaper, faster, and 8GB 390X.
Even when he tries to be fair, it's little illogical leaks like this that get through the cracks because he cannot help his predisposition. Some of his Nano shenanigans made me think I was on the Huffington Post.
Nope, nV itself tells the opposite - see Kepler and Maxwell whitepapers. Majority of scheduling tasks are performed in driver.
//And it's already explained in those AnandTech articles you are referring to![]()
More importantly, the scheduling functions have been redesigned with a focus on power efficiency. For example: Both Kepler and Fermi schedulers contain similar hardware units to handle scheduling functions,
including,(a) register scoreboarding for long latency operations (texture and load), (b) inter-warp scheduling decisions (e.g.,pick the best warp to go next among eligible candidates), and (c) thread block level scheduling (e.g.,the GigaThread engine)
http://www.nvidia.com/content/PDF/product-specifications/GeForce_GTX_680_Whitepaper_FINAL.pdfFor Kepler, we realized that since this information is deterministic (the math pipeline latencies are not variable), it is possible for the compiler to determine up front when instructions will be ready to issue,
and provide this information in the instruction itself. This allowed us to replace several complex and power-expensive blocks with a simple hardware block that extracts the pre-determined latency
information and uses it to mask out warps from eligibility at the inter-warp scheduler stage.
No, most of work happend on the GPU. What they moved back was scheduling of instructions into warps:
http://www.nvidia.com/content/PDF/product-specifications/GeForce_GTX_680_Whitepaper_FINAL.pdf
You misunderstand. That's talking about scheduling from within an SMM. What schedules work to the SMM? If If you look at a Maxwell block shot, you'll notice a PCI Express interface, then the Gigathread Engine (a large Queue) and then several SMMs. So my question to you is, where's the Scheduler?
http://www.nvidia.com/content/pdf/f...dia_fermi_compute_architecture_whitepaper.pdfGigaThread
Thread Scheduler
One of the most important technologies of the Fermi architecture is its two-level, distributed thread scheduler. At the chip level, a global work distribution engine schedules thread blocks to various SMs, while at the SM level, each warp scheduler distributes warps of 32 threads to its execution units.
I am serious, work is scheduled to the Gigathread engine by the NVIDIA driver. The NVIDIA driver re-orders grids, performs shader swaps etc at the driver level and then schedules the work to the Gigathread engine which holds the work in queue. AWS (Asynchronous Warp Schedulers) within each SMM grab work from the Gigathread engine and schedule it for execution by the various units within the SMM.
Maxwell is not hybrid. Stop making things up. Scheduling of Warps happens on the GPU through scheduling units.AMD GCN is a hardware scheduling architecture.
Maxwell is a Hybrid.
Kepler and Maxwell, too. :\Fermi had a scheduler:
The Gigathread engine is not a processor. If it were, it would be called the "Gigathread Processor". It does not execute tasks, it waits for an available SMM to signal it for work.The GigaThread doesnt hold something in a queue. It is a pool and it schedules work from this pool to free compute units/cluster.
Maxwell is not hybrid. Stop making things up. Scheduling of Warps happens on the GPU through scheduling units.
Kepler and Maxwell, too. :\
GF114, owing to its heritage as a compute GPU, had a rather complex scheduler. Fermi GPUs not only did basic scheduling in hardware such as register scoreboarding (keeping track of warps waiting on memory accesses and other long latency operations) and choosing the next warp from the pool to execute, but Fermi was also responsible for scheduling instructions within the warps themselves. While hardware scheduling of this nature is not difficult, it is relatively expensive on both a power and area efficiency basis as it requires implementing a complex hardware block to do dependency checking and prevent other types of data hazards. And since GK104 was to have 32 of these complex hardware schedulers, the scheduling system was reevaluated based on area and power efficiency, and eventually stripped down.
Source: http://www.anandtech.com/show/5699/nvidia-geforce-gtx-680-review/3The end result is an interesting one, if only because by conventional standards its going in reverse. With GK104 NVIDIA is going*back*to static scheduling. Traditionally, processors have started with static scheduling and then moved to hardware scheduling as both software and hardware complexity has increased. Hardware instruction scheduling allows the processor to schedule instructions in the most efficient manner in real time as conditions permit, as opposed to strictly following the order of the code itself regardless of the codes efficiency. This in turn improves the performance of the processor.
However based on their own internal research and simulations, in their search for efficiency NVIDIA found that hardware scheduling was consuming a fair bit of power and area for few benefits. In particular, since Keplers math pipeline has a fixed latency, hardware scheduling of the instruction inside of a warp was redundant since the compiler already knew the latency of each math instruction it issued. So NVIDIA has replaced Fermis complex scheduler with a far simpler scheduler that still uses scoreboarding and other methods for inter-warp scheduling, but moves the scheduling of instructions in a warp into NVIDIAs compiler. In essence its a return to static scheduling.
Ultimately it remains to be seen just what the impact of this move will be. Hardware scheduling makes all the sense in the world for complex compute applications, which is a big reason why Fermi had hardware scheduling in the first place, and for that matter why AMD moved to hardware scheduling with GCN. At the same time however when it comes to graphics workloads even complex shader programs are simple relative to complex compute applications, so its not at all clear that this will have a significant impact on graphics performance, and indeed if it did have a significant impact on graphics performance we cant imagine NVIDIA would go this way.