Also there seems to be an error? From reading the article, AMD says each ACE can handle 8 queues. Hawaii/Tonga has 8 ACEs, so 8x8 = 64 queues. That's 1 graphics or 63 (or 7x8 = 56) compute queues. Actually its even better than that, ACEs can execute the compute queues in parallel with the command processor so its always full capacity when operating in mixed mode.
ie.
1 graphics + 8 compute PER ACE in Hawaii/Tonga.
That's 1 graphics + 64 compute simultaneously.
![]()
HD 7000 & Rx 240/250/270/280 : processeur de commandes x1 queue + 2 ACE x1 queue + 2 DMA engines
->Graphics/Compute/Copy avec limitations
HD 7790 & R7 260 : processeur de commandes x1 queue + 2 ACE x8 queues + 2 DMA engines
->Graphics/Compute/Copy
R9 285/290 : processeur de commandes x1 queue + 8 ACE x8 queues + 2 DMA engines
->Graphics/Compute/Copy
GTX 400/500/600/700 : processeur de commandes x1 queue + 1 DMA engine
->Pas de support
GTX 750/780/Titan : processeur de commandes x32 queues (limité) + 1 DMA engine
->Compute/Compute
GTX 900/Titan X : processeur de commandes x32 queues + 2 DMA engines
->Graphics/Compute/Copy
AMD talked about these two features and how its going to be the next update to Mantle, to reduce GPU bottlenecks, but it never happened because DX12 & Vulkan has killed Mantle. But its great that this feature lives on!
HD 7000 & Rx 240/250/270/280 : processeur de commandes x1 queue + 2 ACE x1 queue + 2 DMA engines
->Graphics/Compute/Copy avec limitations
HD 7790 & R7 260 : processeur de commandes x1 queue + 2 ACE x8 queues + 2 DMA engines
->Graphics/Compute/Copy
R9 285/290 : processeur de commandes x1 queue + 8 ACE x8 queues + 2 DMA engines
->Graphics/Compute/Copy
http://www.hardware.fr/news/14133/gdc-d3d12-amd-parle-gains-gpu.html
AMD has released much more technical PR involving DX12 than Nvidia. They've been more openly informative and excited than Nvidia. This kind of AMD needs to stay around. Stop doing the horrible cheese videos, stop allowing Huddy to spout idiotic comments. Instead, focus on their strengths (perf/$), future API readiness, and stay more agile in the market I.e. don't wait so long to drop prices in response to new competitive products.
I hope 390x puts the hurt on Titan X.
Easily. People talk about Nvidia's perf/watt advantage but at what cost? There is no free lunch and Nvidia has clearly skimped on features to bring down the transistor count.gcn is such a beast. THIs uarch might be the most future proof and flexible one yet.
I think you need to go read the article and comprehend what it's all about.![]()
Yet AMD gets trounced in DX12 performance, so much for having 8 ACEs, 64 queues.
![]()
Yet AMD gets trounced in DX12 performance, so much for having 8 ACEs, 64 queues.
Yet AMD gets trounced in DX12 performance, so much for having 8 ACEs, 64 queues.
The AMD DX12 driver then was not fully ready yet. AMD needs more time to improve driver. It's not Nvidia with big money.
GK110 introduced Hyper-Q with support for 32 concurrent queues. At the same time Tahiti only supported 4 Queues. Since then Hyper-Q is supported on every new GPU.
Only with Hawaii AMD supports more queues but even then 32 is enough for nVidia to fully utilized their compute cores.
On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once early implementations of HyperQ cannot be used in conjunction with graphics. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage..
As discussed in the DX12 threads, Async Compute & Shaders is the big feature of the new API, where CPU overhead reduction solves the CPU bottleneck, Async Compute/Shaders is whats going to reduce the rendering bottleneck.
With consoles being limited in power, such techniques to extract efficiency out of each shader (don't let it idle when it can process physics, shadows, lights, compute etc) is going to matter a lot moving forward. Imagine 2-4 years later with these console SoCs aging, developers will tap into any methods that can extract the last drop of performance from them.
AMD talked about these two features and how its going to be the next update to Mantle, to reduce GPU bottlenecks, but it never happened because DX12 & Vulkan has killed Mantle. But its great that this feature lives on!
DX12 supports three different command lists:
Graphics, Compute and Copy.
Pre Maxwell v2 architectures can use graphics + Copy or Compute + Copy.
Every architeture with Hyper-Q supports up to 32 concurrent compute tasks with input from 32 different streams (hosts).
Tahiti on the other hand only supported 4 compute queues and 1 graphics queue. But at this time there wasnt a API to support both at the same time.
DX12 supports three different command lists:
Graphics, Compute and Copy.
Pre Maxwell v2 architectures can use graphics + Copy or Compute + Copy.
Every architeture with Hyper-Q supports up to 32 concurrent compute tasks with input from 32 different streams (hosts).
Tahiti on the other hand only supported 4 compute queues and 1 graphics queue. But at this time there wasnt a API to support both at the same time.