In the Graphics queue (3D queue) NVIDIA executes all tasks sequentially like this:
Graphics --> Compute --> Copy --> Graphics = 20ms
AMD, on the other hand, would execute the first 3 concurrently (at the same time rather than one after the other) by separating the tasks into 3 queues. Then AMD would execute the last task. Like this:
Graphics + Compute + Copy --> Graphics = 16ms
That time savings (reduction in latency) = FPS boost.
It's much easier an analogy for folks to understand when we refer to CARS.
Let's say there's 3 road vehicles.
Cars = Graphics
Trucks = Compute
Bikes = Copy
They are driving from A to B. Their goal is B = frame finishes rendering.
In DX11, the approach is a one lane road that can accommodate all 3 vehicles, but only one vehicle type at a time can be on the road.
In DX12, with async compute capability, it allows multiple lanes. Some of the lanes are reserved only for Trucks & Bikes.
Now all traffic can flow at the same time if you schedule them properly, Cars goes to the main lane, Trucks and Bikes goes to the new lanes reserved for Compute tasks.
Now, on any given day (game), there may not be many Trucks*, it is just Cars. On this day, DX12 Async Compute (multi-lanes not used) does not have much benefit.
* Or rather, games may use compute, but its not queued properly and gets sent as a graphics workload, it goes into the Car lane as well.
You can visualize and understand why under scenarios of heavy compute/copy tasks, Async Compute in DX12 is a huge feature to have on the hardware.