It's one of the few features of DX12 that is a clear performance advantage. Is it the end of the world not having it? No. But it does mean that Nvidia has to work harder for that 5%-10% "free" performance from Async Compute. Just imagine if the Fury X was 5-10% faster overall. It would be game changing to the perception of the card at release.
It may be 5-10% for Fury X but I'm not sure you'd see the same gains on Nvidia cards even if it had hardware support for async compute. We have to keep in mind that because of how the cards are designed you see those async compute gains on AMD hardware because (IIRC) it's harder for AMD cards to reach their maximum throughput so they they have have the required overhead to see a gain from async compute (I can't find the source for this but I believe it's been linked to in one of these async threads).
As others have said async compute is not the only feature that DX12 offers to developers. Furthermore you can still see gains from using compute shaders even under with the non-optimal implementation for Maxwell so long as they they are not interleaved and you properly batch the jobs[1].
I look at it this way, it's not so much that Maxwell sucks at DX12 as it's just harder to see the gains from using it because it's harder to program for because of design decisions that Nvidia made wrt efficiency. Couple that with the fact that the best case for AMD is essentially the worst case for Nvidia[1] and the result is likely what's so far been observed with DX12 titles. I still don't think the sample size is large enough to draw conclusions about Nvidia's architecture in the DX12 era other than it's definitely more awkward to program for.
It may be worth noting that Maxwell2 (second gen maxwell) does support async compute although in a more limited fashion than what's required by the DX12 specs. Mahigan explained it well in another async compute thread[2] (the Doom one):
CUDA applications support Asynchronous compute via Hyper-Q and since PhysX is CUDA based, it supports Asynchronous compute + graphics on Maxwell (GM20x).
Hyper-Q isn't compatible with DX12 barriers and or fences (we're not sure which one) which is why GM20x doesn't support Async compute + graphics under DX12.
Hyper-Q bypasses the Command Processor in GM20x and is handled by a dedicated ARM processor on the GM20x die. This dedicated ARM processor can feed both 3D jobs and compute jobs concurrently and in parallel to GM20x.
It is more than likely that NVIDIA were caught by a minor DX12 API spec.
As for context switches, they occur in two stages.
1. During the execution of work loads.
2. Within the SMMs themselves.
The first can be alleviated by use of Hyper-Q in CUDA applications but the second cannot due to the shared L1 texture/compute caches within an SMM. Basically, an SMM cannot be performing both compute and texture jobs at once due to shared logic. A full flush is required to switch from one context to another within an SMM.
I do wonder if async compute could be supported without the need for the first context switch in the list above with modifications to just the ARM processor to support the barriers/fences or if a more invasive approach is required.
We can speculate that if Pascal shares similarities with Maxwell that it may indeed not be possible to alleviate the issue of hardware async but this is just speculation and I don't think it should be considered a deal breaker even if it's true. We just have to wait and see at this point.
[1]
http://ext3h.makegames.de/DX12_Compute.html
[2]
http://forums.anandtech.com/showthread.php?p=38115150#post38115150