Sorry for the late replies and I think some of you got me wronged on some part. Like I said earlier, I have no deep technical knowledge on GPU uArch design. Not even tried to code on them yet. All my knowledges are for Ryan's uArch disclosure/analysis, whitepapers and tech sites. If one experiences on both uArch can shed some light I'll be very grateful. (Like Mahigan, zlatan, or who ever write shader codes for both uArch)
AMD have produced some really powerful cards that they seem to have had trouble fully loading. Like you say Hawaii and Fury are very wide designs relative to NV that they've had issues getting performance out of.
I don't really view DX12 as bad for NV cards - they're not gaining much in DX12 because they're already doing a decent job of loading their GPUs in DX11 due to the work they've done on their DX11 implementation (and you can see a similar pattern with the narrower AMD cards lately too - there's been more than a couple DX11 vs 12 benchmarks linked that have cards like the 380 being pretty much the same in 11 and 12). It's very good that we're seeing the larger GCN cards now showing performance more inline with how powerful they are on paper - a more competitive sector is good for everyone except NV shareholders.
NV have used the short term performance advantages they've had from a better DX11 implementation to really aggressively increase margins every generation. They're probably not in any danger of being uncompetitive as they can just drop pricing back to more normal levels when they need to, but it'd be a nice change to see them actually have to do this.
You're right on that one. NV uArch's laser focus on DX11 literally make their GPU almost busy every time. Before Kepler, Fermi was like GCN (Wide, complex, a nuclear power plant some said) focusing on compute and tessellation if I'd recalled (Main feature of DX11). Kepler, Maxwell and Pascal streamlined from that design and become serial for more efficient (More work, less idle) which also include moving scheduling to the driver (Software-based not hardware like Fermi did) to save on area and power. It make NV uArch pretty much dumb but efficient like hell.
AMD probably knew NV were targeting DX11 full force and so focused toward DX12 (Low-level) and Compute. Their choice paid off starting with dev getting deep GPU access with DX12 and Vulkan. At the end of the day, GCN really looked like Fermi but more highly advanced (Stateless, unlimited resource). With both console win, GCN just keep getting more optimization for games. Sony did push AMD for some feature that get absorbed back to GCN (More ACEs were Sony's idea). If you look back to AMD's Terascale, that's nv uArch in present time compare to GCN. NV uArch just doesn't have any space left to execute or the brain to manage simultaneous compute and graphic workload (Software scheduling is not that smart for real-time decision)
And yes NV simple uArch makes them more margins since it's not really complex to manufacture (Although still hard from a big die standpoint)
Seems pretty simple. AMD designed for the consoles which translated well to Mantle/DX12/Vulkan and nVidia designed for DX11. For a while now AMD has been suffering with an API that their hardware isn't optimum for. They still managed to hold their own in absolute performance, but nVidia's advantage was enough to tip the playing field commercially. AMD has had to run lean for a while now. I think they really need Zen to be successful or we might see AMD go purely semi custom and RTG spin off with very broad cross licensing between them. The consumer CPU space might go pure Intel if that happens. Assuming the FTC allows that without breaking up Intel.
A lot of ifs here, of course.
More likely AMD GCN was just both Sony and Microsoft was looking for (Both were looking for complete solution rather that custom design). Low-level API was dev request even before GCN (Timothy Lottes, Johan comes to mind). Mantle help pushed it early, DX12 and Vulkan make it a reality (With games targeting both, mostly DX12). I think Zen will not disappoint
If you think instruction level scheduling is somewhat impacted by DX12.. then you have no idea of you are talking about. NVIDIA simplifying their *instruction* scheduling has zero impact/relevance wrt DX12. The latter impacts GPU *task* scheduling, which is a completely different beast!
The number of completely baseless architectural myths and urban legends constantly repeated in this forum is quite staggering (and frankly sad).
I never said that or saying I'm a uArch guru. You can correct me If I'm that wrong but don't accuse what I didn't say. I said "
Instructions is fixed so can be serialized and mapped well to a SM" and frankly scheduling does impact performance on DX12. Ryan said it himself NV scheduling decision could have backfire and DX12 show that. In-order vs Out-of-Order. Static vs Real-time.
The number of completely baseless fanboy accusation and urban legends constantly repeated in this forum is quite staggering (and frankly much more sad I'm afraid)
The architecture differences allowed nvidia to sell smaller, cheaper chips at higher prices.
Just because dx12 allows gcn to perform more similar to equally sized nvidia cards doesn't mean the architecture was a great forward looking idea.
I'm probably going to upgrade to polaris, but because of price/performance and freesync, not because I expect magical dx12 gains.
Actually with DX12 GCN can compete with NV head on. But only on DX12. On DX11 it's lackluster because It can't make itself busy long enough for more work to come. For example, Fury should have bury 980 Ti but because it is so wide, it can't. I'm not saying GCN is that good. But it's still very good.