Sorry, not taking about benchs here but real performance. CUDA devs had to rewrite quite big portions of their software (without tech papers of maxwell v2,to their dismay) to acomodate to the new memory and cache system and to make maxwell at least competitive with kepler products. For each software you show a bench of doing "ok" performance I can show a real world software that has either abysmal maxwell performance o has to be reworked to make maxwell competitive. Right now lots of prosumers are still on kepler because maxwell performance is inconsistent and they are still waiting their entire production software stack to acomodate to the changes in maxwell. Seeing apple is all about a tight circle between software and hardware, inconsistency or rewriting your software around 1 generation of products is undesirable. Why bother if the competition has performance consistency in the api you really use (OpenCL)?
Power efficiency is a gaminv fad right now. Translating it into the prosumer world and thinking the same argunent will hold is just downright silly.