I guess my point is that there's impact beyond the decoder. For example, your load/store unit and fetch unit have to handle self-modifying code. Even without self modifying code, you can jump into the middle of an instruction (and e.g. skip a prefix), meaning you have to execute the same bytes with a different interpretation. Your integer datapath has to check if you perform a shift by 0 bits and set flags differently in that case versus the same instruction shifting by a nonzero amount. Your scheduler needs to handle special instructions that write results to two registers (e.g. MUL writing EDX:EAX for 32b*32b->64b). Your floating point unit needs to handle an 80-bit format nobody wants. Your address generation units have to pull in extra operands on one of the most important critical timing loops in a processor (load to use latency) or add additional complexity if you want to separate the segbase==0 case. A little area here, another gate on your critical path there, a little more energy per operation there... it adds up to a less optimal design. Sure, you could handle most of the ugliness with microcode (i.e. only add overhead to the decode), but if you want to make x86 go fast you end up dealing with it all over.
As processors turn into commodities where every option is basically "good enough" to accomplish most tasks and a customer can spend $1 on an ARM core or $1.05 on an x86 core (or $1 on the battery for the ARM-based system and $1.05 for the battery on the x86 system), wouldn't they save the 5%? x86 vendors took over from the old RISC vendors when the enormous volume (and good-enough margins) allowed them to out-invest the old RISC vendors, and the "small" designs weren't good enough for YouTube/Call of Duty. As the smaller, cheaper designs reach the "good enough" point, it seems like a lot of the x86 market could disappear. Game developers are already porting PC games from PowerPC on all 3 major consoles, and app developers are getting familiar with ARM for phones and tables, so I see the porting/"legacy" argument getting weaker every year.
edit: Oh, and I see overlap between the A15, and possibly the high end of A9 and the low end of x86 (e.g. Atom) within the next year or two.