It's not that it produces code optimized for Intel. It's that it specifically detects non-intel processors and produces un-optimized code. Why not just output one binary, regardless of the brand of the cpu (or at the very least, make the binary based on detected extensions, not the brand)? Their compiler goes out of its way to produce unoptimized output.
Of course, Intel's compiler is the fastest compiler for both amd and intel processors. AMD's own compiler initiative is a failure, and Microsoft's compiler is merely ok.
GCC is about the same as Microsoft's iirc, though I think AMD is actively contributing to it now that their compiler is defunct.
When it comes to scheduling SSE instructions, Intel's compiler is worlds apart from GCC and Microsoft's.
http://terapix.iap.fr/forum/showthread.php?tid=134
It was caught a long time ago. When detecting a non-Intel cpu, the Intel compiler just outputs a binary optimized for some ancient version of x86. As in it throws out 20 years of extensions, and produces a binary that will run on a 586.
Also, I think you'll find that more dev houses use ICC than you might think. When server performance is all about performance per watt, what's a license fee for a compiler against code that will run on thousands of servers?
There was an instance with quake 3 a while back, where it was found the dlls included with it don't enable SSE on an Athlon processor. Granted, quake 3 predates athlons with SSE, but the game did have an SSE code path enabled for Intel cpus. Once the DLLs were recompiled to enable SSE for Athlon XPs, the AXPs started outperforming Pentium 4's in the same way Athlon 64's outperformed P4's in most games (not sure about A64's, but it wasn't uncommon for games to not enable SSE on the Athlon XP). What's my point? I dunno, but I think there's a lesson about using Gentoo in all of that.