Originally posted by: dmens
Please give an example of x86 actually crippling performance in the targeted segment, one where there is no alternative platform. Also, as for the bloat and/or anemia, please give examples of how it makes life difficult for software engineers. Thanks.
Well lets look at the differences between x86 and x86-64, briefly:
64bit: No MMX, 3d-Now! and 16 64bit registers as well as a core 16 128bit SSE/SSE2 register set along with the elimination of the x87 FPU.
32bit: MMX, 3d-Now, SSE, SSE2, x87 FPU collage (all of which are tacked on and not necessary AND overlap) with 8 GPR/FP/SIMD 32bit registers and no core for SSE/SSE2 registers.
So in order to make a fully backwards compatible x86-64 cpu, you need to waste die space decoding all of the legacy stuff, which has very limited uses. Eventually the 32bit portion will be eliminated altogether (which will take a long time, seeing how even 16bit legacy mode still exists), but you will still need to make a processor that is fully compatible for at least the next decade.
Moreover, assembly for 16 64bit GPR's registers with a core 16 128bit SSE/2 registers is quite different than your mundane 8/8/8 register 32bit set. There will be issues as to how to translate the 64bit code backwards for your 32bit clients, usually resulting in massive performance penalties as well as width issues. Now, I wont even get into older MMX/3d-NOW/FPU code running on a native 64bit system and how that'll translate.
Edit: I would like to add that there are *some* programs that are optimized for x86-64 using the full 16 + 16 register set. These are programs that see upwards of 50-100% speed increases, obviously depending on what you're doing. I'm curious to know wtf is the difference between that and IA-64, which generally runs emulated legacy 32bit at half the speed of its native IA-64. Once more programs move to x86-64, I forsee a larger performance discrepancies between that and 32bit.