why's it so hard to write a good compiler for this chip? don't most chips do this sort of stuff in hardware anyway? so why's it more involved than just copying, using software, what the hardware designers have been doing for years?
First of all, there's seemingly a premise behind this question based on the subject and the question that the Itanium compilers aren't very good. I'm not quite sure how anyone could say that the Itanium compilers aren't very good when they have compiled programs that beat every other CPU out there in most (nearly all?) of the major benchmarks. They produce stable, fast, high quality code for multiple OS's (Windows 64, Linux, HP-UX, etc.), my question is how much more were you expecting? Better SPECint scores? Better SPECfp scores? Better tmpC scores? Better OS support?
Second, the majority of the changes that the IA64 instruction set implement are designed to reduce hardware complexity - and improve performance - by allowing the compiler to decide more things for the CPU. Instruction templates allow the compiler to schedule execution units, the use of bundles with stop bits allows the compiler to tell the CPU which instructions to group together, the use of speculative loads and stores allows the compiler to tell the CPU to load things that will probably be used in advance of them actually being used, the use of branch hints allows the compiler to tell the CPU when a branch may or may not be taken and how likely that probability is. These added capabilities that the compiler can exploit make compiler design an even more difficult process.
Third, it takes a lot of time to implement high quality compilers for a given microarchitecture. One designs the CPU with the thinking that you know what instructions the user will be running and you optimize towards this, then the compiler tries to take advantage of these improvements, but at the end of the day many of these guesses may be wrong and the compiler will need to be tweaked to better match instruction streams to CPU features. This is a long-term loop that requires feedback from users, and extensive CPU profiling to close the loop and improve the compiler's performance.
I think the Itanium family's set of compilers is actually pretty respectable given the relative youth of the architecture. But I also expect that compilers will continue to incrementally improve performance over time. Considering that the Itanium 2 is currently the world's fastest CPU in most of the major benchmarks, these additions could enable the Itanium's to extend it's lead without any changes to silicon at all.
Patrick Mahoney
Itanium Microprocessor Design Engineer
Intel Corp.
Fort Collins, CO
PS. I'm still looking up the answer to your question in the SMP thread. I was so busy at work today that I barely even had time to check how far INTC stock was falling.
