<<
How hard would it be to update code for programs to take advantage of hyper-threading? Would it just be better to start from scratch? >>
Not difficult at all...Intel is basically advocating tried-and-true good multithreaded programming techniques related to synchronization and locking, many of which I was exposed to in my OS class last year. A lot of these techniques will not only benefit (if they aren't used already in a multithreaded app) SMT, but SMP as well: better data access patterns, don't falsely share cache lines, minimize synchronization, utilize pause and hault instructions when applicable to spin-wait loops to minimize wasted cycles, don't keep idle loops spinning, call OS to free up resources of idle threads, pipeline spin locks. These techniques are easy to implement, especially with good threading libraries...I've done my multithreaded programming in Java, and for the database access and shell programs that I had to write, I was already using similar spin-locking ideas that Intel is advocating. Richard Wirt's
presentation at IPF showed off some tools (C++ compiler, OpenMP, and VTune) that work together to automatically analyze, generate, and optimize theads for SMT....the demo showed a 25% improvement for a Photoshop blur filter with SMT enabled (assuming a 60% improvement from SMP).
<<
Read this is we had hyper-threading now the Pentium 4 might be slower than it already is. >>
That's a bit short-sighted....SMT on the P4 with poorly-written threaded programs may hurt performance, but the same is potentially true with an SMP or any multiprogrammed system. Infoworld did some testing and
found (graphs at the bottom) a 19% - 60% performance increase in web serving and SQL database transactions. Remember that SMT is still only officially on Xeons, the important thing is to look at enterprise performance, not desktop performance.
<<
Anyone briefly tell me what HyperThreading is/does? >>
The idea behind Simultaneous Multithreading (SMT, Hyperthreading is Intel's name for the concept) is to utilize multiple processor states and execute multiple threads on a single core at the same time to improve resource efficiency. Modern out-of-order superscalar MPUs attempt to fetch and issue multiple instructions at a time to multiple execution units, but due to memory latency, data hazards, branch hazards (and a number of other issues), the sustained throughput is less than the maximum number of instructions that could be issued and executed per cycle. By putting another processor state (defined by the program counter, general purpose registers, stack pointer, memort limit registers, etc), a second thread can be executed, and the two threads share resources to maximize efficiency. The holy grail of SMT was to be the Alpha EV8, which was a 8-way fetch/issue aggressively out-of-order core with 4-way SMT and was expected to yield a 2X performance increase in threaded apps (not to mention significant single-thread performance increase over the EV6 & EV7, which are 4-way fetch superscalar). x86's MPUs, on the other hand, are 3-way superscalar, and the P4 has 2-way SMT...regardless of when we see it on the P4 for the desktop, I think SMT in general is here to stay, and will only evolve and improve in its incarnations.
BurntKooshie
wrote an excellent article on SMT a while back.