NTMBK
Lifer
Their hand-crafting of the assembly is obviously not good enough or else it wouldn't show any improvements in hyperthreading over non-hyperthreading. It would probably be easier if the programmers coded to the actual RISC co-processors in the cores (they can't of course).
You're talking complete nonsense. As soon as you need to pull data in from memory, you're not going to hit 100% CPU utilisation with a single thread. Apart from silly toy benchmarks which fit entirely in cache, your code is going to stall when it needs to go out to main memory to fetch data. Hyperthreading means that while one thread is stalled going out to main memory, the other thread can utilise the resources which would otherwise go idle.
Coding to the "RISC coprocessors" (I can only assume you mean decoded micro-ops) would make no difference. If your algorithm has a dataset larger than the CPU's cache, you're going to stall.