Originally posted by: PhynazHow are you reporting cpu usage, task manager?
Task manager in Windows (used only for development IDE), top in RH5 Linux (X55xx), topas in AIX (P595 with 64 4.2GHz Power6 processors). The comment above was from an 8 core Linux box. But I agree that this doesn't necessarily show that HT isn't fully utilized. The fact that application throughput goes down (quite a lot) as soon as I have more threads than real cores is the best indicator that I am doing well keeping the CPU pipeline full. A lot also depend on the compiler, this is where Intel's compiler really shines.
A nop loop does calculate a lot (assuming that the optimizer dosn't remove it, any C++ compiler worth using will do that). The CPU doesn't know that you aren't doing anything useful!
But I agree that to get get this kind of behavior you have to at a minimum study the assembly (and fix where needed) and program with some knowledge of the caching scheme (padding structs to fit cache lines, ...). Applications written in anything higher level that C++ need not apply.