Cerb
Elite Member
- Aug 26, 2000
- 17,484
- 33
- 86
OSes have already done about all they can do to optimize for HT. The problem is so much code either isn't waiting on any of that for long enough to worry about, or HT can't make it any faster when it does. When it isn't waiting, HT has a fair chance of reducing the performance of each thread. Shorter pipelines, wider ALUs and buses, more and larger buffers, etc., decrease the need for SMT to get good performance. It's better to make that sync operation take 100 cycles instead of 500 than it is to have another thread to execute while you wait.resources like mem cache misses, data from network / hardware, waiting for multi-threaded program to hit checkpoint to sync..... OS/hardware stuff you can't optimize for in the code, that the OS can optimize via hyperthreading
Today, that usually leads to performance that's nearly the same as HT off when it's no good, so it's not the big deal it once was, but it still can't do any/much good, if there's not actually work for the other thread to perform, especially when applications are fairly limited in their thread scaling. As long as our CPUs are designed to allow a single thread to use most of each core (Power, FI, has been moving towards too much for any thread, and lots of SMT), SMT will be a fragile performance enhancing option: +80% here (it happens), +20% there, 0% over yonder, and then plain screwing up QoS in the back room. That's why it's not on every CPU, and can be turned off.
