1. Microsoft is the problem with multithreading performance. The Os should be handling those tasks. The application should be oblivous to the core count. Look at the BEOS and HaikuOs websites.
Applications are oblivious to the core count. They spawn threads (not request cores or anything), and the OS caters to their needs by scheduling any such threads time with CPU, and now that CPUs have multiple cores, they are scheduled to available cores as is ideal. I can spawn 16 threads on a program I create, and those 16 threads will be handled by the OS despite having only a quad core, or even a single core CPU. It's basic multi-tasking, and OSes have that down to a pat. But if I create a program that only ever uses one single thread, then my quad core will perform just as fast as if it were only a single core CPU.
The problem is that applications don't request/spawn/need many threads at all when the processing needs are serial in nature. In fact, in such a scenario, they can only really use one. That is not Microsoft's fault.
It will be impossible for the OS to "multi-thread" an application that does not work on anything more than a single thread. The OS will have no way to transform a serial workload into a parallel workload, especially a program it knows nothing about. At least, not by non-magic means, and if actually done without magic, that would be a major breakthrough in parallel / multi-threaded programming. I would certainly want in on that, because as it is now, I have to go the painstaking route of optimizing my programs to use multiple threads, and it is no easy task figuring out the best way to parallelize as much as possible from what used to be, or easily are, serial programs (if at all possible - sometimes the program is simply 90% serial, and parallelizing it is impossible or impractical given the costs (code complexity, which affects costs related to development, debugging and maintenance) versus the gain).
2. They are attacking both fronts. They are implementing SMT "HT" in hardware vrs in a quasi emulated core sense. It should be faster all thing considered.
I do not know where to start here, and calling Intel's HT implementation as "quasi emulated core" just makes me wonder more if you actually understand the topic (but just like calling it what it isn't), or you actually don't (hence you come up with nonsensical descriptions).
3. I am a horriable speller and I can't type at 10% of the speed I think. I could read what I wrote twice and not see the mising words. If you don't like my posts. Don't reply.
Sorry, perhaps I should not have brought it up. I hope you are not mad.
I give up on this topic. For one thing, all of what we are talking about now is actually off-topic. The real place for this is another thread (or, if #1, then that thread should be in Programming or OS subforums). So have your say if you please, then we'll let it go so as not to continue with the derailment.