I'm no expert but I don't think that process switching takes very much time. It happens everytime you hit a key or move the mouse and on real multi user systems it actually provides the illusion of a dedicated, uninterrupted cpu to many user at the same time (many switches per second). All it would involve is a swapping of register contents and the os making a change to a few process queues.
Windows task scheduler operates at thread rather than process level. For example at the moment there are around 490 threads in various states of execution at the moment on my system. The scheduler allocates CPU time to each thread based on the thread priority, thread state, and also your interactions (threads from processes you interact with receive a process 'boost' if you're at the keyboard!) ALL threads receive attention from the thread scheduler - NOT just the just those of the interactive application.
Think of a thread scheduler on a single CPU like a guy doing plate spinning with his right hand only (490 plates in my case!)... Context-switching is like the plate spinner moving his hand to apply more 'spins' to the plates.
On an MP machine (remember Win2K thinks HT processors are multiple processors) - the thread scheduler tries to distribute thread execution between multiple CPU's - this is pretty tricky.
A true MP machine is like our plate spinner guy using two hands to spin plates
When a thread begins execution on a CPU the scheduler will generally try to keep it executing on that same CPU unless an something major happens - like the scheduler determining that moving the thread to another less-utilized CPU is 'cheaper' than waiting for the current CPU.
Moving a thread executing in one CPU to another is very expensive in relative terms - hundreds of clock cycles wasted - no register swaps - the OS has to 'package' the thread-state to move it, doing that discards all the branch prediction and cache work that the processor did.
Moving thread between processors is like our plate spinner taking time out to picking up a plate, pole 'n' all and moving physically from one side of body to the other.
On an HT machine with an un-aware scheduler e.g. Win2K if CPU0 becomes process bound e.g. in expensive long running computions - the scheduler is mistaken in thinking that moving threads CPU0 to CPU1 may be cheaper than waiting. In truth CPU0 and CPU1 are one and the same thing - both are CPU bound - just you'll now pay the price of also stalling the processor while you try context switching. In intensive CPU-bound operations thread context-switching (thrash) frequently take more CPU time than actual thread execution.
This is our plate spinner trying to two loads of plates with only one hand, thinking that moving plates will make it easier
Also, as far as scheduling processes to share an ht based multiprocessor system, wouldn't that sort of prediction require an examination of the program code to see which instructions its going to execute? Seems to me that's just not feasible, it'd be faster to run the instruction then to decide which cpu resources it'll take.
To make the scheduler HT-aware - the method the scheduler uses to determine whether or not to move blocked threads between processors is changed. Where a thread is waiting on a blocked logical processor the scheduler won't move the thread to another logical processor on the same CPU. It prevents that context-thrash path from occurring.
Our plate spinner realizes that he has only one hand and makes the best of a bad situation.
This is of course grossly oversimplifies a few decades of research and development on SMP systems
