hyperthreading

nutxo · Mar 2, 2004

I have it turned on in my bios for my p4c800-e deluxe but task manager only shows one cpu, am I doing somethign wrong?

OutHouse · Mar 3, 2004

i could be wrong, but do you have to have a xeon proc to do hypertheading?

n0cmonkey · Mar 3, 2004

Originally posted by: Citrix
i could be wrong, but do you have to have a xeon proc to do hypertheading?

You're wrong. Newer p4s support hyperthreading too.

Check your HAL. Don't ask me how, but you might have an uniprocessor HAL and it might require an SMP HAL (or a HT HAL?). Or something.

Nothinman · Mar 3, 2004

Yes, you need a SMP HAL, probably the ACPI SMP HAL. But remember that if you pick the wrong HAL or one that doesn't work with your motherboard you won't be able to boot up again to fix it.

Unforgiven · Mar 3, 2004

what operating system are you using?

darktubbly · Mar 3, 2004

Link 1
Link 2

Try booting from your XP CD and do a repair installation, making sure to pick the ACPI Multiprocessor HAL. Also, you have SP1, right?

nutxo · Mar 3, 2004

had to reinstall the os, thansk for all the suggestions

Sianath · Mar 4, 2004

If you are using Windows 2000, it doesn't matter whether you have it enabled or not, Windows 2000 doesn't support hyper-threading. It will ignore your extra processors.

spyordie007 · Mar 4, 2004

Originally posted by: Sianath
If you are using Windows 2000, it doesn't matter whether you have it enabled or not, Windows 2000 doesn't support hyper-threading. It will ignore your extra processors.

It wont ignore them, it will just think that you have seperate physical CPUs (as opposed to a single physical CPU containing multiple logical CPUs).

Sianath · Mar 4, 2004

My wording was poor, sorry. What I meant was that Windows 2000 will not take advantage of a hyperthreaded machine the same way that an XP or a 2003 machine will.

Kadarin · Mar 4, 2004

Originally posted by: Sianath
My wording was poor, sorry. What I meant was that Windows 2000 will not take advantage of a hyperthreaded machine the same way that an XP or a 2003 machine will.

True, in that you can run into licensing issues (i.e. try to install Win2K on a dual-proc machine with HT enabled). However, I do not know what if any performance difference there will be on a single HT P4 between XP and Win2K.

Sianath · Mar 4, 2004

Windows 2000 doesn't know the difference between physical and logical processors, and the later OS's do.
- http://www.microsoft.com/windows2000/docs/hyperthreading.doc

It's a small difference if you never come close to maxing out the processor support for your OS, but it can make a drastic difference when you can afford to get the correct number of physical processors to max out your OS.

That's all I meant.

boran · Mar 5, 2004

Originally posted by: Astaroth33

Originally posted by: Sianath
My wording was poor, sorry. What I meant was that Windows 2000 will not take advantage of a hyperthreaded machine the same way that an XP or a 2003 machine will.

Click to expand...

True, in that you can run into licensing issues (i.e. try to install Win2K on a dual-proc machine with HT enabled). However, I do not know what if any performance difference there will be on a single HT P4 between XP and Win2K.

well, a hyperthread aware OS can handle the thread management better. say you got a processor with one FPU and one integer processor (I'm making these stuff up I'm no CPU designer) but if yer OS thinks that one CPU is two CPU's it will assign maybe one FP operation to CPU1 and another to CPU2 whereas it would be wiser to assign em both to CPU1 and leave virtual CPU2 handle an INT calculation at the same time. I dunno exactly, but I tought that was the main reason why win2K performs less with HT than XP does with HT.

maybe someone more technical has a more in-depth explanation.

stephbu · Mar 5, 2004

Boran - you explanation is close enough to land in the ball park - HT processors simulate 2 logical CPU's by dividing the resources (like cache, FP, and Int execution units) of a larger single physical CPU. They work on the premise that at maximizing utilization of these resource will improve CPU throughput. (sometimes true sometimes not)

Where threads contend for the same resource they stop/block execution - something that isn't good when you have several hundred threads running. Especially since switching contexts to start work on a different thread is very expensive too. HT-aware schedulers are generally smarter in that they understand the principles behind the logical processor's contention and can schedule threads such that they can contend less. HT especially becomes an issue if your application performance is CPU-bound.

XP & Win2K3 both have HT-aware schedulers, Win2K doesn't.

Kadarin · Mar 5, 2004

Thanks for the info, guys!

kamper · Mar 6, 2004

Originally posted by: stephbu
Boran - you explanation is close enough to land in the ball park - HT processors simulate 2 logical CPU's by dividing the resources (like cache, FP, and Int execution units) of a larger single physical CPU. They work on the premise that at maximizing utilization of these resource will improve CPU throughput. (sometimes true sometimes not)

Where threads contend for the same resource they stop/block execution - something that isn't good when you have several hundred threads running. Especially since switching contexts to start work on a different thread is very expensive too. HT-aware schedulers are generally smarter in that they understand the principles behind the logical processor's contention and can schedule threads such that they can contend less. HT especially becomes an issue if your application performance is CPU-bound.

XP & Win2K3 both have HT-aware schedulers, Win2K doesn't.

Could you expand on this a bit stephbu? I'm no expert but I don't think that process switching takes very much time. It happens everytime you hit a key or move the mouse and on real multi user systems it actually provides the illusion of a dedicated, uninterrupted cpu to many user at the same time (many switches per second). All it would involve is a swapping of register contents and the os making a change to a few process queues.

Also, as far as scheduling processes to share an ht based multiprocessor system, wouldn't that sort of prediction require an examination of the program code to see which instructions its going to execute? Seems to me that's just not feasible, it'd be faster to run the instruction then to decide which cpu resources it'll take.

This is just what I think, if I'm wrong I'd love for you to explain to me what's right.

stephbu · Mar 6, 2004

I'm no expert but I don't think that process switching takes very much time. It happens everytime you hit a key or move the mouse and on real multi user systems it actually provides the illusion of a dedicated, uninterrupted cpu to many user at the same time (many switches per second). All it would involve is a swapping of register contents and the os making a change to a few process queues.

Windows task scheduler operates at thread rather than process level. For example at the moment there are around 490 threads in various states of execution at the moment on my system. The scheduler allocates CPU time to each thread based on the thread priority, thread state, and also your interactions (threads from processes you interact with receive a process 'boost' if you're at the keyboard!) ALL threads receive attention from the thread scheduler - NOT just the just those of the interactive application.

Think of a thread scheduler on a single CPU like a guy doing plate spinning with his right hand only (490 plates in my case!)... Context-switching is like the plate spinner moving his hand to apply more 'spins' to the plates.

On an MP machine (remember Win2K thinks HT processors are multiple processors) - the thread scheduler tries to distribute thread execution between multiple CPU's - this is pretty tricky.

A true MP machine is like our plate spinner guy using two hands to spin plates

When a thread begins execution on a CPU the scheduler will generally try to keep it executing on that same CPU unless an something major happens - like the scheduler determining that moving the thread to another less-utilized CPU is 'cheaper' than waiting for the current CPU.

Moving a thread executing in one CPU to another is very expensive in relative terms - hundreds of clock cycles wasted - no register swaps - the OS has to 'package' the thread-state to move it, doing that discards all the branch prediction and cache work that the processor did.

Moving thread between processors is like our plate spinner taking time out to picking up a plate, pole 'n' all and moving physically from one side of body to the other.

On an HT machine with an un-aware scheduler e.g. Win2K if CPU0 becomes process bound e.g. in expensive long running computions - the scheduler is mistaken in thinking that moving threads CPU0 to CPU1 may be cheaper than waiting. In truth CPU0 and CPU1 are one and the same thing - both are CPU bound - just you'll now pay the price of also stalling the processor while you try context switching. In intensive CPU-bound operations thread context-switching (thrash) frequently take more CPU time than actual thread execution.

This is our plate spinner trying to two loads of plates with only one hand, thinking that moving plates will make it easier

Also, as far as scheduling processes to share an ht based multiprocessor system, wouldn't that sort of prediction require an examination of the program code to see which instructions its going to execute? Seems to me that's just not feasible, it'd be faster to run the instruction then to decide which cpu resources it'll take.

To make the scheduler HT-aware - the method the scheduler uses to determine whether or not to move blocked threads between processors is changed. Where a thread is waiting on a blocked logical processor the scheduler won't move the thread to another logical processor on the same CPU. It prevents that context-thrash path from occurring.

Our plate spinner realizes that he has only one hand and makes the best of a bad situation.

This is of course grossly oversimplifies a few decades of research and development on SMP systems

kamper · Mar 6, 2004

Excellent explanation, thank you. It was definitely your point about the cost of switching processes between processors that I didn't understand before.

Our plate spinner realizes that he has only one hand and makes the best of a bad situation.

That sums it up well for me.

stephbu · Mar 7, 2004

Thanks Kamper you're welcome - as an addendum, Pentium 4 performance depends very much on speculative branch prediction/execution and trace/L1 cache hits to keep its execution pipeline filled with work. Context switching can damage or destroy the work that the processor does for these features. To get optimal performance every stage of the execution pipeline needs to be busy doing something.

Tom's Hardware has some pretty indepth analysis of the Pentium execution core and execution pipeline -
http://www.tomshardware.com/cpu/20001120/p4-09.html#hyper_pipeline its a little out of date now, the latest P4s have a longer pipeline (if I remember right), but none the less it give you a flavour of how the system works.

hyperthreading

Diamond Member

Lifer

Elite Member

Elite Member

Golden Member

Senior member

Diamond Member

Senior member

Diamond Member

Senior member

Lifer

Senior member

Golden Member

Senior member

Lifer

Diamond Member

Senior member

Diamond Member

Senior member