Which apps are actually slowed by hyper-threading?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Verdant

Member
May 8, 2003
83
0
0
perhaps you should buy a HT enabled part and test it with and without HT on.. but really you are looking at (in my estimate) a maximum of 1-2%
 

Ken90630

Golden Member
Mar 6, 2004
1,571
2
81
Chris: I appreciate your comments, but are you 100% sure about HT having been designed to compensate for a longer pipeline in P4s? This is the first time I've heard of that. In theory, can't HT only speed up execution with more than one thread running? If so, that's a separate issue entirely from pipeline stages and execution with single threaded apps (which are what's gonna be run most of the time by people anyway).

When you say, "By making it handle multiple concurrent threads, they can recover the performance and make up for the low IPC of the processor, to an extent," that throws me a bit because it's not recovering any performance except in multi-threaded apps and/or apps architected to utilize SMP. The rest of the time, with single-threaded apps, HT is not being utilized and thus the longer pipeline would receive no such 'assistance' -- it's just gonna be a longer pipeline, period, and theoretically could only be enhanced by higher clock speeds in a P4. I understand branch prediction misses, algorithms, and the relative merits (or lack thereof) of AMD's pipelines vs. Intel's, etc., but I can't see how a technology like HT can help compensate for a longer pipeline if it's only being utilized a small percentage of the time when multi-threaded tasks are being performed.

I'm not saying you're wrong -- I just don't see how you're arriving at your theory. Am I missing something here? Clue me in, my friend! :)

Oh, and Verdant, unfortunately I don't have any way to test chips. I'm just a user, not a builder or tester. Wish I could test a HT chip w/Photoshop CS though!

Ken
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
I thought Prescott's cache increase was to hide the effects of it's longer pipeline.
 

chsh1ca

Golden Member
Feb 17, 2003
1,179
0
0
Originally posted by: Ken90630
Chris: I appreciate your comments, but are you 100% sure about HT having been designed to compensate for a longer pipeline in P4s? This is the first time I've heard of that. In theory, can't HT only speed up execution with more than one thread running? If so, that's a separate issue entirely from pipeline stages and execution with single threaded apps (which are what's gonna be run most of the time by people anyway).
I don't know about the motivation of it, but from the effectiveness it gives at that job, I'd be surprised in believing that WASN'T the intent of its design. You are correct, HT can only speed up execution with more than one thread running. Your modern desktop will have dozens of threads running at any given time.

When you say, "By making it handle multiple concurrent threads, they can recover the performance and make up for the low IPC of the processor, to an extent," that throws me a bit because it's not recovering any performance except in multi-threaded apps and/or apps architected to utilize SMP.
Not true, by allowing two concurrently running threads to execute, instructions being executed by other threads (Operating System threads, driver threads, etc) will be executed. As an example, certain Creative Labs Sound Blaster cards have relatively notorious performance issues (they put a bit more of a load on the CPU than they probably really need to). On HT-enabled processors, games could actually run smoother because driver-related threads would be able to execute in tandem with the game thread, instead of the OS swapping the threads in and out of the processor's pipe as needed.

The rest of the time, with single-threaded apps, HT is not being utilized and thus the longer pipeline would receive no such 'assistance' --- it's just gonna be a longer pipeline, period, and theoretically could only be enhanced by higher clock speeds in a P4.
Multiprocessing in general affects more than just multithreaded applications. Further to your second issue, yes, the higher clock speeds help the P4 much more for intensive singlethreaded applications

I understand branch prediction misses, algorithms, and the relative merits (or lack thereof) of AMD's pipelines vs. Intel's, etc., but I can't see how a technology like HT can help compensate for a longer pipeline if it's only being utilized a small percentage of the time when multi-threaded tasks are being performed.
It helps compensate for the lower IPC by allowing the use of concurrent threads. If you miss on one thread, the other is still executing, so the loss in processing time is lessened significantly. It's not a perfect Multiprocessing solution, but it certainly does make inroads into performance increases.

Anandtech's article on it discusses it in pretty good detail: http://www.anandtech.com/cpu/showdoc.html?i=1576&p=1



 

rogue1979

Diamond Member
Mar 14, 2001
3,062
0
0
OK, I just can't sit still any longer, gotta jump in here.

I have sitting in front of me an Athlon 2500+ at stock speeds (333MHz fsb) on a Via KT600+ and a P4 2.8GHz HT (800MHz fsb) on an 865 chipset. I am not going to do the Duvie style (no offense intended) scientific benchmarks. Now we all know that the P4 is gonna take most of the usual benchmarks, but I can't emphasize this one point enough. Working in Microsoft Office, and just generally opening desktop applications the Athlon system is faster and more responsive.

I have noticed this several times when working with decently fast Athlon and P4 systems at the same time. Now I understand that the P4 is gonna benchmark higher, but for casual desktop work it just doesn't feel as fast.
 

Ken90630

Golden Member
Mar 6, 2004
1,571
2
81
Okay, I'm gonna try to keep this shorter than my earlier epic posts, but I do need to continue this a bit further. I see certain aspects of this more clearer now, but other aspects have become fuzzier ....

First, I probalby should have mentioned in the beginning that the 2.8GHz chip I'm interested in is a Northwood, not a Prescott. (Does Prescott even go down to 2.8GHz? I think it does but am not positive.)

Rogue 1979: Great post. You're giving me the kind of feedback I want, 'cuz I'm only gonna be running the Adobe Creative Suite (Photoshop, Illustrator, InDesign and Acrobat), Microsoft Office 2000 Premium, and maybe -- maybe -- some music software in the not-too-distant future on this computer. That's it. Nothing else. Zero. Zilch. Nada. I don't care about gaming performance, or video encoding performance, or SETI (whatever that is -- anyone wanna clue me in?) or even Web surfing (I do that on another computer).

That said, is your 2.8GHz P4 a Northwood or Prescott? If it's a Prescott, I can totally see how the Athlon would beat it in office programs due to the longer pipeline and missed-prediction stalls of the P4 when working with branchy-code-heavy apps like Office. If your 2.8 is a Northwood, I would expect the results to be closer but I can still see how even it would be slower (even though it's clock speed is higher than the Athlon's). The Northwood's pipeline is still pretty long. You didn't mention, by the way, if both machines have the same amount of RAM.

Okay, now on to Chris: More great info (uh, I think!), and I'm gonna check out the link you provided to the showdown before I discuss too much of what you said in your post. I do wanna say, however, that I was under the impression from the outset -- mainly from the info about Hyper-Threading on Intel's Web site -- that HT only divides a CPU's resources into two concurrent threads at a time. No more than two. If that's the case, then when you say that, "Your modern desktop will have dozens of threads running at any given time," how can that really matter? If that's right, then HT would be "turned on" and running two threads all the time, but only two of the "dozens" would be benefiting from HT. As far as I know, no chip can split its resources into more than 2 threads and run "dozens" of threads at a time.

As I understand HT and SMP, it's only for people who are running two APPLICATIONS at the same time and executing tasks in both (as opposed to devoting resources to OS or drivers or something running in the background). And the only other reputed benefit is that is intended to also take advantage of some particular apps that have been re-engineered to make use of SMP by splitting some of their activity into two threads rather than one. (I'm still waiting for that mythical list. Does anyone have one?????) Adobe Photoshop is a prime example of this: Prior to the newest version, CS, no previous version of Photoshop had any optimizations for HT/SMP. It's only this new version that's been re-engineered, and the only optimization is with a small number of digital effects filters. In fact, Apple uses the Photoshop CS optimization as a selling point for its dual-processor G5s and G4 machines, which of course accomplish the same thing (albeit better) that a single HT chip is designed to do.

Whew. So much for a shorter post .... Need to resolve this soon, as it's taking more time than it's worth. And my brain hurts.

Ken

PS: How do you use the "quote" feature on these forums? Every time I try to use it, it erases my entire post. I'm a newbie here, so would appreciate a tip.
 

rogue1979

Diamond Member
Mar 14, 2001
3,062
0
0
Both machines are running WinXP Pro, 512MB of DDR and the P4 is a Northwood. I have brought this point up several times in other threads, but it seems like everybody just ignores it. How fast your computer opens up common desktop applications is a benchmark in itself, not to be dismissed so readily. This wasn't the first time I have noticed this, I have compared several other higher speed Athlon and P4 systems and noticed the same thing.