When is HyperThreading NOT beneficial to the user?

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,202
126
I've been noticing, crunching on BOINC on three threads, and on my GT740 GPU, I'm getting pauses. While it could be the GPU, I don't think so, because scrolling is unaffected. But things like minimizing windows (with the corresponding semi-transparent shrinking animation) are lagging.

This doesn't happen on my G4400 dual-core OCed to 4.455Ghz, with my 7950 card, also crunching on both CPU cores and GPU.

Here's a theory. The i3 really only offers "half threads". Meaning, that on my G4400, when my UI thread needs attention from the scheduler, because the BOINC task CPU threads are running at a low priority, they get out of the way, and the UI task gets full core performance to execute its task.

On an i3, since the cores are shared by "half-threads", then when I have a UI task thread scheduled, it causes the current BOINC "half-thread" to get out of the way, but the OTHER BOINC half-thread, which is optimized so it doesn't leave much in the way of execution resources free, is still executing on the core.

Therefore, the user-perception is affected, because the UI task thread doesn't execute in as timely a manner as it would have, had the PC only had two FULL cores, rather TWO cores with four "half threads".

My observational conclusion is, that using a HyperThreaded PC, results in a DEGRADED user experience.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,202
126
One proposed solution, would be to alter the scheduler, such that each half-core/half-thread is grouped together, and when a higher-priority thread has to execute, that the scheduler preempt BOTH half-threads, if both of them have a lower priority, and then just schedule the one thread on the core at the higher-priority, until the end of the quantum, or if the thread yields.
 

hhhd1

Senior member
Apr 8, 2012
667
3
71
Um, turn off hyperthreading on the i3 and see if the experience "improves."
+1

I have done that once on a sandybridge dual core, in an effort to reduce heat (and allow the computer to be fanless) , but the performance 'feel' ended up suffering real bad, small tasks like saving web pages, switching apps, open large folders, ..etc all became significantly more laggy.

this was done with the processor also limited to a relatively low clock (0.8 ghz ~ 1.2ghz) , I assume on faster core processers , it may be less noticeable.
 

cytg111

Lifer
Mar 17, 2008
25,207
14,702
136
cache trashing, if for some reason you're running (a) skizofrenic workload(s), the prefetcher may be exhausted, L1 and L2 depleted/thrashed and you're having stalls all around. In theory (might be why i7's have more cache than i5's too).
I have no idea about where you're going with the "altered schedular"?
 

coercitiv

Diamond Member
Jan 24, 2014
7,112
16,453
136
I've been noticing, crunching on BOINC on three threads, and on my GT740 GPU, I'm getting pauses.

This doesn't happen on my G4400 dual-core OCed to 4.455Ghz, with my 7950 card, also crunching on both CPU cores and GPU.
So you're running 3 threads on i3 6100 and 2 threads on G4400? Why not the same thread count?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Its not as intensive, far from.
Prime95 usually is intensive as it uses heavily optimized butterfly kernels (likely more optimized than a just compiler optimized Linpack). I don't know recent performance comparisons but in the past it was 3x as fast as one of the fastest available FFT libs with autotuning (FFTW).

BTW, by running a low priority task I was able to reduce Prime95's throughput significantly, while Prime95 was running at a higher priority. There are influences as the processor doesn't look at priorities and the OS scheduler might put differently prioritized threads on the same core at the same time.

Edit: Here is the plot:
prime95_with_backgrouymupt.png


So it looks like a low priority background task (not too heavy, as it is built with MSVC) increases avg. times by roughly 50% and even best times by ~20%.
 
Last edited:

dark zero

Platinum Member
Jun 2, 2015
2,655
140
106
Virtual machines. It hates HT to the extreme. It lags badly the host machines if you open more VM than the max real cores available.
 

CropDuster

Senior member
Jan 2, 2014
374
59
91
I've experienced stuttering in several games (Battlefield, Far Cry, Rainbow Six) with both of my rigs. It goes away with HT off.
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
Is that with or without c-states being enabled?
Enabled and on AC power with an i5-5600U. It's my $2K work laptop, and I can't change that much there. But I did set max. CPU performance level to 80% in energy settings to avoid thermal throttling somewhat. I might do some more tests while observing the max clock frequencies.
 

coercitiv

Diamond Member
Jan 24, 2014
7,112
16,453
136
I've experienced stuttering in several games (Battlefield, Far Cry, Rainbow Six) with both of my rigs. It goes away with HT off.
But just so we're clear, both your rigs already have 6 physical cores to begin with :)
 

redzo

Senior member
Nov 21, 2007
547
5
81
Virtual machines. It hates HT to the extreme. It lags badly the host machines if you open more VM than the max real cores available.
My belief is that a more realistic approach would be:
1. as far as vm requirements are concerned, the more hardware threads(ht threads included) the better.
2. expect double the flexibility over resource vm thread allocation on ht cpus.
3. don't expect the same out of a 8 core(8 thread cpu) cpu vs 4 core(8 thread) ht cpu: keep your expectation/feet on the ground!
 

lakedude

Platinum Member
Mar 14, 2009
2,778
528
126
At one point in my single life I had perhaps a dozen computers running distributed computing including a Play Station. Every single one of the devices behaved differently.

The Play Station was rock solid and never needed a reboot unless there was a power bump (in which case all the computers would have trouble).

Some systems were utterly unusable while crunching while others were surprisingly usable (with some you wouldn't even know they were busy) even with the CPU and GPU running 100% loaded. Unfortunately I don't remember why some systems were usable and some were not, if we ever knew to begin with.

The computers running just CPU tasks were fairly stable but the computers running GPU tasks would generally only run a few days before needing some sort of operator assistance.
 

hhhd1

Senior member
Apr 8, 2012
667
3
71
I've been noticing, crunching on BOINC on three threads, and on my GT740 GPU, I'm getting pauses. While it could be the GPU, I don't think so, because scrolling is unaffected. But things like minimizing windows (with the corresponding semi-transparent shrinking animation) are lagging.
Sounds like the GPU is the problem, check the gpu load in gpu-z.

Virtual machines. It hates HT to the extreme. It lags badly the host machines if you open more VM than the max real cores available.
if you have a VM with 2 cores allocated:
on 2 cores 2 threads: the host OS will lag
on 2 cores 4 threads: the host OS will not lag

trying to run a VM with 3/4 cores on either of the above will slow down the VM to the level of stuttering.
 

MajinCry

Platinum Member
Jul 28, 2015
2,495
571
136
Some games hate hyperthreading.

Think I posted this before, but a guy on the TaleOfTwoWastelands forum was having some horrific stuttering with his game. Said he had an i7, told him to restart, check for updates, etc. Didn't help at all.

Went out on a limb, asked him ta turn off hyperthreading. Ra-ta-ta-ta-dah, no more stuttering.
 

Exophase

Diamond Member
Apr 19, 2012
4,439
9
81
cache trashing, if for some reason you're running (a) skizofrenic workload(s), the prefetcher may be exhausted, L1 and L2 depleted/thrashed and you're having stalls all around. In theory (might be why i7's have more cache than i5's too).
I have no idea about where you're going with the "altered schedular"?

Cache thrashing or thrashing other resources like TLB or BTB could be a factor in theory. In practice it's probably pretty rare, at least on modern CPUs.

Scaling to higher thread counts has some overhead by itself due to thread creation, synchronization, and parts of the algorithm that are replicated redundantly across threads, among other possible causes. So if there's little benefit from HT going from one to two threads on a single core could result in a net loss.

There could also be issues with suboptimal thread balancing and scheduling. For example, let's say that a task is distributed to three threads that do exactly one third of the total work, so they each do 0.33N unit of work of a total workload of N. Now let's say the scheduler pegs two of these threads to one core and the other thread to another core. If there's no bonus from HT those first two threads will take 0.66N to complete while the other core takes 0.33N. Since all parts need to complete this will mean that the total task takes 0.66N. Now compare with the two thread version: both take 0.5N on each core so the total time is only 0.5N.

Ideally the above is helped by the scheduler which will redistribute the two threads on the first core to both cores after the second core is done and idle. This would ideally result in a total time of 0.33N + 0.165N = 0.5N. But this would only be the case if both threads running on the first core made exactly 50% progress while the other core made 100% progress, and it doesn't include the overhead of thread migration. So in practice the situation will probably be worse. Having many more software threads than hardware threads helps but then you run into the other scaling problems.

HT does also effect absolute latency, so although throughput may be as good or higher responsiveness or other latency sensitive activities may suffer, as per how VirtualLarry described it.
More likely problem is that the task just doesn't scale that well with the given thread count if the workload is not well balanced across hardware threads vs hardware cores.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,202
126
I fixed the slowdown in minimizing / restoring my windows, by telling BOINC to only run one two cores (threads) instead of three.