Is task manager lying to me?

Wasps

Junior Member
Apr 12, 2017
2
0
1
Hi
I'm hoping someone here can help.

I have an application with a pretty stable benchmark process whereby I can see how well an environment will perform based on how quickly my benchmark process completes.
I have this application installed on an Amazon EC2 instance and, as expected, the time to complete reduces if i change the instance type to one with a newer generation, or faster CPU.

What's confusing me is that at no point does Task Manager show 100% CPU utilisation. I'm seeing somewhere between 40% and 60% utilisation regardless of whether I'm using a slower processor and it's taking 5 minutes to complete, or a faster processor completing in 4 minutes.
Even if i view individual cores in resource monitor or procmon, neither go near 100% utilisation.


So, if i have a Xeon E5-2670v2 completing the benchmark in 5 minutes, and a E5-2686v4 completing it in 4 minutes, where, within Windows can i see that 1 CPU is working faster than the other if neither is using 100% of the processor time.
And as a bonus question, why would it not use 100% of the processor time if there's 5 minutes worth of work to do?



Thank you for your help and i hope i learn something today
 

dullard

Elite Member
May 21, 2001
25,066
3,415
126
Highly threaded software is extremely difficult to write. It can even be impossible, depending on your task.

Suppose you need to do four complicated math problems:
1) Complex problem #1 gives you variable W which depends on known values. Thus, W can be calculated right away.
2) Complex problem #2 gives you variable X which depends on known values. Thus, X can be calculated away.
3) Complex problem #3 gives you variable Y which depends on variable X. Thus, this problem cannot be solved until X is known.
4) Complex problem #3 gives you variable Z which depends on variables W and X. Thus, this problem cannot be solved until W and X are known.
  • In a single thread CPU, it will do step 1, then, 2 then 3, then 4, using 100% processor power each time.
  • In a dual thread CPU, it will do steps 1 and 2 at the same time each on a different thread (using ~100% processor power). Then it will do steps 3 and 4 at the same time each on a different thread (using ~100% processor power).
  • In quad thread CPU, it will try steps 1 and 2 at the same time each on a different thread (using ~50% processor power) but threads 3 and 4 cannot start since they don't know the value of W and X (using 0% processor power).
  • Now imagine trying to do that with 20 threads of your E5-2670v2 processor. 2 of the 20 threads can operate (~10% of the processor power) but 2 others are idle until W and X are known and the other 16 threads have absolutely nothing to do at the moment.

Thus having more threads doesn't help in this example. A good programmer might be able to break up step 3 and step 4 into smaller bits that might be partially calculated while step 1 and 2 are being done. But lots of times this just isn't physically possible. Trying to break up all problems into exactly 20 pieces for your 20 thread chip is quite difficult work. Plus, what happens if you can break the problem up into exactly 20 pieces but one piece ends sooner? That thread has to sit idle waiting for the others to catch up.

And this doesn't even get into the overhead of transferring data from thread to thread (or worse, from CPU to memory). You might be able to perfectly split the problem into 20 exactly equal parts, but the CPU might be faster than the memory if the data set is large enough and the CPU has to just sit there waiting.

Finally, you have a 10 core CPU that runs 20 threads. In an ideal world a core could run two different threads at the same time if they use different resources in the core. But often, if the math is similar, the two threads need the same part of the core. Thus, only 10 threads out of your total of 20 can actually be ran simultaneously if that is the case.
 
Last edited:
  • Like
Reactions: iamgenius and Burpo

Wasps

Junior Member
Apr 12, 2017
2
0
1
Thank you very much for the reply... that certainly helps.

This is a virtualised environment whereby i only have 2 of the 10 cores from the E5-2670v2 CPU, and therefore presumably only 4 threads available.
In your example above, are Complex Problems 3 and 4 assigned to a thread straight away so that it sits idle, or only once it can be processed? I assume it's the latter, otherwise no other processes could run on my 2 idle threads until all Complex problems 1-4 are complete.


Performance therefore presumably comes down to how quickly each thread can complete each complex problem that it's working on.
I know that the published clock speeds of CPU's are largely irrelevant these days. For example, my E5-2670 v2 has a base clock speed of 2.5Ghz and a turbo of 3.3Ghz, whereas the faster E5-2686 v4 has a base clock speed of 2.3Ghz and a turbo of 3.0Ghz (i think).

Is there any way within Windows that I can see how quickly each of the threads is being completed?


There's an extra reason for this whole problem.
We have a different virtual server running a completely different CPU (AMD Opteron 6136).
The benchmark tests run far slower on this environment than our AWS environment,

I'd therefore like something within windows to be able to show that the AMD CPU is processing each thread slower.
So far, i've run passmark benchmark and can see that the single thread performance of the AMD CPU is far slower than the AWS Intel chips, but it would be nice if we could see what effect this is having on the time each thread takes to complete.
 

sm625

Diamond Member
May 6, 2011
8,172
137
106
On a thread level? Not really. Task manager can show you per-process time, assuming the process stays running in an idle state after it completes. If the process exits then it just disappears from task manager. To get down to a thread level you have to use some other tool. Like Procmon. That will give you thread creation and exit times. But it isnt quick to set that up. You have to use a filter. The filter would go something like "If Operation Excludes 'Thread' then Exclude". Considering the fact that simply opening my gmail in a web browser results in the creation/exiting of 565 threads, it would be a lot of work to sift through all that to find what you want.
 

Ketchup

Elite Member
Sep 1, 2002
14,545
236
106
Well, your Intel chip has 2 threads per core, so 40-60 percent utilization is just about perfect for a single-thread app. Your AMD chip is 1:1, but, honestly, it's probably just bad at it. Without knowing anything about your app, a lot of it could come down to raw speed: 2.4 GHz on AMD vs 3.3 with Intel. Not to mention the Intel chip has almost double the cache, which certainly doesn't hurt.