of load balancing..

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
one of the things that is bugging me, of late, is the seemingly haphazard way in which computers balance out their load balancing..i.e. poorly!

i just had an excellent 2 hour sesh on bc2 (60 kills, 28pins :D) and left my task manager running. if you check the image link you will see the right-hand most core (core0? my guess is yes?) being spanked more than the rest?

http://img508.imageshack.us/img508/5315/bc2cpu.jpg

(i linked rather then embedded, as it was a bit wide..)

even the second core in from the left (core1?) has higher usage then the right two?

so clearly load balancing is not working right in bc2!


but if we look at cod, we see excellent load balancing?:

http://img851.imageshack.us/img851/1716/codtm.gif

it's almost perfect!


i think physically, we have gained multi (4) core processing (..and more), but the os's and certainly apps are still miles behind?
 
Last edited:

podspi

Golden Member
Jan 11, 2011
1,965
71
91
Keep in mind the different threads may be doing entirely different things. There is no easy way to separate out any arbitrary operation into an arbitrary amount of threads.

For games, I believe it is possible to separate out rendering, physics, sound, input, but the demands each of these threads have may be wildly different. I do agree that Window's thread scheduling seems ... lacking. It can't be great for performance to bounce threads around like it does, especially with things like turbo and non-shared caches.
 

Ben90

Platinum Member
Jun 14, 2009
2,866
3
0
i think physically, we have gained multi (4) core processing (..and more), but the os's and certainly apps are still miles behind?
Your operating system is a decade old. Its like being on DOS and saying that computers have shitty GUI's.
 

Morg.

Senior member
Mar 18, 2011
242
0
0
Yes, most of the stuff is not threaded correctly - but then it's not even coded properly so hey, what do you care ?

Also, it's not the problem of your OS, its an application issue (as obviously your tests show).

Windows doesn't do much shit well, but hey, have you seen 4 threads with two cores full and two cores empty ? NO . so stop complaining and talking about random stuff.

On the other hand, if it does find it funny to start a thread on a core and move it on another, that looks quite fishy -- and also .. unlikely but hey I didn't check.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
one of the things that is bugging me, of late, is the seemingly haphazard way in which computers balance out their load balancing..i.e. poorly!

Question is - what is the performance detriment of this fact?

And are you personally willing to pay more for a product that required longer development so as to be coded for better load-balancing?

And are there enough people who would be willing to pay more for the product so as to justify the extra development expense to make it so?

Its not like software product managers start their project's development with the plan being "let's just totally screw up the load-balancing on this one, for the fun of it!".

In the world of business there is only one reason why a project manager would elect to invest more of their development budget into load-balancing and that is if it is necessary to do so in order for the product to make the kind of ROI that the company is targeting.

Why does your OS and the game BC2 have such lopsided load-balance? Because the guys at BC2 who developed the game came to the conclusion that there was little to be gained by investing more money in making in any different.

They probably had their reasons, data-driven reasons, for making this business decision.
 

Concillian

Diamond Member
May 26, 2004
3,751
8
81
i think physically, we have gained multi (4) core processing (..and more), but the os's and certainly apps are still miles behind?

Programming to take advantage of the hardware has always been, and probably will always be, the primary limitation due to labor being expensive.

Every app has to be individually programmed with load balancing in mind. This takes time. Not every program takes that time.
 

podspi

Golden Member
Jan 11, 2011
1,965
71
91
What about applications that are single-threaded, but still bounce around? Isn't that the Window's scheduler doing its thing?

Somewhere else on this board somebody (I forget exactly who) said it was to prevent "hot spots" on the CPU, but couldn't the performance detriment be pretty high? Or is it L3 cache to the rescue?
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,346
1,525
136
What about applications that are single-threaded, but still bounce around? Isn't that the Window's scheduler doing its thing?*

Yes. You can often gain a lot of performance simply by pinning it to a cpu.

Somewhere else on this board somebody (I forget exactly who) said it was to prevent "hot spots" on the CPU

Oh that's just bullshit.
 

podspi

Golden Member
Jan 11, 2011
1,965
71
91
So does anyone know why Windows does this? As I'm sure many of you know, the current implementation of TurboCORE is almost useless w/out setting affinity manually :D
 

VirtualLarry

No Lifer
Aug 25, 2001
56,327
10,035
126
It's because of HyperThreading. W2K used to "pin" cores, and only bump them as necessary. Unfortunately, when the P4 with HT came out, that resulted in less than idea performance, as tasks would often get stuck pinned to HT cores, when they could have been "bounced" to full cores. So now the current scheduler prefers to "bounce" cores around as much as possible.
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
What about applications that are single-threaded, but still bounce around? Isn't that the Window's scheduler doing its thing?

It is true, we have explored the real-world performance ramifications of "thread-thrashing" in the Linpack thread.

http://forums.anandtech.com/showpost.php?p=29008096&postcount=97

http://forums.anandtech.com/showpost.php?p=29008307&postcount=100

That guy saw around 16% performance improvement when he locked the thread affinity versus letting it thrash around across the cores.

Somewhere else on this board somebody (I forget exactly who) said it was to prevent "hot spots" on the CPU, but couldn't the performance detriment be pretty high? Or is it L3 cache to the rescue?

Yep, that topic has also been floated around and we generated some empirical data that speaks to the possibility that this might be the case, but nevertheless it is not true from fundamental IC validation and lifetime-reliability reasons.

See:
http://forums.anandtech.com/showthread.php?p=27287579#post27287579

and:
http://forums.anandtech.com/showthread.php?p=26903339#post26903339
 

podspi

Golden Member
Jan 11, 2011
1,965
71
91
IDC, very interesting, thanks!

I'm hopeful that Microsoft fixes this in the future. Honestly, I would like a lot more control over everything in the future. Imagine a dual-module BD APU in a laptop that allows you to power down an entire module when the laptop is unplugged. That sort of thing just isn't possible (afaik) today.

IIRC, isn't this thread migration issue also what caused CnQ to be essentially useless originally? CnQ couldn't adjust quickly enough and so Windows would continually move threads to downclocked cores. I wonder how AMD fixed that in Thuban...

Edit: I have to admit, whenever I do something that is singlethreaded, I do set the affinity, because that's the only way I can see TurboCORE kicking in. I don't know if it does otherwise and I just can't see it, but I figure it can't hurt.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,346
1,525
136
I'd like to point out that this is one of the reasons the original Phenom launch failed so badly. The most innovative power saving feature in Phenom was the ability to dial down any cores individually. This sounds like a good idea, if machine is mostly idle, keep one core up, writeback the caches on the rest and turn them all off. But because Windows feels like no thread should stay resident on a core more than a couple of seconds, you are saving no power and you are constantly suffering latency on bringing cores up from a low pstate.
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
Keep in mind the different threads may be doing entirely different things. There is no easy way to separate out any arbitrary operation into an arbitrary amount of threads.

For games, I believe it is possible to separate out rendering, physics, sound, input, but the demands each of these threads have may be wildly different. I do agree that Window's thread scheduling seems ... lacking. It can't be great for performance to bounce threads around like it does, especially with things like turbo and non-shared caches.
ok thx :cool:
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
Your operating system is a decade old. Its like being on DOS and saying that computers have shitty GUI's.
erm, i was talking about the game, the app..what the os does IS valid for sure..but, correct me if i'm wrong here, it's the app that starts threads?
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
Yes, most of the stuff is not threaded correctly - but then it's not even coded properly so hey, what do you care ?

Also, it's not the problem of your OS, its an application issue (as obviously your tests show).

Windows doesn't do much shit well, but hey, have you seen 4 threads with two cores full and two cores empty ? NO . so stop complaining and talking about random stuff.

On the other hand, if it does find it funny to start a thread on a core and move it on another, that looks quite fishy -- and also .. unlikely but hey I didn't check.
strange post :confused:
 

podspi

Golden Member
Jan 11, 2011
1,965
71
91
erm, i was talking about the game, the app..what the os does IS valid for sure..but, correct me if i'm wrong here, it's the app that starts threads?


The app starts the threads, but in most cases the OS manages them
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
Question is - what is the performance detriment of this fact?
well, if you notice one core was practically bumping up at the top?

surely if one core is at 100% that would impact on what could be achieved with properly balanced loading?

And are you personally willing to pay more for a product that required longer development so as to be coded for better load-balancing?
why isn't it normal?

why don't ALL apps multi-thread properly?

this is kinda the point of this thread!

And are there enough people who would be willing to pay more for the product so as to justify the extra development expense to make it so?
if it means better app performance, why not?

why isn't this something that can be learnt and adopted as default.

i doubt that we are going to get a two-tier app market, where some apps are coded quickly, so they are cheap, but thread badly and others do the multi-threading properly but costs more? that would be mad..

Its not like software product managers start their project's development with the plan being "let's just totally screw up the load-balancing on this one, for the fun of it!".
well, imo, we are in a transitional period where prior to this even-handed multi-threading was simply not important with 1-core or 2 cores, but now multi-cores are becoming normal it will be a part of development..

In the world of business there is only one reason why a project manager would elect to invest more of their development budget into load-balancing and that is if it is necessary to do so in order for the product to make the kind of ROI that the company is targeting.
i think proper thread balancing will be a feature of newer apps. one day it will simpy be expected and ones that do not will be pointed out for their 'primitiveness'..

Why does your OS and the game BC2 have such lopsided load-balance? Because the guys at BC2 who developed the game came to the conclusion that there was little to be gained by investing more money in making in any different.
sure..that doesn't mean to say that it will not be in the future..

bc2 was a console port anyway..idk about xbox/ps3 cpus but are they multi-core?

They probably had their reasons, data-driven reasons, for making this business decision.
let's see how bf3 performs on this front..

what's the betting it will have better thread balancing than bc2???
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
Programming to take advantage of the hardware has always been, and probably will always be, the primary limitation due to labor being expensive.

Every app has to be individually programmed with load balancing in mind. This takes time. Not every program takes that time.
sure, this is the way it has been, it's not been an issue until recently..but why have all these lovely cores and use them badly?!

imo, this will be a part of the development of new apps..
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
What about applications that are single-threaded, but still bounce around? Isn't that the Window's scheduler doing its thing?

Somewhere else on this board somebody (I forget exactly who) said it was to prevent "hot spots" on the CPU, but couldn't the performance detriment be pretty high? Or is it L3 cache to the rescue?
mm, as far as the os is concerned i could see that being something that someone could implement, but this is about apps, ofc os thread balancing is a valid debate :)
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
It's because of HyperThreading. W2K used to "pin" cores, and only bump them as necessary. Unfortunately, when the P4 with HT came out, that resulted in less than idea performance, as tasks would often get stuck pinned to HT cores, when they could have been "bounced" to full cores. So now the current scheduler prefers to "bounce" cores around as much as possible.
interesting :)
 

ZPIGS!

Member
Aug 21, 2010
62
0
0
eupeople.net
I'd like to point out that this is one of the reasons the original Phenom launch failed so badly. The most innovative power saving feature in Phenom was the ability to dial down any cores individually. This sounds like a good idea, if machine is mostly idle, keep one core up, writeback the caches on the rest and turn them all off. But because Windows feels like no thread should stay resident on a core more than a couple of seconds, you are saving no power and you are constantly suffering latency on bringing cores up from a low pstate.
interesting thx :)