Originally posted by: Argo
Originally posted by: exdeath
Self task each work slot and have them pull work from the cluster instead of being pushed work. No master needed, no distribution scheme needed, no black hole problem, and scalable. Each slot just stays in an endless loop and just keeps looking for work to do until until there are no more jobs left. The real work here then is balancing the local queues ahead of the local slot availability to minimize network latency and slot idle time when the local queue empties and network broadcasts go out sniffing for more work.
Thats the most popular method for multi-threading the SPEs on Cell, it should extend over a cluster pretty well.
Interesting idea. Would you have a shared common queue that is maintained across cluster and have each work slot poll it? Or alternatively have a member broadcast that it has messages in the queue and then all work slots poll that member until it runs out of work?
I'm leaning towards the second.
Up to you to decide. My main idea here is self tasking. Keeping the local queues filled will be the meat of your balancing algorithm, with the goal being to be that the local threads should never have to broadcast over the network because of an empty queue. They should always find something in their local queues due to some house keeping that occurs in the idle loop of each 'slot' before or after a job has started/completed. The idea being that the first slot to idle will lock the queue and pull in more jobs from other machines based on workload stats. It can be assumed that the other slots on that machine will soon be going idle too in just a few moments when they complete their current tasks, and if the first one to go idle can lock queue, import jobs from another, unlock queue. The other slots will then be able to immediately find local work and not have to repeat those steps (they will be blocked briefly if they happen to go idle before the first one finishes filling the queue).
while(1)
{
nextjob().execute();
housekeeping(); // grab jobs from another queue, go into sleep state of all jobs are complete, etc
}
My thought was a distributed job list where no jobs are duplicated anywhere so each machines local list is unique and exclusive, but you can move jobs between machines before they go into "executing" status.
The idea is that each unit machine balances itself by requesting work, so there is no master. First check local queues, and then if local queue is empty, take advantage of that 'slots' momentary free time to going into a 'housekeeping' mode where it polls the status of all participant's queues and based on some performance stats (for example, choose jobs from the queue of the machine that responded with the deepest queue and move n number of jobs from that queue). The machine that is doing this will automatically know it is the least busiest and best candidate to take on a large workload from another simply by nature of it being the first to go into this 'housekeeping' mode in the first place! This is the automatic load balancing feature that keeps everything busy. No guessing or predicting which machine should take more work. No burdening a single machine that has to decide when to devote time to balancing. The machine that is least busy at any given time will elect itself to balance, making even the balancing algorithm a distributed 'job' in and of itself. This is where self tasking has it's strength, that the machine that is least busy will automatically take on work from a busy machine without any master, etc. Having no master also makes it more generic and scalable.
The pitfall is you have to be careful here though about redundant network traffic, as newly idle threads on different machines will go ping ponging jobs around the network. Instead of a black hole problem you'll have the opposite; machines being selfish and trying to horde jobs and pulling them in every which direction. Just include some kind of stat in your polling/balancing algorithm, such as a time stamp on the jobs that is reset when it goes to a new machine. Then in addition to using the machine with the most jobs in queue, also look at the average time stamp of the jobs. A machine with a smaller queue but a higher time stamp indicates few jobs but timely jobs that have been sitting on that machine for a while and might take priority over a machine with a larger queue filled with jobs that just got moved there.
I'm actually making up a lot of this on the fly, and thinking about how I could extend what I know about multi core programming to also work over a network. Thinking aloud
😉