Looking at Linux-based clustering. Am feeling a bit overwhelmed....

Elledan

Banned
Jul 24, 2000
8,880
0
0
SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?

The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.

Thank you for any help you can give me!
 

SCSIRAID

Senior member
May 18, 2001
579
0
0
What are you trying to accomplish with your cluster? Failover/Redundancy? What is your external storage solution? FC?
 

Elledan

Banned
Jul 24, 2000
8,880
0
0


<< What are you trying to accomplish with your cluster? Failover/Redundancy? What is your external storage solution? FC? >>


I'll use the cluster to run simulations with. I want to use a cluser because it's much more scalable (and cheaper ;) ) than a single system, especially because some simulations require insane amounts of RAM (3+ GB).

From what I understand, with most clusters if one node fails, it's simply removed from the cluster and new nodes can be added during runtime. This makes the cluster quite redundant.
 

Elledan

Banned
Jul 24, 2000
8,880
0
0
Don't tell me no one here has ever build a cluster or even knows anything about it...
 

Silver222

Member
Jun 26, 2001
77
0
0
You do realize the forum this is in, don't you?

You might as well post on a hot rod enthusiasts forum about building rocket avionics. Good luck though.
 

Elledan

Banned
Jul 24, 2000
8,880
0
0


<< You do realize the forum this is in, don't you? >>

Okay, then help me choose between Highly Technical and Operating Systems.
 

r0tt3n1

Golden Member
Oct 16, 2001
1,086
0
0
If you havent done so , check the Linux Documentation Project(linuxdoc.org), they have an extensive How-To for Boewolf.... Do a google search for boewolf or clustering... hth
 

Elledan

Banned
Jul 24, 2000
8,880
0
0
Yeah, I've already looked through the documentation you mentioned and done a search with Google. That's the exact reason why I created this thread :)
 

Armitage

Banned
Feb 23, 2001
8,086
0
0


<< SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?

The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.

Thank you for any help you can give me!
>>



I do quite a bit of work on clusters & have built a few, so I'll take a shot at it...

SSI: Not familiar with this.
HA: High Availability. I think this is more geared toward servers then computational clusters.
Heartbeat: I'm not very familiar with this, but I think it is basically concerned with very timely notification of system failure.

MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool.

Beowulf: This is the name ussually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:

The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).

The other way is to actually write distributed programs using something like PVM or MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds.

Failsafe & DLM: I'm not familiar with these.

As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV

You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps
 

Elledan

Banned
Jul 24, 2000
8,880
0
0


<<

<< SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?

The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.

Thank you for any help you can give me!
>>



I do quite a bit of work on clusters & have built a few, so I'll take a shot at it...

SSI: Not familiar with this.
>>


Single System Image. Basically it makes the whole cluster look as much like a single system as possible.


<< HA: High Availability. I think this is more geared toward servers then computational clusters. >>

yes, it's meant to make a cluster as reliable as possible, e.g., if a node fails it shouldn't bring down the whole cluster.


<< Heartbeat: I'm not very familiar with this, but I think it is basically concerned with very timely notification of system failure. >>

That's correct.



<< MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool. >>

yeah, it's one of the cluster-types I've been looking at. I don't think that I'll use it, though.



<< Beowulf: This is the name ussually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:

The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).

The other way is to actually write distributed programs using something like PVM or MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds.
>>

Sounds ugly enough :) I'll have a look at these links you provided.



<< Failsafe & DLM: I'm not familiar with these. >>

Failsafe: similar to Heartbeat. DLM is unfamiliar to me as well.



<< As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV
>>

Unfortunately I'm on a very tight budget, so I'll have to use whatever systems I can get :(



<< You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps
>>

The simulations I'll be running on the cluster will be very suitable for parallel processing (many similar threads can be created). With one simulation, for example, I can create a very large number of threads which can work independently from each other and require about the same time to finish.

Thanks for your help! :)
 

Armitage

Banned
Feb 23, 2001
8,086
0
0


<<

<< HA: High Availability. I think this is more geared toward servers then computational clusters. >>


yes, it's meant to make a cluster as reliable as possible, e.g., if a node fails it shouldn't bring down the whole cluster.
>>



In a Beowulf cluster, the nodes are already mostly independent, but you'll have to write your software such that it can tolerate the untimely demise of some of the slave processes.



<<

<< MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool. >>


yeah, it's one of the cluster-types I've been looking at. I don't think that I'll use it, though.
>>



I agree. It certainly looks more complicated to set up then a typical beowulf.



<<

<< Beowulf: This is the name usually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:

The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).

The other way is to actually write distributed programs using something like PVM or MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds.
>>


Sounds ugly enough :) I'll have a look at these links you provided.
>>



I would start with PVM. I've looked at both, and find PVM easier to use.
The MIT Press book on PVM is very good (PVM, Geist, Bequelin, Dongarra ...). It's availble in HTML format from the PVM site. Maybe postscript also?
Also, the whole PVM API is documented in man-pages as well.

A few tips:
Use macros or constants for message tags. Define them in a single header file that is included by both the master and slave programs. It makes it much easier to keep track of things

Don't worry about the higher level PVM features (groups, mailboxes, multiple buffers, etc.) to get started. Just work on packing, sending, receiving and unpacking simple messages.






<<

<< As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV
>>


Unfortunately I'm on a very tight budget, so I'll have to use whatever systems I can get :(
>>


Yea, I've been there! It's just a thought if you have any budget at all, you might be better off ebaying those P200 for $100 each and getting an athlon(s).



<<

<< You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps
>>


The simulations I'll be running on the cluster will be very suitable for parallel processing (many similar threads can be created). With one simulation, for example, I can create a very large number of threads which can work independently from each other and require about the same time to finish.
>>



Cool. Can you talk about what you're doing?
My stuff is generally genetic algorithm based optimization of various astrodynamic problems.

 

Elledan

Banned
Jul 24, 2000
8,880
0
0


<< I would start with PVM. I've looked at both, and find PVM easier to use.
The MIT Press book on PVM is very good (PVM, Geist, Bequelin, Dongarra ...). It's availble in HTML format from the PVM site. Maybe postscript also?
Also, the whole PVM API is documented in man-pages as well.

A few tips:
Use macros or constants for message tags. Define them in a single header file that is included by both the master and slave programs. It makes it much easier to keep track of things

Don't worry about the higher level PVM features (groups, mailboxes, multiple buffers, etc.) to get started. Just work on packing, sending, receiving and unpacking simple messages.
>>


Okay, I'll check PVM out first, then.

BTW, I noticed that the links don't work. You've to remove the superfluous 'https://' in front of them.



<< Yea, I've been there! It's just a thought if you have any budget at all, you might be better off ebaying those P200 for $100 each and getting an athlon(s). >>

Problem is that I'm not located in the US, and most people who use ebay are from the US.

I think I'll manage, though. I'll just use 20+ P200 PC"s and with that have a faster system than the average geek :p



<<
Cool. Can you talk about what you're doing?
My stuff is generally genetic algorithm based optimization of various astrodynamic problems.
>>


I'm currently working on a simulation of the upper part of the CNS (Central Nervous System). The brain, thus. The program will simulate the development of a biological neural network under the influence of impulses from its 'senses', which actually exist only in a virtual environment, for which reason I'll be using some kind of game, i.e., a virtual environment in which the 'creature' can develop itself.

Think of 'The Matrix' and you know exactly what I mean :)

I've some fancy theories on the CNS which I would like to test using this cluster.

Regarding the parallel part of the simulation, I'm using 'time-frames', of which each time-frame is a moment in time, in which certain events take place. During each time-frame neurons fire and evolution algorithms are triggered. Using parallel processing I can process these actions much faster than with a single, non-parallel system.

Feel free to ask more questions :)
 

Armitage

Banned
Feb 23, 2001
8,086
0
0


<< BTW, I noticed that the links don't work. You've to remove the superfluous 'https://' in front of them. >>



Oops!
I tried editing them, but it doesn't seem to take!?
Well, as you said, just remove the extra "https://" from the links.



<< Cool. Can you talk about what you're doing?
My stuff is generally genetic algorithm based optimization of various astrodynamic problems.
>>




<< I'm currently working on a simulation of the upper part of the CNS (Central Nervous System). The brain, thus. The program will simulate the development of a biological neural network under the influence of impulses from its 'senses', which actually exist only in a virtual environment, for which reason I'll be using some kind of game, i.e., a virtual environment in which the 'creature' can develop itself.

Think of 'The Matrix' and you know exactly what I mean :)

I've some fancy theories on the CNS which I would like to test using this cluster.

Regarding the parallel part of the simulation, I'm using 'time-frames', of which each time-frame is a moment in time, in which certain events take place. During each time-frame neurons fire and evolution algorithms are triggered. Using parallel processing I can process these actions much faster than with a single, non-parallel system.

Feel free to ask more questions :)
>>



Sounds very interesting!
Are your time-frames really independent such that they could be evaluated in parallel though?
It would seem to me that the evaluation in one time frame would depend on the end state of the previous time frame.
 

Elledan

Banned
Jul 24, 2000
8,880
0
0


<<
Oops!
I tried editing them, but it doesn't seem to take!?
Well, as you said, just remove the extra "https://" from the links.
>>

Well, doesn't really matter, I can still extract the URL from them :)



<< Sounds very interesting!
Are your time-frames really independent such that they could be evaluated in parallel though?
It would seem to me that the evaluation in one time frame would depend on the end state of the previous time frame.
>>


Heh, apparently I'm even worse at explaining things than I thought :p

What I tried to say was that all events which take place during one time-frame could be distributed over all nodes. Only very little communication is necessary between the threads.

Actually, I've to learn to 'think' parallel, since so far I've only programmed for systems with a single CPU. From what I've heard it's quite difficult, so I'll need every help I can get :)