Looking at Linux-based clustering. Am feeling a bit overwhelmed....

Elledan · Jan 1, 2002

SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?

The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.

Thank you for any help you can give me!

SCSIRAID · Jan 1, 2002

What are you trying to accomplish with your cluster? Failover/Redundancy? What is your external storage solution? FC?

Elledan · Jan 1, 2002

<< What are you trying to accomplish with your cluster? Failover/Redundancy? What is your external storage solution? FC? >>

I'll use the cluster to run simulations with. I want to use a cluser because it's much more scalable (and cheaper

) than a single system, especially because some simulations require insane amounts of RAM (3+ GB).

From what I understand, with most clusters if one node fails, it's simply removed from the cluster and new nodes can be added during runtime. This makes the cluster quite redundant.

Elledan · Jan 1, 2002

Elledan · Jan 2, 2002

Don't tell me no one here has ever build a cluster or even knows anything about it...

Silver222 · Jan 2, 2002

You do realize the forum this is in, don't you?

You might as well post on a hot rod enthusiasts forum about building rocket avionics. Good luck though.

Elledan · Jan 2, 2002

<< You do realize the forum this is in, don't you? >>

Okay, then help me choose between Highly Technical and Operating Systems.

CTho9305 · Jan 2, 2002

well, it (beowulf) is not technically an operating system... so HT

Elledan · Jan 2, 2002

I asked the mod to move this topic to HT. Now let's see how long it takes

r0tt3n1 · Jan 2, 2002

If you havent done so , check the Linux Documentation Project(linuxdoc.org), they have an extensive How-To for Boewolf.... Do a google search for boewolf or clustering... hth

Elledan · Jan 2, 2002

Yeah, I've already looked through the documentation you mentioned and done a search with Google. That's the exact reason why I created this thread

Armitage · Jan 2, 2002

<< SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?

The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.

Thank you for any help you can give me! >>

I do quite a bit of work on clusters & have built a few, so I'll take a shot at it...

SSI: Not familiar with this.
HA: High Availability. I think this is more geared toward servers then computational clusters.
Heartbeat: I'm not very familiar with this, but I think it is basically concerned with very timely notification of system failure.

MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool.

Beowulf: This is the name ussually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:

The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).

The other way is to actually write distributed programs using something like PVM or MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds.

Failsafe & DLM: I'm not familiar with these.

As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV

You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps

Elledan · Jan 2, 2002

<<

<< SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?

The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.

Thank you for any help you can give me! >>

I do quite a bit of work on clusters & have built a few, so I'll take a shot at it...

SSI: Not familiar with this. >>

Single System Image. Basically it makes the whole cluster look as much like a single system as possible.

<< HA: High Availability. I think this is more geared toward servers then computational clusters. >>

yes, it's meant to make a cluster as reliable as possible, e.g., if a node fails it shouldn't bring down the whole cluster.

<< Heartbeat: I'm not very familiar with this, but I think it is basically concerned with very timely notification of system failure. >>

That's correct.

<< MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool. >>

yeah, it's one of the cluster-types I've been looking at. I don't think that I'll use it, though.

<< Beowulf: This is the name ussually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:

The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).

The other way is to actually write distributed programs using something like PVM or MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds. >>

Sounds ugly enough

I'll have a look at these links you provided.

<< Failsafe & DLM: I'm not familiar with these. >>

Failsafe: similar to Heartbeat. DLM is unfamiliar to me as well.

<< As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV >>

Unfortunately I'm on a very tight budget, so I'll have to use whatever systems I can get

<< You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps >>

The simulations I'll be running on the cluster will be very suitable for parallel processing (many similar threads can be created). With one simulation, for example, I can create a very large number of threads which can work independently from each other and require about the same time to finish.

Thanks for your help!

Armitage · Jan 2, 2002

<<

<< HA: High Availability. I think this is more geared toward servers then computational clusters. >>

yes, it's meant to make a cluster as reliable as possible, e.g., if a node fails it shouldn't bring down the whole cluster. >>

In a Beowulf cluster, the nodes are already mostly independent, but you'll have to write your software such that it can tolerate the untimely demise of some of the slave processes.

<<

<< MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool. >>

yeah, it's one of the cluster-types I've been looking at. I don't think that I'll use it, though. >>

I agree. It certainly looks more complicated to set up then a typical beowulf.

<<

<< Beowulf: This is the name usually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:

The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).

The other way is to actually write distributed programs using something like PVM or MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds. >>

Sounds ugly enough I'll have a look at these links you provided. >>

I would start with PVM. I've looked at both, and find PVM easier to use.
The MIT Press book on PVM is very good (PVM, Geist, Bequelin, Dongarra ...). It's availble in HTML format from the PVM site. Maybe postscript also?
Also, the whole PVM API is documented in man-pages as well.

A few tips:
Use macros or constants for message tags. Define them in a single header file that is included by both the master and slave programs. It makes it much easier to keep track of things

Don't worry about the higher level PVM features (groups, mailboxes, multiple buffers, etc.) to get started. Just work on packing, sending, receiving and unpacking simple messages.

<<

<< As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV >>

Unfortunately I'm on a very tight budget, so I'll have to use whatever systems I can get >>

Yea, I've been there! It's just a thought if you have any budget at all, you might be better off ebaying those P200 for $100 each and getting an athlon(s).

<<

<< You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps >>

The simulations I'll be running on the cluster will be very suitable for parallel processing (many similar threads can be created). With one simulation, for example, I can create a very large number of threads which can work independently from each other and require about the same time to finish. >>

Cool. Can you talk about what you're doing?
My stuff is generally genetic algorithm based optimization of various astrodynamic problems.

Elledan · Jan 2, 2002

<< I would start with PVM. I've looked at both, and find PVM easier to use.
The MIT Press book on PVM is very good (PVM, Geist, Bequelin, Dongarra ...). It's availble in HTML format from the PVM site. Maybe postscript also?
Also, the whole PVM API is documented in man-pages as well.

A few tips:
Use macros or constants for message tags. Define them in a single header file that is included by both the master and slave programs. It makes it much easier to keep track of things

Don't worry about the higher level PVM features (groups, mailboxes, multiple buffers, etc.) to get started. Just work on packing, sending, receiving and unpacking simple messages. >>

Okay, I'll check PVM out first, then.

BTW, I noticed that the links don't work. You've to remove the superfluous 'https://' in front of them.

<< Yea, I've been there! It's just a thought if you have any budget at all, you might be better off ebaying those P200 for $100 each and getting an athlon(s). >>

Problem is that I'm not located in the US, and most people who use ebay are from the US.

I think I'll manage, though. I'll just use 20+ P200 PC"s and with that have a faster system than the average geek

<<
Cool. Can you talk about what you're doing?
My stuff is generally genetic algorithm based optimization of various astrodynamic problems. >>

I'm currently working on a simulation of the upper part of the CNS (Central Nervous System). The brain, thus. The program will simulate the development of a biological neural network under the influence of impulses from its 'senses', which actually exist only in a virtual environment, for which reason I'll be using some kind of game, i.e., a virtual environment in which the 'creature' can develop itself.

Think of 'The Matrix' and you know exactly what I mean

I've some fancy theories on the CNS which I would like to test using this cluster.

Regarding the parallel part of the simulation, I'm using 'time-frames', of which each time-frame is a moment in time, in which certain events take place. During each time-frame neurons fire and evolution algorithms are triggered. Using parallel processing I can process these actions much faster than with a single, non-parallel system.

Feel free to ask more questions

Armitage · Jan 2, 2002

<< BTW, I noticed that the links don't work. You've to remove the superfluous 'https://' in front of them. >>

Oops!
I tried editing them, but it doesn't seem to take!?
Well, as you said, just remove the extra "https://" from the links.

<< Cool. Can you talk about what you're doing?
My stuff is generally genetic algorithm based optimization of various astrodynamic problems. >>

<< I'm currently working on a simulation of the upper part of the CNS (Central Nervous System). The brain, thus. The program will simulate the development of a biological neural network under the influence of impulses from its 'senses', which actually exist only in a virtual environment, for which reason I'll be using some kind of game, i.e., a virtual environment in which the 'creature' can develop itself.

Think of 'The Matrix' and you know exactly what I mean

I've some fancy theories on the CNS which I would like to test using this cluster.

Regarding the parallel part of the simulation, I'm using 'time-frames', of which each time-frame is a moment in time, in which certain events take place. During each time-frame neurons fire and evolution algorithms are triggered. Using parallel processing I can process these actions much faster than with a single, non-parallel system.

Feel free to ask more questions >>

Sounds very interesting!
Are your time-frames really independent such that they could be evaluated in parallel though?
It would seem to me that the evaluation in one time frame would depend on the end state of the previous time frame.

Elledan · Jan 2, 2002

<<
Oops!
I tried editing them, but it doesn't seem to take!?
Well, as you said, just remove the extra "https://" from the links. >>

Well, doesn't really matter, I can still extract the URL from them

<< Sounds very interesting!
Are your time-frames really independent such that they could be evaluated in parallel though?
It would seem to me that the evaluation in one time frame would depend on the end state of the previous time frame. >>

Heh, apparently I'm even worse at explaining things than I thought

What I tried to say was that all events which take place during one time-frame could be distributed over all nodes. Only very little communication is necessary between the threads.

Actually, I've to learn to 'think' parallel, since so far I've only programmed for systems with a single CPU. From what I've heard it's quite difficult, so I'll need every help I can get

Search

Looking at Linux-based clustering. Am feeling a bit overwhelmed....

Elledan

Banned

SCSIRAID

Senior member

Elledan

Banned

Elledan

Banned

Elledan

Banned

Silver222

Member

Elledan

Banned

CTho9305

Elite Member

Elledan

Banned

r0tt3n1

Golden Member

Elledan

Banned

Armitage

Banned

Elledan

Banned

Armitage

Banned

Elledan

Banned

Armitage

Banned

Elledan

Banned

TRENDING THREADS