<<
SSI, HA, Heartbeat, MOSIX, Beowulf, Failsafe, DLM... can anyone help me in understanding it?
The nodes will consist out of a wide variety of systems, from 486 PC's to P200 and better. I'll use Linux because I'm familiar with this OS and it appears to be quite suitable for clustering.
Thank you for any help you can give me! >>
I do quite a bit of work on clusters & have built a few, so I'll take a shot at it...
SSI: Not familiar with this.
HA: High Availability. I think this is more geared toward servers then computational clusters.
Heartbeat: I'm not very familiar with this, but I think it is basically concerned with very timely notification of system failure.
MOSIX: This is a project that works toward having an operating system span several systems on a network. It can automatically distribute/move tasks (and threads?) between nodes on your cluster. I haven't worked with it personally, but it looks cool.
Beowulf: This is the name ussually applied to computational clusters. It is basically an array of stripped down workstations running linux and connected by various flavors of network. There are several ways to use it:
The simplest is to run independent jobs on the various nodes in parallel. This is useful for things like running monte carlo analysis, etc. There are tools to help automate this (see PBS: Portable Batch Scheduler, etc).
The other way is to actually write distributed programs using something like
PVM or
MPI for passing data between the parts of the simulation running on different nodes of the cluster.
This is what I do. It's not as ugly as it sounds.
Failsafe & DLM: I'm not familiar with these.
As for 486 or P200.
This will be useful for learning, but you ought to consider something a bit more modern. Until a few weeks ago I was using a 20 CPU cluster of P200s. By the benchmarks, I could expect to match this performance on a dual athlon with considerably less complication.
My sim is pure floating point though. YMMV
You also need to take a close look at the program you want to use on the cluster. Not everything is amenable to cluster programming. What you want to look for is high "granularity". This means that there are big chunks of the program (time-wise) that can be done independently with little communication with the rest of the cluster. ie. the ratio between computation & communication should be high, and the communication required should be simple.
Hope this helps