Need Tips about building an experimental cluster .

The Linuxator · Oct 5, 2005

Ok since I was a kid and I have been wondering about clusters, what would I need to build a cluster, I bet it would involve alot of CAT cabling, and a gigabit network, a varient of Linux and what not but what is the missing info here ?
And what benefits come from building a cheapo cluster from old PCs lying around here and there ?

Mday · Oct 6, 2005

Originally posted by: The Linuxator
Ok since I was a kid and I have been wondering about clusters, what would I need to build a cluster, I bet it would involve alot of CAT cabling, and a gigabit network, a varient of Linux and what not but what is the missing info here ?

talk the the DC crowd on the DC forum. they'll learn you something good

Future Shock · Oct 6, 2005

Actually, the DC crowd here is really focused on distributed client computing, not clusters (at least that's how I read 95% of their posts). The best site is probably a site like clustermonkey.net, which is an offshoot of clusterworld.com (which got sold to a linux magazine a few months ago and has since deceased).

To briefly answer your question: there are a LOT of reasons to build a cluster, and the details of how you fit it out very much depend upon it's use. Is it for scientific computing (High Performance Computing, or HPC) use? Then you will care a lot about inter-node connectivity (using Infiband or GigaEthernet) to get the nodes communicating between themselves as fast as possible, as many scientific needs have to pass partial results sets between nodes. Think about weather forecasting - the results of one "cell" of the weathermap obviously affect the results in the neighboring areas, and as the forecast is run each node has to pass that data on to it's surrounding nodes at each step in the simulation. So high-speed node connects are important for a lot of those scientific uses.

If you are using your cluster for a database server, the emphasis is different. Internode communication isn't that important (GigaE is more than enough usually, as the only information that is transmitted between nodes is usually data lock information), but connectivity from the nodes to the disk arrays is critical. In most database clusters, either high-performance RAID arrays are directly attached to the individual nodes, or the entire cluster is wired to a SAN. The former gives higher performance but is a pain to administer, the latter can have a SAN-connectivity bottleneck but is easier to build and administer.

Some clusters may go for iSCSI to wire the nodes to the NAS/SAN, but this is usually restricted to non-disk intensive applications (such as HPC), due to the overhead inherent with iSCSI.

If rolling your own cluster, again, depending upon what you want to do with it, I might suggest a linux variant that runs cluster apps well. Many commercial cluster apps (such as Oracle 10g) require Red Hat or Suse to work out of the box, as they are intended for commercial sites, but I know of people that have had success running CentOS (a freeware Red Hat distro) as a basis for Oracle 10g installs. Fedora has just enough differences that many RH-specific applications are a pain to get working, due to different directory structures, etc.

Lastly, from a software side, I suggest you take a look at globus.org, a commercial/academic effort to standardize grid/cluster middleware. Interesting reading. Also check out the hardware vendor's sites - I know hp.com has a lot of VERY detailed information as to how to configure their own kit on their site - check out their line of blade processors (BL-series) and look for the technical white papers for download.

If you have specific questions, PM me and I can try to at least point you in the right direction...

Future Shock

The Linuxator · Oct 6, 2005

Thank you for the detailed info Future Shock, and to answer some of your questions, I am considering building a cluster more for the learning part of it rather than the application part. I have seen some intresting hardware for such an application link I was considering that maybe about 5-10 of those would form an intresting cluster, If you look at the reviews on the site you would see that actually one of the buyers there had the same idea as I did, only that he actually did build a cluster with them. this is a nice peae of hardware power consumption is a big plus when considering that you might run more than 10 at the same time of these babies. then again they are VIA C3 procesors they aren't performance monsters but hey they dioget the job done

itachi · Oct 7, 2005

globus isn't really for clusters when they're not a part of a grid.

are you sure you want to setup a cluster? and what type of computations are you trying to perform on it? a cluster isn't like a dual-core cpu.. you can't just run any software off it. say you want to simulate the activity of ion channels when an alpha-toxin is introduced.. to do that you'd need to run computations on hundreds of macromolecules and the hundreds of thousands of atoms that they're composed of.. the cluster would have each node perform the computations for different groups, when a node finishes, it sends the result to the master node.. which appends it to the output file specified at submission time.
this is generally how all computations will be performed.. in a cluster, there is no shared memory architecture so you can't simply just execute any single program over several computers.. you'd run the single program on every computer with a subset of data.

if u still wanna build it.. then here's some (free) software that you may want to look into
openpbs www.openpbs.org - (u have to register before u can dl it). relatively simple to setup, and isn't confined to a specific distribution of linux.
rocks www.rocksclusters.org - an all-in-one package.. the cluster software is built directly into the operating system. only compatible with CentOS (free os, built using the source files for rhel) and RHEL4 (costs money). they have a distribution of centos on their site that u can download if u need it.

the hardware you'd need to build a cluster of 5-10 nodes.. is 5-10 computers (all with a hd, video card, keyboard, and anything else necessary to keep the bios from crying) and a network interface.

it won't take long, but it'll cost u.. if u're familiar with hardware and systems programming, u could really challenege urself and try to create a shared memory architecture.. either by modifying and designing the hardware or by developing a driver for the kernel. just a thought.. not realistic by any means tho.

Armitage · Oct 7, 2005

As everyone else has said - what do you want to do with it? If you aren't a programmer, or don't have access to some project that's already designed for a cluster computer, you're likely to be disapointed. There's lots of software out there to help setup & maintain clusters, lots of programming tools and libraries for programming them, but not alot of end-user software to run on them. That's because it's ussually developed for in-house analysis tasks by labs & research groups. Not the kind of stuff that gets released to the public, and rarely of interest to the general public anyway.

If you want to get a foot in the door to programming clusters, take a look at MPI/MPICH, PVM, etc. You don't need much hardware to learn the basics. In fact, you don't need a cluster at all - a single machine will do. It's more fun with an actual network of course - I'd personally pick up some dirt cheap used boxes and throw your favorite linux distro on it. That's how I got started - 5 486 boxes for $100 about 6 years ago.

And I am currently clusterless

The raid card on our head node went TU yesterday

The Linuxator · Oct 7, 2005

Well as I have said I do have a great intrest in distrubuted computing and after researching it for a while clusters are very intresting, and the reason why I want to build a clusters is for educational purposes, what did you guys think I was going to put Battelfield 2 on a cluster or something LOL.
Also another thing I like about the idea of clusters is that it's possible to turn worthless old computers lying around into something productive for not much of a cost if you already have the hardware involved in the process. I do understand the basic concepts of distrobuted computing so there is no need for us to go any deeper into the process of explaining to me what's the difference between dual core and clusters, it's kind of insulting to CSE student like me

, As I said I am just not experienced when ot comes to complex networking and Clusters. keep that input coming.
And what do you guys think about that CHEAP mobo with the integrated C3 processor that I have linked to in the beginenning ?

Armitage · Oct 7, 2005

So what are you going to run on it?

The Linuxator · Oct 7, 2005

I was hoping that I will be able to run a distro of my liking, perhaps FC4 if that's going to cause problems I might choose Centos .
What do you think ?

Armitage · Oct 7, 2005

Actually, I meant what application(s). I doubt your choice of distribution will make much difference. Our local one runs SuSe, primarily because that's our IT staffs preference. Our corporate cluster runs a BSD.

The Linuxator · Oct 7, 2005

I haven't decided about the apps the cluster will run, but since I am building just to learn how it functions , I might try to do some very intensive caclulations / simulations since I am planning to have about 5-10 Pc's running at once and each will have a PIII type 800 mhz cpu or above, and a fair amount of memory I might as well go push it to it's limits. do you have any suggestions ?

smack Down · Oct 7, 2005

There really isn't that much adminstative work to do on a cluster. All you have to do for MPI is get a host file (list all the nodes IP address), set up SSH to auto log in, and a distrubted file system. Then all you need to do is run the MPI program (I forgot its name) and will open contections to each node using SSH and start a thread on each node.

Any distro will work. The cluster I played around with had Gentoo on one node and Red hat on the other.

Future Shock · Oct 9, 2005

Originally posted by: smack Down

Any distro will work. The cluster I played around with had Gentoo on one node and Red hat on the other.

Any distro will work for playing around. I think I might have confused the OP by mentioning that only RedHat (or CentOS) and Suse are well supported by commercial cluster-aware applications such as Oracle, Informatica, etc.

Future Shock

Search

Need Tips about building an experimental cluster .

The Linuxator

Banned

Mday

Lifer

Future Shock

Senior member

The Linuxator

Banned

itachi

Senior member

Armitage

Banned

The Linuxator

Banned

Armitage

Banned

The Linuxator

Banned

Armitage

Banned

The Linuxator

Banned

smack Down

Diamond Member

Future Shock

Senior member

TRENDING THREADS