I am building a Supercomputer, and need help!!

WingZero94 · Mar 20, 2002

Ok ok, here goes. I am a noobie here, but thought you guys/gals could help out. We are building a supercomputer for my university. It will have 36 processors. They are doing parallel processing, which I don't totally understand.

Anyways this is what I need: Each computer has to cost apprx 600 bucks (36 of 'em). Each needs an 80GB HD, 512 RAM, and LAN. We have tried an Intel PIII 1000/815, AMD Athlon MP 1400/SISK7SEM, P4 1600/850, Celery 950/810. The to Pentium systems ran identical. The AMD was 10% slower and the Celery was 37% slower. What do you think we should use? It needs to be a Floating Point monster. TIA,

Paul

JameSF · Mar 20, 2002

Um, sounds like you're building a cluster. It's a little more complicated than this. What kinda of interconnection topology (network) are you going to use? You have to make sure that your interconnection network has enough bandwidth to support the amount of communication (processing) you want to do.

WingZero94 · Mar 20, 2002

Well the are all using 100Base/T Ethernet. Probably switched network I would say.

Locutus4657 · Mar 20, 2002

A). Why in the world do you need 80GB hard drives on these computers? I can't imagin any parallel problem that requires that much disk space... If you really need that much disk space in each client, then you might as well not build your "super computer". You won't be able to afford the networking hardware that would make the kind of I/O you must be doing.

B). I'm having trouble beleiveing you benchmarks... I don't see how a PIII 1GHz could possibly be faster than an AMD Athlon MP 1400, are you sure it isn't the P4/Athlon ran the same, the PIII was 10% slower and the celery bit? In any rate I would go with whatever CPU fit your budget the best, that would probably end up being the AMD chip.

C). configure your computers, your client workstations should have relitivly small hard drives as all they will need too hold is the Linux beowulf OS, the currently running executable, and any locally stored data that the executable requires. The most commonly used Linux beowulf distro. is scyld, available at http://www.scyld.com.

<< Ok ok, here goes. I am a noobie here, but thought you guys/gals could help out. We are building a supercomputer for my university. It will have 36 processors. They are doing parallel processing, which I don't totally understand.

Anyways this is what I need: Each computer has to cost apprx 600 bucks (36 of 'em). Each needs an 80GB HD, 512 RAM, and LAN. We have tried an Intel PIII 1000/815, AMD Athlon MP 1400/SISK7SEM, P4 1600/850, Celery 950/810. The to Pentium systems ran identical. The AMD was 10% slower and the Celery was 37% slower. What do you think we should use? It needs to be a Floating Point monster. TIA,

Paul >>

WingZero94 · Mar 20, 2002

I don't know why they need 80GB HD. They just said thats what they need. I also thought it hard to believe that the AMD performed worse. Could it be because of the crappy motherboard or PC133 RAM? What about doing 36 dual processor systems, with less expensive processors?

Elledan · Mar 20, 2002

<< Ok ok, here goes. I am a noobie here, but thought you guys/gals could help out. We are building a supercomputer for my university. It will have 36 processors. They are doing parallel processing, which I don't totally understand. >>

Parallel processing: the execution of multiple threads at the same time by two or more CPUs.

<< Anyways this is what I need: Each computer has to cost apprx 600 bucks (36 of 'em). Each needs an 80GB HD, 512 RAM, and LAN. We have tried an Intel PIII 1000/815, AMD Athlon MP 1400/SISK7SEM, P4 1600/850, Celery 950/810. The to Pentium systems ran identical. The AMD was 10% slower and the Celery was 37% slower. What do you think we should use? It needs to be a Floating Point monster. >>

The FPU of the Athlon is superior to the FPU of any Intel CPU.

Anyway, take a look at these sites. They will provide you with a rough idea of what you need:

www.beowulf.org
Beowulf Underground : Current Articles
MOSIX

In short, for the hardware you'll need a number of nodes, the specifications you listed will suffice, although the HDs of the nodes only have to contain the OS and nothing else, so 80 GB is quite excessive.
Further you'll need a fast network to connect the nodes, 100 MBit will likely not suffice for a cluster of 36 nodes, so you'll have to use either 1000 MBit Ethernet, or use two NICs per node (which would function as one NIC/connection).

Using SMP systems would increase the efficiency and speed of the cluster, since less data has to be moved around on the LAN. The average SMP system is at best only slightly cheaper than two single CPU systems, so it might increase the costs, but it would result in a noticable gain in performance.

When using SMP systems, you'll have to buy less RAM, NICs and other peripherals, and decrease the number of individual nodes, making the cluster more managable.

Armitage · Mar 20, 2002

[rant]Ok, first off, the way you/they are speccing this thing is completely bass-ackwards ... "Each computer has to cost apprx 600 bucks (36 of 'em)" That's bullsh|t.

There are basically two ways to spec this kind of system:

1. Specify the performance you need, and minimize the cost to obtain that performance.
2. Specify the money available and maximize the performance for that amount of cash.

Two sides of the same coin of course.

But to say "we want 36 $600 nodes" is next to worthless. What if you could get the same performance with 12 $1800 nodes? Same cost, easier to administer, and less scaling problems.[/rant]

Here's how you should do this:
1. Find out what kind of apps they intend to run on this system.
2. Benchmark the floating point performance of various CPU (at least PIII, Athlon, and P4) on representative apps
2.a. If possible, try it on some big-cache chips (xeon) and see if that might be imortant to you
2.b. See how the representative apps scale in single & simultaneous runs on SMP machines. If you've got huge bandwidth hogs, you may not want to go with SMP boxes.
3. Assess the network requirements. Is 100baseT adequate? for 36 nodes it easily could not be depending on the apps.
4. Assess the memory requirements per cpu
5. Design clusters around different combinations of these things to approximate a common level of performance. This could be something like (36 1 GHz PIII SMP , myrinet) vs. (24 1.2GHz AthlonMP SMP, 1000baseT) vs (32 2.2GHz P4, 2x100baseT), etc.
6. Assess the cost of each of these options
7. Assess non-direct-cost issues (power supply, cooling, space, ease of maintenance, future parts availibility, etc.)
8. Pick a system.
9. Hope everybody is happy.

As for 80GB/node. I have no idea why they want that unless they intend to layer some sort of storage system on top of it. 36*80GB = 2.9 TB, not a bad chunk of space if you can use it effectively.
But you'd do well to see if netbooting your cluster is feasible. Would make administration a snap if set up properly.

Also, don't forget your server node. This may be significantly different then you slave nodes.

Armitage · Mar 20, 2002

Oh yeah...
36 CPU is hardly a supercomputer IMHO. Not a bad start though

Goosemaster · Mar 20, 2002

I am confused here...why are so many "nOObies" building supercomputers...some guy in the networking forum was building a server for his school and he wanted to use a motherboard with 2, Count them, 2 AGP ports....FOR A FREAKIN' server that should have a crappy ass vid card at most!!!

Seriously, I do not mean to challenge your or the other guy's intelligence, but who hires a neophyte to build a cluster array? seriously....I guess this sort of relates to the fact that the school wants 36 $600 machines....

DOn't take me wrong...i know alot about building servers...BUT I would find it impossible for a company to assign such a large project given my backround.....I am not an IT major(I am EE) ..

Anyways, now that my desire for basing has been fufilled(ahh!) let me give you some advice...

As a few have pointed out, push for the school to concentrate on more dual or quad systems than waht they have planned. Unless every comp will be a user for a Dumb terminal somewhere, more preformance will see the light of day if it is on one pc.

From experience, the network may get bogged down, and the computers will end up waiting to process data. The key is to organize....

Cluster:

Dual Server:

Processing
Storage
Web Server
Secuirty

This is by no means an adaquate schematic, but I hope you get my drift....

Fewer and quicker solutions that will be dived into operations. Just as one would use a mail server, a print server...webserver etc.

Then again, you have not said what this is for...If you are looking for JUST Rpocessing power(Scientific cals etc) there are other options...

You could try a Sun server.....you're spending a little more than 20 grand anyway...

oR.....

Get a lot of dual servers and link them with Gigabit over copper.....

This is a dual-cpu server that I recommend...as does Lai Shimpi
Bam!

Locutus4657 · Mar 20, 2002

I think they just lie... The PC133 would affect the Athlon's preformance relitive to a P4, but not a PIII... But since they are not pairing the Athlon with good hardware then yes I can imagin preformance would be affected... If I were you, I would go with Athlons, I would push them into getting a DDR board from a respected MB OEM, and I would seriously question them about why they need 80GB's, it is most unusual for a beowulf client too have such disk space. Are you sure they just didn't mean 80GB for node 0?

<< I don't know why they need 80GB HD. They just said thats what they need. I also thought it hard to believe that the AMD performed worse. Could it be because of the crappy motherboard or PC133 RAM? What about doing 36 dual processor systems, with less expensive processors? >>

Shalmanese · Mar 21, 2002

Does anybody else find it odd that he is using an athlon MP on a single processor mobo?

I am guessing he just tried to search for the cheapest mobo.

WingZero94 · Mar 21, 2002

Yea, I found it wierd that the MP's were used. Especially with PC133 and a crappy mobo. We are meeting with the Dr. at the University to find out exactly what he wants. Perhaps he is running some sort of experiment or such. I appreciate all your ideas, and once we find out exacty what he is using it for, I will let ya know! Actually....... if it is for the university, they are probably just finding out more ways to bill us....

Paul

Gunbuster · Mar 21, 2002

SGI builds supercomputers

RSMemphis · Mar 21, 2002

A cluster with TCP/IP on a 100 MBit ethernet will just be crippled anyway.
Unless the project allows a lot of parallelization (meaning the results do not have to shuffled back and forth a lot), you can put almost any CPU in there. It just won't matter.

You know what is harder than building a cluster? Programming a cluster. People who know how to properly program a cluster know which components to use.

I agree with what most people here said:

*) 80 GB is overkill.
*) I find the processor remarks weird

Additionally:
*) What OS are you going to use? Athlons often perform very much better under Unix based systems than P4s
*) What compiler are you going to use? If it is time critical code, you better get a very good compiler.
*) You need to drop the TCP/IP transfer protocol. There's just too much overhead for effective parallel computing.

Armitage · Mar 22, 2002

<< A cluster with TCP/IP on a 100 MBit ethernet will just be crippled anyway.
Unless the project allows a lot of parallelization (meaning the results do not have to shuffled back and forth a lot), you can put almost any CPU in there. It just won't matter. >>

As I said in the other thread, it depends on the applications and the size of the cluster. Switched 100baseT will be plenty of network for many applications on a 36 node cluster.
Here's a link to a 1000 CPU cluster tied together with 100baseT (hubs & switches). For their application, it's all that was needed.

<< You know what is harder than building a cluster? Programming a cluster. People who know how to properly program a cluster know which components to use. >>

yea verily

<< *) You need to drop the TCP/IP transfer protocol. There's just too much overhead for effective parallel computing. >>

PVM, one of the two message passing libraries commonly used to program clusters, uses a mix of TCP and UDP (linkage)
UDP is to unreliable by itself, you'd have to layer on all the robustness that TCP gives you by default.
I'm not sure about MPI.

RSMemphis · Mar 22, 2002

<< PVM, one of the two message passing libraries commonly used to program clusters, uses a mix of TCP and UDP (linkage)
UDP is to unreliable by itself, you'd have to layer on all the robustness that TCP gives you by default.
I'm not sure about MPI. >>

PVM does use TCP, but I wouldn't say it is very effective. I have seen both an Ethernet and a Myriad network. Believe me, there is a huge difference.

Armitage · Mar 22, 2002

<< PVM does use TCP, but I wouldn't say it is very effective. I have seen both an Ethernet and a Myriad network. Believe me, there is a huge difference. >>

Huh?
A Myriad network? Perhaps you mean myrinet? Of course there is a huge difference between ethernet & myrinet (if that's what you mean). If you need myrinet performance, you should get myrinet. But it is fantastically expensive (about $1200/ card & $400/port). If your app won't work without myrinet performance, maybe you should reexaming how you're doing things with your application. Maybe it just isn't a candidate for cluster computing.

In any case, how do you say PVM isn't very effective? What would you suggest.
I'm pretty confident MPI uses TCP also.

hoihtah · Mar 22, 2002

currently the fastest cpu for $ i'd say is p4 1.6a over clocked to 2.4ghz.
a mobo for that would go for about 100... and another 140-150 for cpu.
for 250... this would definately be the choice of my mega computer.

Shalmanese · Mar 22, 2002

Hmm... you want to OC a cluster?

with 36 nodes, if you decrease the stability by even 1%, your going to have a third more crashes than normal. This isnt like single computers where a crash is just a reboot.

Lithium381 · Mar 22, 2002

80gb!??!?!?! hah, i'd go with athlon, cheaper...... 36 beasts? man, that'd be nice to have in my garage.......you know,
how about 18 dual cpu systems? well, have fun!

Armitage · Mar 22, 2002

<< Hmm... you want to OC a cluster?

with 36 nodes, if you decrease the stability by even 1%, your going to have a third more crashes than normal. This isnt like single computers where a crash is just a reboot. >>

While I agree that overclocking a cluster is probably a bad idea, I'd be interested in how you came up with those figures (1% stability decrease = 1/3 more crashes)

nirgis · Mar 23, 2002

<<

<< Hmm... you want to OC a cluster?

with 36 nodes, if you decrease the stability by even 1%, your going to have a third more crashes than normal. This isnt like single computers where a crash is just a reboot. >>

While I agree that overclocking a cluster is probably a bad idea, I'd be interested in how you came up with those figures (1% stability decrease = 1/3 more crashes) >>

1% * 36 = 36% less stability

Armitage · Mar 23, 2002

<< 1% * 36 = 36% less stability >>

No, it doesn't really work that way.
Try this example. Say your average uptime for a node on your cluster is 100 days (just to keep the math simple). For a 36 node cluster, you can now expect 36 node failures/100 days, or 2.778 days of cluster uptime between node failures.
Now, you decide you overclock your nodes. Overclocking reduces your stability by 1% (per your argument), so the nodes now average 99 days uptime. 36 node failures per 99 days now gives you 2.75 days of cluster uptime between node failures.

So, by decreasing your node stability by 1% you've reduced your cluster uptime between failures by (2.778 - 2.75)/2.78 = 1%

Note that the failure of a slave node does not neccesarily mean your entire cluster and the apps running on it goes down. If you're using PVM, and do some extra work on your apps, you can detect the failure of remote processes and/or nodes and work around the problem. That's been my experience with Linux clusters anyway. Windows may not be as well behaved, I don't know. Of course, if you lose the master node, typically everything goes down.

Another note, PVM handles this sort of dynamic process management very well. MPI does not, although MPI 2 is supposed to add some of this functionality.

schizoid · Mar 23, 2002

Uhh...

Last time I checked, MPI is based on the PVM macros. It's a supreme bitch to deal with (well, I guess it's not THAT bad, but it aint fun either). I had to program a distrubuted shared memory package in MPI. It was lovely.

Armitage · Mar 23, 2002

<< Uhh...

Last time I checked, MPI is based on the PVM macros. It's a supreme bitch to deal with (well, I guess it's not THAT bad, but it aint fun either). I had to program a distrubuted shared memory package in MPI. It was lovely. >>

You should check again. MPI has nothing to do with PVM. I'm not sure what you mean by PVM macros.

Actually, MPI is an industry standard, of which there are several implementations (LAM, MPICH, many vendors have their own).

And I agree, MPI is a real PITA. I much prefer & recommend PVM. But I learned PVM first, and use it alot, so I could be somewhat biased.

I am building a Supercomputer, and need help!!

Golden Member

Junior Member

Golden Member

Senior member

Golden Member

Banned

Banned

Banned

Lifer

Senior member

Platinum Member

Golden Member

Diamond Member

Golden Member

Banned

Golden Member

Banned

Diamond Member

Platinum Member

Lifer

Banned

Senior member

Banned

Banned

Banned