Need Help! Price estimate on server build

JM Aggie08 · Jul 13, 2010

MotF Bane said:
I came.

Twice.

xSauronx · Jul 13, 2010

zinfamous said:
Alrighty, this is totally last second dumped on me, as she's been away for more than a month and suddenly needs to finish this grant. Nothing unusual here, actually....

I'll have to look over some stuff at home. le sigh... 🙁

so you guys are saying $20-40k won't be able to run MySQL, for example, and handle some 20-40 TB of data?

:hmm:

for an idea...the community college i just graduated from took almost 20k to buy a non-major brand 12TB SAN (mirrored, 6TB usable), plus another 7k to buy a used 3750-24 10/100/1000 with 2 GBICs

we looked at something from hp and dell i think, a similar setup would have cost more like 60 or 70k (i dont remember too well, it was expensive) for a similar amount of storage.

zinfamous · Jul 13, 2010

Platypus said:
It can be done on that price point but it's not going to be that great, probably awful... we need a more realistic sense of how fast it needs to be, how people are connecting to it, what your network is like, how much the data is going to grow over the next 3 years, large files/small files?, etc

well, we run 10gb switches on campus--I could be wrong? fuck if I know, lol. a single sequencing run produces about 15-20gb data. We sequence a lot of genomes, and assembly means taking 1-3 of these 20gb runs and analyzing them together.

Hell, I'll give you a call later. I'm gonna try and pry out some more info.

BrunoPuntzJones · Jul 13, 2010

500GB or 500gb? Big difference.

As noted, even if only 500gb a week, you're well outside of a standard server for hosting, you'll need a SAN.

How much retention/redundancy? What about uptime? If you lose the equipment, how long can you be down before going out of business?

As noted, you really need to talk with enterprise or corp sales at HP, Dell, etc.

Platypus · Jul 13, 2010

zinfamous said:
well, we run 10gb switches on campus--I could be wrong? fuck if I know, lol. a single sequencing run produces about 15-20gb data. We sequence a lot of genomes, and assembly means taking 1-3 of these 20gb runs and analyzing them together.

Hell, I'll give you a call later. I'm gonna try and pry out some more info.

You'll want a 10G NIC then, which is pricey in its own right. I can make recommendations for you on stuff I've worked with personally when you call later if you want. You should also think about a backup solution for this.. how are you going to recover 20TB quickly, what medium are you saving it on? How much of a retention do you need? etc

Gigantopithecus · Jul 13, 2010

Why are you storing full genome sequences instead of variants from the reference sequences?

Jeff7 · Jul 13, 2010

MotF Bane said:
I came.

Yes, but will it blend?

So awesome though. 😀

Rubycon · Jul 13, 2010

MotF Bane said:
I came.

Wow talk about easy! I mean they are using the most flaky drives in the world! If you want excitement wire that with 300GB Pliants! 😀

olds · Jul 13, 2010

MagnusTheBrewer said:
Say what? That's like asking how high is up? What's the best car or, how many people does it take to screw in a corporate light bulb?...

I used to love it when I was out doing chain control or plowing. People would ask me: "how far down is it snowing?"
My answer: "all the way to the ground."

We now resume your regularly scheduled off topic programming.

Gigantopithecus · Jul 13, 2010

Rubycon said:
Wow talk about easy! I mean they are using the most flaky drives in the world! If you want excitement wire that with 300GB Pliants! 😀

Yeah, it'd be interesting to find out how many of those 1.5tb Seagates have failed on them, heh.

Lifted · Jul 13, 2010

How in the hell do you plan on building/administering this server/NAS/SAN if you don't know the first thing about the technology?

If you're in a uni, contact somebody in the IT department as they will know how to design a solution for you and know who offers the best bang for the buck based on their purchasing agreements with the major vendors. On items this large you can expect 20 - 40% off list for an edu contract.

zinfamous · Jul 13, 2010

Gigantopithecus said:
Why are you storing full genome sequences instead of variants from the reference sequences?

because they are, well..."our" genomes? We're mostly working with previously un-sequenced critters, so everything we work with, we have to sequence. That's our data.

We have to assemble each genome, of course; mostly using TopHat, Soap...Bowtie, a few other programs developed locally. If any of this is familiar to you, perhaps you could give some tips on what would be needed in a server for multiple people to concurrently run our aligning software over our data? Anyway, I don't work with the analysis; I mostly prepare the libraries for sequencing.

Hehe, currently, our lab took up more Illumina time than any other on campus--we have our own Next Gen sequencing core facility here. Hell, Our closest collaborator has his own Solexa machine in his lab. 😱
yeah, he's loaded.....

zinfamous · Jul 13, 2010

Lifted said:
How in the hell do you plan on building/administering this server/NAS/SAN if you don't know the first thing about the technology?

If you're in a uni, contact somebody in the IT department as they will know how to design a solution for you and know who offers the best bang for the buck based on their purchasing agreements with the major vendors. On items this large you can expect 20 - 40% off list for an edu contract.

yeah, I guess I'm still not that clear. I'm looking for a general price estimate in order to write a grant. We aren't buying anything at the moment. It takes months for grants to be approved.

Again, this task was handed to me about 2 hours ago, out of nowhere. Our previous guy that would jump on this stuff is now somewhere else. 🙁

I'm just looking for a baseline price, so a few details on what I should be looking for...

Dark4ng3l · Jul 13, 2010

If these things take months to approve then you should be fine taking a couple of days to look at options and consult the right people. As an accountant I can tell you that there is nothing worse than making random decisions or just throwing a random number at a project without some kind of real ides what the cost will be both short term and long term.

Tell your boss that you are not sure and can't tell her right now unless you guess and that you can give her a better answer Friday or something. If you don't know then just say you don't know but that you are going to find the solution and then do it.

BoberFett · Jul 13, 2010

zinfamous said:
so you guys are saying $20-40k won't be able to run MySQL, for example, and handle some 20-40 TB of data?

Not if it's important it won't. Backup? Uptime requirements?

mugs · Jul 13, 2010

If you're getting by with a desktop PC as your server right now, I think it's safe to say that most of what has been mentioned in this thread is overkill.

ultimatebob · Jul 13, 2010

mugs said:
If you're getting by with a desktop PC as your server right now, I think it's safe to say that most of what has been mentioned in this thread is overkill.

Yeah... but I'm sure that they're not trying to store 40 TB of data on that old desktop, either.

For that kind of storage, you really need both a server and a SAN. Sure, he could probably daisy chain a few Drobo's to that desktop to get that kind of storage, but the performance would suck.

Gigantopithecus · Jul 13, 2010

zinfamous said:
because they are, well..."our" genomes? We're mostly working with previously un-sequenced critters, so everything we work with, we have to sequence. That's our data.

We have to assemble each genome, of course; mostly using TopHat, Soap...Bowtie, a few other programs developed locally. If any of this is familiar to you, perhaps you could give some tips on what would be needed in a server for multiple people to concurrently run our aligning software over our data? Anyway, I don't work with the analysis; I mostly prepare the libraries for sequencing.

Hehe, currently, our lab took up more Illumina time than any other on campus--we have our own Next Gen sequencing core facility here. Hell, Our closest collaborator has his own Solexa machine in his lab. 😱
yeah, he's loaded.....

Ahh, even if every single organism you're sequencing has never been sequenced, it still has relatives, and you can still use indices. And I doubt your PI works on entirely disparate branches from the tree of life, probably on a bunch of relatively related organisms - IOW your own lab should be building its own indices.

You're also not generating 500gb of data per week that actually needs long term storage unless your lab is sequencing the equivalent of 100 human genomes and annotating them...every week. Once those scaffolds are assembled, they're trashed.

MarkXIX · Jul 13, 2010

Yep, you need a SAN.

I would find a solution that allows you to start small and build out though. There are a lot of smaller storage vendors with solutions out there. They usually advertise in IT related magazines and as utilitarian as storage has become, the barrier to entry is getting pretty low.

SSSnail · Jul 14, 2010

I came late to the party, but if you guys read through this http://web.mclink.it/MC8247/Building-CF-Labs.pdf and look at the storage section, you'll probably... well, judge for yourselves.

The paper is seriously lacking in graphical eyes candies, but what it lack it makes up for in awesomeness.

zinfamous · Jul 14, 2010

well, sorry for all the formatting, but this is what I put together through our campus IT account service, University pricing and such. (Thanks Plat, for reminding me 😛)
Yeah, I had to cut and paste, cause I have no idea what's going on here...but it sounds like it's 90% close to what we need. Consulting and ironing out the details can come later, being that it will be several months before we here about the grant, and know what there is to spend.

Thanks AT, I actually know a little bit more about this stuff now than I did 6 hours ago (though I'm sure it doesn't show :hmm🙂

MD3000 disk storage array

Qty 1 Configured with two single-port controllers
PowerVault MD3000 --Primary Hard Drive Ten 1TB 7.2K RPM Universal SATA 3Gbps --Server connectivity SAS 5/E HBA, PCI-Express, 2x4 connectors
--5x 500GB 7.2K RPM Universal SATA 3Gbps 3.5-in HotPlug Hard Drive
(12.5 TB is actually 3x our current read usage)
--300GB 15K RPM Serial-Attach SCSI 3Gbps 3.5-in HotPlug Hard Drive, Cust. Kit
TOTAL: $13,139.39

R610 1U 2-socket standard server
Qty 1 Chassis for Up to Six 2.5-Inch Hard Drives and Intel® 56XX Processors, Windows Server 2008 R2, Enterprise Academic Edition,x64, Includes 25 CALs

Unit Price $19,807.76

--Operating System Windows Server 2008 R2, Enterprise Academic Edition,x64, Includes 25 CALs --96GB Memory (12x8GB), 1333MHz Dual Ranked RDIMMs for 2 Processors, Optimized --Dual Two-Port Embedded Broadcom® NetXtreme II 5709 Gigabit Ethernet NIC --2x Intel® Xeon® X5677, 3.46Ghz, 12M Cache,Turbo, HT, 1333MHz Max Mem
--1st Hard Drive 600GB 10K RPM Serial-Attach SCSI 6Gbps 2.5in Hotplug Hard Drive --Primary Controller PERC 6/i SAS RAID Controller, 2x4 Connectors, Internal, PCIe, 256MB Cache --Network Adapter Broadcom 57710 10GbE Single Port 10GbE NIC, Copper, PCIe-8 -- 5x 600GB 10K RPM Serial-Attach SCSI 6Gbps 2.5in Hotplug Hard Drive --Hard Drive Configuration RAID 10 for H700 or PERC 6/i Controllers
--Power Supply High Output Power Supply, Non-Redundant, 717W --Host Bus Adapater/Converged Network Adapter Qlogic QLE8152 10Gb CNA/Fibre Channel over Ethernet Adapter
TOTAL: $19,807.76

Total Price $32,947.15

TridenT · Jul 14, 2010

That is a lot of fuckin' money on two servers.

zinfamous · Jul 14, 2010

Gigantopithecus said:
Ahh, even if every single organism you're sequencing has never been sequenced, it still has relatives, and you can still use indices. And I doubt your PI works on entirely disparate branches from the tree of life, probably on a bunch of relatively related organisms - IOW your own lab should be building its own indices.

You're also not generating 500gb of data per week that actually needs long term storage unless your lab is sequencing the equivalent of 100 human genomes and annotating them...every week. Once those scaffolds are assembled, they're trashed.

yes, we have our common outgroups. we're interested in critters with funky sex chromosomes, so we not only do we need the various configurations of sex chromosomes out there, but a decent portion of the autosomes so that we can track selection. --many of this have newly-evolving x chromosomes (or z, or whatever), so we can track selection from autosomes to the sex chromosomes (well, that's the hope).

I honestly don't know how much is tossed after it's processed and assembled, sure--there's plenty of junk that gets tossed once the reads come off the machine before you even start assembly. But after that, depending on who is doing what, I really couldn't say what we need to keep temporarily or long-term as projects tend to take strange directions throughout their lifespan.

But again, I'm a molecular kind of guy. This stuff is really not my bag. :\

ultimatebob · Jul 14, 2010

96 GB of memory?!? Holy hell. Why do you need that much RAM for a single database server? I've built VMWare host systems running 8 servers each that have less total memory than that.

Elbryn · Jul 14, 2010

random thoughts while reading the thread. what's your i/o profile going to look like? that'll drive the decision on what kind of disks you need.

why 5 500gb disks and 10 1tb disks? cost consideration? you wont be able to create a single raid group out of the set as raid will use the smallest disk in the array to calculate size.

is 10gb nic really necessary? do you have enough machines that will be sending data concurrently to demand that sort of pipe at the same time? if you do, then you may really want to go back to the first question and quantify your i/o needs because i dont think your backend sata disk is going to keep up with a 10gb pipe filling it. decide what you need performance or space.

you also got some 10k sas drives in that server. going multiple tier storage? your 10k is going to be your higher performing disk, the md3000 with sata is long term?

Plan your raid- raid levels will reduce your total disk amount. dont plan on raw data size, plan with size after raid creation. higher protection in raid's eat up drives. raid 5 gets you n-1 times drive size in usable space. whereas raid 10 will eat up considerable more, mirroring and striping.

that single server you added is bringing quite a bit of memory. is it going to be running calc's and apps in addition to being storage?

who's gonna be supporting the rig? if its you or someone in the lab, i'd get the simplest solution that meets the requirements as possible. a tower server with as many 1-2tb drives as you can fit into it and acting as purely storage is the easiest route. that same tower can be upgraded to 2 quad core procs and a boatload of ram to also run calcs.

my suggestion is to make it simple and standalone unless you have support to take care of a more complicated setup.

Need Help! Price estimate on server build

Diamond Member

Lifer

No Lifer

Lifer

Lifer

Diamond Member

Lifer

Madame President

Elite Member

Diamond Member

Diamond Member

No Lifer

No Lifer

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Platinum Member

Lifer

No Lifer

Lifer

No Lifer

Lifer

Golden Member