JM Aggie08
Diamond Member
I came.
Twice.
I came.
Alrighty, this is totally last second dumped on me, as she's been away for more than a month and suddenly needs to finish this grant. Nothing unusual here, actually....
I'll have to look over some stuff at home. le sigh... 🙁
so you guys are saying $20-40k won't be able to run MySQL, for example, and handle some 20-40 TB of data?
:hmm:
It can be done on that price point but it's not going to be that great, probably awful... we need a more realistic sense of how fast it needs to be, how people are connecting to it, what your network is like, how much the data is going to grow over the next 3 years, large files/small files?, etc
well, we run 10gb switches on campus--I could be wrong? fuck if I know, lol. a single sequencing run produces about 15-20gb data. We sequence a lot of genomes, and assembly means taking 1-3 of these 20gb runs and analyzing them together.
Hell, I'll give you a call later. I'm gonna try and pry out some more info.
I came.
I used to love it when I was out doing chain control or plowing. People would ask me: "how far down is it snowing?"Say what? That's like asking how high is up? What's the best car or, how many people does it take to screw in a corporate light bulb?...
Wow talk about easy! I mean they are using the most flaky drives in the world! If you want excitement wire that with 300GB Pliants! 😀
Why are you storing full genome sequences instead of variants from the reference sequences?
How in the hell do you plan on building/administering this server/NAS/SAN if you don't know the first thing about the technology?
If you're in a uni, contact somebody in the IT department as they will know how to design a solution for you and know who offers the best bang for the buck based on their purchasing agreements with the major vendors. On items this large you can expect 20 - 40% off list for an edu contract.
so you guys are saying $20-40k won't be able to run MySQL, for example, and handle some 20-40 TB of data?
If you're getting by with a desktop PC as your server right now, I think it's safe to say that most of what has been mentioned in this thread is overkill.
because they are, well..."our" genomes? We're mostly working with previously un-sequenced critters, so everything we work with, we have to sequence. That's our data.
We have to assemble each genome, of course; mostly using TopHat, Soap...Bowtie, a few other programs developed locally. If any of this is familiar to you, perhaps you could give some tips on what would be needed in a server for multiple people to concurrently run our aligning software over our data? Anyway, I don't work with the analysis; I mostly prepare the libraries for sequencing.
Hehe, currently, our lab took up more Illumina time than any other on campus--we have our own Next Gen sequencing core facility here. Hell, Our closest collaborator has his own Solexa machine in his lab. 😱
yeah, he's loaded.....
Ahh, even if every single organism you're sequencing has never been sequenced, it still has relatives, and you can still use indices. And I doubt your PI works on entirely disparate branches from the tree of life, probably on a bunch of relatively related organisms - IOW your own lab should be building its own indices.
You're also not generating 500gb of data per week that actually needs long term storage unless your lab is sequencing the equivalent of 100 human genomes and annotating them...every week. Once those scaffolds are assembled, they're trashed.