I work for a cloud services company. Our bread and butter is online backup/DR but we also colocate and host servers. typically dell blades + ESXi is our preferred setup for hosting.
My boss is intrigued by the success of others in their use of commodity desktops PCs as worker nodes in vast linux clusters.
I on the other hand am skeptical that the commodity approach is as cost efficient as our current storage hardware.
I'm looking for someone with a good level of experience setting up and administering Hadoop/HDFS, FhGFS, Gluster, QFS or any other open source distributed file system so maybe we can have a conference call in our next meeting and get some conventional wisdom. show us what you've set up... name your price and hopefully I can just put you on speaker and you'll blow our minds. We can set up screen-sharing software and I can put the display on the projector as well if you want to show us something.
I do not have the experience to backup my claims that HPC-style storage clusters are not a good fit for bulk archival storage. Yes a HDFS/FhGFS cluster offers high aggregate bandwidth, but we are not bandwidth limited and the cost is still higher than our current setup for less storage overall. Plus most of these systems require a minimum replication factor of 3, so for every block you write, 2 replicas get pushed to other nodes and now the usable storage in your cluster is divided by 3! That's really not cheap storage at all, it just offers high aggregate bandwidth and reasonable fault tolerance.
We aren't bandwidth starved, and our current setup is pretty dense (384 drives per rack, about one petabyte usable storage per rack after parity). In fact I have tried to price out little barebones systems with cheap 2 TB drives. It still comes out to over $400 per node and nowhere near one petabyte per rack, not even half a petabyte. With so many nodes you are also wasting way too many rack units on 48-port switches rather than disks.
What it all boils down to is that we are all helpless little babies when it comes to the topic of distributed storage or computing, but we want to explore it and are looking for someone to share their experience. what do you guys think? Am I in the wrong forum?
My boss is intrigued by the success of others in their use of commodity desktops PCs as worker nodes in vast linux clusters.
I on the other hand am skeptical that the commodity approach is as cost efficient as our current storage hardware.
I'm looking for someone with a good level of experience setting up and administering Hadoop/HDFS, FhGFS, Gluster, QFS or any other open source distributed file system so maybe we can have a conference call in our next meeting and get some conventional wisdom. show us what you've set up... name your price and hopefully I can just put you on speaker and you'll blow our minds. We can set up screen-sharing software and I can put the display on the projector as well if you want to show us something.
I do not have the experience to backup my claims that HPC-style storage clusters are not a good fit for bulk archival storage. Yes a HDFS/FhGFS cluster offers high aggregate bandwidth, but we are not bandwidth limited and the cost is still higher than our current setup for less storage overall. Plus most of these systems require a minimum replication factor of 3, so for every block you write, 2 replicas get pushed to other nodes and now the usable storage in your cluster is divided by 3! That's really not cheap storage at all, it just offers high aggregate bandwidth and reasonable fault tolerance.
We aren't bandwidth starved, and our current setup is pretty dense (384 drives per rack, about one petabyte usable storage per rack after parity). In fact I have tried to price out little barebones systems with cheap 2 TB drives. It still comes out to over $400 per node and nowhere near one petabyte per rack, not even half a petabyte. With so many nodes you are also wasting way too many rack units on 48-port switches rather than disks.
What it all boils down to is that we are all helpless little babies when it comes to the topic of distributed storage or computing, but we want to explore it and are looking for someone to share their experience. what do you guys think? Am I in the wrong forum?
