Options for Enterprise class, redundant storage?

KingGheedora · Dec 8, 2009

I'm not an IT person, I'm a developer/manager. We recently had an outage for part of our system at work caused by hardware outage of a computer that is acting as a network share. The share is used by a bunch of other servers to store and retrieve a large amount of files (both in size, and in quantity of files).

The machine that went down did have raid, but multiple drives failed, motherboard failed, and some other piece of equipment also failed. At least that's what the IT guys are telling us. We also had backups of the machine that went down, but the data was a couple weeks old.

It's not my area of expertise, or even my responsibility, but I'd really like to have some kind of redundancy for this. What options are available for this? Is it possible to cluster a file server the same way SQL databases are clustered, and have the data actually stored on a SAN or something (this is the way our SQL db's are clustered for failover). The sense i get is that this is just a windows machine with a big RAID array. Amount of data is 300-500GB.

What other solutions are possible to provide redundancy for file servers (besides RAID obviously, since that would not prevent the type of outage we had). I'm sure you guys knwo of some devices i've never even heard of that might be right for the job.

I'd like to suggest a solution to the tech VP, for the IT team to implement next year.

zuffy · Dec 8, 2009

The question you should ask is why the backup is a couple of weeks old. All the redundancy in the world wouldn't save you if the data gets corrupted. That is where the backup comes in.

RebateMonger · Dec 9, 2009

zuffy said:
The question you should ask is why the backup is a couple of weeks old. All the redundancy in the world wouldn't save you if the data gets corrupted. That is where the backup comes in.

Yeah, there's lots of good backup software, that'll make nice image backups and update the image every hour or less if you want. If they are windows servers, you can even back them up to a Windows Home Server, which will make daily backups and keep every version of every file for months. Although I don't recommend that for an Enterprise solution, it'd be better than what you have now.

It's really silly to lose significant file server data when a 1 TB hard disk is $70.

Yeah, you can also cluster file servers. You can virtualize them. You can virtualize clusters. You can put the data on a SAN. File servers themselves can be restored very quickly and data can be restored quickly from a directly attached tape or SATA or SAS disk.

Reliant · Dec 9, 2009

First off, backup are important! What's the good in running them once a month when you have constantly changing data? Secondly, you could look in to some sort of SAN or NAS device for better redundancy.

Emulex · Dec 9, 2009

yeah get a pair of hp lefthand's they are clustered including network raid-1 plus come with all the snapshot goodies so you can do hourly snaps. if you add one more unit and a wan you can n:1 replication off-site using periodic snapshots.

Red Squirrel · Dec 9, 2009

Take a look at Dell Equalogic. Be ready to pay the price though, but that's enterprise class.

If you want something on the budget that will still have decent performance you could just build a linux software raid and use samba or NFS, or openNAS (never used the last two but they're fairly standard). Either way you should also have nightly backups to go with it.

Though for true enterprise you'll want something supported by a major company like dell. There's also IBM but be ready to overpay 100x if a part breaks in 3 years when the support is over. We were on an IBM san and an ESM card failed and they wanted 26k to replace that card. It's insane.

Oh and since you mentioned something other then raid to prevent other hardware failures, look at virtualization. Setup at least two ESX servers then that way if one goes down the other takes the slack. I wont get into details here but plenty of resources are available online.

zuffy · Dec 9, 2009

ESX DRS or SRM wouldn't save your ass either if the OS, application or data are corrupted. They are mainly for host redundancy/load-balancing, not VM protection. You still need backup.

HappyCracker · Dec 9, 2009

Yes, you could use MSCS and setup a clustered Windows file server (assuming you need Windows/CIFS stuff). You would need to have some sort of shared storage (iSCSI would work, as would Fibre Channel), and a couple of servers that see the same volumes.
Alternatively, there are fancier solutions that do nothing but NAS like NetApp filers, EMC Celerra, and the like. My personal experience is with Celerra and the ones we've owned have been pretty solid. They do multiprotocol (CIFS/NFS/iSCSI) and have the ability to replicate and take snapshots of file systems, making them very flexible.

erwos · Dec 9, 2009

Hire someone who knows what they're doing. Seriously, you do not want to be learning as you go with critical data.

Red Squirrel · Dec 9, 2009

HappyCracker said:
Yes, you could use MSCS and setup a clustered Windows file server (assuming you need Windows/CIFS stuff). You would need to have some sort of shared storage (iSCSI would work, as would Fibre Channel), and a couple of servers that see the same volumes.
Alternatively, there are fancier solutions that do nothing but NAS like NetApp filers, EMC Celerra, and the like. My personal experience is with Celerra and the ones we've owned have been pretty solid. They do multiprotocol (CIFS/NFS/iSCSI) and have the ability to replicate and take snapshots of file systems, making them very flexible.

Actually that just reminded me of a neat feature in win2k3 that is often forgotten: DFS. Basically you setup these special shares and they replicate accross each other so you could have a few servers that replicate files to each other in real time. I think AD actually uses this to replicate accross DCs.. but I could be wrong, it might do it's own thing.

HappyCracker · Dec 11, 2009

To the original post, I have only seen multiple drive failures in the event of a RAID controller freakout (IBM's were notorious). If the motherboard had a RAID controller on it and that flipped out or failed, that might explain it. Are your servers white-box or is some company responsible for their manufacture?

KingGheedora · Dec 11, 2009

erwos said:
Hire someone who knows what they're doing. Seriously, you do not want to be learning as you go with critical data.

Right. I'm a developer/manager, not an IT person, so neither deciding or implementing the solution are my responsibility.

I mainly want to know what the options are (1) for my own curiosity since i like to understand how all our systems work, and (2), so I can understand what the IT staff ends up doing and/or provide some input if they end up not doing anything at all.

KingGheedora · Dec 11, 2009

HappyCracker said:
To the original post, I have only seen multiple drive failures in the event of a RAID controller freakout (IBM's were notorious). If the motherboard had a RAID controller on it and that flipped out or failed, that might explain it. Are your servers white-box or is some company responsible for their manufacture?

Not white box, it was HP system. I believe HP repaired the damaged system. But the point of the post is I would like to see some type of failover system in place so we can cutover to standby hardware if this happens again.

We have clustering for our web servers (farm of over a dozen servers, load balanced), sql servers (two SQL machines, somehow pointing to a shared data storage which may or may not be a SAN), and even our mail relay systems are clustered.

This network file storage system is the only major system that i can think of that doesn't have redundancy like this and it caused us a lot of down time.

KingGheedora · Dec 11, 2009

RedSquirrel said:
Oh and since you mentioned something other then raid to prevent other hardware failures, look at virtualization. Setup at least two ESX servers then that way if one goes down the other takes the slack. I wont get into details here but plenty of resources are available online.

Not sure i understand how this would work. Why virtual? On each ESX server would be one virtual machine running what exactly? How would the two be sharing the data or be set up so that we could cut over to the 2nd ESX server if the first one were to fail?

Red Squirrel · Dec 11, 2009

KingGheedora said:
Not sure i understand how this would work. Why virtual? On each ESX server would be one virtual machine running what exactly? How would the two be sharing the data or be set up so that we could cut over to the 2nd ESX server if the first one were to fail?

Basically you need a SAN which is centralized redundant storage (usually consists of raid, redundant PSUs and redundant data paths). Then each ESX server load balances and runs it's own share of VMs. If one ESX server goes down, the VMs can then run on another server. I never witnessed it myself but I'm pretty sure the failover is automatic too. It basically keeps track of the virtual RAM and "transfers" it over to another host while live. VMs can even be moved off a certain host and that host be brought down for maintenance.

Each VM would be a virtual server running an operating system. You can even set rules so two VMs are never on the same physical host so for example if you have two domain controllers they will never be on the same host. By default it just tries to balance out the resources. If one VM starts using too much resources it is moved to a less busy server or other VMs are moved off that server.

I'm not exactly an expert at this so I may be wrong on how exactly it works, but that's the jist of it. It's not cheap though... but once you have a decent setup it's very easy to add new servers. need a linux box or windows box? no problem, just add it in.

zuffy · Dec 12, 2009

RedSquirrel said:
Basically you need a SAN which is centralized redundant storage (usually consists of raid, redundant PSUs and redundant data paths). Then each ESX server load balances and runs it's own share of VMs. If one ESX server goes down, the VMs can then run on another server. I never witnessed it myself but I'm pretty sure the failover is automatic too. It basically keeps track of the virtual RAM and "transfers" it over to another host while live. VMs can even be moved off a certain host and that host be brought down for maintenance.

Each VM would be a virtual server running an operating system. You can even set rules so two VMs are never on the same physical host so for example if you have two domain controllers they will never be on the same host. By default it just tries to balance out the resources. If one VM starts using too much resources it is moved to a less busy server or other VMs are moved off that server.

I'm not exactly an expert at this so I may be wrong on how exactly it works, but that's the jist of it. It's not cheap though... but once you have a decent setup it's very easy to add new servers. need a linux box or windows box? no problem, just add it in.

You are talking about vMotion and DRS. They are for load-balancing and moving VMs around ESX servers seamlessly. If the ESX server goes kaboom, any VMs in that box will go down. If you configure the DRS to automatically recover for the VMs if a host goes down, then those VMs will power up in another ESX server. So, you do have downtime and possible data loss if an ESX server goes down.

Options for Enterprise class, redundant storage?

KingGheedora

Diamond Member

zuffy

Senior member

RebateMonger

Elite Member

Reliant

Diamond Member

Emulex

Diamond Member

Red Squirrel

No Lifer

zuffy

Senior member

HappyCracker

Senior member

erwos

Diamond Member

Red Squirrel

No Lifer

HappyCracker

Senior member

KingGheedora

Diamond Member

KingGheedora

Diamond Member

KingGheedora

Diamond Member

Red Squirrel

No Lifer

zuffy

Senior member

TRENDING THREADS