High availability clustering

Mark R · Jul 11, 2014

What are people's experience with this?

We've got some mission critical apps at work, and they are set up on triple redundant active-standby-standby veritas clusters.

The problem is that they keep falling over, and not on a single occasion has the cluster ever failed over to one of the standby racks.

In one case, the issue was tracked to a network cable fault, which left the active server unavailable, but didn't trigger a failover. However, the system did failover automatically when the cable was disconnected.

In another case, it was a TOR switch failure.

In 2 other cases, all 3 servers had locked up and needed to be rebooted.

In one case, it was due to a failure of the backbone network, which lost a fiber link - so it doesn't really count as a cluster failure.

Are we having a bad experience? Obviously, we paid $$$$$ for triplicated servers, veritas licensing, etc. so we're a bit annoyed.

imagoon · Jul 11, 2014

Honestly it really depends on the app. Without information about how you have these configured and what they do it is hard to say if it is bad luck or a configuration issue.

If for example you are trying to server a webpage you may do HA control via a front end load balancer that is "taught" what to look for to verify operation. Databases use replication and virtual IP's, storage MPIO etc.

Has this solution been vetted out by the vendor?

--edit--

PS I know that Veritas Clusters are complete disparate stacks of equipment. It sounds like to me there is a config issue or a trigger missing to detect that type of failure.

High availability clustering

Mark R

Diamond Member

imagoon

Diamond Member

TRENDING THREADS