• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

High availability clustering

Mark R

Diamond Member
What are people's experience with this?

We've got some mission critical apps at work, and they are set up on triple redundant active-standby-standby veritas clusters.

The problem is that they keep falling over, and not on a single occasion has the cluster ever failed over to one of the standby racks.

In one case, the issue was tracked to a network cable fault, which left the active server unavailable, but didn't trigger a failover. However, the system did failover automatically when the cable was disconnected.

In another case, it was a TOR switch failure.

In 2 other cases, all 3 servers had locked up and needed to be rebooted.

In one case, it was due to a failure of the backbone network, which lost a fiber link - so it doesn't really count as a cluster failure.

Are we having a bad experience? Obviously, we paid $$$$$ for triplicated servers, veritas licensing, etc. so we're a bit annoyed.
 
Honestly it really depends on the app. Without information about how you have these configured and what they do it is hard to say if it is bad luck or a configuration issue.

If for example you are trying to server a webpage you may do HA control via a front end load balancer that is "taught" what to look for to verify operation. Databases use replication and virtual IP's, storage MPIO etc.

Has this solution been vetted out by the vendor?

--edit--

PS I know that Veritas Clusters are complete disparate stacks of equipment. It sounds like to me there is a config issue or a trigger missing to detect that type of failure.
 
Last edited:
Back
Top