What are people's experience with this?
We've got some mission critical apps at work, and they are set up on triple redundant active-standby-standby veritas clusters.
The problem is that they keep falling over, and not on a single occasion has the cluster ever failed over to one of the standby racks.
In one case, the issue was tracked to a network cable fault, which left the active server unavailable, but didn't trigger a failover. However, the system did failover automatically when the cable was disconnected.
In another case, it was a TOR switch failure.
In 2 other cases, all 3 servers had locked up and needed to be rebooted.
In one case, it was due to a failure of the backbone network, which lost a fiber link - so it doesn't really count as a cluster failure.
Are we having a bad experience? Obviously, we paid $$$$$ for triplicated servers, veritas licensing, etc. so we're a bit annoyed.
We've got some mission critical apps at work, and they are set up on triple redundant active-standby-standby veritas clusters.
The problem is that they keep falling over, and not on a single occasion has the cluster ever failed over to one of the standby racks.
In one case, the issue was tracked to a network cable fault, which left the active server unavailable, but didn't trigger a failover. However, the system did failover automatically when the cable was disconnected.
In another case, it was a TOR switch failure.
In 2 other cases, all 3 servers had locked up and needed to be rebooted.
In one case, it was due to a failure of the backbone network, which lost a fiber link - so it doesn't really count as a cluster failure.
Are we having a bad experience? Obviously, we paid $$$$$ for triplicated servers, veritas licensing, etc. so we're a bit annoyed.