Cliffs notes: ip change caused support issues due to cached DNS entries on the client systems, ignoring TTL. looking for the best way to avoid the headaches
Long story:
I work for a fairly large and growing financial website... we upgraded our internet circuits over the weekend and were forced to change the IP address. Despite a reasonably low TTL in our DNS records, our support queues were slammed this morning with connectivity issues due to cached ips at what seems to be browser, system and home router layers. It wasn't the best customer experience.
We upgraded by failing over to our second data center for the weekend while the upgrade was completed in our primary data center. Failover is accomplished by a DNS change to the ips for our backup data center. Maintenance was completed and we failed back, but to a new set of ips from our new ISP.
It seems to me that the best way to actually do this is avoid a hard cut over.. leave the old ip addresses active for some overlap to allow the caches to time out. It's frustrating that TTL isn't honored on a larger scale on the client side. Having an active-active setup, with multiple ips in the DNS record would mitigate this too, I'd imagine. In the scenario where we are forced to change, only some of the active ips would be forced to change and those clients who haven't refreshed aren't left out in the cold.
Is there a best practice here I don't know about? How is this accomplished elsewhere?
Long story:
I work for a fairly large and growing financial website... we upgraded our internet circuits over the weekend and were forced to change the IP address. Despite a reasonably low TTL in our DNS records, our support queues were slammed this morning with connectivity issues due to cached ips at what seems to be browser, system and home router layers. It wasn't the best customer experience.
We upgraded by failing over to our second data center for the weekend while the upgrade was completed in our primary data center. Failover is accomplished by a DNS change to the ips for our backup data center. Maintenance was completed and we failed back, but to a new set of ips from our new ISP.
It seems to me that the best way to actually do this is avoid a hard cut over.. leave the old ip addresses active for some overlap to allow the caches to time out. It's frustrating that TTL isn't honored on a larger scale on the client side. Having an active-active setup, with multiple ips in the DNS record would mitigate this too, I'd imagine. In the scenario where we are forced to change, only some of the active ips would be forced to change and those clients who haven't refreshed aren't left out in the cold.
Is there a best practice here I don't know about? How is this accomplished elsewhere?
