• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Weirdness on a 4500 with a bad default gateway

Brovane

Diamond Member
We have a really weird issue with a 4507R at a remote site yesterday. One of the LAN team members is in the switch making changes to a port and we get a syslog message from the supervisor engine that it has lost handshaking with the standby sup engine. The redundant engine refuses to come backup remotely. Talk to the boss and we decided that we will remove and re-seat the standby sup engine early next morning during SLA window.

I go home and then my boss calls me at home because about 4-hours later the switch looks like it goes down and the Junior Admin is freaking out. He is driving to the site with a spare parts to get the switch backup. I look at my BB and judging by the alerts I don't think the switch is down. I go into our networking monitoring gear and the switch stopped responding to pings on its main IP and cannot access through SSH but about 80% of the equipment connected into the switch is up, working fine and responding to pings. What the hell.

So I dig a little deeper and find that I can ping the one of the local networks setup on the switch but I still cannot ping the main IP on VLAN 1. I start looking over the config and I see a bad default gateway. The IP route for 0.0.0.0 0.0.0.0 was set correctly but the default gateway was not correct. However this default gateway has been in the config for at least 3-months so I am not even sure if this caused a problem. When the Jr Admin arrived onsite before he started ripping gear out I had him remove both sup engines and then bring them up one at a time and the switch came up fine.

I went in and corrected the default gateway. Opened a TAC case and Cisco cannot figure out what happened either. Has anybody else seen this before that a bad default gateway works fine and then just stops working? Just kind of a weird issue.

 
I'm confused. What do you mean by "bad default gateway". If the switch is routing you should not even have a default gateway in the config with the "ip default-gateway" command and just have the 0/0 default route and inject that into the routing protocol. "ip default gateway" command is used for management traffic, which is normally vlan 1 unless you change it. Eitherway, don't use both (unless you have a really compelling reason why you should, for example a separate management network - they are not the same thing.

What's the routing protocol and the config of said routing protocol?

-edit-
If it's exactly 4 hours that sounds like arp timer. What redundancy mode are the sups running?
 
Originally posted by: spidey07
I'm confused. What do you mean by "bad default gateway". If the switch is routing you should not even have a default gateway in the config with the "ip default-gateway" command and just have the 0/0 default route and inject that into the routing protocol. "ip default gateway" command is used for management traffic, which is normally vlan 1 unless you change it. Eitherway, don't use both (unless you have a really compelling reason why you should, for example a separate management network - they are not the same thing.

What's the routing protocol and the config of said routing protocol?

-edit-
If it's exactly 4 hours that sounds like arp timer. What redundancy mode are the sups running?


The Sup engines only support static routing so no routing protocol. The switch itself is trunked from the remote location over our MAN back to the core. However we have a VLAN that is local to only the site and another VLAN that is extended over several remote locations and this is all configured in the switch with static routing. The redundancy mode is SSO. I need to talk to the Engineer that configured this switch next week and figure out why he has the 0/0 default route and a default gateway configured for this equipment. It is weird that it was right around 4-hours from when the handshaking was lost to when we lost access to the switch.

The default gateway was on a completely different network from what the IP of switch was on. About a 18-months ago we had trunked this switch back to the Core and changed the IP of the switch to a IP on VLAN 1 with the rest of the network gear. The default gateway left in the config was the default gateway for the network that the switch used to be on before we changed the IP. So it looked like when the work was done the old default gateway was left in the config but the correct 0/0 default route for the new IP was put into the config.
 
I kinda sounds to me like you may have replaced the backup SUP once before and then not having had the switch rebooted since.. So the sups had different configs loaded, or maybe in ROMON.. I've tried something similar with 2 brand new SUP's to be installed. One came with Hybrid IOS, and the other with standard IOS.. Took me a while to get failover working there 😛..

Btw.. If you have "ip routing" enabled on the switch it disregards the "ip default-gateway" command, but instead uses the 0/0 route as it's default gateway. Without "ip routing" enabled, it'll be the other way around..
 
Originally posted by: rathsach
I kinda sounds to me like you may have replaced the backup SUP once before and then not having had the switch rebooted since.. So the sups had different configs loaded, or maybe in ROMON.. I've tried something similar with 2 brand new SUP's to be installed. One came with Hybrid IOS, and the other with standard IOS.. Took me a while to get failover working there 😛..

Btw.. If you have "ip routing" enabled on the switch it disregards the "ip default-gateway" command, but instead uses the 0/0 route as it's default gateway. Without "ip routing" enabled, it'll be the other way around..

Both Supervisor engines where on the same code 12.2 (25) EWA12. I had just upgrade the switch from EWA5 to EWA12 3 weeks earlier. So I am certain that they where both on the same version of code and both sups had been rebooted within the past 3 weeks. Also in SSO mode you cannot have different versions of code.
 
Back
Top