got this from my boss this morning and thought I'd pass it on. copying his email straight from my inbox so...dont bitch at me
taking down the switch for an OS upgrade seemed a bit odd to me, but i still thought it was interesting.
I thought this warranted a forward to the group!
Jarred and I spent the better part of a day dealing with what appeared to be DHCP server issues. The results were sporadic, but basically random PCs from a particular subnet/VLAN were getting IPs for a subnet that they should not have been part of. We had machines that were on an untagged VLAN 413 port (10.40.13.x) and they would occasionally receive IP addresses for VLAN 405 (10.40.5.x). The stranger pieces to this is that sometimes these machines would function fine on the wrong network and sometimes they would not?? Also, occasionally, we couldnt get an IP address from the DHCP server? We rebooted DHCP the server several times, checking for any recently installed updates, issues with the NIC configuration (priority/vlan tagging), etc
We had a similar issue about a year ago and had attributed it to a bug in the core switch software. So, because of that, the core was rebooted on Tuesday, problem showed back up on Wednesday, upgraded core to latest IOS (taking down the mill again), rebooted same problem.
When we started performing some mirroring of the DHCP server port, we discovered multiple DHCP requests with the same transaction ID, seemed really odd..
*** im not bothering to upload and insert the pic. ***
After a few hours of various troubleshooting with HP, they suggested that we could have a loop between VLANS. We confirmed this by having a machine on 10.40.13.x and one on 10.40.5.x and then performing a ping from 10.40.5.x to 10.40.13.255 and we could see the broadcast ping between the two networks, which we shouldnt have been able to. So this put us on a goose chase to determine where a loop could possibly be. Jarred discovered some unusual MAC addresses associated with one of the core ports that was on VLAN 405, I reviewed the port and had no documentation for this particular port on our topology maps, so that put us on the trail. We finally narrowed the switch port connectivity down to the sales conference room in the Admin building. Youll notice by the attached picture that someone was nice enough to plug two wall ports into the same phone. It just so happens that they were on different VLANS J, thus creating a loop within the phone. We believe that all of the suspect traffic had been forwarding through this phone so, the little Cisco phone was supporting about 50 users! Once we disconnected the 2nd cable, all of the machines that were incorrectly assigned 10.40.5.x addresses lost network connectivity J .they had to be restarted (release/renew) to gain their proper addressing (10.40.13.x). It was refreshing to finally figure this out, however it was quite frustrating for something so simple to wreak such havoc. Beware of the randomly connected Cisco phone!
taking down the switch for an OS upgrade seemed a bit odd to me, but i still thought it was interesting.