ARP issues

RW-Taylor

Junior Member
May 11, 2016
9
0
0
Looking for assistance with an odd network issue I'm seeing.
I have two servers, Server A and Server B.
Both are connected directly to my core switch.
Server A is connected to port 1/15
Server B is connected to port 2/4
Server A has a NIC with IP address 192.168.1.213
Server B has a NIC with IP address 192.168.1.218
connectivity to Server B is spotty at best.
When I tracert to the FQDN of Server B, the first hop it tries to take is 192.168.1.213.
When I run a "Show Arp" on the core switch, I see both
192.168.1.213 and 192.168.1.218 with the same MAC address, and listed on the same port, 1/15.
I then did a port mirror of 1/15, ran a packet sniff, and cleared the arp cache on the core. I can see the NIC of Server A replying to a "who has" for both 192.168.1.213 and 192.168.1.218.
I have verified that 192.168.1.218 is in no way associated with Server A.

Any thoughts?
 

Tweak155

Lifer
Sep 23, 2003
11,448
262
126
How could they have the same MAC? Was the NIC ever installed on server A that is now on B?
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
How could they have the same MAC? Was the NIC ever installed on server A that is now on B?

No, the NIC from from Server A and B have always been installed in their respective servers.

They show as having the same MAC address because Server A is replying to the ARP "Who Has" for IP 192.168.1.213 AND 192.168.1.218, when it only really has 192.168.1.213. And this is what is causing all of my connectivity issues with Server B.
 

Tweak155

Lifer
Sep 23, 2003
11,448
262
126
No, the NIC from from Server A and B have always been installed in their respective servers.

They show as having the same MAC address because Server A is replying to the ARP "Who Has" for IP 192.168.1.213 AND 192.168.1.218, when it only really has 192.168.1.213. And this is what is causing all of my connectivity issues with Server B.

Ah ok, I misunderstood what you wrote.

If you take Server A off the network, does Server B work correctly?
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
Unfortunately that is not an option, as Server A is the iSeries DB server which houses ALL of our company's reporting information. It needs to be up 24x7.

An interesting note, every so often when I clear the arp cache on the core switch, the appropriate MAC address and port information is shown for Server B.

Given that fact, I just ran another packet capture, but this time I cleared the arp cache on the core switch until I received the appropriate MAC address and port information for Server B.

Unfortunately, the packet capture didn't show anything different from the first time. it still shows Server A replying to the "Who Has" for both 192.168.1.213 AND 192.168.1.218. It does not show Server B replying at all, even though the ARP cache is correct.
 

Tweak155

Lifer
Sep 23, 2003
11,448
262
126
Ok then can you do a reversal? Take off server B and see if Server A reports anything for .218 IP address.
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
I just did the following
took Server B offline (unplugged it from the core)
cleared the ARP cache on the core switch,
tried to ping 192.168.1.218 from the core,
ran a show arp on the core,

Same results as before, the ARP cache on the core shows both 192.168.1.213 and 192.168.1.218 as having the same MAC address, and being plugged into port 1/15
 

Tweak155

Lifer
Sep 23, 2003
11,448
262
126
I just did the following
took Server B offline (unplugged it from the core)
cleared the ARP cache on the core switch,
tried to ping 192.168.1.218 from the core,
ran a show arp on the core,

Same results as before, the ARP cache on the core shows both 192.168.1.213 and 192.168.1.218 as having the same MAC address, and being plugged into port 1/15

Pretty sure this indicates a configuration issue on Server A, always best to find a test that confirms the theory which I feel this does.

I know you said you confirmed there are no 218 references on Server A, but should confirm that you don't have multiple IP addresses assigned to either IPv4 or IPv6.

It may be simplest just to change the static IP of Server B if you don't want to root cause the issue.
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
That is an interesting thought. Server B is just a management console, so I don't see any reason we can't change it's IP... I'll give that a try!
 

Gryz

Golden Member
Aug 28, 2010
1,551
204
106
I'm a networking guy, and not a server guy. So excuse me if I ask dumb questions.

What kind of servers are we talking about here ? What OS do they run ? Windows Server Something ?

Do you run a VM-environment ? What kind of VM-technology do you use ? And more importantly, is there a Virtual Switch in the HyperVisor ? And if so, which one ? Stuff like Hyper-V Extensible Switch and/or Hyper-V Network Virtualization ?

If so, you must realize that the host-OS on your servers acts like a layer-2 switch and maybe even as a router or a NAT-box. In any case, it is not a dumb host any more. If you have a misconfigured VM on the box, your virtual switch might think it needs to proxy-arp for it. In other words, if there is a VM on server A that thinks it has IP-address 192.168.1.218, the hyper-switch might reply for arp-requests for it.

Any specific applications you run on those 2 servers ? I vaguely remember that Microsoft had some application that tried to do redundancy by letting your backup-server reply ARPs for the ip-address of the main-server, if it thought the main-server was down. A bit like Microsoft trying to do HSRP/VRRP on the application level. I am sorry I don't remember the details. But it might be worth looking into such a possible application on your servers.
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
Server A is an IBM iSeries (their new name for the AS400)
Server B is an IBM HMC (the hardware management console for the AS400)

We do utilize VMWare hosts, but they would not effect this current issue.
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
Ok, did the following
changed the IP address of Server B to 192.168.1.214
cleared the ARP cache on the core switch
pinged 192.168.1.214 from the core switch
did a show ARP

and it's showing the exact same thing as initially. Server A and Server B's IP addresses in the ARP cache both have the same MAC address and port number 1/15
 

Tweak155

Lifer
Sep 23, 2003
11,448
262
126
That is bizarre. Unfortunately I don't know enough about server configurations to suggest another solution. The first thought that came to my mind was to get Server B off of 192.168.1 and move it to 192.168.2.x, but not sure if that would cause you network headaches.

It seems as though Server A is configured to respond to 192.168.1.x requests. My guess is Server B is your only other machine on 192.168.1.x?
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
Negative, 192.168.1.x is the network ALL my servers are on. I'm beginning to think that it is an IBM iSeries specific issue. I am reaching out to a midrange/iseries mailing list I just signed up for. If/When I find an answer I will definitely update this post.

Thanks for your assistance folks!
 

RW-Taylor

Junior Member
May 11, 2016
9
0
0
I have found the answer to this problem!

There is a virtual NIC on the P7 with an IP address of 192.168.1.215 with a subnet mask of 255.255.255.240.
From what I have been told in the midrange-L email list I signed up for (see below)

I am not sure to understand what you mean but I have got the same kind of
behaviour with a misconfigured network mask for a Virtual IP Address configuration
(VIPA) on an IBM i partition. The mask of the VIPA was *not* set to
255.255.255.255. The result was that the IBM i partition was responding to any arp
request about an IP in the mask of the VIPA "yes, this IP is mine and here my MAC
address".
So, if you have such a VIPA configuration, double check the masks.

The cause of our issues is that we don't have a 255.255.255.255 subnet mask on the VIPA.
I believe the VIPA was initially setup as a sort of fault tolerance. I believe the 110 and 213 NICs should be plugged into vlan4, and everyone should be accessing the 215 address instead of the 213 address.
I would prefer to have fault tolerance setup via a LACP connection. I will implement that instead.

For now I have ended the 215 interface, which stopped the issue we were seeing with the HMC not responding.

Hope this might help someone in the future!
 

MtnMan

Diamond Member
Jul 27, 2004
9,294
8,604
136
Years ago while upgrading a customers network (converting from Token Ring to Ethernet) we bought a case of 100 3COM NIC cards. Two of the cards in that case had the same MAC. Took us days to find that.