• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Two machines lose each other in a big network

Felecha

Golden Member
I work for a large software company, several thousand employees. There are a number of offices worldwide, so it's a big network. I am having a big PITA problem with two machines in my local office of about 20 people, so it's a smallish LAN in a really big network.

My own machine at my desk is a Dell Latitude D610 Windows XP Pro. The other problem machine is a Sun Blade 150 running Solaris 5.8. The Solaris box runs database server software that I test, running the test scripts off of my laptop. So they talk to each other all day.

Anywhere from several times a week to several times a day I find they have lost their connectivity. From the laptop I ping and get "Destination host unreachable." The only solution is to reboot BOTH of them. Cold boots, too.

The mystery piece is - when it happens, the laptop can ping any other machine in the network, Windows or Unix (we run lots of different Unix and Linux machines), and any other machine can ping the laptop. AND the Solaris box can ping any other machine and any other machine can ping the Solaris box.

But the two cannot ping each other.

the IT guy has been really good about trying everything in the world to solve it. He is diligent and smart and has a large IT department to work with, and he's just flat stumped. They have looked at all kinds of internal monitoring and logging and he's just stumped.

Anyone have ANY idea?
 
For software issues, check the Gateway and DNS addresses as configured on the clients (same DNS and the same one as all the other clients). Are they correct? Also, check the host file on each. Maybe there is a bad or malicious host entry that is causing the problem.

Are they in the same piece of hardware (switch)?
 
Alas, I'm not going to be able to answer many questions. I know enough to work with my Linksys router and my home network, but this is a large network totally invisible to me, and all under IT's control. We are in an office suite with as I said, a small number of employees, so there must be something in the way of local hardware here that connects the two machines that are physically 5 feet away from each other. But each has an ethernet cable that goes into a wall jack, and from there who knows what there is? I'm sorry, that leaves anyone with ideas sort of out in the cold without much way to help me, huh?

I can get the Default Gateway with ipconfig on the Dell, but I don't know the command in Unix. I tried ifconfig, which I have used in Linux, but ifconfig in Solaris just gives me back a usage statement.

With ping I can see that both are at 10.24.112.***, but that's sort of what you would expect, they are on a local subnet. At least the laptop is set for DHCP automatic. I would suspect they both use the same DNS.
 
The laptop C:\WINDOWS\system32\drivers\etc\hosts is just
127.0.0.1 localhost

the solaris /etc/hosts is
127.0.0.1 localhost
10.24.112.36 <machinename> localhost loghost
 
ifconfig * or all or something....I forget now, but read teh usage on it, and it will tell you how to pull it from all of them.
 
Originally posted by: JFG
See if the TCP/IP NetBIOS helper service is crashing. Maybe the tcp/ip driver got corrupted
That would not be my choice. NetBIOS would not effect ping. Especially on a UNIX box. Usually it is local misconfiguration, then bad DNS, then bad DHCP server setup, then bad switch ports, then bad routing tables, then bad cables, then bad router firmware (your mileage may vary).

 
Felecha, this could be a lot of things. I know you were hoping for a simple answer, but perhaps you and your IT guy will be relieved to hear that there isn't one (it's not that you just forgot to check one simple thing...).

Some places to look:

1. The switches' error stats. Any events on the ports? If you take the port down and bring it back up, does that help?
2. The switches' MAC address tables. What do they say about these systems when things work and don't? When you clear the MAC table, what happens?
3. Can you attach a dumb hub/switch to the end of the cable going into one of the systems having trouble as well as the problem devices, hook a laptop up to that hub/switch, and show that the network works for the laptop but not the problem station?
4. The routers' ARP tables. What do they say about these systems when things work and don't? When you clear the ARP table, what happens?

Those are some starters, there's plenty more it could be.
 
Yeah, HOPING but not expecting. My main HOPE was that the description I gave to start with might make someone perk up and say "Oh, I have heard of one just like that. Really rare, but here's the deal".

And as I said, I wont be much "help" to anyone who offers to help. I knew that when I started, I tried to find a way to present it that would not lead to lots of people in effect "wasting their time" on me.

I should have added that the IT guy is not even in my local office. I work in Concord NH and he is in Concord MA, a much larger center for our company. The IT there is responsible for the NH branch office. He comes up here when he has something he needs to do for us, and I'm on nice first name terms with him. Really nice, really sharp guy. Anyway, I won't be able to answer too many of you folks' questions. I will pass on some of the suggestions here but it is not likely that I will be able to pursue it the way I would if it were a problem here at home. We have to submit service requests and those go into a queue and the IT guys have managers that track their activity etc.

Anyway, I can pass these things on and see what happens.

Thanks to all

F

 
Just an opinion as to your problem:

Most company lans use Static IP ... not DHCP automatic
as the address will change every time the lease expires.

Also in the Hosts file: Below is what you said was in there
Just my input, I think your laptop needs that other entry

10.24.112.36 <machinename> localhost loghost (use the ip of the latop)

The laptop C:\WINDOWS\system32\drivers\etc\hosts is just
127.0.0.1 localhost

the solaris /etc/hosts is
127.0.0.1 localhost
10.24.112.36 <machinename> localhost loghost

One of those might fix it .. also try new CAT 5 cables
to the wall plate .. may be a bad cable
 
Originally posted by: bruceb
Most company lans use Static IP ... not DHCP automatic
as the address will change every time the lease expires.
The use of Static IP addresses for client PCs is seldom done, and only for a few specific reasons. The local DNS Server generally makes changing IP addresses a non-issue.
One of those might fix it .. also try new CAT 5 cables to the wall plate .. may be a bad cable
I've seen some very strange behavior caused by "home-made" network patch cables.

I'd try to rule out a switch or cabling problem. Get a $10 switch and hook up the two problem PCs using two commercially-made patch cables. Obviously, also connect the switch to the rest of your network to the switch using another patch cable.
 
Back
Top