Pinpointing source of network freezes (FIXED!)

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Occasionally our computers would freeze up for a few seconds (cant Ctrl Alt Del) and immediately following there'd be error messages such as
"This IMAP command could not be sent to the server before the connection was terminated" in Outlook Express, or
"Network path not found" in Explorer.

Almost immediately after, the connection will be restored.


Our network specs:

16-port 10/100 Switch
4-port Zyxel router connected to uplink port of switch
3 servers (P4 1.8s for mail & pdc/file share; slow K6-233 for print serving but is about to be replaced by a P4 as well)
6 workstations, all running Win2k SP3 except mine (XP)

how can I accurately narrow the problem down? My guess would be the switch as it is a generic/cheap model. Any ideas?

 

Garion

Platinum Member
Apr 23, 2001
2,327
4
81
Grab Ping Plotter from download.com and setup a constant ping. When you get a failure, see where it lies. If it's in the switch, ALL the hops in the route will go away, including the first one. If it's somewhere else, this will help you identify it's location.

- G
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
! Thanks for that fast response :)
Exactly what I was looking for... i'll go along now and check out that util.

:beer::D


edit: OK got it up and running.. using default settings (15 seconds) to pingplotter.com . Should I just ping my mail server?
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Garion,

I changed it to ping my mail server (192.168.1.102).

It s been running close to an hour now... had some occasional red spikes , lasting several seconds. But I just had a huge one , over one minute long.. here's a snippet of the log:

"28/01/2004 16:57:25",4
"28/01/2004 16:57:28",0
"28/01/2004 16:57:30",0
"28/01/2004 16:57:33",0
"28/01/2004 16:57:35",0
"28/01/2004 16:57:38",*
"28/01/2004 16:57:40",*
"28/01/2004 16:57:43",*
"28/01/2004 16:57:45",*
"28/01/2004 16:57:48",*
"28/01/2004 16:57:50",*
"28/01/2004 16:57:53",*
"28/01/2004 16:57:55",*
"28/01/2004 16:57:58",*
"28/01/2004 16:58:00",*
"28/01/2004 16:58:03",*
"28/01/2004 16:58:05",*
"28/01/2004 16:58:08",*
"28/01/2004 16:58:10",*
"28/01/2004 16:58:13",*
"28/01/2004 16:58:15",*
"28/01/2004 16:58:18",*
"28/01/2004 16:58:20",*
"28/01/2004 16:58:23",*
"28/01/2004 16:58:25",*
"28/01/2004 16:58:28",*
"28/01/2004 16:58:30",*
"28/01/2004 16:58:33",*
"28/01/2004 16:58:35",*
"28/01/2004 16:58:38",*
"28/01/2004 16:58:40",*
"28/01/2004 16:58:43",*
"28/01/2004 16:58:45",*
"28/01/2004 16:58:48",0
"28/01/2004 16:58:50",*
"28/01/2004 16:58:53",*
"28/01/2004 16:58:55",*

"28/01/2004 16:58:58",0
"28/01/2004 16:59:00",0
"28/01/2004 16:59:03",0
"28/01/2004 16:59:05",0
"28/01/2004 16:59:08",0
"28/01/2004 16:59:10",0
"28/01/2004 16:59:13",0
"28/01/2004 16:59:15",0
"28/01/2004 16:59:18",0
"28/01/2004 16:59:20",0
"28/01/2004 16:59:23",0
"28/01/2004 16:59:25",0
"28/01/2004 16:59:28",0
"28/01/2004 16:59:30",0
"28/01/2004 16:59:33",0
"28/01/2004 16:59:35",0

What dya make of that?
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
And another, not even 10 minutes after I restarted the Pinging :


"28/01/2004 17:12:33",0
"28/01/2004 17:12:35",*
"28/01/2004 17:12:38",*
"28/01/2004 17:12:40",*
"28/01/2004 17:12:43",*
"28/01/2004 17:12:45",*
"28/01/2004 17:12:48",*
"28/01/2004 17:12:50",*
"28/01/2004 17:12:53",*
"28/01/2004 17:12:55",*
"28/01/2004 17:12:58",*
"28/01/2004 17:13:00",*
"28/01/2004 17:13:03",*
"28/01/2004 17:13:05",0
"28/01/2004 17:13:08",*
"28/01/2004 17:13:10",*
"28/01/2004 17:13:13",*
"28/01/2004 17:13:15",*
"28/01/2004 17:13:18",*
"28/01/2004 17:13:20",*
"28/01/2004 17:13:23",*
"28/01/2004 17:13:25",*
"28/01/2004 17:13:28",*
"28/01/2004 17:13:30",*
"28/01/2004 17:13:33",*
"28/01/2004 17:13:35",*
"28/01/2004 17:13:38",*
"28/01/2004 17:13:40",0

EDIT: I get these vertical black rectangles every so often .. what might those represent?
 

Garion

Platinum Member
Apr 23, 2001
2,327
4
81
Is your mail server on your same network? If a ping to it is timing out, then you've got some kind of cable / switch / NIC problem. Try to swap out the switch, or use a crossover cable to the other device temporarily and see if that fixes the issue.

- G
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Yes and yes.

All our computers are on the same network... everything is plugged into the "no-name" 16-port switch.

And, these cables were homemade. I believe they've been tested by a cable tester and passes those tests, but maybe the cables "quality" is not up to snuff?
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Another possible source (?) -

some cables run through the ceiling, I know this is a big no-no... but thats the way it was done. There could be electrical interference with the flourescent lights, etc.

What are the chances of this causing problems, from your experience?
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
I can almost guarntee you have a cable problem.

Do a search on the net for 568b and follow the wiring diagram.

w/o, o, w/g, b, w/b, g, w/br, br.

Or better yet terminate the cables with keystone jacks instead of rj45 plugs then use a patch cable from the jack to the pc.

Or for giggles you could replace the switch and see if the problem remains.
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
I have ADSL, I guess that's close enuf to a cable modem? ;)

The network cables were done according to some specification... not sure if that is the 568b, but they were all done the same way. I'll take a look at the layout when I get to work in a bit.

At this point it appears that replacing the switch would be an easier solution overall requiring less effort than terminating the cables in keystone jacks. However if the problem is still present after changing the switch.... I'll post back here for help :confused:
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Do a search on the net for 568b and follow the wiring diagram.

w/o, o, w/g, b, w/b, g, w/br, br.
Just got into the office and I can confirm that the cables have been wired up this way.

Here's the deal: I left Ping Plotter on overnight pinging our PDC at 15 second intervals. Got into work this morning and found RED EVERYWHERE! Here's a summary:

Target Name: N/A
IP: 192.168.1.101
Date/Time: 29/01/2004 19:26:48 to 30/01/2004 09:21:14

Hop Sent Err PL% Min Max Avg Host Name / [IP]
1 20000 5392 27.0 0 1 0 [192.168.1.101]

So out of 20000 packets sent, 27% of them have been errors :Q:Q

 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
I've installed Ping Plotter on a couple other machines, and they too show PL pinging our PDC and mail server at exactly the same time.

The switch has to go.

Now, for a small office network (less than 16 computers) do I need VLAN, QOS, etc.? i.e.

Level One FSW-1620TX ~US100
16 10/100 Mbps copper ports
Store-and-Forward technology
filtering/forwarding is used to eliminate bad packets
Back-Pressure function for Half-Duplex operation
IEEE802.3x flow-control supports for Full-Duplex operation
Auto-MDI/MDIX and Auto-Negotiation functions at each port
LEDs on front panel for easy access and real time viewing status of Switch

Level One FSW-1640TX ~US115
Same as 1620, but:
Supports port-based VLAN and trunking
Supports Qos function based on IEEE 802.1p port base & MAC base priority

Level One FSW-1610TX ~US108
up to 16KB MAC address table
1MB buffer memory

As you can see, these differ not by a lot of $, so do I just go for the most expensive out of these three?

Even more confusing to me are switches in the US60 price range, for example "Buffalo", "Micronet", or "Practical" 16-port switches. I could not even find information on the net about these. Should I avoid these like the plague and go for a slightly higher quality switch?

Thanks in advance.

 

Garion

Platinum Member
Apr 23, 2001
2,327
4
81
If it's your server, do you know that it's really the switch? Do you have speed/duplex problems causing packet loss? Do you have a bad NIC that's going down? A bad cable? Try a new cable on a new switch port and see if it helps. Also try to lock the server NIC to 100/full and see how that works.

- G
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Sorry, please bear with me here... :eek:

Do you have speed/duplex problems causing packet loss?
How would I go about determining that?

Do you have a bad NIC that's going down?
One bad NIC in a network can cause problems? I seem to recall reading about "broadcast storms" from a dying NIC.. might this be what is happening, and causing problems?

A bad cable?
Again, can one bad cable somewhere on the network cause problems as described?

It is my server, and no, I am still not 100% sure it is the switch. So far, the only thing I can conclude is that PL appears to affect all 3 machines that have Ping Plotter running.

Perhaps, afterhours, I can plug the two servers, my machine, and another machine directly into the router (only 4 ports) and test with Ping Plotter? That wouldn't eliminate bad cables or a bad NIC somewhere on the network though, if it worked...

Thanks again for your patience, Garion and Spidey.
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
Everyone's left work, so I shut down all workstations except the servers and my machine. Ran Ping Plotter again, pinging the mail server, and got a 30 second timeout, and just now, a 5 second timeout.

I'll leave it on overnight again and compare results to last night's run ...

This should eliminate "bad NICs" as a possible cause, as I'm sure the 3 servers and my machine have good NICs. It doesnt however eliminate bad cables (how about electrical interference??) nor a bad switch.
 

suklee

Diamond Member
Oct 9, 1999
4,585
10
81
It was the hub afterall! I went out and bought us a relatively cheap 16-port switching hub for ~US73. All problems are gone, according to an overnight run of ping plotter. ZERO PL! :D

thanks very much Spidey07 & Garion for your help :beer::beer::beer: