Intermittent network software hangs

sieistganzfett

Senior member
Mar 2, 2005
588
0
0
What software can I use to monitor packet loss on a switch, or a router, or a server, etc, basically anything with an interface port on it. I need to see if packet loss is why a network intensive software that is being used locks up and sometimes even crashes intermittently? Regardless of what the environment is, I ran into this wall numerous times before over my few years in the industry and I have still not found a perfect/easy to use, also ?free? software that I can run that will constantly do something like a pathping and list what times it did it to a txt file along. But the most important thing is that I need something that will tell the packet loss that occurs on an interface over time and at what times the packets were lost at. i thought briefly MRTG would do something like it, but didnt look right for what i need.
 

sieistganzfett

Senior member
Mar 2, 2005
588
0
0
does anybody have a recommendation, or I'm just using a batch file right now doing
time /t >> c:\pinglog.txt
ping destination -l 1500 -n 10>> c:\pinglog.txt

i just want to know what time packets are lost on interfaces
 

nweaver

Diamond Member
Jan 21, 2001
6,813
1
0
your best bet is to just monitor stats on a managed switch. Mine shows good stats such as

GigabitEthernet8/1 is up, line protocol is up (connected)
Hardware is C6k 1000Mb 802.3, address is 0001.9723.2400 (bia 0001.9723.2400)
Description: E-Rack2
MTU 1500 bytes, BW 1000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Full-duplex, 1000Mb/s
input flow-control is off, output flow-control is on
Clock mode is auto
ARP type: ARPA, ARP Timeout 04:00:00
Last input never, output 00:00:47, output hang never
Last clearing of "show interface" counters 1w5d
Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 0
Queueing strategy: fifo
Output queue: 0/40 (size/max)
5 minute input rate 0 bits/sec, 0 packets/sec
5 minute output rate 0 bits/sec, 0 packets/sec
20138744 packets input, 5976851726 bytes, 0 no buffer
Received 1352132 broadcasts (1264159 multicast)
0 runts, 0 giants, 0 throttles
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
0 watchdog, 0 multicast, 0 pause input
0 input packets with dribble condition detected
28406141 packets output, 6537685293 bytes, 0 underruns
0 output errors, 0 collisions, 0 interface resets
0 babbles, 0 late collision, 0 deferred
0 lost carrier, 0 no carrier, 0 PAUSE output
0 output buffer failures, 0 output buffers swapped out
6509#

You can even graph those stats with MRTG. What you need to watch for are things like output errors, collisions, resets, etc.
 

sieistganzfett

Senior member
Mar 2, 2005
588
0
0
the switches are unmanaged. all i can do is connect into it with a null cable to update the firmware, which was already done.

my ping/time loop from 1030am to 3pm found a few times where the destination was unreachable, but the 2 pc i was pinging in a loop did not have the timeout occur at the same times. and when the software hung, it recovered itself in about 15 seconds but looking at the ping log the destination unreachable messages were not around the time the software running over the network hung.

mrtg/wireshark will show me packets going around. but does say when they are lost. am i missing something here?
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
mrtg will definately show you dropped pacekts. But if you're dealing with unmanaged gear you're kinda SOL unless you can setup two probes/sniffers that are near the source and destination.

Also, wireshark has built in tools to detect and show problems in the various layers. Things that indicate packet loss are a closing TCP window or retransmits.

What is very bothersome is you are losing pings on a LAN. That should never, ever happen unless the destination is just too busy to respond.
 

sieistganzfett

Senior member
Mar 2, 2005
588
0
0
i know it should never happen on a lan, i'm losing pings 1500 bytes in size randomly. i'm thinking the server could be too slow but i'm thinking its more likely the switch losing packets, ping times show 1 lost out of 30 sent, but its happening about once every 10 minutes, sometimes once every 30 minutes. i checked through the log, its never lost 2 in a batch of 30. never lost 2 in the course of a minute. it seems to lose one ever 10 minutes or so.

i'm going to try a 4 port switch connected to the sever, and the one user that sees the problem, connect both to the switch, connect that switch to the other switch, the other switch has a lifetime warranty, but i cant prove its that switch yet. that switch does fiber to another switch, where the other user has the problem as well.

how do i use mrtg or wireshark to just show the lost packets and log only that? i was thinking run wireshark on the server, maybe even on the 2 computers as well to capture packets if i need to. what config do i use with either to find what i want, with ease?

thanks for the help
 

spidey07

No Lifer
Aug 4, 2000
65,469
5
76
In all honesty it sounds like a cabling problem.

Network "weirdness" and mysterious errors are normally cabling related.
 

nweaver

Diamond Member
Jan 21, 2001
6,813
1
0
you really need to either replace the cables from the switch to the problem PC's with premade cables, and possibly the server to the switch, and then all your switch to switch.

This would be a bit easier with decent gear. A cheap Cisco Catalyst 2912XL can be had on EBay for under 40 bucks or so, and would make something like this much easier. Trying to do this in wireshark is going to be painful.
 

sieistganzfett

Senior member
Mar 2, 2005
588
0
0
since virtually everyone had the problems today with the software locking up, its probably going to be the cable to the sever, or the switch, or related to the server itself in some form, i'm going to do some swapping around of cables and hardware to figure it out. (this server is running like crap right lately too, with 110 processes, with what looks like 2 sql severs for some odd reason, the exchange server, and has a 1.3GB page file.. (its a xeon 3.0 with 1GB ram and 15k HDDs in raid 5.) i can't remember the model of the two hp switches, but the cables, and even the runs (except the fiber one) for the most part look like they were made by some dude, and may have gone to crud over the years. too bad there isn't a prog that will say what the packets lost are on interfaces with ease, since this is an issue I see at the client sites randomly over the years and drives me nuts since every thing that gets replaced looks like it fixes the problem, but then it comes back again weeks later..

surprisingly, my ping script of 1500 bytes drops 1 packet about every ten to twenty minutes on my lan at work as well. as soon as i changed the size on my lan to 1400, and the size for the 2 scripts i'm running on that server at the site with a problem, none of the 3 scripts show any more pings being dropped over the 6 hour period, except in once case where i did see the 1 problem pc get 36 "destination unreachable" messages in a row which would be to short of a time to restart the pc and i haven't looked into what happened there yet. (this pc is in the remote building, with a few other pc connected to one HP switch and the line is fiber between switches, but the main building with the server did not have a single ping fail to the one pc connected to the HP switch in that same building)
 

sieistganzfett

Senior member
Mar 2, 2005
588
0
0
i figured out the problem, server hangs for about a minute randomly throughout the day. event id 7011, dealing with ntfrs when the server hangs, and i get calls from the people the moment it hangs for them, i conected into the server and narrowed it down to that error. 2 things i did was updated the mcafee virusscan 8.0i patch 14 that kept failing its autoupdate for a week with a superdat. (problem occured again) then did

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\LanmanServer\Parameters. I set my Autodisconnect value to 0xfffff. See M138365 for an article describing the Autodisconnect parameter". The value 0xfffff is the highest value allowed on a domain controller. In my case, I used 0xfff. The default value on my SBS 2003 server was 0xf.

and disabled the protection piolet & virusscan's services from running since that 7011 occured once more. hasnt happened since.