Winsock corruption on random network computers.

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
Our network has a strange problem no one can seem to solve. We get some computers with the inability to talk to anything on the network. Reboots do nothing, there is no spyware or viruses detected by any tool that we have tried, and plugging other devices into the same port works fine. This computer will not work again unless we use a winsock fix.

What is bothering me is we can't seem to find a cause. Our tech's can't find any spyware or viruses, and it happens to computers at random. We have had brand new images get this problem within 30 seconds of boot up, and other computers that have never had this problem. It never happens on our windows 2003 and 2008R2 servers, but has happened on windows XP and windows 7 desktops.

We have suspected our antivirus was the cause (after noticing some strangeness on the update server) and put a test lab on campus that does not have antivirus on it. It is happening in that room as well (which only runs vmware view client as a restricted user).

I'm really at a lost to what we need to look for. It happens at random (sometimes going months without an outbreak) and that makes it really hard to track down. Any suggestions? This is causing a big problem for our windows 7 migration because when this happens to windows 7 we have yet to find a winsock like fix that resolves the issue.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
Yep, winsock fix is the only thing that helps, some times it's gone for good on that PC, some times it comes back in minutes.

Something has to be screwing up the winsock, but I can't find it.
 

ViviTheMage

Lifer
Dec 12, 2002
36,189
87
91
madgenius.com
We figured it was landesk because we could not connect remotely, or use most/any of the functions landesk had on an installed computer.

We also found some computers would remove themselves from our active directory domain (we run a 2008 DC environment), purely at random, causing similar issues.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
We figured it was landesk because we could not connect remotely, or use most/any of the functions landesk had on an installed computer.

We also found some computers would remove themselves from our active directory domain (we run a 2008 DC environment), purely at random, causing similar issues.

Well, the bad news is that we are getting this problem in our test lab which only has windows XP, Office, and the vmware view client installed. So that rules out or remote access software.

One thing that makes me think it's spyware (although I can't find any spyware) is that it only hits some subnets and not others. For example, my virtual desktops have never experienced this issue nor has our server vlan. Our classroom and staff vlans are the only networks experiencing this issue. It has also never happend on our IT network...
 

ViviTheMage

Lifer
Dec 12, 2002
36,189
87
91
madgenius.com
Well, let's get some details. All effected computers are on one VLAN, correct? Are they firewalled at all? Websensed?

Affected users, do they obtain an IP, DNS, etc?

Do they have any network traffic at all (local, HTTP, CIFS, port 80 to the web)?
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
The effected computers are are 2 out of the 5 vlans on campus. All run though the same checkpoint firewall and have very similar rules (in fact these vlans are the most restricted). All vlans get NTP, DNS and DHCP from the same servers.

In terms of network traffic, they are allowed out to the internet for http/https and allowed locally to access their users assigned and group assigned network volumes (via AD).

All computers on campus are imaged with the same images (winxp or very recently win7) and most tend to use the same applications. All email is handled via google (we do not allow local mail clients). All systems run the latest version of symantec end point protection which is updated constantly. All systems get windows updates monthly via WSUS.

Things we have tried or have happened during the time this problem has existed.

1) Building a brand new image - still effected.
2) Not using our antivirus (tried using forefront and tried no antivirus in a test lab)
3) No applications on brand new image - still effected.
4) Migrating systems to windows 7 - some win7 machines have been effected again after migration
5) Fresh image of windows 7 on new machines - some have been effected.
6) Scanning effected machines with a plethora of spyware and malware detection softwares. - Nothing found.
7) Running winsock fix - solves the problem but it can return minutes to weeks later. In one case before the tech had left the office.
8) Migrated Filesharing, printing, DNS and DHCP - Not done because of this issue, but because we have migrated from novell to Microsoft AD during the time we have had this issue.
9) Locked down firewall - Again, not done because of this issue, but because we had a IT audit.
10) Removed websense from our network - College changed policy to not do content filtering on students.


Things of note:

1) My virtual desktops have never had this issue. They are on the same vlans as the physical machines with this issue.
2) None of the other vlans have had this issue.
3) As far as we can tell, when this happens on windows 7, there is no way to fix it short of a re-image.
 

sourceninja

Diamond Member
Mar 8, 2005
8,805
65
91
We have not yet found a solution. We do not have ipv6 enabled, and have not had the problem show up in the last week or so. I'm waiting for it to pop up again so I can keep hunting for a solution.

It comes and goes sometimes everyday, sometimes a week or two with nothing.
 

reiver

Junior Member
Jan 12, 2012
2
0
0
I am having the same type of problem... I have done allot of the same as you and I replaced my main switches. Have you found anything else or did the problem go away? I have been wondering if it was hardware on the network causing it...

Clearing the winsock on your windows 7 computers does not fix it?
 

reiver

Junior Member
Jan 12, 2012
2
0
0
I think the issue was a broadcast storm, not sure yet as we have not waited long enough to tell for certain. Basically we have one main switch and then allot of 5 port switches (old building, owner won't re-wire) in each office as there is more then one computer in each office. So the design of the network made it a breeding ground for broadcasts, when we discovered the true root of the problem was the DHCP server.

The DHCP server was set to hand out addresses every 5 minutes instead of the default 8 days! So, in this network with all the additional switches and all the devices constantly asking for an address it was just creating a traffic jam. The DHCP server is now set to 8 days, so we shall see if the problem is resolved now. Will post back if that was the solution, it may be a different root cause for others, but the problem of a network traffic jam may be the same.

In my case, I could find no reason why some computers were affected and not others, so that made troubleshooting difficult.
 
Last edited: