IIS Server 2003 Problems after windows updates

gmc8757

Member
Feb 9, 2005
170
0
0
I have a windows server 2003 machine. It was really behind on updates, so just recently I applied the service pack 1 update and all the updates after that. It's up to date with the updates now. It's also running IIS 6.

The Problem is that after a few days (about 3), the server does not respond to any pings from a computer on a different subnet, therefore, all the websites are down for other subnets including the "outside world". If I restart the server, everyone can get to the sites regardless of what subnet their on. It sounds like there's a buffer filling up denying icmp packets from other routers.

What's I've done so far is this:
1: Updated Nic Drivers to the latest
2: Updated Remote Access Controller driver to the latest
3: Followed Microsofts KB 898060 Windows Server 2003 Service Pack 1 may cause network connectivity between clients and servers to fail
4: Followed Microsofts KB 899148 Some firewalls may reject network traffic that originates from Windows Server 2003 Service Pack 1-based computers

Number 4 actually caused the server to blue screen. I chose the option "load last known good configuration" and it came right back. The reg entry was still there.

Anyone have any suggestions?
Thanks.
 

RebateMonger

Elite Member
Dec 24, 2005
11,586
0
0
Item 3) certainly LOOKS like your problem. As to why it wasn't fixed by following the MS KB article.....

If this problem was caused by Microsoft Updates, especially a Service Pack, MS will give you free support to fix the problem. I'd call them.
 

gmc8757

Member
Feb 9, 2005
170
0
0
I'm working with dell right now, we have gold support so hopefully they can help out. I agree, 3) looks like it exactly. We'll see what dell says and then i'll call microsoft.

Anything else?
 

RebateMonger

Elite Member
Dec 24, 2005
11,586
0
0
Originally posted by: gmc8757
Anything else?
Well, yeah....I've never seen Dell fix a complex "software problem". Dell's fine on hardware. Be sure you have a full system backup..... :(
 

gmc8757

Member
Feb 9, 2005
170
0
0
Good thinking. I do get a full data backup every night. Really not too much on there, just about 10 websites.
 

Smilin

Diamond Member
Mar 4, 2002
7,357
0
0
Originally posted by: gmc8757
I'm working with dell right now, we have gold support so hopefully they can help out. I agree, 3) looks like it exactly. We'll see what dell says and then i'll call microsoft.

Anything else?

Yes, call MS now. Problems from updates are covered free of charge. If you don't get good traction on this, PM me your case number and I'll have a peek.

898060 does not affect pings. It only causes a failure of PMTU discovery so large packets crossing an intermediate WAN link with a low MTU get dropped as if there is a black hole router. The problem was introduced in SP1 and 898060 fixes it. However, more recent (and automatic) security updates also include the updated tcpip.sys so if you are fully patched, 898060 is not an issue.


Some things you can do on your own:
1. Check your logs. Be sure you aren't throwing 202x events indicating running out of resources.
2. Get a very good scope of what is working and what isn't. Same subnet, different, pings, fileshare, nslookup, loopback address ping etc..
3. During a failed state, run "netstat -ano", save the results. See if you are out of sockets or anything.
4. During a failed state, get a network trace at the server itself (netmon2 works fine for this, if it's not in your administrative tools, pop your 2003 CD and add/remove windows components). We need to know if traffic is failing to arrive, or arriving but being 'ignored'.
5. Set your computer up for a manually initiated crash dump. Depending on the scope of the problem, getting a kernel memory dump during a failed state may be required by the MS networking guys. See:
244139 Windows feature lets you generate a memory dump file by using the keyboard
http://support.microsoft.com/default.aspx?scid=kb;EN-US;244139
 

gmc8757

Member
Feb 9, 2005
170
0
0
That's great, I really appreciate your response Smilin. I will do the things on your list and go from there.

Dell had me remove one of the hotfixes, i can't remember which one. When I speak to him again, I'll ask and let you guys know.

I think I'm goign to wait till another failed state to call Microsoft. I will definitely contact them though.
 

Smilin

Diamond Member
Mar 4, 2002
7,357
0
0
Removing a hotfix is cool but removing a security update is NOT a solution, just a very temporary workaround. If Dell had you do this, then bad Dell! :p

If removing a hotfix did the trick, cool. If removing an update did the trick, reapply it and if the probelm reappears call MS directly. Don't be scared :p Even if reapplying knocks you unbootable the MS guys can get you going again.
 

gmc8757

Member
Feb 9, 2005
170
0
0
Sounds good. I'm just going to wait I think until I see if this fixed it. Dell is calling me back next week, it would go down before then if it didn't fix it. BUt you're absolutely right, i don't want to fix it by removing an update.
 

gmc8757

Member
Feb 9, 2005
170
0
0
Ok, server is now in a downstate again.

I did as you said Smilin.
3.During a failed state, run "netstat -ano", save the results. See if you are out of sockets or anything.
4. During a failed state, get a network trace at the server itself (netmon2 works fine for this, if it's not in your administrative tools, pop your 2003 CD and add/remove windows components). We need to know if traffic is failing to arrive, or arriving but being 'ignored'.

I'm going to call microsoft now.


Originally posted by: Smilin
Originally posted by: gmc8757
I'm working with dell right now, we have gold support so hopefully they can help out. I agree, 3) looks like it exactly. We'll see what dell says and then i'll call microsoft.

Anything else?

Yes, call MS now. Problems from updates are covered free of charge. If you don't get good traction on this, PM me your case number and I'll have a peek.

898060 does not affect pings. It only causes a failure of PMTU discovery so large packets crossing an intermediate WAN link with a low MTU get dropped as if there is a black hole router. The problem was introduced in SP1 and 898060 fixes it. However, more recent (and automatic) security updates also include the updated tcpip.sys so if you are fully patched, 898060 is not an issue.


Some things you can do on your own:
1. Check your logs. Be sure you aren't throwing 202x events indicating running out of resources.
2. Get a very good scope of what is working and what isn't. Same subnet, different, pings, fileshare, nslookup, loopback address ping etc..
3. During a failed state, run "netstat -ano", save the results. See if you are out of sockets or anything.
4. During a failed state, get a network trace at the server itself (netmon2 works fine for this, if it's not in your administrative tools, pop your 2003 CD and add/remove windows components). We need to know if traffic is failing to arrive, or arriving but being 'ignored'.
5. Set your computer up for a manually initiated crash dump. Depending on the scope of the problem, getting a kernel memory dump during a failed state may be required by the MS networking guys. See:
244139 Windows feature lets you generate a memory dump file by using the keyboard
http://support.microsoft.com/default.aspx?scid=kb;EN-US;244139

 

gmc8757

Member
Feb 9, 2005
170
0
0
Ok, so I jsut got off the phone with Microsoft. They had me start the remote access service and configure it for basic lan routing.

I hope it works, my only concern is that this was never configured before doing the updates and it worked fine. Also, that the server would be fine for 3-4 days then not respond to requests. I just think that if not having routing service started was the problem, why would it still work for 3-4 days?

I guess we'll see in the next week or so. They left the case open till friday, then they'll call back to see if it is still working.
 

gmc8757

Member
Feb 9, 2005
170
0
0
Actually, I was thinking about it; so I disabled what the MS tech had me do that got it running(enable RRA for basic lan routing) and see if I could get to it from another subnet. After disabling RRS, I was still able to get to the server. So I don't think that was the problem. I think by enabling RRA, it might have just cleared a buffer to get it working again. I enabled it again and we'll see in a few days I guess.

Any thoughts on a possible buffer overflow?
 

Smilin

Diamond Member
Mar 4, 2002
7,357
0
0
Please PM me your case number (SRX... or SRZ... ). I would like to have a look at it.


Thanks



edit:
And FYI ... the case getting left open until Friday isn't exactly accurate. It gets left open until you say it's fixed :p