windows 2003 issues

pcthuglife

Member
May 3, 2005
173
0
0
ok so a few months back my office was forced into purchasing a windows 2003 server. we were forced because the accounting and payroll software that we were upgrading to was only available for windows servers. so we got the server and got everything setup which was pretty easy. Later we made the server our network's PDC running active directory.

Everything worked great for about 2 months then all of a sudden the machine stops working. The only way the machine will boot is if it's in active directory restore mode, or safe mode. Now here's the kicker, if we restore the system files via the windows 2003 installation cd, everything works fine for about 24 - 48 hours.

We disabled terminal services to rule out the possibility of someone logging in and screwing with the system. (the log files on our proxy server didnt even show any suspiscious connections over port 3389, but we disabled rdp just to be safe)

The event viewer gives some warning and says the source is aarich. Some errors indicate a bad scsi controller but we're using SATA drives. other than the arrich error, we get a lot of erros of network services failing to start. My question is what would cause these services to fail after 48 hours?

The system will work perfectly fine after the system files are restored, then for no reason the system will freeze, and when we reboot the server will not boot into windows under normal mode. This has been so freakin frustrating and it just pisses me off that we were forced to pay money to use this crappy OS, and the darn thing won't even work. Any thoughts before I call MS and pay $375 for tech support?
 

stash

Diamond Member
Jun 22, 2000
5,468
0
0
Is this machine connected directly to the Internet.

What network services are failing to start. Exact errors instead of vague descriptions would be much more helpful here.

It sounds like either hardware or malware at this point. Server 2003 does not crash for no reason, sorry to burst that bubble.
 

pcthuglife

Member
May 3, 2005
173
0
0
well im not in the office right now so i cant give you the list of services. The server is connected to the internet through a hardware firewall and proxy server. The proxy server logs do not show any unusual connections via rdp. The windows server's anti virus program has not picked up on any viruses. All of the computers on our lan have auto protect virus and adware protection. Also the proxy server prohibits lan users from downloading files executable or compressed zip files.

The machine works perfectly fine when it boots into Active Directory Restore mode. There are no warnings in the device manager, and the network services that run while in the restore mode work fine.

Withouth seeing the event viewer, does anyone have any ideas as to why the server would just stop working while booted up normally? Like I said, all of the network services work just fine for about 48 hours after we run the system restore from the installation cd. Has anyone ever heard of Active Directory just taking a dump for no obvious reason?
 

stash

Diamond Member
Jun 22, 2000
5,468
0
0
The machine works perfectly fine when it boots into Active Directory Restore mode. There are no warnings in the device manager, and the network services that run while in the restore mode work fine.

Do you remember if there were any errors/warnings in the directory services or FRS logs?

Has anyone ever heard of Active Directory just taking a dump for no obvious reason?

Depends on your level of AD knowledge. What may be obvious to me may not be obvious to you.
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
Sounds like a rival gang. Maybe the Macrypts. I hear they're pretty hard core and stuff.
 

pcthuglife

Member
May 3, 2005
173
0
0
Right now my only guess is a bad power supply. If a power supply is faulty and sends intermittent power to the system would that be enough to corrupt an OS? I know I've corrupted my XP installations a number of times by overclocking my hardware and adjusting my power settings.

We have a windows server admin at the office who has been working on this thing for about a week. I generally handle the web/e-mail/proxy server administration so I have very little experience with Windows AD. The windows admin has been examining the logs and he keeps telling me that the logs are pointing to a definitive problem. Basically one service doesn't start because another service failed to start before it. Everytime we run the restore different network services fail at different times.

I'd love to find out that this is a hardware failure causing our problems but so far that havent really been any error messages that would lead us to that conclusion.

I should also note that we ran the hardware diagnostic cd that came with our server and everything tested Ok.
 

stash

Diamond Member
Jun 22, 2000
5,468
0
0
No offense to your AD guy, but if you have been troubleshooting this issue for a week, and it is affecting your business, you really should open a case with Microsoft PSS. $245 will get your issue resolved quickly.
 

pcthuglife

Member
May 3, 2005
173
0
0
well thats what i plan on doing tomorrow morning. its been a timely process because everytime we restore the system files from the installation cd he keeps a close eye on the logs and changes the active directory configuration to see if he can resolve the problem. we have no way of knowing whether or not the changes fixed the issue until the server crashes again. At one point it went 4 days without screwing up.

What do you think of the bad power supply theory?
 

stash

Diamond Member
Jun 22, 2000
5,468
0
0
Possibly. I really can't say either way with the current info. What changes is he making, and why? Stuff like that.
 

pcthuglife

Member
May 3, 2005
173
0
0
originally the server also acted as our office file server. first thing we did was move that task to another machine to make sure that it wasnt one of our scheduled tasks or backup scripts that was causing the server to crash.

Then he hopped on MS's knowledge base to view the list of services that should be running. I don't know the exact services that he enabled or disabled, but basically he found an article on MS's web site that display of list of services that needed to be running. He went into the services console and noticed that some of those services were set to Disabled. He changed them to automatic, rebooted the server and everything worked fine. A day later, the server froze and we couldn't boot back into windows.

Thats the other part that puzzles me, in the time that its working we can reboot the machine all we want, we can run windows updates, install hotfixes, and everything works fine. 1 - 3 days later the system freezes, and we can no longer boot into normal mode until we restore the system files.
 

stash

Diamond Member
Jun 22, 2000
5,468
0
0
I don't know the exact services that he enabled or disabled, but basically he found an article on MS's web site that display of list of services that needed to be running

This will be very useful information if you can get it when you get back to the office. Domain controllers have very specific service requirements, and if they are not all running, they could definitely make the machine hang on boot, or prevent you from logging in. Also, since logging into DSRM seems to work, that also seems to indicate an issue with AD.

So the question is, what services are being disabled and how.
 

imported_BikeDude

Senior member
May 12, 2004
357
1
0
Originally posted by: pcthuglife
Everything worked great for about 2 months then all of a sudden the machine stops working.

"stops working" as in "freeze"? I.e. no bluescreen? (or reboot if it is set to automatically reboot)

Our server freezes too, about two-three times per day -- if our SATA RAID 1 array is running. Unplug one of the drives and it will be stable again (but lots of aarich events every time it checks if the second drive has been put back -- one event every minute).

The freeze is soft, that is I can continue using the command prompt, until e.g. a DIR tries to list content not in the cache. The instant it touches disk the command prompt will stall and I'll have to switch to the next prompt (and so on). I.e. no clues.

Funny thing was, it had been up and running continuously from January (when it was installed) to about May when I rebooted it to install the hotfixes that had accumulated. I guess it could be one of those, but... (Oh, and we still use Win2000 Server for some reason)
 

pcthuglife

Member
May 3, 2005
173
0
0
Hey BikeDude you're my hero! We disabled RAID mode and tried booting up one hard drive at a time and everything seems to work fine. So far we've only booted up with Drive 1 so there's a good chance that Drive 0 may have some bad sectors. Either that or our RAID controller is a piece of crap. I appreciate all the help.
 

imported_BikeDude

Senior member
May 12, 2004
357
1
0
Originally posted by: pcthuglife
We disabled RAID mode and tried booting up one hard drive at a time and everything seems to work fine.

Well, you won't really know until a day or so has passed?

Would this happen to be a Supermicro server? (I kinda hope it turns out that way, because then you and I get to kick some serious butt) Or at the very least an Adaptec SATA RAID controller?

Oh, btw, we got a completely new server from Supermicro and I merely moved the CPU, memory and drives over. So any hardware related problems specific to us must be located amongst those components. On my first attempt at removing one drive I did experience one final freeze but the next reboot a week ago has proven to be stable. (albeit according to Murphy's law I'm about to experience a drive failure)
 

imported_BikeDude

Senior member
May 12, 2004
357
1
0
Well, you mentioned the aarich event, so Adaptec was kinda obvious.

My SuperMicro server 6013A-T labels the driver thusly: "Adaptec ICH5-R / 6300ESB (RAID) 2.01.021". I doubt "6300ESB" identifies the chip -- I'll have to take a peek tomorrow. I doubt Adaptec has all that many controller chips, so it is probably an identical device driver.

How... annoying.

I've tried some combinations to rule out that one disk is bad. I bought a new drive, and either both the old drives are bad or none. Even if there was something amiss with the hard drives themselves, I would've expected something about this in one of the logs. (nothing in event viewer nor Adaptec's management console)

Any possibility that a hotfix might've triggered something? I think I'll attempt to uninstall a few of them this weekend.
 

pcthuglife

Member
May 3, 2005
173
0
0
yah, dual 2.8 Xeon. we're going to order a pci sata raid controller and hope that works. having the drives mirrored is a must for us.
 

imported_BikeDude

Senior member
May 12, 2004
357
1
0
FWIW: SuperMicro sent me build 41 of the Adaptec RAID driver and this seems to have fixed the freezing issue we experienced. (but my event log is fast filling up with error entries generated by said driver -- one every minute)

So, if you haven't changed the RAID controller, maybe you could try the driver too?