• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Delayed Write Errors

I've been working through these problems at work for a long time. I've gone through every idea that I could come up with and have officially ran out of ideas.

The clients consist of Windows XP, Windows Vista, and Windows 7x64. The servers are all Windows Server 2008 x32 (Why x64 wasn't installed, I'll never ever ever know....)

Connected to the Domain at work, we have a File Server, clients, and the rest. The File Server is connected to a big 4U RAID-5 controller card. That large block of space is then divided among a few Windows Server shares.

Some of the clients run a program locally on their computer, but dump their output and logs to a General Share [X:] on the File Server.

We don't know quite when it started, but those programs are not completing their write to the file. Typically the output will finish, whereas the logs sometimes just stop being written to in the middle. There have been instances where both just fail to finish their writes though.

A 'Delayed Write Error' will pop up on the client computer in the event that something fails. Additionally, the client machines will log Event 139 or Event 50 with a 'Mup' source when one of these events happens. The server logs do not appear to contain anything relevant, though I'll need to triple check this claim.

The problems occur completely sporadically.
-----------------

To attempt to mitigate the problem, I have:
1. Moved a few select clients to a switch that is used among the servers.
2. Increased the SessTimeout from 45ms to 300ms
3. Disabled backups
4. Removed all Group Policy restrictions from select clients
5. Updated Firmware on the Servers
6. Restarted the entire Domain Server cluster (File Server, Domain Controller, etc...) as well as select clients.

None of the above appears to have worked. However:

7. Logging in locally and storing data locally
8. Logging in normally (Domain) and storing data locally

Do not seem to exhibit the problem, which lead me to think that the RAID Controller may fail under pressure. As a result, I took 231GB of files on the File Server's main drive and added them to a .ZIP file that resided on the RAID Array. I exhibited no problems during this process...

If anyone has any ideas (No matter how silly) I'm more than willing to try them.

Thanks,
-Kevin (Gamingphreek)
 
Last edited:
How are the the clients connecting to the share? Using the UNC path? Via a mapped drive? Are the shares part of a Distributed File System?
 
How are the the clients connecting to the share? Using the UNC path? Via a mapped drive? Are the shares part of a Distributed File System?

I believe the program writes the files using a UNC path. The drives are mapped to each client as drive letters. With this in mind, I don't know whether an absolute path is used in the program or a relative path.

I don't know if the shares are part of a DFS or not. Multiple users can access the data simultaneously from multiple computers - but I'm not sure what the underlying architecture is.

-Kevin (Gamingphreek)
 
I dont offer a fix for your problem, but I will mention a slightly similar issue I was having.

We use a cheap POP3 server for our email, which means clients download the emails to a local client like Outlook. We then store the Outlook.PST file in the 'Application Data' folder which means its copied to the server each time the user logs off (lazy backup - welcoming better suggestions). What was happening to us is these larger files (0.5-1.5gigs) would timeout (With a delayed write error) half way through the process of copying them to the server during log off. The EXTRA frustrating part was, this would end up corrupting the files. The Outlook.pst on the server would be the exact size, looking normal, and when the client logged in the next day, it would download the corrupted file from the server. Great!

This particular server was running as a guest in MS Virtual Server 2005. Back in February we migrated the Guest OS over to VMWare ESXi, and the whole "Delayed Write Error" and corrupt email files on log off went away. There must have been something wonky with the Virtual Server driver.

With a setup as big as yours, it could take some work to find the problem.. Could be anything from corrupt sectors on a drive in the array (spinrite), to the RAID card, the RAID cabling, to the network drivers or VM drivers if you're running one..

Good luck, interested to hear how this plays out.
 
Well I do have a suggestion for you dawks. You could do Folder Redirection for a specific folder that you will store the .pst file in. That way it is always on the server and thus backed up.

From what I understand about the writes, they are not opening sockets or anything to write. They are simply using fstream() to write to a file. I'm not sure what the transport method for this kind of write is, but would imagine it is not reliable as it would simply throw an IOException() on fail.

Would I gain anything by monitoring the traffic in Wireshark and trying to see what happens when one of the write errors occurs? In my mind that wouldn't narrow it down from File Server H/W or Network H/W anymore than I already have.

-Kevin

Edit: As if this problem wasn't enough, my router at home has decided to die slowly and painfully over the past few minutes. It went from a ping response of 1000+ms to so slow that it can't even do a DHCP lease whether wired or wireless.... 🙁
 
Last edited:
I've considered that but the way Outlook handles PSTs, and their sheer size, its just not possible. Plus theres a KB article that specifically says not to open PST's over a network share.

Have you played with DiskMon? It will show you all Disk IO by the kernel. Not sure if it shows network writes, but I believe it would show what the server is doing by running it on the server. http://technet.microsoft.com/en-us/sysinternals/bb896646

A Wireshark capture would be interesting.. there would be a lot of info to scan, but might be fun to check out.
 
Back
Top