Weird Error on MS SQL Server ("Warning: Fatal error 823 occurred")

cyberia

Platinum Member
Oct 22, 1999
2,535
0
0
I have run into an intermittent problem in a web application the last two mornings. The application has been in production for a while, with no recent changes. The error ocurrs when a certain SQL query is executed.

The problem occured at around 8 am yesterday and went away at around 11 am. It reoccured this morning (and has not gone away yet).

The error is:
Server: Msg 21, Level 18, State 1, Line 1
Warning: Fatal error 823 occurred at May 28 2008 10:25AM

I quick search on Google and some testing suggest that it is server load/network related. I would appreciate any pointers on how to resolve it.
 

cyberia

Platinum Member
Oct 22, 1999
2,535
0
0
I didn't want to crowd my origianl post, so here is some additional info:

The SQL query in question is *really* complex. It combines data from three databases on two SQL servers (via OPENROWSET). Let's say, in simple terms, it looks like this:

select * from table1
union
select * from OPENROWSET('SQLOLEDB', '....server2....', 'select * from table2')

The OPENROWSET call returns 27,200+ records. If I limit the results of that call to 27000 rows ('select top 27000...'), everything works. If I try to do 27,001 records, it becomes intermittent, and at 27,200 records, it fails every time. That is until the problem goes away completely, like it did yesterday.

Weird?
 

cyberia

Platinum Member
Oct 22, 1999
2,535
0
0
Originally posted by: KLin
Something to try

Looks like the guy did an index rebuild to fix the problem he was having.

Well, just like yesterday: The application is back up after 11 am. I can't really try reindexing because I won't know if it fixes anything. I have to wait for it to go down again to see if a particular fix does anything.

But thank you for the suggestion anyway. Hopefully, it won't go down again. We are doing a reboot tonight to see if that helps anything. Reindexing may be worth doing anyway as well.
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
We have had issues with the tempdb going corrupt on us in the past on huge tables (500-600 million rows). Yours is minuscule in comparison. Nonetheless, it wouldn't hurt to recreate tempdb (Google it). But try KLin's suggestion first... it should take a few seconds to rebuild the indexes after all.

What kind of data are you dealing with? Are you sure the data is not corrupt? Does your production system have enough memory? What about the network itself? If it's a network put in place by your IT guys, ask them to diagnose it... maybe packets are being dropped on the OPENROWSET query if its a remote DB (I won't be surprised if this is the cause). Also, just to try and reproduce this issue, I would suggest run the query, and unplug the network cable (you can do this on your development machine) while the OPENROWSET call is being made - see if you get the same error.
 

cyberia

Platinum Member
Oct 22, 1999
2,535
0
0
The data is of simple alphanumeric types. There is no image fields or any other binary kind of stuff (except perhaps for timestamps).

Our network guys are swearing it can't be a network issue. They monitored the server when I tried to run the query, and there was no spike in CPU or memory usage.

In fact, the error is returned right away, so it does not seem like it even tries to go to the remote server to get data. It "feels" that the query optimizer just gives up. But then how would it know that it shouldn't give up on a remote query that would return 27000 records as opposed to 27001 records without actually attempting the OPENROWSET call?

Oh well, I can only guess now because we are back up.
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
Did you try the "unplug network" cable idea? Most IT guys swear it's not an issue, when it very well could be. Are you able to reproduce it every time with >27000 rows? Even then, it could be that the network is reaching a threshold (our network guys mess around with Cisco routers all the time and their built in TCP algorithms) at 27000 rows, and packets start dropping. Just a theory.
 

imported_Dhaval00

Senior member
Jul 23, 2004
573
0
0
I just spoke to my DBA and he mentioned that network is probably not the issue. The error is a low level error and in *most* cases can be attributed to a bad HD or bad RAM. He suggested hold on to your backups and check for any consistency errors now using CHECKDB and its various switches. Hope this helps.
 

cyberia

Platinum Member
Oct 22, 1999
2,535
0
0
Originally posted by: Dhaval00
I just spoke to my DBA and he mentioned that network is probably not the issue. The error is a low level error and in *most* cases can be attributed to a bad HD or bad RAM. He suggested hold on to your backups and check for any consistency errors now using CHECKDB and its various switches. Hope this helps.

This definitely helps. I just learned that the remote DB server (to which OPENROWSET calls are made) had a hardware problem and a hard drive was just replaced. The server is complaining of a corrupt .exe file, so it may very well have other corrupted files including databases.
 

cyberia

Platinum Member
Oct 22, 1999
2,535
0
0
Yep, we have a corrupted database. Luckily, we have backups, but it ain't gonna be pretty. :(

Thank you everyone for the help and the extremely valuable suggestions.