Why is this crashing?

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
I am troubleshooting a simple polling application crash that seems to be random, but there's a few select areas that it crashes. One of them is this function here:


Code:
void ShardListEntry::Debug(string str)
{
	Logger::Log("{" + m_sname + "} : " + str);     //line 62 of ShardListEntry.cpp
}

Logger::Log is a static function, as follows:

Code:
void Logger::Log(string entry)
{
	loggermutex.Lock();
	
	rslib::DateTime tmpdate = rslib::DateTime::Now();	
	
	string filename="log/poller" + logfilenum + "-" + tmpdate.GetDate("l") + ".log";
	
	string logentry = "[" + tmpdate.GetDate() + "] " + entry + "\n";
	
	
	fstream fout(filename.c_str(),ios::app | ios::out);	
	fout<<logentry<<"\n";	
	fout.close();                               //line 23 of logger.cpp
	
	if(consoleoutput)cout<<logentry<<endl;
	
	loggermutex.UnLock();
}

The Mutex is static.


This is a multithreaded application. Is there any issues with the above function being called multiple times at once? I can't see why, but maybe I'm missing something? Note that the Logger::Log() function also gets called directly. It always crashes at the first function though. The crash is "Broken pipe".

Code:
Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 0x4f112950 (LWP 10582)]
0x0000003a7580db0b in write () from /lib64/libpthread.so.0
Current language:  auto; currently asm
Missing separate debuginfos, use: debuginfo-install keyutils.x86_64
(gdb) backtrace
#0  0x0000003a7580db0b in write () from /lib64/libpthread.so.0
#1  0x0000003c1d6c1ca6 in xwrite (__fd=<value optimized out>, __s=<value optimized out>, __n=<value optimized out>) at basic_file.cc:126
#2  0x0000003c1d66d3ac in std::basic_filebuf<char, std::char_traits<char> >::_M_convert_to_external (this=<value optimized out>, 
    __ibuf=<value optimized out>, __ilen=<value optimized out>)
    at /usr/src/debug/gcc-4.3.0-20080428/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/fstream.tcc:504
#3  0x0000003c1d66d4f6 in std::basic_filebuf<char, std::char_traits<char> >::overflow (this=<value optimized out>, __c=<value optimized out>)
    at /usr/src/debug/gcc-4.3.0-20080428/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/fstream.tcc:431
#4  0x0000003c1d66de96 in std::basic_filebuf<char, std::char_traits<char> >::_M_terminate_output (this=<value optimized out>)
    at /usr/src/debug/gcc-4.3.0-20080428/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/fstream.tcc:786
#5  0x0000003c1d66e1c8 in std::basic_filebuf<char, std::char_traits<char> >::close (this=<value optimized out>)
    at /usr/src/debug/gcc-4.3.0-20080428/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/fstream.tcc:159
#6  0x0000003c1d66f69d in std::basic_fstream<char, std::char_traits<char> >::close (this=<value optimized out>)
    at /usr/src/debug/gcc-4.3.0-20080428/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/fstream:775
#7  0x0000000000421a6b in Logger::Log (entry=
        {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x4f111b10 "\ufffd\ufffd\a\ufffd\ufffd\177"}}) at includes/logger.cpp:23
#8  0x00000000004221c3 in ShardListEntry::Debug (this=0x7fcab80240c0, str=
        {static npos = 18446744073709551615, _M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>}, _M_p = 0x4f111ed0 "X\ufffd\a\ufffd\ufffd\177"}}) at includes/ShardListEntry.cpp:62
#9  0x000000000042d743 in ShardListEntry::Poll (this=0x7fcab80240c0) at includes/ShardListEntry.cpp:165
#10 0x000000000042e345 in PollThread (data=0x0) at includes/PollThread.cpp:20
#11 0x0000003a7580729a in start_thread (arg=<value optimized out>) at pthread_create.c:297
#12 0x0000003a74ce439d in clone () from /lib64/libc.so.6
(gdb)

I can post the entire code if needed, but as a start, just wondering if there's anything I'm doing wrong in this particular function.
 
Last edited:

Crusty

Lifer
Sep 30, 2001
12,684
2
81
What kind of Mutex are you using and how many threads could be calling Log(..) at the same time?
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
The mutex is a pthread_mutex_t. The class I use is just a wrapper, takes care of the init and all that. This is the full class:

Code:
MUTEX::MUTEX()
	{
		#ifdef WIN32
			mutex=true;
		#elif linux
			pthread_mutex_init (&mutex, NULL);
		#endif
	}
	
	MUTEX::~MUTEX()
	{
		#ifdef linux
			pthread_mutex_destroy(&mutex);
		#endif	
	}
	
	void MUTEX::Lock(int delaybetweenlocks)
	{
		#ifdef WIN32
			while(!mutex)Wait(delaybetweenlocks);
			mutex=false;
		#elif linux
			while(pthread_mutex_lock( &mutex )!=0)Wait(delaybetweenlocks);
		#endif	
	}
	
	
	bool MUTEX::TryLock()
	{
		#ifdef WIN32
			if(mutex)
			{
				mutex=false;
				return true;
			}
			return false;
		#elif linux
			return(pthread_mutex_trylock( &mutex )==0);
		#endif	
		return false;
	}
	
	void MUTEX::UnLock()
	{
		#ifdef WIN32
			mutex=true;
		#elif linux
			pthread_mutex_unlock( &mutex );
		#endif	
	}


Oh, and I'm running in Linux, ignore the win32 code, I never got around to finding a working win32 implementation so I made a crude one that probably would not work. Does not apply here as it's running in Linux.
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
Well from a quick glance I'm not sure what you're trying to accomplish in your lock method.

pthread_mutex_lock is going to block until the calling thread can acquire the lock. If there's a non-zero return from pthread_mutex_lock you're usually looking at a logic error like one of the following
  • Unlocking a non-locked mutex
  • Unlocking a mutex locked by another thread
  • Using non-recursive locks and attempting to lock the same mutex twice in one thread

Once again I would highly recommend using Boost instead of pthreads directly. The API is far more usable IMO.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
SIGPIPE = you're writing to a file descriptor that has been closed.

This probably happens if you haven't opened the FD yet (initialization not complete by the time some thread tries to log) or you've shut down the FD (some thread trying to log while another is closing the log).
 

Crusty

Lifer
Sep 30, 2001
12,684
2
81
SIGPIPE = you're writing to a file descriptor that has been closed.

This probably happens if you haven't opened the FD yet (initialization not complete by the time some thread tries to log) or you've shut down the FD (some thread trying to log while another is closing the log).

The file is opened and closed all within the scope of Log(...) though....
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
It only does it randomly though. This function gets called probably 100 times per second by my program, it sometimes goes for days without failing, othertimes, it fails instantly, it's completely random. I also tried to catch it with a try/catch but it still crashes. I was going to just make it try again or something. Could it be the file system did not have a chance to fully open the file, should I add a delay between opening it and writing to it? Never had to do that before though...

Code:
	fstream fout(filename.c_str(),ios::app | ios::out);	  //open file
	fout<<logentry<<"\n";	  //write to file
	fout.close();                               //close file  (this is where it crashes)
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Sorry RS; I need to scan the right code before I comment.
Try guarding your close() call with fout.is_open(), and see if it ever fails.
Also, try checking if any of fout's flags are set by your write operation (generally a good thing to check your "return" values -- even those that aren't return values).
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
Not sure what you mean, should I be checking to ensure the file is open before closing it?

For the flags, I tried and the output was the same even when it crashed, so I guess it rules that out. (I forget the value now... 2048 I think it was...)

I decided I will try a different approach. Rather than opening/closing a file all the time for every single log entry, I will buffer them into a vector, then in the main program loop I will just flush the entries to file. This way the file i/o stays in a single thread. I've added a mutex to ensure the vector is protected. I'll see what happens.
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
Hmm so I think I see what you're saying, so basically even though I'm calling this:

Code:
fstream fout(filename.c_str(),ios::app | ios::out);

There's always the chance that for whatever reason, such as a file system issue, that it failed and the file was not actually opened?

So basically I should do this correct? :
Code:
fstream fout(filename.c_str(),ios::app | ios::out);		
		
		if(!fout.is_open())
		{
			cout<<"Logger::Flush() : Error opening log file for writing";
		}
		else
		{
			string logentry;
			for(int i=0;m_logbuffer.Get(logentry,i);i++)
			{		
				fout<<logentry<<"\n";	
			}
			
			m_logbuffer.Clear();
			
			fout.close();
		}
 
Last edited:

iCyborg

Golden Member
Aug 8, 2008
1,388
94
91
A quick search on SIGPIPE gets some sites recommending catching and ignoring SIGPIPE signals in multithreaded programming. Although it's more related to socket programming and remote end closing without notifying, it should be ok for local files.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
What if you never opened the file in the first place? You're never checking fout for any errors so it's entirely possible that the file was never opened in the first place.

http://www.cplusplus.com/reference/iostream/fstream/fstream/

This was the sort of thing I had in mind; somewhere between open and close you might be taking an error that's closing the file. Not sure if that's possible with C++ I/O; I generally avoid it because it's a pain to go scouting for I/O errors.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
This probably isn't relevant, but I'm having a hard time reconciling what I see in the stack trace.

On entry to ShardListEntry::Debug the parameter str points to the character string "X\ufffd\a\ufffd\ufffd\177".

You then execute: Logger::Log("{" + m_sname + "} : " + str);

Assuming that m_sname points to "foobar" what I would expect to see on entry to Logger::Log is that the parameter entry points to the character string "{foobar} : X\ufffd\a\ufffd\ufffd\177"

Instead it points to the character string "\ufffd\ufffd\a\ufffd\ufffd\177"

How'd that happen?
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
Yeah that definitely is odd. m_sname would be a few normal words, nothing special so not sure where that string comes from.

The app has been running since my last post, after I made those changes. It's probably also a more efficient way of doing it anyway. Basically all the threads just write the entries to a buffer, and that buffer is flushed to file in the main thread loop. That way not all threads are competing for the same file. Perhaps the file system was overwhelmed with all the open/close requests or something, and at times it failed. Just speculating though.

Interesting to know that sigpipe can be ignored though. That would have been my next thing to attempt. I tried to catch it with a try/catch block but it was still crashing, but maybe I was doing something wrong.
 

iCyborg

Golden Member
Aug 8, 2008
1,388
94
91
\ufffd is some sort of special replacement character. Are you sure you're setting that string to some normal words?
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
There's a possibility of maybe quotes and some special chars, but definitely the majority of those are normal words, or should be. I managed to fix most of the crash issues though.

There is one issue with libpng but I have a feeling it has to do with the wrapper library (pngwriter which has not been updated in ages) so I will probably try to find another or write my own based on libpng.
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
ok so my fix did not work, it was just a coincidence that it ran for 2 days without crashing.

It's still doing it. It just does not like writing to file for some reason. How do I go about ignoring that sigpipe crash? The verdict does indeed seem to indicate I should just ignore that in multithreaded apps. I tried a try/catch but it still crashes.
 

Markbnj

Elite Member <br>Moderator Emeritus
Moderator
Sep 16, 2005
15,682
14
81
www.markbetz.net
If you're eating the exception and it's still crashing, then something else is going on. The exception you're catching could be a side effect.
 

Red Squirrel

No Lifer
May 24, 2003
71,310
14,083
126
www.anyf.ca
Am I doing something wrong? This is what I tried:

Code:
void Logger::Flush()
{	
	loggermutex.Lock();
	
	
	if(m_logbuffer.GetSize()>0)
	{	
		rslib::DateTime tmpdate = rslib::DateTime::Now();	
	
		string filename="log/poller" + logfilenum + "-" + tmpdate.GetDate("l") + ".log";
				
		
		fstream fout(filename.c_str(),ios::app | ios::out);		
		
		if(!fout.is_open())
		{
			cout<<"Logger::Flush() : Error opening log file for writing";
		}
		else
		{
			try
			{
				string logentry;
				for(int i=0;m_logbuffer.Get(logentry,i);i++)
				{		
					fout<<logentry<<"\n";	//sometimes crashes here
				}
				
				m_logbuffer.Clear();
				
				fout.close();  //crashes here most of the time
			}
			catch(Exception e)
			{
			}
		}
	}
	
	
	loggermutex.UnLock();
}
 

iCyborg

Golden Member
Aug 8, 2008
1,388
94
91
That 0xUFFFD is a replacement character that manipulates incoming characters and the stream from within the string, I think you should find out how and why is it there.

You could also try flush()-ing the stream and then clear() (fout.clear(), don't know what your m_logbuffer.Clear() does), so that close() won't need to do any writing. And/or inspect the fail bits after each operation.