Threading question

Armitage

Banned
Feb 23, 2001
8,086
0
0
Ok, so Valgrind/Helgrind is having issues with my program. Says I have an unreasonable number data race conditions.

My code is something like this.

vector<CatObj> catobj;

Each CatObj contains

vector<CatObj*> catobj_ptrvec[3];

So every object in catobj contains a native static array of 3 vectors of pointers to CatObj. These catobj_ptrvec will be populated with pointers to objects in the catobj vector.

Head hurt yet?

So here's where the threading bit & the claimed data-race comes in. Each of the three catobj_ptrvec is populated by the same algorithm - so I kick this algorithm off in three separate threads, each thread takes a different index (0, 1, 2) and works on a different element of catobj_ptrvec

So you could have this function, in separate threads, simultaneously doing this:

Thread 1 - catobj[100].catobj_ptrvec[0].push_back(&(catobj[200]));
Thread 2 - catobj[100].catobj_ptrvec[1].push_back(&(catobj[200]));
Thread 3 - catobj[100].catobj_ptrvec[2].push_back(&(catobj[200]));

As far as I can tell, this should be OK. While all three threads are writing to the same catobj_ptrvec, they are all writing to different elements of this native vector, so it should be kosher. Right? Apparently Valgrind doesn't think so.

If anybody has followed this, any insight is appreciated?!
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
I don't see the problem. The push_backs are happening on different vectors so the threads can't step on each other that way, and the only "common" thing they all do is use vector's operator[]. Maybe it can't tell that you are only using the result of the common operator[] call for reading? It seems likely that an automated tool could get easily confused by threading issues -- lots of assumptions and whatnot are implicit things that can be obvious to a programmer but may be hard for a machine to figure out.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: BingBongWongFooey
I don't see the problem. The push_backs are happening on different vectors so the threads can't step on each other that way, and the only "common" thing they all do is use vector's operator[]. Maybe it can't tell that you are only using the result of the common operator[] call for reading? It seems likely that an automated tool could get easily confused by threading issues -- lots of assumptions and whatnot are implicit things that can be obvious to a programmer but may be hard for a machine to figure out.

But the tool (valgrind) is actually looking for memory addresses that are accessed by different threads and not wrapped in a mutex. So apparently there is a chunk of memory here that multiple threads are touching. I just can't figure out what it is. Add to that, I do occasionally get segfaults that trace back to that location. I've torn this thing inside out, and I'm not seeing it. I think I'm going to wrap a mutex around it anyway and see what happens.

That's an interesting point though - about the operator[] Or it could be push_back() Is the GNU implementation of the STL guarenteed to be threadsafe?? If it's not I'm fvcked.
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
i don't see a threading issue in your code. the catobj_ptrvec[0], [1], and [2] accesses should each be to different blocks of memory and should have nothing to do with each other. put it this way. if you extracted each pointer into a separate variable and passed each variable as the input to the three threads you start, you would end up with pretty much the same thing. the only difference is the [] access on the vector. no idea if that is threadsafe or not, but i would be surprised if it isn't.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Ok, here's what I found on thread safety & the GNU STL (first posted on Ars, which does have a dedicated programming forum, and some smart folks).

libstdc++ FAQ

libstdc++-v3 strives to be thread-safe when all of the following conditions are met:

* The system's libc is itself thread-safe,
* gcc -v reports a thread model other than 'single',
* [pre-3.3 only] a non-generic implementation of atomicity.h exists for the architecture in question.

The user-code must guard against concurrent method calls which may access any particular library object's state. ...

A bit further down... (emphasis is mine)

All library objects are safe to use in a multithreaded program as long as each thread carefully locks out access by any other thread while it uses any object visible to another thread, i.e., treat library objects like any other shared resource. In general, this requirement includes both read and write access to objects; unless otherwise documented as safe, do not assume that two threads may access a shared standard library object at the same time.

The bit about read access is a bit disturbing. I had been assuming that you would only need to lock access if some sort of write access was in the mix. ie., I expected that I could have two threads simultaneously reading an object without locks.

More reading...
libstdc++ Howto
This references the SGI definition of thread safety for the STL. IIRC, The GNU STL is based heavily on the SGI implementation.

The SGI implementation of STL is thread-safe only in the sense that simultaneous accesses to distinct containers are safe, and simultaneous read accesses to to shared containers are safe. If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.

This is what I expected, but it seems that it may conflict with the statement in the FAQ about read access. So I guess I'm still a bit up in the air.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Arghh
Wrapped a mutex around that push_back and valgrind is still bitching about it and the code is still crashing :(:|
I'm gonna get a pencil and paper and just work it out by hand shortly - see you in a few decades.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: oog
how about wrapping the mutex around the [] on the vector?

I'm not following you? I put the mutex around the line that looks like this:

catobj[100].catobj_ptrvec[0].push_back(&(catobj[200]));

On another note, my faith in valgrind is slipping - it's also flagging as possible write races comparisons between two iterators.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Well, after a few weeks of pulling my hair out, I may have finally gotten past this bug - another day or so of testing should tell for sure. That's the good news. The bad news is that I had to scrap my use of the STL for this particular chunk of code and write my own container. As far as I can tell there is a thread-safety bug in the STL, down deep in the memory management stuff. Kind of puts a notch in my confidence in the STL.

I doubt it's worth filing a bug, as I can't provide a reasonably concise test case to demonstrate it.

On the plus side, I had a nice review of the STL, pthread programming & debugging distributed code. And nearly every line of my code has been re-scrutinized - I found a few other minor issues in the process.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
In other news, replaing that chunk of STL code with my own container contributed to about a 20% improvement in runtime (~15-16 minutes -> ~12 minutes!) once I did some work on it with the profiler! Maybe I should gut out some more STL :p