Non-threadsafe code in threaded programs?

Armitage

Banned
Feb 23, 2001
8,086
0
0
Currently I have a chunk of code in one of my apps that isn't thread-safe, so I have mutex locks around all the calls to it. I suspect that this is a significant bottle neck, but before I try to rewrite or replace it (it's not my code) I'd like to make sure.

Is there any reason I can't make several copies of this function, each with a different name (ie. func() -> func_1(), func_2(), func_3()...). I could then pass the threads a paramter when they are initialized that tells them which version to use (and make sure each version is only used by a single thread).

It'd bloat the code a bit, but it should give me a feel for how much that lock contention is costing me, right?
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: znaps
Sounds like a bad way of doing it IMO. Which language are you using?

Why? Is there another way to measure the impact of the lock contention? I know it's kinda klugey, but it's simple and should give me some indication if this is really a problem.
Language is C++
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Duplicating the code wouldn't fix anything. Making sure they don't modify the same data would.

Edit: or so I think. I'm not a threading expert by any means.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: kamper
Duplicating the code wouldn't fix anything. Making sure they don't modify the same data would.

Duplicating the code would make sure they aren't modifying the same data. The issue is that this is actually a collection of functions, and all the data is passed between them via globals. It's a port of some crufty old fortran replete with common blocks :disgust: So I figure if I put each copy in its own file, the globals will just have file scope, so each copy should be independent of the others.

Rewriting & validating the code to be threadsafe is a fairly substantial undertaking ... I'd like to get a feel for whether it's worth it first.

Edit: or so I think. I'm not a threading expert by any means.

 

znaps

Senior member
Jan 15, 2004
414
0
0
Ok, as a way of finding out if it's a bottleneck, it might be OK if you're only testing with 5 or 10 threads. I'm more of a Java guy so I can't help with specifics on other ways to do it, but I'm sure there are ways where you could test with 100's of threads and not have to write new code to deal with it.

As a solution to the issue, it's not OK, but I think you know that anyway. Sorry I can't be of more help.
 

kamper

Diamond Member
Mar 18, 2003
5,513
0
0
Ok, then I agree, that's a nasty fix but it should tell you what you want to know.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: znaps
Ok, as a way of finding out if it's a bottleneck, it might be OK if you're only testing with 5 or 10 threads. I'm more of a Java guy so I can't help with specifics on other ways to do it, but I'm sure there are ways where you could test with 100's of threads and not have to write new code to deal with it.

As a solution to the issue, it's not OK, but I think you know that anyway. Sorry I can't be of more help.

This doesn't have to scale to hundreds of threads ... its very computationally intensive, so scaling much beyond the number of CPUs isn't going to get you anywhere. My bottom line issue is that I can't seem to get the system load above about 70% regardless of how many threads I start ... it levels off around 6 threads for a dual xeon with HT on, and the threads do no significant IO.

So my suspicion is lock contention. There are some locks on some message buffers also, but my prime suspect is this chunk of code.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
I'm just curious - why can't one thread take complete advantage of the CPU, if it's very computationally-expensive? Do you spend a lot of time waiting on IO?
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: kamper
Originally posted by: CTho9305
I'm just curious - why can't one thread take complete advantage of the CPU, if it's very computationally-expensive? Do you spend a lot of time waiting on IO?
dual xeon with HT on

Yep, exactly ... on a single CPU box there would be little reason to thread it ... a single thread would peg it. Although I would likely still thread the IO because it dumps the output data to a MySQL database (or back to the master node via PVM for the clustered version).

Actually, on the clustered version, I considered going single-threaded and starting multiple processes per node. But I already had the multithreaded version running, and want to maintain it, as it may get on opportunity to run on a bigger SGI as well. Helps on the initial large IO to only send it to one slave process/node instead of several.

I'm almost to the point where the flip of a macro in the malkefile is the difference between the single system version and the distributed version :p
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.
 

tart666

Golden Member
May 18, 2002
1,289
0
0
Originally posted by: Armitage
Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Determine all the data that the function uses.

Redesign the function to take a structure/class as a parameter.
Have all the data to be operated on contained within that structure/class.
Have the caller of the function that is thread-safe control the structure/class.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: tart666
Originally posted by: Armitage
Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

From the link:
Critical SectionThe critical section method is similar to the mutex method, differing in its use of the critical section Windows APIs.

Is this a windows only thing? I did a bit more googling and it seems like it's part of the Windows API.
 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
Originally posted by: Armitage
Originally posted by: tart666
Originally posted by: Armitage
Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

From the link:
Critical SectionThe critical section method is similar to the mutex method, differing in its use of the critical section Windows APIs.

Is this a windows only thing? I did a bit more googling and it seems like it's part of the Windows API.

Correct - other OS's may have similar protection schemes.
My posted solution is non OS specific
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: EagleKeeper
Originally posted by: Armitage
Originally posted by: tart666
Originally posted by: Armitage
Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

From the link:
Critical SectionThe critical section method is similar to the mutex method, differing in its use of the critical section Windows APIs.

Is this a windows only thing? I did a bit more googling and it seems like it's part of the Windows API.

Correct - other OS's may have similar protection schemes.
My posted solution is non OS specific

It seems like this is just a lighterweight mutex, so I don't think it's really going to help me.

I'm not quite sure I understand what you posted previously, it's basically saying to make the code in question threadsafe?

The problem is that the externally facing function is backed by 10 other internal functions and - I kid you not - no less then 375 global variables shared between them :disgust: So I'm not eager to rewrite this code. That and I may not be able to get the source code for future versions of this library - but hopefully if I can show that its a problem I can get them to make some effort to clean this mess up.

 

EagleKeeper

Discussion Club Moderator<br>Elite Member
Staff member
Oct 30, 2000
42,589
5
0
I may have mis-understood you; I was thinking that you had safe code calling unsafe code.
If you have a threaded program, each thread has its own stack space but can share data within a process.
Each process will have it own seperate area completely unavailable (unless explicitly designed) from every one else.

You seem to be stating that an unsafe thread function must be using global variables.
If it is only one or two functions, then they can be corrected by using the method of passing a data structure for them to operate on. that data structure is controlled by a safe area.

If the problem is embedded throughout the design, then you will have to tear apart everything (not a wise thing to do) to rebuild it completely thread-safe or turn each thread into a seperate process.

Where the processes have to share information with each other, you will have to develop a sharing method.

This second method is a little more safe, either way will be a big pain for a large application
 

oog

Golden Member
Feb 14, 2002
1,721
0
0
improving the performance of getting and releasing a lock isn't going to help. if there is any significant issue here, it's probably that the lock is being held for a long time because the resources are needed for that period. duplicating the resources or providing a pool of them somehow is the right approach to see if you'll get improvement. further, when you look at making the code threadsafe, that doesn't just mean low-level locking -- it means providing a number of alternatives for the common contentious resource.
 

tinyabs

Member
Mar 8, 2003
158
0
0
>> ...375 global variables shared between them

This is a bad design. Does the processing using these 375 variables takes a long time? My suggestion is do whatever is necessay with these variables and release them.

[lock] -> [preprocess] ->[unlock] ->[postprocess]

You might investigate thread pool and connection pool if applicable.
 

Barnaby W. Füi

Elite Member
Aug 14, 2001
12,343
0
0
Can you check how many other threads are waiting for a mutex? Couldn't you print that out right before you unlock it?

Or, assuming you can't check a mutex to see how many threads are waiting on it -- and this might be stupid since I don't work with threads much -- but couldn't you, instead of:

acquire(foo);
... do stuff ...
release(foo);

do:

static volatile int waiting = 0;
++waiting;
acquire(foo);
--waiting;
... do stuff ...
cerr << "waiting: " << waiting << '\n';
release(foo);
 

tinyabs

Member
Mar 8, 2003
158
0
0
Actually, you can create a gate that allows 4 requests to be process at a time. During these 4 requests, they are processed sequentially and only locks between these 4 requests. I believe this will increase the throughput.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: tinyabs
>> ...375 global variables shared between them

This is a really bad design.

Edited for accuracy :) If I counted up the number of goto's it'd really make you cringe.

The code is a disaster, but until I get access to a better library, I'm stuck with it. Actually, I took a closer look at it just before I left for christmas. It seems that all of the variables in this library are global :disgust:, but only maybe 40-60 are actually used to communicate between functions in the library. When I get back I'm going to have a whack at cleaning it up. Validating the result will be a biatch though.

Does the processing using these 375 variables takes a long time? My suggestion is do whatever is necessay with these variables and release them.

Yes, it takes a significant amount of time ... when I profiled the code calls to this library were near the top of the list. I'm sorting out the call tree to see if I can move the locks further downstream as well as examining the whole global variable issue.

[lock] -> [preprocess] ->[unlock] ->[postprocess]

You might investigate thread pool and connection pool if applicable.

Already using a thread pool, and most of the IO is handled by seperate threads as well.
 

Armitage

Banned
Feb 23, 2001
8,086
0
0
Originally posted by: BingBongWongFooey
Can you check how many other threads are waiting for a mutex? Couldn't you print that out right before you unlock it?

Or, assuming you can't check a mutex to see how many threads are waiting on it -- and this might be stupid since I don't work with threads much -- but couldn't you, instead of:

acquire(foo);
... do stuff ...
release(foo);

do:

static volatile int waiting = 0;
++waiting;
acquire(foo);
--waiting;
... do stuff ...
cerr << "waiting: " << waiting << '\n';
release(foo);

Interesting idea ... I'll have a look at it when I get back. I was looking for some pthread library function to tell me that, but it might be as easy as what you show here.