Non-threadsafe code in threaded programs?

Armitage · Dec 21, 2004

Currently I have a chunk of code in one of my apps that isn't thread-safe, so I have mutex locks around all the calls to it. I suspect that this is a significant bottle neck, but before I try to rewrite or replace it (it's not my code) I'd like to make sure.

Is there any reason I can't make several copies of this function, each with a different name (ie. func() -> func_1(), func_2(), func_3()...). I could then pass the threads a paramter when they are initialized that tells them which version to use (and make sure each version is only used by a single thread).

It'd bloat the code a bit, but it should give me a feel for how much that lock contention is costing me, right?

znaps · Dec 21, 2004

Sounds like a bad way of doing it IMO. Which language are you using?

Armitage · Dec 21, 2004

Originally posted by: znaps
Sounds like a bad way of doing it IMO. Which language are you using?

Why? Is there another way to measure the impact of the lock contention? I know it's kinda klugey, but it's simple and should give me some indication if this is really a problem.
Language is C++

kamper · Dec 21, 2004

Duplicating the code wouldn't fix anything. Making sure they don't modify the same data would.

Edit: or so I think. I'm not a threading expert by any means.

Armitage · Dec 21, 2004

Originally posted by: kamper
Duplicating the code wouldn't fix anything. Making sure they don't modify the same data would.

Duplicating the code would make sure they aren't modifying the same data. The issue is that this is actually a collection of functions, and all the data is passed between them via globals. It's a port of some crufty old fortran replete with common blocks :disgust: So I figure if I put each copy in its own file, the globals will just have file scope, so each copy should be independent of the others.

Rewriting & validating the code to be threadsafe is a fairly substantial undertaking ... I'd like to get a feel for whether it's worth it first.

Edit: or so I think. I'm not a threading expert by any means.

znaps · Dec 21, 2004

Ok, as a way of finding out if it's a bottleneck, it might be OK if you're only testing with 5 or 10 threads. I'm more of a Java guy so I can't help with specifics on other ways to do it, but I'm sure there are ways where you could test with 100's of threads and not have to write new code to deal with it.

As a solution to the issue, it's not OK, but I think you know that anyway. Sorry I can't be of more help.

kamper · Dec 21, 2004

Ok, then I agree, that's a nasty fix but it should tell you what you want to know.

Armitage · Dec 21, 2004

Originally posted by: znaps
Ok, as a way of finding out if it's a bottleneck, it might be OK if you're only testing with 5 or 10 threads. I'm more of a Java guy so I can't help with specifics on other ways to do it, but I'm sure there are ways where you could test with 100's of threads and not have to write new code to deal with it.

As a solution to the issue, it's not OK, but I think you know that anyway. Sorry I can't be of more help.

This doesn't have to scale to hundreds of threads ... its very computationally intensive, so scaling much beyond the number of CPUs isn't going to get you anywhere. My bottom line issue is that I can't seem to get the system load above about 70% regardless of how many threads I start ... it levels off around 6 threads for a dual xeon with HT on, and the threads do no significant IO.

So my suspicion is lock contention. There are some locks on some message buffers also, but my prime suspect is this chunk of code.

tart666 · Dec 21, 2004

have you tried a lower-overhead sync method, like critsec?

CTho9305 · Dec 21, 2004

I'm just curious - why can't one thread take complete advantage of the CPU, if it's very computationally-expensive? Do you spend a lot of time waiting on IO?

kamper · Dec 21, 2004

Originally posted by: CTho9305
I'm just curious - why can't one thread take complete advantage of the CPU, if it's very computationally-expensive? Do you spend a lot of time waiting on IO?

dual xeon with HT on

Armitage · Dec 21, 2004

Originally posted by: kamper

Originally posted by: CTho9305
I'm just curious - why can't one thread take complete advantage of the CPU, if it's very computationally-expensive? Do you spend a lot of time waiting on IO?

Click to expand...

dual xeon with HT on

Click to expand...

Yep, exactly ... on a single CPU box there would be little reason to thread it ... a single thread would peg it. Although I would likely still thread the IO because it dumps the output data to a MySQL database (or back to the master node via PVM for the clustered version).

Actually, on the clustered version, I considered going single-threaded and starting multiple processes per node. But I already had the multithreaded version running, and want to maintain it, as it may get on opportunity to run on a bigger SGI as well. Helps on the initial large IO to only send it to one slave process/node instead of several.

I'm almost to the point where the flip of a macro in the malkefile is the difference between the single system version and the distributed version

Armitage · Dec 21, 2004

Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

tart666 · Dec 22, 2004

Originally posted by: Armitage

Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Click to expand...

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

EagleKeeper · Dec 22, 2004

Determine all the data that the function uses.

Redesign the function to take a structure/class as a parameter.
Have all the data to be operated on contained within that structure/class.
Have the caller of the function that is thread-safe control the structure/class.

Armitage · Dec 22, 2004

Originally posted by: tart666

Originally posted by: Armitage

Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Click to expand...

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Click to expand...

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

From the link:

Critical SectionThe critical section method is similar to the mutex method, differing in its use of the critical section Windows APIs.

Is this a windows only thing? I did a bit more googling and it seems like it's part of the Windows API.

EagleKeeper · Dec 22, 2004

Originally posted by: Armitage

Originally posted by: tart666

Originally posted by: Armitage

Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Click to expand...

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Click to expand...

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

Click to expand...

From the link:

Critical SectionThe critical section method is similar to the mutex method, differing in its use of the critical section Windows APIs.

Click to expand...

Is this a windows only thing? I did a bit more googling and it seems like it's part of the Windows API.

Correct - other OS's may have similar protection schemes.
My posted solution is non OS specific

Armitage · Dec 22, 2004

Originally posted by: EagleKeeper

Originally posted by: Armitage

Originally posted by: tart666

Originally posted by: Armitage

Originally posted by: tart666
have you tried a lower-overhead sync method, like critsec?

Click to expand...

Not familiar with that? Reference?
Although, I doubt that it's the the locking mechanism ... the method that the lock gets thrown around takes a bit of time.

Click to expand...

Ref

Some other references I've seen say critsec takes about 20 cycles to switch, while mutex takes about 600. Of course this would make little difference if your method takes 10k+

Click to expand...

From the link:

Critical SectionThe critical section method is similar to the mutex method, differing in its use of the critical section Windows APIs.

Click to expand...

Is this a windows only thing? I did a bit more googling and it seems like it's part of the Windows API.

Click to expand...

Correct - other OS's may have similar protection schemes.
My posted solution is non OS specific

It seems like this is just a lighterweight mutex, so I don't think it's really going to help me.

I'm not quite sure I understand what you posted previously, it's basically saying to make the code in question threadsafe?

The problem is that the externally facing function is backed by 10 other internal functions and - I kid you not - no less then 375 global variables shared between them :disgust: So I'm not eager to rewrite this code. That and I may not be able to get the source code for future versions of this library - but hopefully if I can show that its a problem I can get them to make some effort to clean this mess up.

EagleKeeper · Dec 22, 2004

I may have mis-understood you; I was thinking that you had safe code calling unsafe code.
If you have a threaded program, each thread has its own stack space but can share data within a process.
Each process will have it own seperate area completely unavailable (unless explicitly designed) from every one else.

You seem to be stating that an unsafe thread function must be using global variables.
If it is only one or two functions, then they can be corrected by using the method of passing a data structure for them to operate on. that data structure is controlled by a safe area.

If the problem is embedded throughout the design, then you will have to tear apart everything (not a wise thing to do) to rebuild it completely thread-safe or turn each thread into a seperate process.

Where the processes have to share information with each other, you will have to develop a sharing method.

This second method is a little more safe, either way will be a big pain for a large application

oog · Dec 22, 2004

improving the performance of getting and releasing a lock isn't going to help. if there is any significant issue here, it's probably that the lock is being held for a long time because the resources are needed for that period. duplicating the resources or providing a pool of them somehow is the right approach to see if you'll get improvement. further, when you look at making the code threadsafe, that doesn't just mean low-level locking -- it means providing a number of alternatives for the common contentious resource.

tinyabs · Dec 24, 2004

>> ...375 global variables shared between them

This is a bad design. Does the processing using these 375 variables takes a long time? My suggestion is do whatever is necessay with these variables and release them.

[lock] -> [preprocess] ->[unlock] ->[postprocess]

You might investigate thread pool and connection pool if applicable.

Barnaby W. Füi · Dec 24, 2004

Can you check how many other threads are waiting for a mutex? Couldn't you print that out right before you unlock it?

Or, assuming you can't check a mutex to see how many threads are waiting on it -- and this might be stupid since I don't work with threads much -- but couldn't you, instead of:

acquire(foo);
... do stuff ...
release(foo);

do:

static volatile int waiting = 0;
++waiting;
acquire(foo);
--waiting;
... do stuff ...
cerr << "waiting: " << waiting << '\n';
release(foo);

tinyabs · Dec 25, 2004

Actually, you can create a gate that allows 4 requests to be process at a time. During these 4 requests, they are processed sequentially and only locks between these 4 requests. I believe this will increase the throughput.

Armitage · Dec 28, 2004

Originally posted by: tinyabs
>> ...375 global variables shared between them

This is a really bad design.

Edited for accuracy

If I counted up the number of goto's it'd really make you cringe.

The code is a disaster, but until I get access to a better library, I'm stuck with it. Actually, I took a closer look at it just before I left for christmas. It seems that all of the variables in this library are global :disgust:, but only maybe 40-60 are actually used to communicate between functions in the library. When I get back I'm going to have a whack at cleaning it up. Validating the result will be a biatch though.

Does the processing using these 375 variables takes a long time? My suggestion is do whatever is necessay with these variables and release them.

Yes, it takes a significant amount of time ... when I profiled the code calls to this library were near the top of the list. I'm sorting out the call tree to see if I can move the locks further downstream as well as examining the whole global variable issue.

[lock] -> [preprocess] ->[unlock] ->[postprocess]

You might investigate thread pool and connection pool if applicable.

Already using a thread pool, and most of the IO is handled by seperate threads as well.

Armitage · Dec 28, 2004

Originally posted by: BingBongWongFooey
Can you check how many other threads are waiting for a mutex? Couldn't you print that out right before you unlock it?

Or, assuming you can't check a mutex to see how many threads are waiting on it -- and this might be stupid since I don't work with threads much -- but couldn't you, instead of:

acquire(foo);
... do stuff ...
release(foo);

do:

static volatile int waiting = 0;
++waiting;
acquire(foo);
--waiting;
... do stuff ...
cerr << "waiting: " << waiting << '\n';
release(foo);

Interesting idea ... I'll have a look at it when I get back. I was looking for some pthread library function to tell me that, but it might be as easy as what you show here.

Non-threadsafe code in threaded programs?

Banned

Senior member

Banned

Diamond Member

Banned

Senior member

Diamond Member

Banned

Golden Member

Elite Member

Diamond Member

Banned

Banned

Golden Member

Discussion Club Moderator<br>Elite Member

Banned

Discussion Club Moderator<br>Elite Member

Banned

Discussion Club Moderator<br>Elite Member

Golden Member

Member

Elite Member

Member

Banned

Banned