Almost there (I hope)

RedArmy

Platinum Member
Mar 1, 2005
2,648
0
0
If you look here: http://pastebin.com/mfbb867f I took snippets of my code out that would be relevant to my question. The vector multans is in shared memory, I have all of that defined elsewhere and I didn't want to take up more space than necessary.

I commented where I have questions and since this is the very first time I've EVER used the fork function I have no idea if the way I'm doing it is acceptable.

Basically I have two matrices holding values and I need to multiply both the matrices together to get a bunch of numbers. Then, I sum those numbers together (in the right order) to get the resulting output matrix.

In this program 'tsets' is the amount of multiplies I need to do depending on the sizes of the matrices (it's defined elsewhere and changes automatically depending on matrix size).

The multiplier and multiplicand variables are meant to be set (somehow, this is one of my questions) before the fork so that each process gets it's own set of numbers to multiply together.

The index variable is supposed to be used to keep track and store the resulting multiplied answers back in the multans vector.

If something else isn't clear let me know. It's tough for me to talk about this stuff since I'm not too comfortable with it yet. Thanks for all the help :cookie:
 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
You're not doing what you think you are doing...

for (i=0; i<(tsets-1);i++)
index++;

pid_t i=fork();

you are not forking forking tsets -1 times, instead you are incrementing index (tsets-1) times and THEN forking. You need a curly bracket.

Also, since this is shared memory, you need some sort of mutex to lock the memory before a read/write. If not, you will see some pretty bizzare corruption.

I recommend you abandon fork and go with pthreads. They use less memory, run faster, and provide mutexs with the library. There really isn't much reason to use fork over a separate thread.
 

Kirby

Lifer
Apr 10, 2006
12,028
2
0
I agree with Cogman. The only time I've ever used fork was in conjunction with exec. Threads share resources, whereas a new process would need to allocate more, so threads can save a lot of space. Mutexes and locks can be a pain in the ass though, because if you're not careful you can get deadlocks, data races and such.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Adding to Cogman's and nkgreen's comments:
fork() by itself creates a new process. The parent process and the new child process do not share memory (though that can be set up with other system calls).

And now for some concrete help: Look at pthread_create() (e.g., man pthread_create). Also, for mutexes, pthread_mutex_lock(), pthread_mutex_unlock(), pthread_mutex_init(). For thread destruction, pthread_join().
 

RedArmy

Platinum Member
Mar 1, 2005
2,648
0
0
Unfortunately, I can't use pthreads yet, I have to stick with the fork() method, threads comes later on for us. We also aren't supposed to be using semaphores or mutex yet. Basically my only question is more of a math/logic one I suppose.

I have variables that tell me column length and row height for both matrices (Column1, Row1, Column2, Row2). I have a vector in shared memory where I'm storing the answers to my multiplications I get from the individual processes.

Therefore, I'm wondering the best way to traverses through my two matrices (which could be of any size) and be able to send a single multiply to each process until I multiplied every number between the two matrices and stored the results in that shared memory vector area. It was suggested that we use 3 for loops (see example) but I'm not sure if there's an easier/ more intuitive way to do it.

for(b=0; b<Column2;b++){
What goes here to set my multiplicand and multiplier?
-----------for(ca=0; ca<Column1;ca++){
Same here
--------------------- for(ra=0;ra<Row1;ra++){
and here
--------------}
-------}
}
I didn't post this but this is where I set up the shared memory:

void sharing(char input2, int answers){
key_t key;
key = ftok(input2, 'A');
memid = shmget(key,1024,IPC_CREAT|IPC_PRIVATE|SHM_PERM); //handle
base = (myshared *)shmat(memid,0,0);

I took out the error checking here to make it easier to read. 'input2' is just the name of the file I'm reading the matrices from and 'answers' is just (Row1*Column2)


Edit: Also, I didn't add in the brackets since I'm not sure if I should be enclosing my whole area up until I print the answers or if I just do it up until the fork() area. The example I have doesn't have any brackets so I wasn't 100% sure.

I'm also not 100% sure if the contents of my if (i==0){....} function is right for storing multiply answers in the shared memory vector is right.

If this seems convoluted as hell, it's because it is. Apparently when we switch to threads it gets easier.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: RedArmy
The vector multans is in shared memory, I have all of that defined elsewhere and I didn't want to take up more space than necessary.

Sorry -- I missed this part before.

I'm going to make some responses to your code now. I assume you implemented Cogman's suggestion about curly braces.

Apologies in advance if my presentation of the material is below your level. There's something to be said for generality in forums.

Also, I feel for you with the System V shared memory stuff ... there be dragons -- I know them well.

Here's a sketch of a mulit-process shared memory matrix multiply.

- Fork N processes.
- Each process does 1/Nth of the multiply
- waitpid() N processes

Now, the only thing that differentiates one process from another is the region of the output matrix that the 'current' process happens to be working on. Notice that this implementation uses N+1 total processes, including the master process.

Each worker process should have a workID, from 0 to N-1. Here's how to set that up:

int myWorkID = -1;
for(int i=0; i<N; ++i) {
..int pid = fork();
..if(pid == 0) {
..../* This is a child process */
....myWorkID = i;
....break;
..} else {
..../* This is the parent process */
....continue;
..}
}

Now, divide the output matrix into N slices (I prefer to divide the number of rows by N -- there are arguments to use columns instead). Each child process will be responsible for output rows on the inclusive range bounded by myWorkID*Nrows/N and (myWorkID+1)*Nrows/N. I have assumed Nrows divides evenly by N.

In other words:
if(myWorkID != -1) {
..int lower_bound = myWorkID*Nrows/N;
..int upper_bound = (myWorkID+1)*Nrows/N;
..for( int row = lower_bound; row < upper_bound; ++row ) {
..../* Iterate over all possible output columns */
..../* at each row/column pair, find the matrix product for that element */
..}
} else {
../* Parent process doesn't do anything... except waitpid() */
..for(int i=0;i<N;++i) {
..../* Use your favorite flavor of waitpid() here */
....waitpid(-1, NULL, 0);
..}
}

A word on using vector<int>:
You might have trouble getting a vector<int> to properly use a system V shared segment of memory -- just because the struct holding the vector resides in the segment does not mean the vector's data is in the shared segment. I recommend you get it working first with plain old volatile int** first.
 

RedArmy

Platinum Member
Mar 1, 2005
2,648
0
0
Thanks for the thorough reply degibson. I completely follow everything you said except for the if(myWorkID != -1) { stuff } part. Basically, my value for 'tsets' is every single multiply between each element in the two matrices. For example, if I had two 3x3 matrices, I'd have 27 (this value would be my tsets value) total multiplies. Each one of those multiplies would be passed off to a different child and then the product of those two numbers would be computed and then thrown into the multans vector in shared memory (Stored sequentially so I can compute the final output matrix later on by summing the results of those products).

That's why I'm wondering if I need to throw an index variable in there that updates after each child process and advances to the next location in the vector (at the end of my child process in the code I wrote I said multans[index] = output where output is the result of the multiply and index was the current position in the multans vector (index gets updated at the beginning before each fork().

I'm not sure if you're code is just a different way of doing the same thing, but I guess I'm just having a little bit of trouble interpreting it since you say we'd be "dividing the number of rows by N", however in this case 'N' would be my tsets (maybe?) and I'd always end up with fractions. Also, each time I reach the end of a child process, do I need to use _exit? We were told that we might have to but I'm not sure in this case if I have to.
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: RedArmy
...Each one of those multiplies would be passed off to a different child and then the product of those two numbers would be computed and then thrown into the multans vector in shared memory

I'm having a hard time understanding what it is you want -- that is, I cannot imagine why you would want to organize a computation in this way. Assigning multiply operations individually to subprocesses will have enormous overhead. Is this a stated requirement?

I completely follow everything you said except for the if(myWorkID != -1) { stuff } part.

The idea there is to start a small number of concurrent subprocesses (usually one process per CPU), and farm out the computation evenly among all the subprocesses.

Stored sequentially so I can compute the final output matrix later on by summing the results of those products.

Again, I'm not sure why this is necessary (another requirement?). Summing is commutative -- you can do it in any order and not change the result, so there is no need to restrict storing of the output.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
It sounds like you want to do fine-grained parallelism with fork(). That is a really bad idea, because creating a new process is a fairly heavy-weight operation and you will be creating O(n^3) processes. I would not do this with pthreads either. Neither are suitable for lightweight, fine-grained parallelism. The number of cycles to create a thread or process is going to far out-weigh a multiplication. Now, if you were using a data-parallel language then this would not be a problem. But since you are not, you want to make sure that you are doing enough work in each thread to justify its creation.


 

Cogman

Lifer
Sep 19, 2000
10,286
145
106
Originally posted by: dinkumthinkum
It sounds like you want to do fine-grained parallelism with fork(). That is a really bad idea, because creating a new process is a fairly heavy-weight operation and you will be creating O(n^3) processes. I would not do this with pthreads either. Neither are suitable for lightweight, fine-grained parallelism. The number of cycles to create a thread or process is going to far out-weigh a multiplication. Now, if you were using a data-parallel language then this would not be a problem. But since you are not, you want to make sure that you are doing enough work in each thread to justify its creation.

He is doing this for an assignment. I agree with your general assessment though, its just that this is for some arbitrary assignment. (fork should never be used, ever. I don't know any functionality that it gives that pthreads doesn't with a much lighter overhead)
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Originally posted by: dinkumthinkum
It sounds like you want to do fine-grained parallelism with fork(). ...

Thats the thing that confuses me -- every which way I try to picture the description, there ends up being no parallelism at all. Here's the part that confuses me:

Originally posted by: RedArmy
(Stored sequentially so I can compute the final output matrix later on by summing the results of those products).

Anyway, back to the topic.

RedArmy: If you want to do every multiplication as a subprocess, thats possible. Instead of dividing matrix rows, just make one process for each output element (i.e., for each row/col pair).

Something like this:
for(int row=0; row<Nrows; ++row) {
..for(int col=0; col<Ncols; ++col ) {
....int pid = fork();
....if( pid == 0 ) {
......output[row][col] = doMuliplication();
......exit(0);
....} else {
......waitpid(pid,NULL,0);
..}
}

After computing, you should force each child process to exit... with _exit if you want -- exit(0) is better*.

* exit(0) will have more overhead, but we're talking about processes, and its going to be horribly slow regardless.

Originally posted by: Cogman
.... (fork should never be used, ever. I don't know any functionality that it gives that pthreads doesn't with a much lighter overhead)

fork() should almost never be used. Probably never for data parallelism.

fork() works fairly well for latency-tolerant I/O -- basically a poll()/fork() loop. There are also handy failure isolation reasons to use fork(). For instance, I have a piece of legacy code, pre-built, uses an archaic ABI, that occasionally trapes all over its memory space. Thankfully, I've set it up as a child process, controllable via a system V shared segment. So, occasionally the segment is hosed, but not the entire computation (safe in the parent process).
 
Sep 29, 2004
18,656
68
91
I know of PThreads but never used them. If anyone has some good online sources of info, please PM them to me.
 

adamant_387

Junior Member
Mar 17, 2013
1
0
0
RedArmy ...I am trying to make similiar program . Could you please help me with this ?
 
Last edited: