Problems with MPI (c++)
Hi folks
I have a rather puzzling problem with an MPI program that I'm developing. Essentially, the story is as follows:
I can send and receive messages successfully. But some times the execution comes to a halt. I do the following:
1) Process 0 reads some parameters from a file and sends them to the other processes.
I simply use MPI::COMM_WORLD.Send/Recv and it works fine. All processes print that they got the parameter etc. As far as I understand, this should be blocking, i.e. every process should wait an unlimited amount of time for the data, right?
2) Process 0 reads a data set with 100x100x100x8 data points and hands them out in chunks of 8 points. So process 0 reads entry [i,j,k] from file and does a send operation. The receiving processes are then set to receive the number of data chunks which correspond to their part of the data, which is 50*50*50.
But this is not working properly. By printing [i,j,k] for every chunk of data, I am able to conclude the following:
It all starts fine, with data being sent to the processes. Then, it becomes radically slower, and often come to a halt at some point. It never halts at the same point and one it actually completed after stalling for some time. The one time it completed, all the processes printed that they had all the data and the test run exited normally. I don't understand anything of this. Does anyone have an Idea?
Best
Carlis
Hi folks
I have a rather puzzling problem with an MPI program that I'm developing. Essentially, the story is as follows:
I can send and receive messages successfully. But some times the execution comes to a halt. I do the following:
1) Process 0 reads some parameters from a file and sends them to the other processes.
I simply use MPI::COMM_WORLD.Send/Recv and it works fine. All processes print that they got the parameter etc. As far as I understand, this should be blocking, i.e. every process should wait an unlimited amount of time for the data, right?
2) Process 0 reads a data set with 100x100x100x8 data points and hands them out in chunks of 8 points. So process 0 reads entry [i,j,k] from file and does a send operation. The receiving processes are then set to receive the number of data chunks which correspond to their part of the data, which is 50*50*50.
But this is not working properly. By printing [i,j,k] for every chunk of data, I am able to conclude the following:
It all starts fine, with data being sent to the processes. Then, it becomes radically slower, and often come to a halt at some point. It never halts at the same point and one it actually completed after stalling for some time. The one time it completed, all the processes printed that they had all the data and the test run exited normally. I don't understand anything of this. Does anyone have an Idea?
Best
Carlis