Suspending ParaStation MPI tasks

Parallel Tasks started by ParaStation MPI can be suspended by sending the system signal SIGTSTP to the ParaStation MPI Logger process. The signal will be forwarded to all processes of the parallel task and will by default stop the processes. To continue, the SIGCONT must be sent to the ParaStation MPI Logger process. This signal will also be forwarded to all processes of the task.

Note

The application has to be prepared to handle interrupted system calls properly.

Depending on the transport protocol in use, tasks can be suspended only for a limited period time. If using TCP (HwType ethernet), connections may timeout and after sending the SIGCONT signal, the processes will receive I/O errors for this sockets. Using the ParaStation MPI protocol p4sock will solve this problem, as this protocol does not use any timeout features.

Suspending a task using the signal SIGTSTP will also trigger the ParaStation MPI queuing facility (see the section called “Using the ParaStation MPI queuing facility”). Depending of the global setting of freeOnSuspend, CPUs will be reused for newly spawned processes. Refer to parastation.conf(5).