mpirun_chgm — run a MPIch/GM MPI program on a ParaStation MPI cluster
mpirun_chgm [-?Vv] -np nodes
[[-nodes nodelist
] | [-hosts hostlist
] | [-hostfile hostfile
]] [-sort {[proc] | [load] | [proc+load] | [none]}
] [-all-local] [-inputdest dest
] [-sourceprintf] [-rusage] [-exports envlist
] [[--gm-no-shmem] | [--gm-numa-shmem]] [--gm-wait=sec
] [--gm-kill=sec
] [--gm-eager=size
] [--gm-recv=mode
] [--usage] program [arg]...
mpirun_chgm is a tool that enables MPIch/GM programs to run on a ParaStation MPI cluster under control of the ParaStation MPI management facility. Within ParaStation MPI the startup of parallel jobs is handled as described within the process_placement(7) manual page. The spawning mechanism is either steered by environment variables, which are described in detail within ps_environment(7), or via options to the mpirun_chgm command. In fact these do nothing but setting the corresponding environment variables.
mpirun_chgm typically works like this:
mpirun_chgm -npnum
prog
[args
]
This will startup the parallel MPIch/GM program prog
on num
nodes of the cluster.
args
are optional argument which will be
passed to each instance of prog.
-np nodes
Specify the number of processors to run on.
-nodes nodelist
Define the nodes which will build the partition of the ParaStation MPI cluster used in order to spawn new processes.
nodelist
is a single character string
containing a comma separated list of ParaStation MPI IDs. Depending on the
existence of the environment variable PSI_NODES_SORT
and the presence of the -sort
option, the order of
the nodes within nodelist
might be
relevant.
If the number of spawned processes exceeds the number of nodes within the partition, some nodes may get more than one process.
If any of the environment variables PSI_NODES
,
PSI_HOSTS
or PSI_HOSTFILE
is set,
this option must not be given.
-hosts hostlist
Define the nodes which will build the partition of the ParaStation MPI cluster used in order to spawn new processes.
hostlist
is a single character string
containing a space separated list of hostnames. These have to be
resolvable in order to get the corresponding ParaStation MPI IDs. Depending on
the existence of the environment variable
PSI_NODES_SORT
and the presence of the
-sort
option, the order of the nodes within
hostlist
might be relevant.
If the number of spawned processes exceeds the number of nodes within the partition, some nodes may get more than one process.
If any of the environment variables PSI_NODES
,
PSI_HOSTS
or PSI_HOSTFILE
is set,
this option must not be given.
-hostfile hostfile
Define the nodes which will build the partition of the ParaStation MPI cluster used in order to spawn new processes.
hostfile
is the name of a file
containing a list of hostnames. These have to be resolvable in
order to get the corresponding ParaStation MPI IDs. The format of the file is
one hostname per line. Depending on the existence of the
environment variable PSI_NODES_SORT
and the presence
of the -sort
option, the ordering of the nodes
within the hostfile
might be relevant.
If the number of spawned processes exceeds the number of nodes within the partition, some nodes may get more than one process.
If any of the environment variables PSI_NODES
,
PSI_HOSTS
or PSI_HOSTFILE
is set,
this option must not be given.
-sort mode
Steer the sorting criterion which is used in order to bring the
nodes within a partition in an appropriate order. This order will
be used to spawn remote processes. The following values of
mode
are recognized:
The nodes are sorted by the number of running ParaStation MPI processes before new processes are spawned. This is the default behavior.
The nodes are sorted by load before new processes are spawned. Therefore nodes with the least load are used first.
To be more specific, the load average over the last minute is used as the sorting criterion.
The nodes are sorted corresponding to the sum of the 1 minute load and the number of running ParaStation MPI processes. This will lead to fair load-balancing even if processes are started without notification to the ParaStation MPI management facility.
No sorting of nodes before new processes are spawned. The
nodes are used in a round robin fashion as they are set in
the PSI_NODES
, PSI_HOSTS
or
PSI_HOSTFILE
environment variables or via the
corresponding -nodes
,
-hosts
or -hostfile
options.
If the environment variables PSI_NODES_SORT
is set,
this option must not be given.
-all-local
Run all processes on the master node.
Keep in mind that the masternode is not necessarily the local machine but, depending on the ParaStation MPI configuration and the options and environment variables given, may be any machine within the ParaStation MPI cluster. Nevertheless all processes building the parallel MPIch/GM task will run on the same node of the ParaStation MPI cluster.
-inputdest dest
Define the process which receives any input to the parallel
task. dest
is an integer number in the
range from 0 to nodes
-1, where
nodes
is set by the -np
option.
The default is to send the input to the process with rank 0 within the parallel task.
-sourceprintf
If this option is enabled, the logger will give information about the source of the output produced, i.e. “[id]:” will be prepended to any line of output, where id is the rank of the printing process within the parallel task.
Usually the id coincides with the MPI-rank.
-rusage
When this option is given, the logger will print a notice about the user and system time consumed by each process within the parallel task upon exit of this process.
-exports envlist
Register a list of environment variables which should be
exported to remote processes during spawning. Some environment
variables (HOME
, USER
,
SHELL
and TERM
) are exported by
default.
Furthermore PWD
is set correctly for remote
processes.
envlist
is a single character string
containing a comma separated list of environment variables. Only
the name of the environment variable has to be given.
If the environment variable PSI_EXPORTS
is set,
envlist
will be appended to this
variable.
--gm-no-shmem
Disable the shared memory support (enabled by default).
--gm-numa-shmem
Enable shared memory only for processes sharing the same Myrinet interface.
--gm-wait=sec
Wait sec
seconds between each
spawning step.
Usually the spawning of the client processes is done as fast as
possible. Using this option a delay of
sec
seconds is introduced in between
each spawning step.
--gm-kill=sec
Kill all processes sec
seconds after
the first one exits.
Usually the termination of a job is handled as follows: As long as a client process terminates normally, i.e. no signal was delivered to the process and the return value is 0, all other processes of the parallel task are not affected in any way. In any other case, i.e. if a client process exits with return value different from 0 or as the effect of a signal delivered to the process, all other processes will be killed immediately.
Using this option changes the behavior of parallel tasks with
processes terminating normally. Now all other processes will be
killed sec
seconds after the first
process has terminated.
Keep in mind that the implementation of MPI_finalize() within MPIch/GM blocks. This means no process of the parallel task will return from the MPI_finalize() before all processes have called this function. Thus this option only makes sense if a process is able to exit execution without calling MPI_finalize(). Following the MPI standard it is not recommended to do so.
--gm-eager=size
Specifies the eager/rendezvous protocol threshold size.
As within any state of the art communication library MPIch/GM implements two different protocols used depending on the message size being send. This option enables the possibility to modify the threshold size that determines which protocol to use.
size
is given in bytes.
--gm-recv=mode
Specifies the receive mode of the GM communication library.
Possible values for mode
are
polling
, blocking
or
hybrid
. The default is polling
.
For a detailed description of the different receive modes please refer to the GM documentation.
-V
, --version
Output version information and exit.
-v
, --verbose
Verbose execution with many message during startup of the parallel task.
-?
, --help
Show a help message.
--usage
Display a brief usage message.
In order to start the parallel MPIch/GM program
prog1
on any 5 nodes within the ParaStation MPI cluster,
execute:
mpirun_chgm -np 5 prog1 -v
The option -v
will be passed to any instance of
prog1
spawned.
If the parallel task should run on the nodes 5-9 of the cluster,
mpirun_chgm -np 5 -nodes "5,6,7,8,9" prog1
has to be executed.
If the nodes should be sorted by load, use:
mpirun_chgm -np 5 -nodes "5,6,7,8,9" -sort load prog1
In order to acquire information about the user and system time used by the spawned processes on the different nodes run:
mpirun_chgm -np 5 -rusage prog1