Name

mpirun_chgm — run a MPIch/GM MPI program on a ParaStation MPI cluster

Synopsis

mpirun_chgm [-?Vv] -np nodes [[-nodes nodelist] | [-hosts hostlist] | [-hostfile hostfile]] [-sort {[proc] | [load] | [proc+load] | [none]} ] [-all-local] [-inputdest dest] [-sourceprintf] [-rusage] [-exports envlist] [[--gm-no-shmem] | [--gm-numa-shmem]] [--gm-wait=sec] [--gm-kill=sec] [--gm-eager=size] [--gm-recv=mode] [--usage] program [arg]...

Description

mpirun_chgm is a tool that enables MPIch/GM programs to run on a ParaStation MPI cluster under control of the ParaStation MPI management facility. Within ParaStation MPI the startup of parallel jobs is handled as described within the process_placement(7) manual page. The spawning mechanism is either steered by environment variables, which are described in detail within ps_environment(7), or via options to the mpirun_chgm command. In fact these do nothing but setting the corresponding environment variables.

mpirun_chgm typically works like this:

  mpirun_chgm -np num prog [args]

This will startup the parallel MPIch/GM program prog on num nodes of the cluster. args are optional argument which will be passed to each instance of prog.

Options

-np nodes

Specify the number of processors to run on.

-nodes nodelist

Define the nodes which will build the partition of the ParaStation MPI cluster used in order to spawn new processes.

nodelist is a single character string containing a comma separated list of ParaStation MPI IDs. Depending on the existence of the environment variable PSI_NODES_SORT and the presence of the -sort option, the order of the nodes within nodelist might be relevant.

If the number of spawned processes exceeds the number of nodes within the partition, some nodes may get more than one process.

If any of the environment variables PSI_NODES, PSI_HOSTS or PSI_HOSTFILE is set, this option must not be given.

-hosts hostlist

Define the nodes which will build the partition of the ParaStation MPI cluster used in order to spawn new processes.

hostlist is a single character string containing a space separated list of hostnames. These have to be resolvable in order to get the corresponding ParaStation MPI IDs. Depending on the existence of the environment variable PSI_NODES_SORT and the presence of the -sort option, the order of the nodes within hostlist might be relevant.

If the number of spawned processes exceeds the number of nodes within the partition, some nodes may get more than one process.

If any of the environment variables PSI_NODES, PSI_HOSTS or PSI_HOSTFILE is set, this option must not be given.

-hostfile hostfile

Define the nodes which will build the partition of the ParaStation MPI cluster used in order to spawn new processes.

hostfile is the name of a file containing a list of hostnames. These have to be resolvable in order to get the corresponding ParaStation MPI IDs. The format of the file is one hostname per line. Depending on the existence of the environment variable PSI_NODES_SORT and the presence of the -sort option, the ordering of the nodes within the hostfile might be relevant.

If the number of spawned processes exceeds the number of nodes within the partition, some nodes may get more than one process.

If any of the environment variables PSI_NODES, PSI_HOSTS or PSI_HOSTFILE is set, this option must not be given.

-sort mode

Steer the sorting criterion which is used in order to bring the nodes within a partition in an appropriate order. This order will be used to spawn remote processes. The following values of mode are recognized:

proc

The nodes are sorted by the number of running ParaStation MPI processes before new processes are spawned. This is the default behavior.

load

The nodes are sorted by load before new processes are spawned. Therefore nodes with the least load are used first.

To be more specific, the load average over the last minute is used as the sorting criterion.

proc+load

The nodes are sorted corresponding to the sum of the 1 minute load and the number of running ParaStation MPI processes. This will lead to fair load-balancing even if processes are started without notification to the ParaStation MPI management facility.

none

No sorting of nodes before new processes are spawned. The nodes are used in a round robin fashion as they are set in the PSI_NODES, PSI_HOSTS or PSI_HOSTFILE environment variables or via the corresponding -nodes, -hosts or -hostfile options.

If the environment variables PSI_NODES_SORT is set, this option must not be given.

-all-local

Run all processes on the master node.

Keep in mind that the masternode is not necessarily the local machine but, depending on the ParaStation MPI configuration and the options and environment variables given, may be any machine within the ParaStation MPI cluster. Nevertheless all processes building the parallel MPIch/GM task will run on the same node of the ParaStation MPI cluster.

-inputdest dest

Define the process which receives any input to the parallel task. dest is an integer number in the range from 0 to nodes-1, where nodes is set by the -np option.

The default is to send the input to the process with rank 0 within the parallel task.

-sourceprintf

If this option is enabled, the logger will give information about the source of the output produced, i.e. “[id]:” will be prepended to any line of output, where id is the rank of the printing process within the parallel task.

Usually the id coincides with the MPI-rank.

-rusage

When this option is given, the logger will print a notice about the user and system time consumed by each process within the parallel task upon exit of this process.

-exports envlist

Register a list of environment variables which should be exported to remote processes during spawning. Some environment variables (HOME, USER, SHELL and TERM) are exported by default.

Furthermore PWD is set correctly for remote processes.

envlist is a single character string containing a comma separated list of environment variables. Only the name of the environment variable has to be given.

If the environment variable PSI_EXPORTS is set, envlist will be appended to this variable.

--gm-no-shmem

Disable the shared memory support (enabled by default).

--gm-numa-shmem

Enable shared memory only for processes sharing the same Myrinet interface.

--gm-wait=sec

Wait sec seconds between each spawning step.

Usually the spawning of the client processes is done as fast as possible. Using this option a delay of sec seconds is introduced in between each spawning step.

--gm-kill=sec

Kill all processes sec seconds after the first one exits.

Usually the termination of a job is handled as follows: As long as a client process terminates normally, i.e. no signal was delivered to the process and the return value is 0, all other processes of the parallel task are not affected in any way. In any other case, i.e. if a client process exits with return value different from 0 or as the effect of a signal delivered to the process, all other processes will be killed immediately.

Using this option changes the behavior of parallel tasks with processes terminating normally. Now all other processes will be killed sec seconds after the first process has terminated.

Keep in mind that the implementation of MPI_finalize() within MPIch/GM blocks. This means no process of the parallel task will return from the MPI_finalize() before all processes have called this function. Thus this option only makes sense if a process is able to exit execution without calling MPI_finalize(). Following the MPI standard it is not recommended to do so.

--gm-eager=size

Specifies the eager/rendezvous protocol threshold size.

As within any state of the art communication library MPIch/GM implements two different protocols used depending on the message size being send. This option enables the possibility to modify the threshold size that determines which protocol to use.

size is given in bytes.

--gm-recv=mode

Specifies the receive mode of the GM communication library. Possible values for mode are polling, blocking or hybrid. The default is polling.

For a detailed description of the different receive modes please refer to the GM documentation.

-V , --version

Output version information and exit.

-v , --verbose

Verbose execution with many message during startup of the parallel task.

-? , --help

Show a help message.

--usage

Display a brief usage message.

Examples

In order to start the parallel MPIch/GM program prog1 on any 5 nodes within the ParaStation MPI cluster, execute:

  mpirun_chgm -np 5 prog1 -v

The option -v will be passed to any instance of prog1 spawned.

If the parallel task should run on the nodes 5-9 of the cluster,

  mpirun_chgm -np 5 -nodes "5,6,7,8,9" prog1

has to be executed.

If the nodes should be sorted by load, use:

  mpirun_chgm -np 5 -nodes "5,6,7,8,9" -sort load prog1

In order to acquire information about the user and system time used by the spawned processes on the different nodes run:

  mpirun_chgm -np 5 -rusage prog1

Errors

No known errors.

See also

psmstart(1), ps_environment(7), process_placement(7)