Name

mpiexec — run a ParaStation MPI program

Synopsis

mpiexec { --np num | -np num | -n num } [ -e | --exports=envlist ] [ -x | --envall ] [ -E | --env=name value ] [ -b | --bnr ] [ -a | --jobalias=alias ] [ options ] program [arg]...

mpiexec { -A | --admin } [ -L | --login=name ] {[ -N | --nodes=nodelist ] | [ -H | --hosts=hostlist ] | [ -f | --hostfile=file ] | [ --machinefile=file ]} program [arg]...

mpiexec [ -V | --version ] [ -? | --help ] [ --usage ] [ --extendedhelp ] [ --extendedusage ] [ --debughelp ] [ --debugusage ] [ --commhelp ] [ --commusage ]

Description

The mpiexec command is the typical way to start parallel or serial jobs. It hides the differences of starting jobs of various implementations of the Message Passing Interface, version 2, from the user. Within the ParaStation MPI implementation of this command, the startup of parallel jobs is handled as described by the process_placement(7) manual page. The process spawning may also be steered by environment variables which are described in detail within ps_environment(7).

This version of mpiexec supports the Process Manager Interface (PMI) protocol. Therefore, this version of mpiexec also supports many other implementations of MPI2, like MPICH2, MVAPICH2 or Intel MPI.

The command mpiexec is typically used like

  mpiexec -np num prog [args]

This will start up the program prog num times in parallel forming a parallel job. Args are optional arguments which will be passed to each task. Prog is not necessarily required to use MPI calls to transfer data.

To run a serial job, aka a job consisting only of a single task, use a task count of 1, e.g.

  mpiexec -np 1 prog [args]

Options

General options

-n num , -np num , --np num , --np=num

Specify the number of processes to start.

-e , --exports=envlist

Name or comma-separated list of environment variable(s) exported to all processes.

-x , --envall

Export all environment variables to all processes.

-E , --env name value

Export the variable name with the content value.

-b , --bnr

Enable ParaStation4 compatibility mode.

-a , --jobalias=name

Assign an alias to the job. This name is currently only used for accounting purposes.

Options controlling process placement

-N , --nodes=list

Comma- or space-separated list of nodes IDs to use, e.g. 3-5,7,11-17.

-H , --hosts=list

Comma- or space-separated list of hosts to use, e.g. host1,host2.

-f , --hostfile=filename

Hostfile to use.

--machinefile=filename

Machinefile to use. Equal to --hostfile.

-o , --overbook

Allow overbooking.

-f , --loopnodesfirst

Place consecutive processes on different nodes, if possible. If not set, as many as possible consecutive processes will be placed within a SMP node to allow local communication using shared memory.

-E , --exclusive

Do not allow any other processes on used nodes.

-s , --sort=criteria

Sorting criteria to be used when selecting nodes: proc, load, proc+load or none.

-d , --wdir=dir

Working directory to start the processes.

-P , --path=pathlist

Set the PATH list.

Communication options

-c , --discom=connection

Disable a particular communication architecture: SHM, TCP, P4S, GM, MVAPI, OPENIB or ELAN. Connection may also be a comma- or space-separated list to disable multiple architectures at once. This option is typically used for testing and debugging purposes only.

-t , --network=string

Space-separated list of networks used for MPI communication.

-y , --schedyield

Use sched yield system call.

-r , --retry=num

Number of connection retries.

I/O forwarding options

-s , --inputdest=ranklist

Define the rank the standard input is forwarded to. Rank may also be a comma-separated list of ranks, or all to send the input to all processes. For example, the option --inputdest=0-3,5,8-15 will forward standard input to all but ranks 4, 6 and 7.

-l , --sourceprintf

Print output-source info.

-T , --timestamp

Print time stamps.

-R , --rusage

Print elapsed sys/user time.

-m , --merge

Merge identical output lines from all processes.

Privileged options

-A , --admin

Start processes as administrative tasks. These tasks are not counted within the ParaStation resource management and may be run in parallel to compute tasks. Only privileged users are allowed to run administrative tasks.

-L , --login=name

Remote user used to execute command. For administrative tasks only.

Other options

--gdb

Run processes under control of gdb.

-u , --usize=size

Set the universe size.

--extendedhelp

Print extended help.

--debughelp

Print debug help.

--comhelp

Print communication debug help.

Extended options

--plugindir=dir

Directory to search plugins. TODO

--sndbuf=size

Define the TCP sendbuffer size.

--rcvbuf=size

Define the TCP receivebuffer size.

--delay

Don't use the NODELAY option for TCP sockets.

--mergedepth=num

Number of lines to search for identical output. Defaults to 300.

--mergetimeout=secs

Timeout in seconds to wait for identical output. Defaults to 2secs.

--pmitimeout=secs

Timeout in seconds for all clients to join the first PMI barrier. Use -1 to disable. Defaults to 60sec + np * 0.5µsec.

--pmiovertcp

Connect to the PMI client using a TCP/IP socket.

--pmioverunix

Connect to the PMI client using a UNIX domain socket (default).

--pmidisable

Disable PMI interface.

Debugging options

--pscomdb

Enable libpscom debugging.

--psidb

Enable psid debugging.

--pmidb

Enable PMI debugging.

--pmidbclient

Enable PMI client debugging.

--pmidbkvs

Enable PMI key-value-space debugging.

--show

Show command for remote execution but don't run it.

--loggerrawmode

Set raw mode of the logger.

--sigquit

Output debug information on signal SIGQUIT.

Compatibility options

-bnr

Enable ParaStation4 compatibility mode. Same as --bnr.

-machinefile=filename

Machinefile to use. Same as --hostfile.

-1

Override default of trying first (ignored).

-ifhn=string

Space-separated list of networks enabled.

-file=string

File with additional information (ignored).

-tv=string

Run procs under totalview (ignored).

-tvsu=string

Totalview startup only (ignored).

-gdb

Run processes under control of gdb. Same as --gdb.

-gdba

Attach to debug processes with gdb (ignored).

-ecfn

Output xml exit codes filename (ignored).

-dir=dir , -wdir=dir

Working directory to start the processes. Same as --wdir.

-umask=mask

Umask for remote process (ignored).

-path=pathlist

Set the PATH list.

-host=host

Host to start on (ignored).

-soft=num

Giving hints instead of a precise number for the number of processes (ignored).

-arch=arch

Arch type to start on (ignored).

-envall

Export all environment variables to all processes. Same as --envall.

-envnone

Export no environment variables to all processes.

-envlist=envlist

Name or comma-separated list of environment variable(s) exported to all processes. Same as --exports.

-env=name value

Export the variable name with the content value. Same as --env.

-usize=size

Set the universe size. Same as --usize.

Global compatibility options

-gnp num , -gn num

Specify the number of processes to start. Same as -np or -n.

-gdir=dir , -gwdir=dir

Working directory to start the processes. Same as --wdir.

-gumask=mask

Umask for remote process (ignored).

-gpath=pathlist

Set the PATH list. Same as -path.

-ghost=host

Host to start on (ignored).

-gsoft=num

Giving hints instead of a precise number for the number of processes (ignored).

-garch=arch

Arch type to start on (ignored).

-genvall

Export all environment variables to all processes. Same as --envall.

-genvnone

Export no environment variables to all processes. Same as -envnone.

-genvlist=envlist

Name or comma-separated list of environment variable(s) exported to all processes. Same as --exports.

-genv=name value

Export the variable name with the content value. Same as --env.

Miscellaneous options

-V --version

Show version and exit.

-? --help

Show help message and exit.

--usage

Show brief usage message and exit.

--extendedusage

Show brief extended usage message and exit.

--debugusage

Show brief debug usage message and exit.

Extended description

The mpiexec command may be controlled in many ways. To do so, a lot of specialized options are implemented. This sections describes some of them and the underlying concepts.

Option compatibility

The ParaStation MPI mpiexec command supports many options also found in other implementations, especially the MPICH2 version, to ensure compatibility on a command line level. For many of the long options, indicated by two dashes (--), versions with only one dash are implemented, e.g. -envall is equivalent to --envall.

The colon syntax (:) for defining local options, used for MPI_COMM_SPAWN_MULTIPLE calls, is currently not supported. In addition, configuration files are not supported (option -configfile).

Signal handling

The mpiexec command will forward all possible signals to all tasks of the actual job. In particular, the signal SIGTSTP will be sent to all controlled processes and will therefore suspend the entire job immediately. See ParaStation MPI User's Guide for details about suspending jobs.

Process Manager Interface (PMI)

This version of mpiexec supports the simple Process Manager Interface, version 1.1. Thus, it is able to startup and control any PMI compatible MPI application. The PMI protocol is the default interface for a mpd daemon. Supporting this protocol, the ParaStation MPI daemon psid(8) now is a complete replacement for mpd.

The PMI implementation is completely transparent to the user. ParaStation MPI will use this by default.

If at least one task uses the PMI protocol, the proper startup of all tasks is ensured by a global PMI barrier. This barrier is monitored by a timeout (see option --pmitimeout). If at least one task of the application fails to connect to the PMI within this timeout, the entire application will be terminated.

Machine file format

Using the option --hostfile or --machinefile, the mpiexec command will read the list of nodes to be used from the specified file. One node name per line may be listed. Separated by a white space, the option ifhn=subnet may be added to each node defining the desired subnet for the MPI traffic for this particular node. Subnet may be a subnet or the IP address of a node's interface.

Note

In case this subnet is not available or suitable, the communication subsystem may use a different subnet or interface for MPI communication.

Merging output

To improve the readability of the output of parallel applications, identical output lines may be merged and printed only once using the --merge option. Output lines have to be delimited by a \n.

If the line merging option is enabled, the logger process buffers all lines output to stdout and stderr by each task to compare it against each other and therefore to identify identical lines. If an identical line for all tasks is found, it is written to stdout or stderr, respectively. The search depth may be set using the --mergedepth option.

Each output line is internally marked with a timestamp and monitored by a timeout. After this timeout, all identical lines read up to now are combined and written to stdout or stderr. The line order will be preserved. The timeout will improve the actual feedback to the user, even if one or more tasks will delay the output for a long period of time. The timeout may be set using the option --mergetimeout.

When merging output, each line written to stdout or stderr is prepended with a list of ranks enclosed in brackets at the beginning of the line, like [0-13,15]. With this example, an identical line was read from all but rank 14, assuming this is a job running on 16 CPUs. The output of rank 14 may be listed after a certain period of time, prepended with [14] or may be missing completely, this is up to the application.

Note

While merging output lines, no output is ever suppressed! All lines will be output.

Important

In general, output merging should not be used when saving binary output data. Especially, the characters \n and \0 are swallowed.

Spawning administrative tasks

In addition to parallel or serial compute jobs, the mpiexec command may also be used to spawn administrative tasks. This kind of tasks are not counted within the ParaStation MPI resource management and therefore may be run on nodes already in use.

To run an administrative task, use the option --admin. This will also enable the output merging capabilities, see option --merge. For example, the command

  mpiexec --admin --hosts=node1,node2,node3,node4 date

will run the date command on the nodes 1 to 4. You have to supply a list of nodes using the --hosts, --nodes or --hostfile option. The number of processes will be automatically determined by the number of nodes.

Note

Only members of the adminuser or admingroup list are allowed to run administrative tasks. Root is typically a member of the adminuser list.

Debugging parallel applications

Parallel and serial jobs may be run under the control of gdb using the command line argument --gdb. By default, the standard input is redirected to each process (see option --inputdest=all) and the output of all processes is merged (--merge) to improve readability. The initial PMI timeout is also disabled (--pmitimeout). Each task is controlled by it's own gdb instance. All gdb instances will be controlled in parallel.

As with gdb, to actually run an application the run must be issued. All optional arguments from the mpiexec command line will be used by gdb. To supply arguments to the parallel tasks, use the gdb command run args or the gdb option -args.

To re-route the standard input to a particular task or a set of tasks, use the pseudo command [rank], where rank is a single rank or a comma-separated list of ranks. Again, to redirect the input to all tasks, use the command [all].

The gdb prompt will be automatically set to "(gdb)\n". The newline is required by the merging option. Therefore, the input will be read from the beginning of a new line.

Due to security reasons, terminated processes may not be restarted. Restart the entire job instead.

Signals are handled like expected: all catchable signals are forwarded by the logger task to all foreground processes controlled by the local forwarder(s). Within gdb, the resulting action of those forwarded signals depend on the actual signal handling within gdb.

Running ParaStation4 applications

The mpiexec command may also be used to run applications linked with the former version of ParaStation MPI. To do so, use the --bnr option. This will enable the startup mechanism used by ParaStation4 and former versions.

Errors

No known errors.

See also

psmstart(1), ps_environment(7), process_placement(7), psid(8) and ParaStation MPI User's Guide.