mpiexec — run a ParaStation MPI program
mpiexec { --np num
| -np num
| -n num
} [
-e | --exports=envlist
] [
-x | --envall
] [
-E | --env=name
value
] [
-b | --bnr
] [
-a | --jobalias=alias
] [
options
] program [arg]...
mpiexec { -A | --admin } [
-L | --login=name
] {[
-N | --nodes=nodelist
] | [
-H | --hosts=hostlist
] | [
-f | --hostfile=file
] | [
--machinefile=file
]} program [arg]...
mpiexec [ -V | --version ] [ -? | --help ] [ --usage ] [ --extendedhelp ] [ --extendedusage ] [ --debughelp ] [ --debugusage ] [ --commhelp ] [ --commusage ]
The mpiexec command is the typical way to start parallel or serial jobs. It hides the differences of starting jobs of various implementations of the Message Passing Interface, version 2, from the user. Within the ParaStation MPI implementation of this command, the startup of parallel jobs is handled as described by the process_placement(7) manual page. The process spawning may also be steered by environment variables which are described in detail within ps_environment(7).
This version of mpiexec supports the
Process Manager Interface (PMI)
protocol.
Therefore, this version of mpiexec also
supports many other implementations of MPI2, like MPICH2,
MVAPICH2 or Intel MPI.
The command mpiexec is typically used like
mpiexec -npnum
prog
[args
]
This will start up the program
prog
num
times in parallel forming a
parallel job.
Args
are optional arguments which will be
passed to each task.
Prog is not necessarily required to use MPI
calls to transfer data.
To run a serial job, aka a job consisting only of a single
task, use a task count of 1
, e.g.
mpiexec -np 1prog
[args
]
-n num
,
-np num
,
--np num
,
--np=num
Specify the number of processes to start.
-e
,
--exports=envlist
Name or comma-separated list of environment variable(s) exported to all processes.
-x
,
--envall
Export all environment variables to all processes.
-E
,
--env name
value
Export the variable name
with the content value
.
-b
,
--bnr
Enable ParaStation4 compatibility mode.
-a
,
--jobalias=name
Assign an alias to the job. This name is currently only used for accounting purposes.
-N
,
--nodes=list
Comma- or space-separated list of nodes IDs to use,
e.g. 3-5,7,11-17
.
-H
,
--hosts=list
Comma- or space-separated list of hosts to use, e.g.
host1,host2
.
-f
,
--hostfile=filename
Hostfile to use.
--machinefile=filename
Machinefile to use. Equal to
--hostfile
.
-o
,
--overbook
Allow overbooking.
-f
,
--loopnodesfirst
Place consecutive processes on different nodes, if possible. If not set, as many as possible consecutive processes will be placed within a SMP node to allow local communication using shared memory.
-E
,
--exclusive
Do not allow any other processes on used nodes.
-s
,
--sort=criteria
Sorting criteria to be used when selecting nodes:
proc
,
load
,
proc+load
or
none
.
-d
,
--wdir=dir
Working directory to start the processes.
-P
,
--path=pathlist
Set the PATH list.
-c
,
--discom=connection
Disable a particular communication architecture:
SHM
,
TCP
,
P4S
,
GM
,
MVAPI
,
OPENIB
or
ELAN
.
Connection
may also be a
comma- or space-separated list to disable multiple
architectures at once.
This option is typically used for testing and debugging
purposes only.
-t
,
--network=string
Space-separated list of networks used for MPI communication.
-y
,
--schedyield
Use sched yield system call.
-r
,
--retry=num
Number of connection retries.
-s
,
--inputdest=ranklist
Define the rank the standard input is forwarded to.
Rank
may also be a
comma-separated list of ranks, or
all
to send the input to all
processes.
For example, the option
--inputdest=
will forward standard input to all but ranks 4, 6 and 7.
0-3,5,8-15
-l
,
--sourceprintf
Print output-source info.
-T
,
--timestamp
Print time stamps.
-R
,
--rusage
Print elapsed sys/user time.
-m
,
--merge
Merge identical output lines from all processes.
-A
,
--admin
Start processes as administrative tasks. These tasks are not counted within the ParaStation resource management and may be run in parallel to compute tasks. Only privileged users are allowed to run administrative tasks.
-L
,
--login=name
Remote user used to execute command. For administrative tasks only.
--gdb
Run processes under control of gdb.
-u
,
--usize=size
Set the universe size.
--extendedhelp
Print extended help.
--debughelp
Print debug help.
--comhelp
Print communication debug help.
--plugindir=dir
Directory to search plugins. TODO
--sndbuf=size
Define the TCP sendbuffer size.
--rcvbuf=size
Define the TCP receivebuffer size.
--delay
Don't use the NODELAY option for TCP sockets.
--mergedepth=num
Number of lines to search for identical output. Defaults to 300.
--mergetimeout=secs
Timeout in seconds to wait for identical output. Defaults to 2secs.
--pmitimeout=secs
Timeout in seconds for all clients to join the
first PMI barrier.
Use -1
to disable.
Defaults to 60sec + np *
0.5µsec
.
--pmiovertcp
Connect to the PMI client using a TCP/IP socket.
--pmioverunix
Connect to the PMI client using a UNIX domain socket (default).
--pmidisable
Disable PMI interface.
--pscomdb
Enable libpscom
debugging.
--psidb
Enable psid debugging.
--pmidb
Enable PMI
debugging.
--pmidbclient
Enable PMI
client debugging.
--pmidbkvs
Enable PMI
key-value-space debugging.
--show
Show command for remote execution but don't run it.
--loggerrawmode
Set raw mode of the logger.
--sigquit
Output debug information on signal SIGQUIT.
-bnr
Enable ParaStation4 compatibility mode.
Same as --bnr
.
-machinefile=filename
Machinefile to use.
Same as --hostfile
.
-1
Override default of trying first (ignored).
-ifhn=string
Space-separated list of networks enabled.
-file=string
File with additional information (ignored).
-tv=string
Run procs under totalview (ignored).
-tvsu=string
Totalview startup only (ignored).
-gdb
Run processes under control of gdb.
Same as --gdb
.
-gdba
Attach to debug processes with gdb (ignored).
-ecfn
Output xml exit codes filename (ignored).
-dir=dir
,
-wdir=dir
Working directory to start the processes.
Same as --wdir
.
-umask=mask
Umask for remote process (ignored).
-path=pathlist
Set the PATH list.
-host=host
Host to start on (ignored).
-soft=num
Giving hints instead of a precise number for the number of processes (ignored).
-arch=arch
Arch type to start on (ignored).
-envall
Export all environment variables to all processes.
Same as --envall
.
-envnone
Export no environment variables to all processes.
-envlist=envlist
Name or comma-separated list of environment
variable(s) exported to all processes.
Same as --exports
.
-env=name
value
Export the variable name
with the content value
.
Same as --env
.
-usize=size
Set the universe size.
Same as --usize
.
-gnp num
,
-gn num
Specify the number of processes to start.
Same as -np
or -n
.
-gdir=dir
,
-gwdir=dir
Working directory to start the processes.
Same as --wdir
.
-gumask=mask
Umask for remote process (ignored).
-gpath=pathlist
Set the PATH list. Same as -path
.
-ghost=host
Host to start on (ignored).
-gsoft=num
Giving hints instead of a precise number for the number of processes (ignored).
-garch=arch
Arch type to start on (ignored).
-genvall
Export all environment variables to all processes.
Same as --envall
.
-genvnone
Export no environment variables to all processes.
Same as -envnone
.
-genvlist=envlist
Name or comma-separated list of environment
variable(s) exported to all processes.
Same as --exports
.
-genv=name
value
Export the variable name
with the content value
.
Same as --env
.
The mpiexec command may be controlled in many ways. To do so, a lot of specialized options are implemented. This sections describes some of them and the underlying concepts.
The ParaStation MPI mpiexec command supports
many options also found in other implementations, especially the
MPICH2 version, to ensure compatibility on a command line
level.
For many of the long options, indicated by two dashes
(--
), versions with only one dash are
implemented, e.g. -envall
is equivalent to
--envall
.
The colon syntax (:
) for defining local
options, used for MPI_COMM_SPAWN_MULTIPLE
calls, is currently not supported.
In addition, configuration files are not supported (option
-configfile
).
The mpiexec command will forward all possible
signals to all tasks of the actual job.
In particular, the signal SIGTSTP
will be
sent to all controlled processes and will therefore suspend
the entire job immediately.
See ParaStation MPI User's Guide for details about suspending jobs.
This version of mpiexec supports the simple Process Manager Interface, version 1.1. Thus, it is able to startup and control any PMI compatible MPI application. The PMI protocol is the default interface for a mpd daemon. Supporting this protocol, the ParaStation MPI daemon psid(8) now is a complete replacement for mpd.
The PMI implementation is completely transparent to the user. ParaStation MPI will use this by default.
If at least one task uses the PMI protocol, the proper startup
of all tasks is ensured by a global PMI barrier.
This barrier is monitored by a timeout (see option
--pmitimeout
).
If at least one task of the application fails to connect to
the PMI within this timeout, the entire application will be
terminated.
Using the option --hostfile
or
--machinefile
, the
mpiexec command will read
the list of nodes to be used from the specified file.
One node name per line may be listed.
Separated by a white space, the option
ifhn=
may be
added to each node defining the desired subnet for the MPI
traffic for this particular node.
subnet
Subnet
may be a subnet or the IP
address of a node's interface.
In case this subnet is not available or suitable, the communication subsystem may use a different subnet or interface for MPI communication.
To improve the readability of the output of parallel
applications, identical output lines may be merged and printed
only once using the --merge
option.
Output lines have to be delimited by a
\n
.
If the line merging option is enabled,
the logger process buffers all lines
output to stdout and stderr by each task to compare it against
each other and therefore to identify identical lines.
If an identical line for all tasks is found, it is written
to stdout or stderr, respectively.
The search depth may be set using the
--mergedepth
option.
Each output line is internally marked with a timestamp and
monitored by a timeout.
After this timeout, all identical lines read up to now are
combined and written to stdout or stderr.
The line order will be preserved.
The timeout will improve the actual feedback to the user, even
if one or more tasks will delay the output for a long period
of time.
The timeout may be set using the option
--mergetimeout
.
When merging output, each line written to stdout or stderr is
prepended with a list of ranks enclosed in brackets at the
beginning of the line, like
[0-13,15]
.
With this example, an identical line was read from all but
rank 14, assuming this is a job running on 16 CPUs.
The output of rank 14 may be listed after a certain period of
time, prepended with [14]
or may be
missing completely, this is up to the application.
While merging output lines, no output is ever suppressed! All lines will be output.
In general, output merging should not be used when saving
binary output data.
Especially, the characters \n
and
\0
are swallowed.
In addition to parallel or serial compute jobs, the mpiexec command may also be used to spawn administrative tasks. This kind of tasks are not counted within the ParaStation MPI resource management and therefore may be run on nodes already in use.
To run an administrative task, use the option
--admin
.
This will also enable the output merging capabilities, see
option --merge
.
For example, the command
mpiexec --admin --hosts=node1,node2,node3,node4 date
will run the date command on the nodes 1 to
4.
You have to supply a list of nodes using the
--hosts
, --nodes
or
--hostfile
option.
The number of processes will be automatically determined by
the number of nodes.
Only members of the adminuser
or admingroup
list are allowed to run
administrative tasks.
Root is typically a member of the
adminuser
list.
Parallel and serial jobs may be run under the control of
gdb using the command line argument
--gdb
.
By default, the standard input is redirected to each process
(see option --inputdest=all
) and the output of all
processes is merged (--merge
) to improve
readability.
The initial PMI timeout is also disabled
(--pmitimeout
).
Each task is controlled by it's own gdb
instance.
All gdb instances will be controlled in parallel.
As with gdb, to actually run an application the
run must be issued.
All optional arguments from the mpiexec
command line will be used by gdb.
To supply arguments to the parallel tasks, use the gdb command
run args
or
the gdb option -args
.
To re-route the standard input to a particular task or a set
of tasks, use the pseudo command
[rank
],
where rank
is a single rank or a
comma-separated list of ranks.
Again, to redirect the input to all tasks, use the command
[all
].
The gdb prompt will be automatically set to
"(gdb)\n
".
The newline is required by the merging option.
Therefore, the input will be read from the beginning of a new
line.
Due to security reasons, terminated processes may not be restarted. Restart the entire job instead.
Signals are handled like expected: all catchable signals are forwarded by the logger task to all foreground processes controlled by the local forwarder(s). Within gdb, the resulting action of those forwarded signals depend on the actual signal handling within gdb.
psmstart(1), ps_environment(7), process_placement(7), psid(8) and ParaStation MPI User's Guide.