ps_environment — ParaStation environment variables
Further variables may be used in order to modify the behavior of the logging facilities implementing a reliable forwarding of input and output.
The last section describes some less frequently used environment variables which affect the behavior of the MPIch system implementing the MPI interface on top of ParaStation.
The following environment variables are used during startup of parallel tasks or while distributing serial jobs throughout a cluster. Depending on their value a splitting of the cluster into virtual partitions is done and the load balancing strategy is controlled.
Defines the number of cores allocated for each process.
May be overwritten by
Only unused nodes are considered for spawning new processes. In addition, the nodes chosen for the current job will be locked for further jobs, consequently no additional processes will be started on this nodes until the current job terminates.
This variable does not define, how many processes of a job
will be placed per node. See also
set maxproc of
A list of environment variables which should be exported to
remote processes during spawning. Some environment variables
are exported by default:
In addition, all variables named
OMP_* are exported.
Therefore, the variable
PWD is set correctly for remote
processes. In addition, the environment used for partitioning the
propagated to remote processes.
Defines the nodes building the partition used to spawn new
processes to. Depending on the variable
PSI_NODES_SORT the ordering may be relevant. If the
number of processes to spawn exceed the number of nodes in the
partition, some nodes may get more than one process.
Space separated list of hostnames on which new processes should
be spawned on. Similar to
PSI_NODES, but with
hostnames instead of logical ParaStation node numbers. If
PSI_NODES is set too, it is dominant over
The name of a file containing a list of nodes' hostnames which
should be used for spawning. Similar to
but the actual information is within the file instead of the
environment variable. If
PSI_HOSTS are set too, they are dominant over
PSI_HOSTFILE are evaluated in the given order. If
more than one of the discussed variables is set, only the first
one will be used in order to create the partition. The latter
ones will be silently ignored.
This variable controls the behavior of ParaStation when placing
processes on nodes.
PSI_LOOP_NODES_FIRST is not defined,
ParaStation first of all will try to use all available CPUs on a
node for the current job.
If necessary, more processes will be placed on the next
PSI_LOOP_NODES_FIRST is defined, ParaStation
will place one process per node, and if more processes as
available nodes are requested, it will start putting an
additional process on each node, as long as all processes
are placed; or the placement couldn't be fullfilled, e.g.
due to the fact that not enough CPUs are available.
This variable defines the sorting criterion used to
reorder the nodes building a virtual partition. This
order will be used to spawn remote processes. The following values
mode are recognized:
No sorting of nodes before a spawn request. The nodes are
used in round robin fashion as they are set in
The nodes are sorted by load before new processes are spawned. Therefore nodes with the least load are used first.
To be more specific, the load average over the last minute is
used as the sorting criterion, i.e. this option is equivalent
The nodes are sorted corresponding to the 1 minute load average.
This option is equivalent to
The nodes are sorted corresponding to the 5 minute load average.
The nodes are sorted corresponding to the 15 minute load average.
The nodes are sorted corresponding to the sum of the 1 minute load and the number of running ParaStation processes. This will lead to fair load-balancing even if processes are started without notification to the ParaStation management facility.
The nodes are sorted by the number of running ParaStation processes before new processes are spawned. This is the default behavior.
If defined, more processes per node will be placed than CPUs available, if necessary. If undefined, only as many processes will be placed on a node as unused CPUs (= number(CPU) - number(currently running processes)) are available.
set maxproc of
psiadmin(8), which takes precedence over
Defines the number of cores allocated per process. If undefined, defaults to 1.
Preceding arguments for remote processes. For example: use
to execute the process chain
<yourApplication> <yourArgs> on the
This parameter defines after how many
PMI_BARRIER_TMOUT cycles a job will be
terminated, if not all processes have joined the PMI
Defaults to 1.
The parameter should remain at the default value in production environments. This parameter's primary use is for diagnostic purposes as it allows the user to observe slower clients join an PMI barrier over multiple timeout periods. As such, the parameter helps administrators identify possible filesystem or network issues that occur on specific client nodes.
PMI barriers are totally unrelated to MPI barriers!
These type of barriers are typically called during
PMI_BARRIER_TMOUT variable defines the
delay (in seconds) allowed for each process to
successfully join an PMI barrier.
If not all processes joined, a corresponding warning is
printed to stdout.
PMI_BARRIER_TMOUT is not set, the
timeout will be 60sec + (# of processes * 0.5µsec).
-1, no barrier timeout is used and
the job will not terminate because of failure to join the
barrier from any one process.
PMI_BARRIER_TMOUT is set to
num, then the timeout is set to
See also ParaStation MPI Administrator's Guide.
If set, suppress pinning of processes, even if enabled globally (value irrelevant).
If set, suppress binding to memory-node, even if enabled globally (value irrelevant).
This variables control the individual communication paths used
Communication paths may be different interconnects and / or
In addition, tuning variables for the particular communication
paths are listed.
The following table lists all currently available communication
paths in descending order.
Using this variables, transports may be prioritized or completely
Assigning a value of
0 to a variable
completely disables this communication path.
Assigning a value of
2 or more prioritizes
the path over all others.
Table 3. Variables controlling the pscom communication paths
|Variable name||Communication path||Description|
Used only for communication within a node.
Identical to the deprecated variable
|InfiniBand (libopenib)||Using UD|
|QsNet||Disabled by default.|
|ParaStation p4sock protocol||
Identical to the deprecated variable
Not all transports may be available at run time due to missing hardware or low level libraries. Furthermore, not all transports are enabled within the precompiled packages.
Using this environment variable, it is possible to define the communication library to use, independent of the variables mentioned above. This library must match the currently available interconnect and protocol, otherwise an error will occur.
The library name must be specified using the full path and
A comma or space separated list of networks enabled to do
optimized ParaStation communication using the p4sock protocol or
network is a resolvable
hostname in the chosen network, the IP address of a host
in this network or the IP address of this network.
The corresponding network has to be bound to a NIC of the current node.
PSP_NETWORK is set, each
network should be bound to a distinct
NIC. This card then is used in order to do
communication operations. If more than one
network is given, the first one found to
be bound to a local NIC is used.
PSP_NETWORK is not set, ParaStation uses the NIC bound to the IP address, the local hostname
Retry counter for all
calls within the
listen() backlog length.
Only required for
version below version 5.0.34.
The actual backlog is the minimum of
net.core.somaxconn, defined by the
If set to 1, use "on demand" connections with
PSP_OPENIB. This means,
establish connections between ranks and allocate there
associated communication buffers with the first byte
send. This could cause application aborts at any time, if the
application runs out of resources (e.g. a final all to
one communication pattern could fail)!
Default is to establish all connections at startup time
(inside MPI_Init()) which assures, that there are enough
resources available for all connections. If not,
MPI_Init() will fail.
These variables define the TCP buffer size used for TCP sockets. Defaults to 32k.
If set to 1 (default), the socket option
NODELAY will be used for TCP sockets.
control the size of the TCP backlog when listening for new connections.
If set to 1, call sched_yield() in polling loops instead of busy polling. This might improve shared memory performance a lot, when there is more than one process per CPU core running, but slowdown communication performance in the common case of one process per core. (see also overbooking)
Control the path MTU of InfiniBand connections. Default is 3 which correspond to 1024 bytes. (1 = 256 bytes, 2 = 512 bytes, 3 = 1024 bytes)
These variables define the InfiniBand buffer counts used for InfiniBand connections. (Default = 16)
In order to modify the behavior of the logger and the forwarders controlling the remotely spawned processes, the following environment variable can be used:
If set, psilogger will forward all input to the process with the corresponding rank within the process group. The default is to give all available input to process 0.
If set, psilogger will print a message about the user and system time consumed by each process of the parallel task upon exit of this process.
If set, psilogger gives information about the source of the
received output, i.e. it will prepend every output by
“[id]:”, where id is the rank of the printing process
within the process group. Usually the id coincides with the
PSI_LOGGERDEBUG is also set, every
output is prepended by “[id, len]”, where id is the
rank again and len is the length of the printed message in bytes.
If set, psilogger will not print out the message “PSIlogger: done” at the end of a parallel run.
If set, psilogger gives debug output about connecting and detaching clients as well as received output from the clients.
If set, debug output of the psiforwarder about connected programs, received input and received output is printed.
The environment variables within this section might be used less frequently. They are mainly listed within this document for completeness.
Length (in bytes) of the largest message sent without rendezvous.
Define the method used in order to spawn remote processes. The possible values are:
Start remote processes with the ParaStation start mechanism.
This is the default. If
not set at all, ParaStation is used in order to spawn remote
Start remote processes with ssh(1).
MPID_PSP_HOSTS must be
Do not start any remote process. The remote processes must be started manually. A commandline template is printed to stdout.
This start mode is for debugging purposes only and should not be used by the end-user.
Comma separated list of hostnames. Used for
The environment variables within this section control the TCP bypass.
defines (beside others) the path to the required
preload library to enable the TCP bypass. It must be set to
The environment variables within this section control the debug information output by ParaStation.
defines the debug mask controlling the process management information. The following bits are defined:
Table 4. PSI_DEBUGMASK flags
|PSC_LOG_PART||partitioning functions (i.e. PSpart_())|
|PSC_LOG_TASK||task structure handling (i.e. PStask_())|
|PSC_LOG_VERB||Various, less interesting messages|
|PSI_LOG_VERB||more verbose stuff, e.g. function calls|
These debug flags may be set as hex numbers, e.g.
defines the debugging level for the ParaStation
psport4 library. Higher values
generally give more output.