After installing and configuring ParaStation MPI on each node of the cluster, the ParaStation MPI daemons can be started up. These daemons will setup all necessary communication relations and thus will form the virtual cluster consisting of the available nodes.
The ParaStation MPI daemons are started using the psiadmin command. This command will establish a connection to the local psid. If this daemon is not already up and running, the inetd will start up the daemon automatically.
If the daemon is not configured to be automatically started by
xinetd, it must be started using
After connecting to the local psid daemon, this command will issue a prompt
To start up the ParaStation MPI daemons on all other nodes, use the add command:
The following status inquiry command
should list all nodes as "up". To verify that all nodes have installed the proper kernel modules, type
The command should report for all nodes all hardware types
Alternatively, it is possible to use the single command form of the psiadmin command:
# /opt/parastation/bin/psiadmin -s -c "list"
The command should be repeated until all nodes are up. The ParaStation MPI administration tool is described in detail in the corresponding manual page psiadmin(1).
If some nodes are still marked as "down", the logfile
/var/log/messages for this node should be
Entries like “psid: ....” at the end of the file
may report problems or errors.
After bringing up all nodes, the communication can be tested using
# /opt/parastation/bin/test_nodes -np
nodes has to be replaced by the actual
number of nodes within the cluster. After a while a result like
--------------------------------------- Master node 0 Process 0-31 to 0-31 ( node 0-31 to 0-31 ) OK All connections ok PSIlogger: done
should be reported. Of course the number '31' will be replaced by a the actual
number of nodes given on the command line, i.e.
in case of failure, test_nodes may give continuously results like
--------------------------------------- Master node 0 Process 0-2,4-6 to 0-7 ( node 0-2,4-6 to 0-7 ) OK Process 3 to 0-6 ( node 3 to 0-6 ) OK Process 7 to 0-2,4-7 ( node 7 to 0-2,4-7 ) OK
A detailed description of test_nodes can be found within the corresponding manual page test_nodes(1).