Testing the installation

After installing and configuring ParaStation MPI on each node of the cluster, the ParaStation MPI daemons can be started up. These daemons will setup all necessary communication relations and thus will form the virtual cluster consisting of the available nodes.

The ParaStation MPI daemons are started using the psiadmin command. This command will establish a connection to the local psid. If this daemon is not already up and running, the inetd will start up the daemon automatically.

Note

If the daemon is not configured to be automatically started by xinetd, it must be started using /etc/init.d/parastation start.

  # /opt/parastation/bin/psiadmin 

After connecting to the local psid daemon, this command will issue a prompt

  psiadmin>

To start up the ParaStation MPI daemons on all other nodes, use the add command:

  psiadmin> add

The following status inquiry command

  psiadmin> list

should list all nodes as "up". To verify that all nodes have installed the proper kernel modules, type

  psiadmin> list hw

The command should report for all nodes all hardware types configured, e.g. p4sock, ethernet.

Alternatively, it is possible to use the single command form of the psiadmin command:

  # /opt/parastation/bin/psiadmin -s -c "list"

The command should be repeated until all nodes are up. The ParaStation MPI administration tool is described in detail in the corresponding manual page psiadmin(1).

If some nodes are still marked as "down", the logfile /var/log/messages for this node should be inspected. Entries like “psid: ....” at the end of the file may report problems or errors.

After bringing up all nodes, the communication can be tested using

  # /opt/parastation/bin/test_nodes -np nodes

where nodes has to be replaced by the actual number of nodes within the cluster. After a while a result like

  ---------------------------------------
  Master node 0
  Process 0-31 to 0-31 ( node 0-31 to 0-31 ) OK
  All connections ok
  
  PSIlogger: done 

should be reported. Of course the number '31' will be replaced by a the actual number of nodes given on the command line, i.e. nodes-1.

in case of failure, test_nodes may give continuously results like

  ---------------------------------------
  Master node 0
  Process 0-2,4-6 to 0-7 ( node 0-2,4-6 to 0-7 ) OK
  Process 3 to 0-6 ( node 3 to 0-6 ) OK
  Process 7 to 0-2,4-7 ( node 7 to 0-2,4-7 ) OK 

A detailed description of test_nodes can be found within the corresponding manual page test_nodes(1).