Chapter 6. Troubleshooting

Table of Contents

Problem: psiadmin returns error
Problem: node shown as "down"
Problem: cannot start parallel task
Problem: bad performance
Problem: different groups of nodes are seen as up or down
Problem: cannot start process on front end
Warning issued on task startup
Problem: pssh fails
Problem: psid does not startup, reports port in use
Problem: processes cannot access files on remote nodes
Warning: PSI barrier timeout

This chapter provides some hints to common problems seen while installing or using ParaStation MPI. Of course, more help will be provided by .

Problem: psiadmin returns error

When starting up the ParaStation MPI admin command psiadmin, an error is reported:

  # psiadmin
  PSC: PSC_startDaemon: connect() fails: Connection refused

Reason: the local ParaStation MPI daemon could not be contacted. Verify that the psid(8) daemon is up and running. Check if the daemon is known to the xinetd:

  # netstat -ant | grep 888
  tcp        0      0 *:888                  *:*      LISTEN

If no "listening" socket is reported, check that the ParaStation MPI daemon is configured within the xinet(8) configuration. Check the file /etc/xinet.d/psidstarter.

If this is ok, reload xinetd:

  # kill -HUP pid of xinetd

If everything seems to be ok up to now, check for recent entries within the log file/var/log/messages. Be aware, the log facility can be modified using the LogDestination within the config file parastation.conf. Look for lines like

  Mar 24 17:19:12 pan psid[7361]: Starting ParaStation DAEMON
  Mar 24 17:19:12 pan psid[7361]: Protocol Version 329
  Mar 24 17:19:12 pan psid[7361]:  (c) Cluster Competence \
    Center GmbH

These lines indicate a normal startup of the psid. Other messages may indicate problems found by the psid, e.g. errors within the configuration file.