Appendix D. Sample action script

Figure D.3, shows a sample action script to set a node offline within a resource management system, namely Torque, if an error occured.

The script called pbs_set_offline.sh should be copied to the directory /etc/parastation/healthcheck/testsets/all/actions. A symlink in the directory /etc/parastation/healthcheck/testsets/testset/actions should point to this script.

After a failed healthcheck run with test set testset, the local node would be set off-line with an appropriate comment, like

  pshealthcheck - 2010-09-01 01:10:23 - 1/31 - ethernet_eth0 - prologue
    

Figure D.1. Example action script output


Use the command pbsnodes -ln to check for automatically off-lined nodes:

  # pbsnodes -ln
  jf49c02              offline                   pshealthcheck - 2010-09-01 \
    01:10:23 - 1/31 - ethernet_eth0 - prologue
  ...
    

Figure D.2. Sample pbsnodes output


  #!/bin/bash
  #
  #               ParaStation Healthchecker
  #
  # Copyright (C) 1999-2004 ParTec AG, Karlsruhe
  # Copyright (C) 2005-2010 ParTec Cluster Competence Center GmbH,
  # Munich
  #
  # Variables exported by calling pshealthcheck:
  #   VERBOSE, LOGGING, TESTSET,
  #   TS_COUNT_OK, TS_COUNT_WARN, TS_COUNT_ERR,
  #   TS_LIST_OK, TS_LIST_WARN, TS_LIST_ERR


  MAX_PBS_NOTE="55"
  HOSTNAME=`hostname`
  FAILED_SCRIPTS="${TS_LIST_ERR//, /,}"

  # calculate statistics
  ((TS_COUNT_TOTAL=TS_COUNT_OK+TS_COUNT_WARN+TS_COUNT_ERR))

  # set the node offline
  msg=$(pbsnodes -ln "$HOSTNAME" | awk '{print $3}' 2>/dev/null)
  [ "$VERBOSE" -ge 2 ] && echo "Setting node '$HOSTNAME' in pbs offline
  ..."

  if [ -z "$msg" ]; then
          if [ "${#FAILED_SCRIPTS}" -gt "$MAX_PBS_NOTE" ]; then
              FAILED_SCRIPTS="${TS_LIST_ERR:0:$MAX_PBS_NOTE-3}"
              FAILED_SCRIPTS="$FAILED_SCRIPTS..."
          fi

          new_msg="pshealthcheck - `date +%Y-%m-%d\ %H:%M:%S`"
          new_msg="${new_msg} - (${TS_COUNT_ERR}/${TS_COUNT_TOTAL})"
          new_msg="${new_msg} - $FAILED_SCRIPTS - ts $TESTSET"

          pbsnodes -o -N "${new_msg}" "$HOSTNAME" || {
                  echo "ERROR: setting node offline failed!";
                  exit 2;
          }
  else
          pbsnodes -o "$HOSTNAME" || {
                  echo "ERROR: setting node offline failed!";
                  exit 2;
          }
  fi
    

Figure D.3. Example test script