Similar to a prologue script, the Healthchecker may be run after a job
terminates using an epilogue script.
To do so, the following epilogue script should be copied to the
file
/var/spool/torque/mom_priv/epilogue:
#!/bin/bash
# ParaStation Healthcheck
#
# Copyright (C) 2009 ParTec Cluster Competence Center GmbH, Munich
#
# Epilogue script arguments:
export PBS_JOBID=$1
export PBS_USER=$2
export PBS_GROUP=$3
export PBS_JOBNAME=$4
export PBS_SESSION_ID=$5
export PBS_LIMITS=$6
export PBS_RESSOURCES=$7
export PBS_QUEUE=$8
export PBS_ACCOUNT=$9
# start the ParaStation Healthchecker
#
# set timeout: [-t 240]
# log to syslog: [-l]
OLD_PATH=$PATH
export PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/parastation/bin
/opt/parastation/bin/pshealthcheck.ng -t 240 -l epilogue \
&> /tmp/pshealthcheck_epilogue.out
PSHC_PID="$!"
PATH=$OLD_PATH
# always exit with 0 to prevent setting the node down in moab
exit 0
Figure E.2. Sample epilogue file
The above example runs the Healthchecker with the TEST SET
epilogue.
It also re-defines the TEST SET's timeout to 240 seconds, which
again is suitable for very large systems only.
Again, each run is recorded in the system's logfile.