Similar to a prologue script, the Healthchecker may be run after a job
terminates using an epilogue script.
To do so, the following epilogue script should be copied to the
file
/var/spool/torque/mom_priv/epilogue
:
#!/bin/bash # ParaStation Healthcheck # # Copyright (C) 2009 ParTec Cluster Competence Center GmbH, Munich # # Epilogue script arguments: export PBS_JOBID=$1 export PBS_USER=$2 export PBS_GROUP=$3 export PBS_JOBNAME=$4 export PBS_SESSION_ID=$5 export PBS_LIMITS=$6 export PBS_RESSOURCES=$7 export PBS_QUEUE=$8 export PBS_ACCOUNT=$9 # start the ParaStation Healthchecker # # set timeout: [-t 240] # log to syslog: [-l] OLD_PATH=$PATH export PATH=/sbin:/bin:/usr/sbin:/usr/bin:/opt/parastation/bin /opt/parastation/bin/pshealthcheck.ng -t 240 -l epilogue \ &> /tmp/pshealthcheck_epilogue.out PSHC_PID="$!" PATH=$OLD_PATH # always exit with 0 to prevent setting the node down in moab exit 0
Figure E.2. Sample epilogue file
The above example runs the Healthchecker with the TEST SET
epilogue
.
It also re-defines the TEST SET's timeout to 240 seconds, which
again is suitable for very large systems only.
Again, each run is recorded in the system's logfile.