— Administrator's Guide

Release 1.0.1-2

September 2010

Reproduction in any manner whatsoever without the written permission of ParTec Cluster Competence Center GmbH is strictly forbidden.

All rights reserved. ParTec and ParaStation are registered trademarks of ParTec Cluster Competence Center GmbH. The ParTec logo, the ParaStation logo and the ParaStation Healthchecker logo are trademarks of ParTec Cluster Competence Center GmbH. Linux is a registered trademark of Linus Torvalds. All other marks and names mentioned herein may be trademarks or registered trademarks of their respective owners.

This document provides detailed information about the ParaStation Healthchecker. Installation and configuration of the ParaStation Healthchecker as well as usage of the ParaStation Healthchecker commands are explained in-depth.

Though it may seem hard to believe, this manual might contain errors. We welcome any reports on errors or problems that are found. We also would appreciate suggestions on improving this book. Please direct all comments and problems to .

The most up-to-date version of this document is available at http://docs.par-tec.com.

 

Share your knowledge with others. It's a way to achieve immortality.

 
 --Dalai Lama


Table of Contents

1. Preface
About this book
This book's audience
Healthchecker overview
2. Healthchecker description
Introduction
Terms
Framework
3. Installing the Healthchecker
Prerequisites
Installing the software package
4. Configuring the Healthchecker
General configuration
Configuring the test
Test set configuration
Alternate test configuration
Configuring actions
5. Running the Healthchecker
I. Reference Pages
healthcheck.confParaStation Healthchecker: global configuration file
pshealthcheck — ParaStation Healthchecker
pshcgetconf — ParaStation Healthchecker Configuration Reader
A. List of implemented checks
B. Extending the Healthchecker
Adding new tests
Adding new actions
Adding new test sets
C. How to determine a node's class
D. Sample action script
E. Including the Healthchecker in a resource management system
Using the Healthchecker within a job's prologue
Using the Healthchecker within a job's epilogue
Glossary

List of Figures

3.1. Installing package
4.1. Example healthcheck.conf file
4.2. Example tests.conf file
4.3. Example testset.conf file
4.4. Example test script
5.1. Example pshealthcheck output
5.2. Example verbose pshealthcheck output
5.3. Example pshcgetconf output
5.4. Example pshcgetconf -l output
C.1. psconfig calling example
D.1. Example action script output
D.2. Sample pbsnodes output
D.3. Example test script
E.1. Sample prologue file
E.2. Sample epilogue file