The Healthchecker framework consists of a set of configuration files defining the particular tests. The command pshcgetconf analyzes this configuration and the node's role and returns all information suitable for the local node. This command is implicitly called by pshealthcheck, which performs the actual checks.
New tests may be added by extending an existing configuration file or by adding an additional configuration file. These tests may use existing checks with adapted parameters or make use of new checks. Each check within a test may be monitored using a timeout parameter. Therefore, the check itself does not have to deal with hanging commands. In case of hanging commands (timeout), they will be killed by the framework itself.
checks may be added by creating additional commands within the checks directory (see the section called “Configuring the test” for details). Every check has to return 0, 1 or 2, depending on the result.
Every test may be associated with a list of nodes or node classes. Refer to Appendix C on how to determine a node's class. This allows to use an identical configuration on all nodes, as the tests actually run on a node are controlled by the node's name or class. A consistent configuration across all nodes simplifies configuration considerably.
Each test has to be a member of at least one test set.
For each test set, action scripts may be defined within the test
set's action subdirectory.
All action scripts are called after running all tests for a test
set, regardless of the result.
The actual result is provided within the variable
Refer to Appendix B, for details how to add a new test.