Healthchecker overview

The ParaStation Healthchecker offers tools and methods to easely verify a running cluster node against a pre-defined configuration. To do so, a pre-defined set of particular tests is run on the node to evaluate node-specific values and compare them against a set of pre-defined parameters.

If a node does not match those configuration, configurable actions may be performed to minimize the impact of this failed node on the entire cluster system. Those actions may be as easy as sending an email to the administrator, shutting down a node or may be as complex as setting a node off-line within the resource management system, opening a new trouble ticket, and forwarding this ticket to an external help desk. As a consequence, the node may be logically and even physically removed from the cluster without any intervention from the administrator. This enables unattended operation of the cluster system.

The ParaStation Healthchecker supports all major Linux distributions and system configurations. It is designed to be easily adjustable to the different needs. New checks may be added, existing checks may be omitted.