The Healthchecker may be run using the command pshealthcheck. At least a test set name must be provided as an argument. For options of the pshealthcheck refer to pshealthcheck(1), Synopsis.
# pshealthcheck manual [++] Testset "manual": Total 37, Ok 37, Warn 0, Err 0 (Timeout 0)
Figure 5.1. Example pshealthcheck output
As the example shows, the Healthchecker prints a statistic about total
number of test, the number of tests failed and succeded.
By default, the Healthchecker will run all tests and actions defined for
the given test set.
To run the tests only, but not to run any action, use the option
--dry-run
.
To list all tests actually performed provide the option
-v
.
# pshealthcheck -v manual [++] ["bios_date compute"] [++] Checking for bios date... [++] ["bios_version compute"] [++] Checking for bios version... [++] ["cpu_count 8"] [++] Checking for number of cpus... [++] ["cpu_type"] [++] Checking for cpu types... [++] ["daemons common"] [++] Checking for daemons... [++] ["daemons parastation"] [++] Checking for daemons... [++] ["daemons pbs_mom"] [++] Checking for daemons... [++] ["disc_free"] [++] Checking for available disc space... [++] ["disc_smart"] [++] Checking for smart... [++] ["infiniband_counters compute"] [++] Checking for infiniband error counters... [++] ["infiniband_phy_state"] [++] Checking for physical infiniband state... [++] ["infiniband_speed"] [++] Checking for infiniband speed... [++] ["infiniband_state ACTIVE"] [++] Checking for infiniband state... [++] ["ipmi"] [++] Checking for ipmi... ... [++] Testset "manual": Total 37, Ok 37, Warn 0, Err 0 (Timeout 0)
Figure 5.2. Example verbose pshealthcheck output
To show all configured tests for a test set, run the command pshcgetconf.
# pshcgetconf manual [<Test Name>] '<Test Command>' '<Test Timeout>' '<Kill Wait Time>' ================================================================== [bios_date admin:gpfs:login] \ '/opt/parastation/lib/checks/bios_date.sh "05/04/2009"' '0' '1' [bios_version admin:gpfs:login] \ '/opt/parastation/lib/checks/bios_version.sh "R4232X20"' '0' '1' [daemons common] '/opt/parastation/lib/checks/daemons.sh \ "syslog-ng" "ntpd" "sshd"' '0' '1' [daemons admin] '/opt/parastation/lib/checks/daemons.sh "pscollect" \ "dhcpd" "named"' '0' '1' [daemons parastation] '/opt/parastation/lib/checks/daemons.sh \ "xinetd" "psid"' '0' '1' [kernel admin:sm] '/opt/parastation/lib/checks/kernel.sh \ "2.6.27.19-5-default"' '0' '1' [md5_sum admin] '/opt/parastation/lib/checks/md5_sum.sh \ ${HC_CONF_DIR}/data/admin.md5sum' '0' '1' [memory_size] '/opt/parastation/lib/checks/memory_size.sh "24"' \ '0' '1' [mounts common] '/opt/parastation/lib/checks/mounts.sh "boot" "tmp" \ "var"' '0' '1' [nameserver] '/opt/parastation/lib/checks/nameserver.sh' '0' '1' [net_counters] '/opt/parastation/lib/checks/net_counters.sh' '0' '1' [net_ports_tcp common] '/opt/parastation/lib/checks/net_ports_tcp.sh \ "sshd=22"' '0' '1' [net_ports_tcp admin] '/opt/parastation/lib/checks/net_ports_tcp.sh \ "pscollect=4000" "named=53"' '0' '1' ...
Figure 5.3. Example pshcgetconf output
To print all configured test sets, use the command pshcgetconf -l:
# pshcgetconf -l epilogue ib-test manual mce-stats mem-test prologue reboot stress-test
Figure 5.4. Example pshcgetconf -l
output