Chapter 5. Running the Healthchecker

The Healthchecker may be run using the command pshealthcheck. At least a test set name must be provided as an argument. For options of the pshealthcheck refer to pshealthcheck(1), Synopsis.

  # pshealthcheck manual                                  
  [++] Testset "manual": Total 37, Ok 37, Warn 0, Err 0 (Timeout 0) 
    

Figure 5.1. Example pshealthcheck output


As the example shows, the Healthchecker prints a statistic about total number of test, the number of tests failed and succeded. By default, the Healthchecker will run all tests and actions defined for the given test set. To run the tests only, but not to run any action, use the option --dry-run. To list all tests actually performed provide the option -v.

  # pshealthcheck -v manual                               
  [++] ["bios_date compute"]                                        
  [++] Checking for bios date...                                    
  [++] ["bios_version compute"]                                     
  [++] Checking for bios version...                                 
  [++] ["cpu_count 8"]                                              
  [++] Checking for number of cpus...                               
  [++] ["cpu_type"]                                                 
  [++] Checking for cpu types...                                    
  [++] ["daemons common"]                                           
  [++] Checking for daemons...                                      
  [++] ["daemons parastation"]                                      
  [++] Checking for daemons...                                      
  [++] ["daemons pbs_mom"]                                          
  [++] Checking for daemons...                                      
  [++] ["disc_free"]                                                
  [++] Checking for available disc space...                         
  [++] ["disc_smart"]                                               
  [++] Checking for smart...
  [++] ["infiniband_counters compute"]
  [++] Checking for infiniband error counters...
  [++] ["infiniband_phy_state"]
  [++] Checking for physical infiniband state...
  [++] ["infiniband_speed"]
  [++] Checking for infiniband speed...
  [++] ["infiniband_state ACTIVE"]
  [++] Checking for infiniband state...
  [++] ["ipmi"]
  [++] Checking for ipmi...
  ...
  [++] Testset "manual": Total 37, Ok 37, Warn 0, Err 0 (Timeout 0)
    

Figure 5.2. Example verbose pshealthcheck output


To show all configured tests for a test set, run the command pshcgetconf.

  # pshcgetconf manual                                                             
  [<Test Name>] '<Test Command>' '<Test Timeout>' '<Kill Wait Time>'                         
  ==================================================================                         
  [bios_date admin:gpfs:login] \
  '/opt/parastation/lib/checks/bios_date.sh "05/04/2009"' '0' '1'
  [bios_version admin:gpfs:login] \
  '/opt/parastation/lib/checks/bios_version.sh "R4232X20"' '0' '1'
  [daemons common]        '/opt/parastation/lib/checks/daemons.sh \
  "syslog-ng" "ntpd" "sshd"' '0' '1'
  [daemons admin] '/opt/parastation/lib/checks/daemons.sh "pscollect" \
  "dhcpd" "named"' '0' '1'      
  [daemons parastation]   '/opt/parastation/lib/checks/daemons.sh \
  "xinetd" "psid"' '0' '1'          
  [kernel admin:sm]       '/opt/parastation/lib/checks/kernel.sh \
  "2.6.27.19-5-default"' '0' '1'     
  [md5_sum admin] '/opt/parastation/lib/checks/md5_sum.sh \
  ${HC_CONF_DIR}/data/admin.md5sum' '0' '1' 
  [memory_size]   '/opt/parastation/lib/checks/memory_size.sh "24"' \
  '0' '1'                         
  [mounts common] '/opt/parastation/lib/checks/mounts.sh "boot" "tmp" \
  "var"' '0' '1'                
  [nameserver]    '/opt/parastation/lib/checks/nameserver.sh' '0' '1'                               
  [net_counters]  '/opt/parastation/lib/checks/net_counters.sh' '0' '1'                             
  [net_ports_tcp common] '/opt/parastation/lib/checks/net_ports_tcp.sh \
  "sshd=22"' '0' '1'          
  [net_ports_tcp admin] '/opt/parastation/lib/checks/net_ports_tcp.sh \
  "pscollect=4000" "named=53"' '0' '1'
  ...
    

Figure 5.3. Example pshcgetconf output


To print all configured test sets, use the command pshcgetconf -l:

  # pshcgetconf -l
  epilogue
  ib-test
  manual
  mce-stats
  mem-test
  prologue
  reboot
  stress-test
    

Figure 5.4. Example pshcgetconf -l output