ID #1072

Why is my failover taking a long time?

There are many factors that can influence failover time. If a failover is taking longer than expected, the rsfmon logs should be checked to find which part(s) of the failover is taking the time.

The main RSF-1 log file is /opt/HAC/RSF-1/log/rsfmon.log. Each line in the log is timestamped, so it is possible to work out how much time each stage is taking. The log file will also contain a line telling you how long a service start or stop took in total:

[23440 Jan 31 17:18:50] [pool4 rsfexec] Total run time for service start: 9 seconds

Log file parser

Log files can be complex to read - especially when multiple services are starting or stopping simultaneously. To help with diagnosis, a tool has been developed to parse the RSF-1 log files (rsfmon.log through to rsfmon.log.9). The tool is attached to this FAQ page, and will also be distributed with all versions of RSF-1 starting from 3.8.13.

A sample of the output is given below:

root@node1:~# rsflog-summary.sh -h

Usage: rsflog-summary.sh [-l|--log <logfile|logdir>] [service]...

Options:
        -l --log <log>: Specify a log file or log directory to search.
                        If a file is given, only that file will be searched.
                        If a directory is given, that directory will be searched
                        for rsfmon.log and rsfmon.log.[0-9].
                        The default is to use the standard RSF-1 log directory
                        /opt/HAC/RSF-1/log

        -h --help:      Show this help text and exit

The config file /opt/HAC/RSF-1/etc/config will be scanned for services to search for
unless a list has been provided on the command line.

root@node1:~#
root@node1:~#
root@node1:~#
root@node1:~# rsflog-summary.sh
Services to search for:  cpw_pool pool4  
Searching log files in /opt/HAC/RSF-1/log:
rsfmon.log.9 rsfmon.log.8 rsfmon.log.7 rsfmon.log.6 rsfmon.log.5 rsfmon.log.4 rsfmon.log.3 rsfmon.log.2 rsfmon.log.1 rsfmon.log.0 rsfmon.log
###############################################
Service startups and shutdowns for cpw_pool:
###############################################
Reading rsfmon.log.9...
Reading rsfmon.log.8...
Reading rsfmon.log.7...
Reading rsfmon.log.6...
Reading rsfmon.log.5...
Reading rsfmon.log.4...
Reading rsfmon.log.3...
Reading rsfmon.log.2...
Reading rsfmon.log.1...
Reading rsfmon.log.0...
Reading rsfmon.log...
Delay before service start - 6 seconds
Service startup #1: Feb 04 10:59:25 (rsfmon.log)
Fping test complete for testvip: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives        run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 5 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 3 seconds (import pool + LUs)
        Zpool import completed status 0, in 3 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 1 seconds (write to log file)
Service start scripts took 17 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 17 seconds

Delay before service start - 6 seconds
Service startup #2: Feb 04 11:06:26 (rsfmon.log)
Fping test complete for testvip: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
        Reservations released in 0 seconds.
Script S02ApplianceStarting run time: 1 seconds (event notify)
Script S14res_drives        run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 4 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 4 seconds (import pool + LUs)
        Zpool import completed status 0, in 3 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 0 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 16 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 16 seconds

###############################################
Service startups and shutdowns for pool4:
###############################################
Reading rsfmon.log.9...
Reading rsfmon.log.8...
Reading rsfmon.log.7...
Reading rsfmon.log.6...
Reading rsfmon.log.5...
Reading rsfmon.log.4...
Reading rsfmon.log.3...
Delay before service start - 18 seconds
Service startup #1: Jan 29 15:29:37 (rsfmon.log.3)
Fping test complete for vip4: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives        run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 5 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 3 seconds (import pool + LUs)
        Zpool import completed status 0, in 2 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 0 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 15 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 15 seconds

Service shutdown #1: Jan 29 15:39:24 (rsfmon.log.3)
Script K01announce          run time: 0 seconds (write to log file)
Script K02ApplianceStarted  run time: 0 seconds
Script K02ApplianceStopping run time: 1 seconds (event notify)
Script K79res_drives        run time: 0 seconds (nothing)
Script K80zfs               run time: 1 seconds (export pool + LUs)
        Comstar mapping removed in 0 seconds
        Zpool export completed in 1 seconds
Script K85zfs_mhdc          run time: 1 seconds (release reservations)
        Reservations released in 0 seconds.
Script K86res_drives        run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 0 seconds
Script K98ApplianceStopped  run time: 1 seconds (event notify)
Script K99announce          run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds

Reading rsfmon.log.2...
Reading rsfmon.log.1...
Reading rsfmon.log.0...
Reading rsfmon.log...
Delay before service start - 6 seconds
Service startup #2: Jan 31 17:18:39 (rsfmon.log)
Fping test complete for vip4: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 1 seconds (event notify)
Script S14res_drives        run time: 0 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 4 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 3 seconds (import pool + LUs)
        Zpool import completed status 0, in 2 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 0 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 9 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 9 seconds

Service shutdown #2: Jan 31 17:18:56 (rsfmon.log)
Script K01announce          run time: 1 seconds (write to log file)
Script K02ApplianceStarted  run time: 0 seconds
Script K02ApplianceStopping run time: 0 seconds (event notify)
Script K79res_drives        run time: 0 seconds (nothing)
Script K80zfs               run time: 1 seconds (export pool + LUs)
        Comstar mapping removed in 0 seconds
        Zpool export completed in 0 seconds
Script K85zfs_mhdc          run time: 1 seconds (release reservations)
        Reservations released in 0 seconds.
Script K86res_drives        run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 1 seconds
Script K98ApplianceStopped  run time: 0 seconds (event notify)
Script K99announce          run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds

Service startup #3: Jan 31 17:19:01 (rsfmon.log)
Fping test complete for vip4: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives        run time: 1 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 4 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 2 seconds (import pool + LUs)
        Zpool import completed status 0, in 2 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 9 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 9 seconds

Service shutdown #3: Jan 31 17:27:32 (rsfmon.log)
Script K01announce          run time: 1 seconds (write to log file)
Script K02ApplianceStarted  run time: 0 seconds
Script K02ApplianceStopping run time: 0 seconds (event notify)
Script K79res_drives        run time: 0 seconds (nothing)
Script K80zfs               run time: 1 seconds (export pool + LUs)
        Comstar mapping removed in 0 seconds
        Zpool export completed in 0 seconds
Script K85zfs_mhdc          run time: 1 seconds (release reservations)
        Reservations released in 0 seconds.
Script K86res_drives        run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 1 seconds
Script K98ApplianceStopped  run time: 0 seconds (event notify)
Script K99announce          run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds

root@node1:~#

attached files: rsflog-summary.sh

Tags: -

Related entries:

Last update: 2014-02-22 10:03
Author: Matt
Revision: 1.1

{writeDiggMsgTag} {writeFacebookMsgTag} {writePrintMsgTag} {writeSend2FriendMsgTag} {writePDFTag}
{translationForm}
Please rate this FAQ:

Average rating: 0 (0 Votes)

completely useless 1 2 3 4 5 most valuable

You cannot comment on this entry