Skip to content
Linux Cluster Software | FAQ's | High Availability

General FAQs - Linux

User Defined Startup/Shutdown Scripts

During service start and stop, RSF-1 will run the scripts located in /opt/HAC/RSF-1/etc/rc.appliance.c/.

Here is an example directory (note some scripts may be missing depending on OS):

root@node-a:/opt/HAC/RSF-1/etc/rc.appliance.c # ls -l
total 130
-rwxr-xr-x  1 root  wheel  10623 Aug 21 09:58 C14res_drives
-rwxr-xr-x  1 root  wheel    846 Aug 21 09:58 K01announce.pyc
-rwxr-xr-x  1 root  wheel   1033 Aug 21 09:58 K02ApplianceStopping
-r-x------  1 root  wheel   3856 Aug 21 09:58 K03snap.pyc
-rwxr-xr-x  1 root  wheel   3040 Aug 21 09:58 K32tn.pyc
-rwxr-xr-x  1 root  wheel    706 Aug 21 09:58 K70samba
-rwxr-xr-x  1 root  wheel  52436 Aug 21 09:58 K80zfs
-rwxr-xr-x  1 root  wheel   6069 Aug 21 09:58 K85zfs_mhdc
-rwxr-xr-x  1 root  wheel    417 Aug 21 09:58 K98ApplianceStopped
-rwxr-xr-x  1 root  wheel    846 Aug 21 09:58 K99announce.pyc
-rwxr-xr-x  1 root  wheel    846 Aug 21 09:58 S01announce.pyc
-rwxr-xr-x  1 root  wheel   1033 Aug 21 09:58 S02ApplianceStarting
-rwxr-xr-x  1 root  wheel  10623 Aug 21 09:58 S14res_drives
-rwxr-xr-x  1 root  wheel   6069 Aug 21 09:58 S15zfs_mhdc
-rwxr-xr-x  1 root  wheel  52436 Aug 21 09:58 S20zfs
-rwxr-xr-x  1 root  wheel  10623 Aug 21 09:58 S21res_drives
-rwxr-xr-x  1 root  wheel   3040 Aug 21 09:58 S68tn.pyc
-rwxr-xr-x  1 root  wheel    417 Aug 21 09:58 S98ApplianceStarted
-rwxr-xr-x  1 root  wheel    846 Aug 21 09:58 S99announce.pyc
The scripts are run in numerical order, Sxx scripts during service start, Kxx scripts during service stop.

It is recommended that user start scripts are ran after the RSF-1 scripts have been run, and the stop scripts before. To acheive this the start script should be numbered S69-S97, stop scripts K04-K31.

Custom scripts should be created using the following template:

#!/bin/sh
#
. /opt/HAC/bin/rsf.sh

service=${RSF_SERVICE:-"service_name"}
script="`basename $0`"

##########################################################
# For service specific scripts, un-comment the following #
# test and replace "my-service" with the service name.   #
# This will exit the script immediately when the service #
# name does not match.                                   #
##########################################################
#
#if [ "${service}" != "my-service" ] ; then
#    rc_exit ${service} ${RSF_OK}
#fi

case "${1}" in

'start')

    #######################################
    # commands to be run on service start #
    # placed in this section              #
    #######################################

    rc_exit ${service} ${RSF_OK}
    ;;

'stop')

    #######################################
    # commands to be run on service stop  #
    # placed in this section              #
    #######################################

    rc_exit ${service} ${RSF_OK}
    ;;

'check')
    exit ${RSF_CHECK_NORESRC}
    ;;

*)
    rc_exit ${service} ${RSF_WARN} "usage: $0 <start|stop|check>"
    ;;

esac

Using this format means that the script can contain both start and stop commands. Furthermore the script can be symbolically linked so that the Sxx and Kxx script refer to the same file.

For example:

lrwxr-xr-x  1 root  wheel      9 Nov 14 16:46 K10custom -> S70custom
-rwxr-xr-x  1 root  wheel      0 Nov 14 16:46 S70custom


Multiple kernel partition messages appearing in syslog from the Udev sub-system

By default, the Udev daemon (systemd-udevd) communicates with the kernel and receives device uevents directly from it each time a device is removed or added, or a device changes its state.

Because of the way RSF-1 writes its heartbeats using the ZFS label, the udev sub-system sees this as a state change and erroneously updates syslog each time a heartbeat is transmitted. This can result in multiple messages appearing in syslog of the form:

Aug 10 17:22:24 nodea kernel: [2422456.906302]  sdf: sdf1 sdf9
Aug 10 17:22:24 nodea kernel: [2422456.013538]  sdg: sdg1 sdg9
Aug 10 17:22:25 nodea kernel: [2422458.418906]  sdf: sdf1 sdf9
Aug 10 17:22:25 nodea kernel: [2422458.473936]  sdg: sdg1 sdg9
Aug 10 17:22:25 nodea kernel: [2422459.427251]  sdf: sdf1 sdf9
Aug 10 17:22:25 nodea kernel: [2422459.487747]  sdg: sdg1 sdg9

The underlying reason for this is because Udev watches block devices by binding to the IN_CLOSE_WRITE event from inotify and each time it receives this event a rescan of the device is triggered.

Furthermore newer versions of the ZFS Event Daemon listen to udev events (to manage disk insertion/removal etc.) and catches the udev events generated due to the disk heartbeats, and then attempts to find to which pool (if any) the disk belongs to, resulting in unnecessary I/O.

The solution to this is to add a udev rule that overrides this default behaviour and disables monitoring of the sd* block devices. Add the following to the udev rules file /etc/udev/rules.d/50-rsf.rules1:

ACTION!="remove", KERNEL=="sd*", OPTIONS:="nowatch"

Finally, reload the udev rules to activate the fix.

Thanks to Hervé BRY of Geneanet for this submission.


REST service fails to start due to port conflict

The RSF-1 REST service (rsf-rest) uses port 4330 by default. If this port is in use by another service (for example pmlogger sometimes attempts to bind to port 4330) then the RSF-1 REST service will fail to start.

To check the service status run the command systemctl status rsf-rest.service and check the resulting output for any errors; here is an example where port 4330 is already in use:

# systemctl status rsf-rest.service

● rsf-rest.service - RSF-1 REST API Service
   Loaded: loaded (/usr/lib/systemd/system/rsf-rest.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2022-07-14 08:23:00 EDT; 5s ago
  Process: 4271 ExecStart=/opt/HAC/RSF-1/bin/python /opt/HAC/RSF-1/lib/python/rest_api_app.pyc >/dev/null (code=exited, status=1/FAILURE)
 Main PID: 4271 (code=exited, status=1/FAILURE)

Jul 14 08:23:00 mgc81 python[4271]:     return future.result()
Jul 14 08:23:00 mgc81 python[4271]:   File "/opt/HAC/Python/lib/python3.9/site-packages/aiohttp/web.py", line 413, in _run_app
Jul 14 08:23:00 mgc81 python[4271]:     await site.start()
Jul 14 08:23:00 mgc81 python[4271]:   File "/opt/HAC/Python/lib/python3.9/site-packages/aiohttp/web_runner.py", line 121, in start
Jul 14 08:23:00 mgc81 python[4271]:     self._server = await loop.create_server(
Jul 14 08:23:00 mgc81 python[4271]:   File "/opt/HAC/Python/lib/python3.9/asyncio/base_events.py", line 1506, in create_server
Jul 14 08:23:00 mgc81 python[4271]:     raise OSError(err.errno, 'error while attempting '
Jul 14 08:23:00 mgc81 python[4271]: OSError: [Errno 98] error while attempting to bind on address ('0.0.0.0', 4330): address already in use
Jul 14 08:23:00 mgc81 systemd[1]: rsf-rest.service: Main process exited, code=exited, status=1/FAILURE
Jul 14 08:23:00 mgc81 systemd[1]: rsf-rest.service: Failed with result 'exit-code'.

The simplest way to resolve this is to change the port the RFS-1 REST service listens on. To do this run the following commands on each node in the cluster (in this example the port is changed to 4335):

# /opt/HAC/RSF-1/bin/rsfcdb update privPort 4335
# systemctl restart rsf-rest.service

The RSF-1 REST service will now restart and listen on the new port. A status check should now show the service as active and running:

    # systemctl status rsf-rest.service
● rsf-rest.service - RSF-1 REST API Service
   Loaded: loaded (/usr/lib/systemd/system/rsf-rest.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2022-07-14 09:22:57 EDT; 2s ago
 Main PID: 52579 (python)
    Tasks: 1 (limit: 49446)
   Memory: 31.8M
   CGroup: /system.slice/rsf-rest.service
           └─52579 /opt/HAC/RSF-1/bin/python /opt/HAC/RSF-1/lib/python/rest_api_app.pyc >/dev/null

Jul 14 09:22:57 mgc81 systemd[1]: Started RSF-1 REST API Service.

This can be confirmed by navigating to the Webapp via the new port https://<ip of node>:4335

Mounting ZVOL's with filesystems

RSF-1 can be configured to mount and unmount ZVOL's with a filesystem on service startup/shutdown.

To enable this feature, ZVOL's are declared in the file /opt/HAC/RSF-1/etc/mounts/<pool name>.<filesystem type> which should be present on each node in the cluster. The format of this file is:

<zvol-path>:<filesystem type>:<mount options>
<zvol-path>:<filesystem type>:<mount options>

Note

The <mount options> field specifies options to be passed to the mount command using the -o parameter. This field in itself is optional and ignored if not present.

For example, a pool named pool1 has two ZVOL's:

NAME           USED  AVAIL     REFER  MOUNTPOINT
pool1        2.16G   115M      307K  /pool1
pool1/zvol1  1.21G  1.25G     77.6M  -
pool1/zvol2   968M  1006M     77.6M  -

Each of these volumes has an xfs filesytem created within them. To mount these on service startup the file pool1.xfs has been created in the /opt/HAC/RSF-1/etc/mounts/ directory containing two entries:

/dev/zvol/pool1/zvol1:/zvol1
/dev/zvol/pool1/zvol2:/zvol2:defaults,_netdev,relatime,nosuid,uquota

Note

It is important to use the ZVOL path rather than the device it points to as device numbering can change on reboot, whereas the ZVOL path remains static.

The suffix xfs in the filename tells RSF-1 to pass the filesystem type xfs to the mount command. RSF-1 will now mount these filesystems on service startup, and unmount them on service shutdown. The mount operation takes place before any VIP's are plumbed in, and the umount operation performed after service VIP's are unplumbed therefore it is safe to share these filesystems out (NFS/SMB etc) using the service VIP.

No options will be passed to the mount of /zvol1, whereas the mount of /zvol2 will have options defaults,_netdev,relatime,nosuid,uquota passed using the -o parameter.

There is no limit placed on the amount of file systems to mount, or their type (xfs, ext4, vfat etc). In the above example if two additional ext4 file systems are created as zvol3 and zvol4 then the file pool1.ext4 would be added to the mounts directory with the contents:

/dev/zvol/pool1/zvol3:/zvol3
/dev/zvol/pool1/zvol4:/zvol4

The mounts directory will now contain the files pool1.xfs and pool1.ext4 with each one being processed on service start/stop. Further pools are added to the configuration by creating additional configuration files in the mounts directory:

total 12
-rw-r--r-- 1 root root 60 Mar 26 15:54 pool1.ext4
-rw-r--r-- 1 root root 60 Mar 25 17:55 pool1.xfs
-rw-r--r-- 1 root root 60 Mar 25 19:02 pool2.vfat

Cluster wide configuration

The filesystem configuration files must be manually created or copied over to each node in the cluster. It is done this way to allow for granularity in which filesystems are mounted on which node during failover.

For example, one node in the cluster may mount zvol1 and zvol2 on service startup, but only mount zvol1 on the other node should a failover occur.


  1. Udev rules are defined into files with the .rules extension. There are two main locations in which those files can be placed: /usr/lib/udev/rules.d is used for system-installed rules whereas /etc/udev/rules.d/ is reserved for custom made rules. In this example we've used the name 50-rsf.rules, but any suitable file name can be use.