FAQ Overview

General

Do all client systems that access the cluster have to be in the same network?

Clients access services in the cluster using the floating IP address for each service. This IP address is a normal, routable, IP address, and acts like any other such address. If the service is accessible when run as a simple service without RSF-1, then it will also be accessible when in exactly the same way when it is in an RSF-1 cluster.

Author: HAC
Last update: 2020-09-16 10:41


I have several applications on a server, each listening on different IP ports. How many agents do we need to monitor these applications?

You only need one agent per RSF-1 service; the agent itself is threaded so can be configured to monitor multiple applications from a single thread. The type of monitoring performed by a single agent can be freely intermixed - i.e. a single agent could have a thread monitoring a database using an sql connection as well as multiple threads monitoring IP ports.

Author: HAC
Last update: 2020-09-16 10:57


Why is my failover taking a long time?

There are many factors that can influence failover time. If a failover is taking longer than expected, the rsfmon logs should be checked to find which part(s) of the failover is taking the time.

The main RSF-1 log file is /opt/HAC/RSF-1/log/rsfmon.log. Each line in the log is timestamped, so it is possible to work out how much time each stage is taking. The log file will also contain a line telling you how long a service start or stop took in total:

[23440 Jan 31 17:18:50] [pool4 rsfexec] Total run time for service start: 9 seconds

Log file parser

Log files can be complex to read - especially when multiple services are starting or stopping simultaneously. To help with diagnosis, a tool has been developed to parse the RSF-1 log files (rsfmon.log through to rsfmon.log.9). The tool is attached to this FAQ page, and will also be distributed with all versions of RSF-1 starting from 3.8.13.

A sample of the output is given below:

root@node1:~# rsflog-summary.sh -h

Usage: rsflog-summary.sh [-l|--log <logfile|logdir>] [service]...

Options:
        -l --log <log>: Specify a log file or log directory to search.
                        If a file is given, only that file will be searched.
                        If a directory is given, that directory will be searched
                        for rsfmon.log and rsfmon.log.[0-9].
                        The default is to use the standard RSF-1 log directory
                        /opt/HAC/RSF-1/log

        -h --help:      Show this help text and exit

The config file /opt/HAC/RSF-1/etc/config will be scanned for services to search for
unless a list has been provided on the command line.

root@node1:~#
root@node1:~#
root@node1:~#
root@node1:~# rsflog-summary.sh
Services to search for:  cpw_pool pool4  
Searching log files in /opt/HAC/RSF-1/log:
rsfmon.log.9 rsfmon.log.8 rsfmon.log.7 rsfmon.log.6 rsfmon.log.5 rsfmon.log.4 rsfmon.log.3 rsfmon.log.2 rsfmon.log.1 rsfmon.log.0 rsfmon.log
###############################################
Service startups and shutdowns for cpw_pool:
###############################################
Reading rsfmon.log.9...
Reading rsfmon.log.8...
Reading rsfmon.log.7...
Reading rsfmon.log.6...
Reading rsfmon.log.5...
Reading rsfmon.log.4...
Reading rsfmon.log.3...
Reading rsfmon.log.2...
Reading rsfmon.log.1...
Reading rsfmon.log.0...
Reading rsfmon.log...
Delay before service start - 6 seconds
Service startup #1: Feb 04 10:59:25 (rsfmon.log)
Fping test complete for testvip: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives        run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 5 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 3 seconds (import pool + LUs)
        Zpool import completed status 0, in 3 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 1 seconds (write to log file)
Service start scripts took 17 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 17 seconds

Delay before service start - 6 seconds
Service startup #2: Feb 04 11:06:26 (rsfmon.log)
Fping test complete for testvip: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
        Reservations released in 0 seconds.
Script S02ApplianceStarting run time: 1 seconds (event notify)
Script S14res_drives        run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 4 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 4 seconds (import pool + LUs)
        Zpool import completed status 0, in 3 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 0 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 16 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 16 seconds

###############################################
Service startups and shutdowns for pool4:
###############################################
Reading rsfmon.log.9...
Reading rsfmon.log.8...
Reading rsfmon.log.7...
Reading rsfmon.log.6...
Reading rsfmon.log.5...
Reading rsfmon.log.4...
Reading rsfmon.log.3...
Delay before service start - 18 seconds
Service startup #1: Jan 29 15:29:37 (rsfmon.log.3)
Fping test complete for vip4: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives        run time: 6 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 5 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 3 seconds (import pool + LUs)
        Zpool import completed status 0, in 2 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 0 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 15 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 15 seconds

Service shutdown #1: Jan 29 15:39:24 (rsfmon.log.3)
Script K01announce          run time: 0 seconds (write to log file)
Script K02ApplianceStarted  run time: 0 seconds
Script K02ApplianceStopping run time: 1 seconds (event notify)
Script K79res_drives        run time: 0 seconds (nothing)
Script K80zfs               run time: 1 seconds (export pool + LUs)
        Comstar mapping removed in 0 seconds
        Zpool export completed in 1 seconds
Script K85zfs_mhdc          run time: 1 seconds (release reservations)
        Reservations released in 0 seconds.
Script K86res_drives        run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 0 seconds
Script K98ApplianceStopped  run time: 1 seconds (event notify)
Script K99announce          run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds

Reading rsfmon.log.2...
Reading rsfmon.log.1...
Reading rsfmon.log.0...
Reading rsfmon.log...
Delay before service start - 6 seconds
Service startup #2: Jan 31 17:18:39 (rsfmon.log)
Fping test complete for vip4: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 1 seconds (event notify)
Script S14res_drives        run time: 0 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 4 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 3 seconds (import pool + LUs)
        Zpool import completed status 0, in 2 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 0 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 9 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 9 seconds

Service shutdown #2: Jan 31 17:18:56 (rsfmon.log)
Script K01announce          run time: 1 seconds (write to log file)
Script K02ApplianceStarted  run time: 0 seconds
Script K02ApplianceStopping run time: 0 seconds (event notify)
Script K79res_drives        run time: 0 seconds (nothing)
Script K80zfs               run time: 1 seconds (export pool + LUs)
        Comstar mapping removed in 0 seconds
        Zpool export completed in 0 seconds
Script K85zfs_mhdc          run time: 1 seconds (release reservations)
        Reservations released in 0 seconds.
Script K86res_drives        run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 1 seconds
Script K98ApplianceStopped  run time: 0 seconds (event notify)
Script K99announce          run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds

Service startup #3: Jan 31 17:19:01 (rsfmon.log)
Fping test complete for vip4: 2 seconds
Script S01announce          run time: 0 seconds (write to log file)
Script S02ApplianceStarting run time: 0 seconds (event notify)
Script S14res_drives        run time: 1 seconds (create initial res_drives)
Script S15zfs_mhdc          run time: 4 seconds (place reservations)
        Reservations taken in 4 seconds.
Script S20zfs               run time: 2 seconds (import pool + LUs)
        Zpool import completed status 0, in 2 seconds
        Comstar mapping restored in 0 seconds
Script S21res_drives        run time: 1 seconds (refresh res_drives)
Script S98ApplianceStarted  run time: 1 seconds (event notify)
Script S99announce          run time: 0 seconds (write to log file)
Service start scripts took 9 seconds
Plumb in VIP interface: 0 seconds       
Total run time for service start: 9 seconds

Service shutdown #3: Jan 31 17:27:32 (rsfmon.log)
Script K01announce          run time: 1 seconds (write to log file)
Script K02ApplianceStarted  run time: 0 seconds
Script K02ApplianceStopping run time: 0 seconds (event notify)
Script K79res_drives        run time: 0 seconds (nothing)
Script K80zfs               run time: 1 seconds (export pool + LUs)
        Comstar mapping removed in 0 seconds
        Zpool export completed in 0 seconds
Script K85zfs_mhdc          run time: 1 seconds (release reservations)
        Reservations released in 0 seconds.
Script K86res_drives        run time: 0 seconds (nothing)
Script K98ApplianceStarting run time: 1 seconds
Script K98ApplianceStopped  run time: 0 seconds (event notify)
Script K99announce          run time: 0 seconds (write to log file)
Service stop scripts took 4 seconds
Total run time for service stop: 4 seconds

root@node1:~#

Author: HAC
Last update: 2020-09-18 15:30


Do all client systems that access the cluster have to be in the same network?

Clients access services in the cluster using a virtual IP (VIP) address for each service. The VIP address is a normal, routable, IP address, and acts like any other such address. If the service is accessible when run as a simple service without RSF-1, then it will also be accessible when run in exactly the same way when it is in an RSF-1 cluster.

Author:
Last update: 2020-10-05 17:05


What is the difference between a machine name and a host name.

Every machine in an RSF cluster needs a unique and unchanging machine name for RSF-1 to associate with it. This is normally the same as host name, but must be different if the host name is to be changed as part of a service fail over (or the host name doesn't resolve to a valid IP address).

The machine names used by RSF-1 are the names which appears in MACHINE lines in the config file. These names are normally associated with real machines by checking to see that they match the host name of that machine. However if a host ID appears on a MACHINE line, then the host name check is not done, and the association is made by checking the host ID (as returned by hac_hostid) instead. Note that in this case the IP address of the machine MUST be specified on the end of the line, as it is assumed that the machine name is not the same as the host name, and thus can't be used to look up the IP address of the host.

This flexible naming scheme also allows multiple hosts in a cluster to have the same host name (a requirement seen in a few applications).

To specify the optional host ID of the machine on the MACHINE line preceded it by "0x" (to indicate it is a hexadecimal value). RSF-1 will then identify that machine by the name on the MACHINE line, even if it does not match the real host name. Here is an example MACHINE entry with corresponding host ID:

MACHINE slug 0x2ae5747

RSF also sets the environment variable RSF_MACHINE_NAME in any service startup/shutdown scripts to the machine name in use. This allows scripts to create log messages using the same machine name as rsfmon itself.

Machine names are also used on heartbeat (NETDISK and SERIAL) lines to indicate what machine the heartbeat is being sent to, and on SERVER lines to indicate which machines may act as act as a server for a service.

Author:
Last update: 2020-09-22 13:36


RSF-1 Quickstart Guide

Please make sure that any firewalls on your system have the following ports open before attempting configuration:

  • 1195 (TCP & UDP)
  • 4330 (TCP)

Configuring with the GUI

To connect to the RSF-1 GUI, direct your web browser to:

https://<hostname>:4330

First, create the admin user account for the GUI. Provide a non-empty password in the provided fields and click the Submit button when ready.

 

Once you click the Submit button, the admin user account will be created and you will be taken to the login screen. Login with the username admin + the password you have just created.

 

The first page you see after login is the dashboard page. It will look like this after a fresh install:

 

RSF-1 Configuration & Licensing through the GUI

Note: Before doing the following steps, make sure your /etc/hosts file is configured correctly on both nodes. Your hostname can't be directed to 127.0.0.1, and both of your nodes should be resolvable.

Below is a correctly configured example:

127.0.0.1 localhost localhost.localdomain 
::1 localhost localhost.localdomain
10.10.6.2 nodo1
10.10.6.3 nodo2

To begin the cluster creation process, click on Create/Destroy option either in the side-menu or in the panel on the Dashboard page.

Upon visiting the Cluster Create page, a scan will be performed to locate any nodes that are ready for clustering.

 

When the scan is complete, a list of nodes that can be clustered will be displayed:

 

Select the nodes you want to cluster by clicking the Add to Cluster toggle.

If any of the selected nodes are unlicensed for use with RSF-1, a licensing panel will be shown:

 

If you want to automatically obtain a temporary evaluation license, enter a valid email address to receive the temporary licenses and click the Licence button.

At this point, the RSF-1 End User License Agreement (EULA) will be displayed. Click accept to proceed.

 

Once the license keys have been successfully installed, click the Create Cluster button to initialize the cluster. When the cluster has been created, you will be able to choose between being taken to the dashboard or to start adding services to the cluster.

 

Managing Pools

Before being able to create a service for your pool, you will need to have your pools imported on one of your nodes. Click the Pools option on the side menu to check the status:

 

To import the pools to create your service with, find your pool by filtering by Cluster State, and/or searching by the pool GUID using the search box.

Once located, you can view more information on the pool by clicking the details button:

 

Click the Import button to import the pool on to the node that you are currently logged into. The Cluster Status of the pool should now change to CLUSTERABLE. If there are any problems with the pool, for example it can't be imported/exported on both nodes, the status will show UNCLUSTERABLE.

 

You should now be able to create a service with your imported pools.

 

Creating Services and Adding Volumes

 

Before proceeding, make sure you are able to export and import your pools on both nodes using the "zpool import/export" command.

 

Click the Services option on the side-menu to go to the services page. You will be presented with a panel like the one shown below:

 

Click on the Add Services button to search for pools that are suitable for clustering.

We have four ZFS pools, pool1, pool2, pool3 and pool4 have already been created and imported using the zpool create command. To begin configuration, select a pool from the list and click the Create button.

 

In this example, we are going to configure a service for pool1 on nodes mgc71 and mgc72. You are now shown the configuration options for the service:

 

Add a Virtual hostname to the service by clicking on the Add button in the Virtual Hostname panel.

 

Enter the Virtual Hostname you want to use into the field as well as an IP Address this hostname will resolve to & the associated netmask.

If the nodes have multiple network interfaces, you can select the interface you want to use for cluster traffic on each node by selecting it from the corresponding drop-down list.

You can confirm that the Virtual Hostname configuration has been added by observing its entry in the Virtual Hostname panel:

 

You can click the Modify button to modify the configuration if you need to or click the Remove button to drop the hostname from the service config.

Click the Create button at the bottom of the page to continue. You are now presented with a configuration summary for the service you are about to add to the cluster.

 

Confirm the details are correct and click Confirm to add the service to the cluster.

 

RSF-1 Status with the GUI

To view the current cluster status, click on the Cluster Control option in the side-menu to access the Cluster Control page.

 

This screen shows the current location of each service and the respective Pool States and Failover Modes and allows the operator to stop, start and move services throughout the cluster. Click on the Actions button of the pool you want to operate on to see what actions are possible. The following screenshot shows the operations available on mgc71 for pool1.

 

Moving Services between Cluster Nodes

In this example, we want to move the service pool1 from mgc71 to mgc72.

 

During this process, pool1 will be cleanly stopped on mgc71 and then restarted on mgc72.

 

Once this move has completed, mgc72 is now running pool1.

 

Viewing Cluster Heartbeats with the GUI

In addition to the cluster control page, you can also view the cluster heartbeats page by clicking the Heartbeats option on the left side-menu. The status of the current heartbeats in the cluster are displayed.

 

Adding Additional Network Heartbeats with the GUI

We can also add additional network heartbeats on this page. In the worked example, we also have a private network connection between the two servers, named mgc71-priv and mgc72-priv respectively. To add network heartbeats using this private address, click on the Add Network Heartbeat button.

In this example, we are going to add a heartbeat connection between mgc71 and mgc72-priv and between mgc72 and mgc71-priv :

 

Click Submit to submit the new heartbeat link addition. Review that the actions about to be taken are correct. Click Confirm to confirm the addition.

The new heartbeat is now displayed on the Heartbeats status page.

 

The basic cluster is now configured.

 

Author:
Last update: 2021-03-12 16:25


101

Access denied

Author: Paul Griffiths-Todd
Last update: 2020-09-30 17:18


What does broken_safe and broken_unsafe mean and how do I fix it.

Broken_safe and broken_unsafe refer to a state of an RSF-1 service that has either failed to start up or shut down correctly.

As a service is started or stopped RSF-1 executes the scripts in the directory /opt/HAC/RSF-1/etc/rc.<service>.d/* where <service> is the service name itself; for example a service named web would have the service directory /opt/HAC/RSF-1/etc/rc.web.d/. The service directory contains three types of scripts:

  • start - prefixed by an S<num>
  • stop - prefixed by a K<num>
  • panic - prefixed by a P<num>

The order in which the scripts are run is dictated by the <num> portion of the prefix, going from low to high. The scripts perform actions to either start or stop a service. Each script should run successfully and complete with a 0 exit code. However, if during the running of one of these scripts something goes wrong, then the script will exit with a non zero exit code (exit code definitions can be found /opt/HAC/bin/rsf.sh).

If an error occurs when running the start or stop scripts, a script can indicate this in its exit code. If the failure occurred when starting a service, then the shutdown scrips are run to release any shared resources that the failed startup attempt may have reserved/started etc.

If the start scripts failed, and the following stop scripts succeeded, the service is marked broken_safe. Broken indicates that something is wrong - the service could not be started, and this should be investigated and remedied before trying to start the service on this server again. The safe part indicates that the stop scrips completed successfully, so no shared resources are allocated and it is safe to try to start the service on a different server.

However, if an error occurs when running the stop scripts, (e.g. failure to unmount a shared file system, even with a forcible unmount), then the service is marked broken_unsafe. As before broken indicates that some investigation is required, but this time the unsafe suffix means that shared resources may still be allocated, and therefore it is NOT safe to try to start the service elsewhere in the cluster (for example, if you were to try to mount and use the file system on another host data corruption could occur).

It is also possible for the startup scripts to indicate that the service should be marked broken_unsafe immediately, without running the stop scripts. This is to allow for situations in which a severe error has been detected by the start scrips, and that running the stop scrips, or allowing another server to try to start the service, may further exasperate the situation.

In either case, the underlying issue causing the broken state needs to be resolved. Check the log file /opt/HAC/RSF-1/log/rsfmon.log to discover where the error occured and what needs to be done. Once the problem has been resolved, RSF-1 needs to be told that the issue is now resolved; do this by first issuing the command (as root):

# /opt/HAC/RSF-1/bin/rsfcli -i=0 repaired <servicename>

This will mark the service as having been repaired and place it in manual mode; if any other nodes in the cluster are in automatic mode for the service in quesiton they will now attempt to start it (when a service is in a broken state on any cluster nodes, then no other node will attempt to start it). To switch the service back into automatic mode on the node it went into a broken state issue the command:

# /opt/HAC/RSF-1/bin/rsfcli -i=0 auto <servicename>

Author: Paul Griffiths-Todd
Last update: 2021-02-05 17:35


What is the difference between a machine name and a host name.

Every machine in an RSF cluster needs a unique and unchanging machine name for RSF-1 to associate with it. This is normally the same as host name, but must be different if the host name is to be changed as part of a service fail over (or the host name doesn't resolve to a valid IP address).

The machine names used by RSF-1 are the names which appears in MACHINE lines in the config file. These names are normally associated with real machines by checking to see that they match the host name of that machine. However if a host ID appears on a MACHINE line, then the host name check is not done, and the association is made by checking the host ID (as returned by hac_hostid) instead. Note that in this case the IP address of the machine MUST be specified on the end of the line, as it is assumed that the machine name is not the same as the host name, and thus can't be used to look up the IP address of the host.

This flexible naming scheme also allows multiple hosts in a cluster to have the same host name (a requirement seen in a few applications).

To specify the optional host ID of the machine on the MACHINE line preceded it by "0x" (to indicate it is a hexadecimal value). RSF-1 will then identify that machine by the name on the MACHINE line, even if it does not match the real host name. Here is an example MACHINE entry with corresponding host ID:

MACHINE slug 0x2ae5747

RSF also sets the environment variable RSF_MACHINE_NAME in any service startup/shutdown scripts to the machine name in use. This allows scripts to create log messages using the same machine name as rsfmon itself.

Machine names are also used on heartbeat (NETDISK and SERIAL) lines to indicate what machine the heartbeat is being sent to, and on SERVER lines to indicate which machines may act as act as a server for a service.

Author: Paul Griffiths-Todd
Last update: 2021-02-04 16:50


REST API documentation and interaction

The REST API documentation is built into the REST server and can be viewed by directing a web browser to any cluster node using the following address (where <hostname> is one of the cluster nodes):

https://<hostname>:4330/docs

The resultant page will provide a top level heirarchy of the available REST API calls, grouped by functionality. You can also use this page to interact with your cluster. Once initially connected you will be presented with a page similar to the following:

Before any operations can be performed it is necessary to login to the REST API. Expand the Authentication group and then select the login operation and fill in the username and password (note, clicking on the Example Value box will populate Parameters box with the JSON template):

The username and password values are the same as used when logging into the main RSF-1 GUI. On successful login the server will send back a confirmation page:

The browser is now authenticated with the REST API and is free to interact with the cluster using the available REST operations as presented in the main page. For example, running the cluster get operation [using the Try it out! button] results in output similar to:

Author:
Last update: 2021-03-12 15:16


Block & File

NFSv4 Failover

NFS Version 4 has removed support for UDP as an underlying transport protocol (as opposed to V3 which supported both UDP and TCP), therefore all NFSV4 connections are TCP based. This exclusive use of TCP in NFSv4 has implications for failover recovery time in certain scenarios due to TCP's TIME_WAIT (sometimes referred to as 2MSL) state that can be entered into during multiple failover operations.

The reason for a TCP socket to enter a TIME_WAIT state is to prevent delayed data packets from one connection being misinterpreted as part of a subsequent newly established connection on the same machine (applying stale data from a previous connection to the current one could have potentially disastrous affects on data integrity).

The implication of TIME_WAIT for failover is observed when HA services are moved from one node to another and then back again in a short period of time. Once the initial move is complete the originating server enters the TIME_WAIT state as part of the normal TCP protocol. If during this 'wait period' services are moved back to the originating server clients will be unable to re-connect until the TIME_WAIT period has expired (and in some cases the client connections will timeout themselves), therefore manual moves back and forth in quick succession (circa 2-4 minutes) between machines that provide data over NFSv4 should be avoided. This type of quick failover/failback scenario is only normally seen during system testing exercises and not representative of production environments.

For machines where the failover was instigated as a result of a systems crash, TIME_WAIT is irrelevant as the TCP connections will have no knowledge of the previous connection.

Author: HAC
Last update: 2020-09-14 11:35


Configuration

I have several applications on a server, each listening on different IP ports. How many agents do we need to monitor these applications?

You only need one agent per RSF-1 service; the agent itself is threaded so can be configured to monitor multiple applications from a single thread. The type of monitoring performed by a single agent can be freely intermixed - i.e. a single agent could have a thread monitoring a database using an sql connection as well as multiple threads monitoring IP ports.

Author:
Last update: 2020-09-22 13:44


Property descriptions for the RSFCDB database

This is the list of supported properties in the cluster configuration database.
It can be displayed running the following command: "rsfcdb -I list_props"

prop_rsf_zfs_event_logging:
  [boolean] When set to true, ZFS sysevents are logged to the RSF-1 log file
  (current events are cache file changes - use this option to ensure cache file
  synchronisation is occurring).
  [Default setting for NexentaStor: false.]


prop_zpool_import_option_m:
  [boolean] When set to true, zpool import supports the -m (import missing) option.
  [Default setting for NexentaStor: true.]


prop_abort_unfenced_import:
  [boolean] Prevents an unfenced pool being imported. When set to true, a pool in
  which no disk is persistently reserved will not be imported and the service set
  to broken_safe should an import be attempted.
  [Default setting for NexentaStor: true.]


prop_zpool_threads_option_t:
  [boolean] When set to true, enables parallel import/export pool mounts using threads.
  [Default setting for NexentaStor: true.]


prop_comstar_support:
  [boolean] Enables failover support for COMSTAR targets configured on clustered volumes.
  [Default setting for NexentaStor: true.]


prop_zpool_sync_cache:
  [boolean] Synchronize ZFS cache files between HA nodes whenever a ZPOOL cache is
  updated. The copy is instigated by a sysevent listener.
  [Default setting for NexentaStor: true.]


prop_zpool_run_dtrace_on_import:
  [boolean] Run dtrace on zpool import. This property is set by the zpool import
  process itself in response to an import operation taking longer than the
  property prop_zpool_dtrace_needed_timeout_seconds. Once the import time takes
  less than prop_zpool_dtrace_needed_timeout_seconds then this property is set
  back to false automatically.
  [Default setting for NexentaStor: false.]


prop_gui_res_drive_support:
  [boolean] When set to true enables reservation drive selection in the GUI rather
  than automatic generation. This ability in the GUI is being deprecated in
  favour of automatic drive selection.
  [Default setting for NexentaStor: false.]


prop_mhdc_use_pgr3:
  [boolean] When set to true, disk reservations will use PGR3. When set to false
  SCSI-2 reservations will be used.
  [Default setting for NexentaStor: false.]


prop_gui_cluster_initialized:
  [boolean] Set to true once the cluster has been initialized. This is a GUI
  initialization speedup that signifies a number of (one off) GUI checks have
  completed successfully and therefore do not need to be run again.
  [Default setting for NexentaStor: true.]


prop_skip_pgr3_log_and_cache:
  [boolean] When set to true, do not attempt to reserve the log and cache devices
  when using PGR3 reservations.
  [Default setting for NexentaStor: true.]


prop_gui_enable_fastfail:
  [boolean] When set to true enable the fast failover option in the GUI - reserved
  for future use.
  [Default setting for NexentaStor: false.]


prop_sync_rsfcdb:
  [boolean] When set to true the RSF-1 configuration database will be synchronised
  between all cluster nodes on any change.
  [Default setting for NexentaStor: true.]


prop_mhdc_use_failfast:
  [boolean] When set to true SCSI-2/PGR3 disk reservations have fail fast
  protection applied. Losing a reservation results in a system panic (and thus
  cross-mounts are avoided).
  [Default setting for NexentaStor: true.]


prop_event_notify:
  [boolean] When set to true send cluster events (such as service start/stop)
  through the event notifier interface.
  [Default setting for NexentaStor: true.]


prop_skip_vip_check:
  [boolean] When set to true, the initial VIP check done on service start is
  bypassed. Under normal circumstances this check should always be left on (i.e.
  the setting should be false).
  [Default setting for NexentaStor: false.]


prop_comstar_standby_luns_before_failover:
  [boolean] When set to true, COMSTAR LUNS wil be placed into standby before failover/deletion.
  [Default setting for NexentaStor: false.]


prop_zpool_threads_number:
  [numeric] Number of threads to run when option prop_zpool_threads_option_t is
  set to true.
  [Default setting for NexentaStor: 100.]


prop_zpool_dtrace_needed_timeout_seconds:
  [numeric] Threshold (in seconds) after which point the dtrace import/export
  scripts should be enabled for the next operation.
  [Default setting for NexentaStor: 180.]


prop_scsi2_drive_count:
  [numeric] Determines the number of disks in the pool on which SCSI-2
  reservations are applied.
  [Default setting for NexentaStor: 2.]


prop_scsi2_threaded_reserve:
  [boolean] When set to true, SCSI-2 reservations are issued in parallel and as
  such the total reservation/release time during poll import/export is reduced.
  Normally a SCSI-2 reserve takes on average 2 seconds, so reserving for example
  6 drives could takeadd up to 12 seconds to the overall fail over time -
  switching to parallel reservations means all 6 drives can be reserved in under
  two seconds. Due to a known issue with SAS disks, this option is disabled by
  default.
  [Default setting for NexentaStor: false.]


prop_zpool_dtrace_import_export:
  [boolean] Enable the running of custom dtrace scripts to trace ZPOOL
  import/export. This option is used in conjunction with the properties
  prop_zpool_run_dtrace_on_import and prop_zpool_run_dtrace_on_export - Please
  also reference their description to fully understand what these properties affect.
  [Default setting for NexentaStor: true.]


prop_use_zfs_cache:
  [boolean] When set to true, uses the ZFS cache file as part of the import to
  reduce the overall import time.
  [Default setting for NexentaStor: true.]


prop_gui_javascript_debug:
  [boolean] When set to true debugging messages are written to the browsers
  JavaScript console (when active). This option is not supported on IE browsers
  due to a bug in their JavaScript implementation.
  [Default setting for NexentaStor: false.]


prop_zpool_run_dtrace_on_export:
  [boolean] Run dtrace on zpool export. This property is set by the zpool export
  process itself in response to an export operation taking longer than the
  property prop_zpool_dtrace_needed_timeout_seconds. Once the export time takes
  less than prop_zpool_dtrace_needed_timeout_seconds then this property is set
  back to false automatically.
  [Default setting for NexentaStor: false.]


prop_gui_serial_hb_support:
  [boolean] When set to true allows the user to configure serial heartbeats in the
  cluster from the GUI.
  [Default setting for NexentaStor: false.]


prop_zpool_fail_mode:
  [string] Determines the behaviour of a catastrophic pool failure due to a loss of
  device connectivity or the failure of all devices in the pool. Options are:
  panic, continue, wait - see ZFS fail mode property for more details.
  [Default setting for NexentaStor: panic.]


prop_gui_enable_ipv6:
  [boolean] When set to true enables IPV6 support for Cluster VIPs along with IPV4
  in the Cluster GUI.
  [Default setting for NexentaStor: false.]


prop_zpool_export_option_c:
  [boolean] When set to true, zpool export supports the -c (flush cache) option.
  [Default setting for NexentaStor 3.x: false.]
  [Default setting for NexentaStor 4.x: true.]


prop_pgr3_drive_count:
  [numeric] Determines the number of disks in the pool on which PGR3 reservations
  are applied.
  [Default setting for NexentaStor: 0.]


prop_additional_common_directories :
  [string] Specifies a colon separated list of extra directories to search when a
  common (shared) startup directory is specified for a service in the service
  configuration. These directories will be searched to build up a list of
  scripts/programmes to execute when starting/stopping a service. Duplicate
  elements (i.e. those with the same file name found in more than one common
  directory) will be selected and executed from the first instance found by
  prioritising the property directory list over the service configured directory.
  For example with both /usr/rc.appliance.c/S01announce and
  /opt/HAC/RSF-1/etc/rc.appliance.c/S01announce the S01announce script from
  /usr/rc.appliance.c will be the only one run at service start time.

prop_cluster_hook:
  [string] A property that specifies an external command to be run during service
  start and stop. By default this property is emptyand therefore no action is taken.
  The properties use is intended to allow system integrators to build in their own
  start/stop hooks as required.

prop_ssh_bound:
  [boolean] Set to true when cluster nodes are bound together, otherwise false and
  no scp or ssh attempted.
  [Default setting for NexentaStor: true.]

prop_gui_disable_online_help:
  [boolean] Enables/Disables the showing of the About, FAQ menu items and manual
  page links in the title bar of each cluster panel.
  [Default setting for NexentaStor: false.]

prop_pool_hook:
  [string] A property specifying an external command to be run during pool import by
  S20zfs (and subsequent export by K80zfs). By default this property is empty and
  therefore no action is taken. The properties use is intended to allow system integrators
  to build in their own import/export hooks as required.

prop_gui_enable_comstar_browser:
  [boolean] When set to true reveals the COMSTAR browser GUI component.
  [Default setting for NexentaStor: false.]

prop_fping_retries:
  [numeric] Number of fping retries when performing VIP check prior to service startup.
  A value of 0 or -1 means just the one check, a numeric value means check this
  number of times if and only if the VIP is pingable on the first check. Ignored if the VIP
  is not detected before service startup.
  [Default setting for NexentaStor: 0.]

prop_gui_default_cluster_name:
  [string] The default name for a new cluster.
  [Default setting for NexentaStor: HA-Cluster.]

prop_scsi2_retry_count:
  [numeric] Sets the number of SCSI-2 TKOWN retry attempts when an I/O error is
  encountered back from the disk being operated on - set to 0 for no retries.
  [Default setting for NexentaStor: 0.]

prop_gui_default_runtimeout:
  [numeric] The default run timeout value for the cluster.
  [Default setting for NexentaStor: 8.]

prop_gui_default_inittimeout:
  [numeric] The default initial timeout value for the cluster.
  [Default setting for NexentaStor: 20.]

prop_gui_failover_state_sticky:
  [boolean] When moving a service move over the service state (automatic/manual) with it.
  [Default setting for NexentaStor: false.]

prop_gui_plugin_version:
  [string] Identifies the plugin version.

Author:
Last update: 2020-09-22 14:01


What are the new prop_zpool_fail_mode properties for?

Access denied

Author:
Last update: 2021-02-05 14:48


Can we use more than one network heartbeat.

RSF-1 does not place any restrictions on the number of heartbeats it supports via network, disk or serial. For network heartbeats a typical configuration is to have one private heartbeat over a ethernet crossover cable, and as many public heartbeats as there are interfaces.

Author:
Last update: 2020-09-22 14:48


The port numbers for rsfnet and rsfreq are both 1195, is this a typo or are rsfnet and rsfreq really on the same port?

RSF-1 uses port 1195 for both TCP and UDP requests. The UDP port is used for heartbeat and cluster discovery packets, with the TCP port being used for the GUI and command line interface.

Port 1195 is officially assigned for RSF-1 use by the Internet Assigned Numbers Authority (see http://www.iana.org/assignments/port-numbers for details).

Author:
Last update: 2020-09-22 14:49


RSF-1 Quickstart Guide

Please make sure that any firewalls on your system have the following ports open before attempting configuration:

  • 1195 (TCP & UDP)
  • 4330 (TCP)

Configuring with the GUI

To connect to the RSF-1 GUI, direct your web browser to:

https://<hostname>:4330

First, create the admin user account for the GUI. Provide a non-empty password in the provided fields and click the Submit button when ready.

 

Once you click the Submit button, the admin user account will be created and you will be taken to the login screen. Login with the username admin + the password you have just created.

 

The first page you see after login is the dashboard page. It will look like this after a fresh install:

 

RSF-1 Configuration & Licensing through the GUI

Note: Before doing the following steps, make sure your /etc/hosts file is configured correctly on both nodes. Your hostname can't be directed to 127.0.0.1, and both of your nodes should be resolvable.

Below is a correctly configured example:

127.0.0.1 localhost localhost.localdomain 
::1 localhost localhost.localdomain
10.10.6.2 nodo1
10.10.6.3 nodo2

To begin the cluster creation process, click on Create/Destroy option either in the side-menu or in the panel on the Dashboard page.

Upon visiting the Cluster Create page, a scan will be performed to locate any nodes that are ready for clustering.

 

When the scan is complete, a list of nodes that can be clustered will be displayed:

 

Select the nodes you want to cluster by clicking the Add to Cluster toggle.

If any of the selected nodes are unlicensed for use with RSF-1, a licensing panel will be shown:

 

If you want to automatically obtain a temporary evaluation license, enter a valid email address to receive the temporary licenses and click the Licence button.

At this point, the RSF-1 End User License Agreement (EULA) will be displayed. Click accept to proceed.

 

Once the license keys have been successfully installed, click the Create Cluster button to initialize the cluster. When the cluster has been created, you will be able to choose between being taken to the dashboard or to start adding services to the cluster.

 

Managing Pools

Before being able to create a service for your pool, you will need to have your pools imported on one of your nodes. Click the Pools option on the side menu to check the status:

 

To import the pools to create your service with, find your pool by filtering by Cluster State, and/or searching by the pool GUID using the search box.

Once located, you can view more information on the pool by clicking the details button:

 

Click the Import button to import the pool on to the node that you are currently logged into. The Cluster Status of the pool should now change to CLUSTERABLE. If there are any problems with the pool, for example it can't be imported/exported on both nodes, the status will show UNCLUSTERABLE.

 

You should now be able to create a service with your imported pools.

 

Creating Services and Adding Volumes

 

Before proceeding, make sure you are able to export and import your pools on both nodes using the "zpool import/export" command.

 

Click the Services option on the side-menu to go to the services page. You will be presented with a panel like the one shown below:

 

Click on the Add Services button to search for pools that are suitable for clustering.

We have four ZFS pools, pool1, pool2, pool3 and pool4 have already been created and imported using the zpool create command. To begin configuration, select a pool from the list and click the Create button.

 

In this example, we are going to configure a service for pool1 on nodes mgc71 and mgc72. You are now shown the configuration options for the service:

 

Add a Virtual hostname to the service by clicking on the Add button in the Virtual Hostname panel.

 

Enter the Virtual Hostname you want to use into the field as well as an IP Address this hostname will resolve to & the associated netmask.

If the nodes have multiple network interfaces, you can select the interface you want to use for cluster traffic on each node by selecting it from the corresponding drop-down list.

You can confirm that the Virtual Hostname configuration has been added by observing its entry in the Virtual Hostname panel:

 

You can click the Modify button to modify the configuration if you need to or click the Remove button to drop the hostname from the service config.

Click the Create button at the bottom of the page to continue. You are now presented with a configuration summary for the service you are about to add to the cluster.

 

Confirm the details are correct and click Confirm to add the service to the cluster.

 

RSF-1 Status with the GUI

To view the current cluster status, click on the Cluster Control option in the side-menu to access the Cluster Control page.

 

This screen shows the current location of each service and the respective Pool States and Failover Modes and allows the operator to stop, start and move services throughout the cluster. Click on the Actions button of the pool you want to operate on to see what actions are possible. The following screenshot shows the operations available on mgc71 for pool1.

 

Moving Services between Cluster Nodes

In this example, we want to move the service pool1 from mgc71 to mgc72.

 

During this process, pool1 will be cleanly stopped on mgc71 and then restarted on mgc72.

 

Once this move has completed, mgc72 is now running pool1.

 

Viewing Cluster Heartbeats with the GUI

In addition to the cluster control page, you can also view the cluster heartbeats page by clicking the Heartbeats option on the left side-menu. The status of the current heartbeats in the cluster are displayed.

 

Adding Additional Network Heartbeats with the GUI

We can also add additional network heartbeats on this page. In the worked example, we also have a private network connection between the two servers, named mgc71-priv and mgc72-priv respectively. To add network heartbeats using this private address, click on the Add Network Heartbeat button.

In this example, we are going to add a heartbeat connection between mgc71 and mgc72-priv and between mgc72 and mgc71-priv :

 

Click Submit to submit the new heartbeat link addition. Review that the actions about to be taken are correct. Click Confirm to confirm the addition.

The new heartbeat is now displayed on the Heartbeats status page.

 

The basic cluster is now configured.

 

Author:
Last update: 2021-03-12 16:25


Do I need to use dedicated disks for heartbeats or reservations?

The RSF-1 cluster uses shared disks for both heartbeats and disk fencing.

Disk Heartbeats

When a disk is used for heartbeats, RSF-1 on each cluster node will use a small portion of the disk to regularly write information about the state of the cluster according to that node. That information will then be read by the remote node and used to build up a picture of the whole cluster.

For a ZFS cluster, disk heartbeats are not required to be dedicated disks. RSF-1 understands the layout of ZFS pool drives and is able to place heartbeat information on any pool disk without disrupting either user data or ZFS metadata.

Disk fencing

When a disk is used by RSF-1 for the purposes of fencing a SCSI reservation is placed on that disk during the startup sequence of a HA service. The reservations are placed before the ZFS pool is imported with the purpose to prevent other cluster nodes from writing to the pool after it is locally imported. This is important for any situation where a cluster node appears to go offline - prompting a service failover to the remaining node - but in reality that node is still running and accessing the pool. In that case, the SCSI reservations will block access to the pool from the failing node, allowing the remaining node to safely take over.

Resersvation drive selection is handled automatically by recent versions of the RSF-1 cluster but for older, manually configured versions or for any situations where the cluster configuration must be changed from the default settings, there are three important requirements for the selection of reservation drives:

  1. Because the reservations are used to fence the ZFS pool's disks, it must be the pool's disks that are reserved - so dedicated disks should not be used for reservations. Additionally, it should be regular data disks that are reserved. SCSI reservations on disks marked as "cache" (L2ARC) or "spare" will not have any effect on the ability of another node to access the pool and will therefore not contribute towards adequate data protection.
    By default, the cluster will also avoid using disks marked as "log" (SLOG). This is less important from a data protection perspective but it has been found that since the purpose of a separate log device is to provide a performance improvement, the type of disk devices used for log tend to be more "cutting edge" than regular data disks and are more likely to exhibit unexpected behaviour in response to SCSI reservations.

  2. Reservation disks should be selected in such a way that the reservations would prevent pool access from the remote node. For example, if a pool is made up of several 4-way mirrors, then as a minimum, reservations should be placed on all 4 devices in any one mirror vdev. This would mean the entire vdev will be inaccessible to the remote node and therefore, the whole pool will be inaccessible.

  3. Reservations cannot be placed on the same disks as heartbeats. Depending on the type of reservations used by the cluster the reservation will block either reads and writes, or just writes, from one of the cluster nodes. Because each disk heartbeat requires both nodes to be able to read and write to the disk, reservations and disk heartbeats will conflict and result in the disk heartbeat being marked down while the service is running on either node.

Author: Matt Youds
Last update: 2021-02-04 11:24


How do I secure the web GUI ?

By default, the web GUI binds to all IP addresses on a node, i.e. '0.0.0.0'. If this is undesired for security reasons, the web GUI can be secured by binding the HTTP server to localhost only, i.e. prevent remote browser access. The server.socket_host line in the global section of the /opt/HAC/RSF-1/SVC/rsf_standalone.conf file just needs to be updated as follows:

server.socket_host: '127.0.0.1'

Once the file has been saved, the web GUI will automatically restart and be bound to localhost only.

N.B. This is the only supported method of securing the web GUI which will allow the new rsfadm command which relies on the Python HTTP API to continue functioning normally.

Author: Paul Griffiths-Todd
Last update: 2021-02-04 16:45


Special syntax when the machine name / vip starts with a number

In the special case where the machine name starts with a number or contains special characters, a special syntax is required when using the rsfcdb command or writing the config file.
If 0-node1 and 0-node2 are our machine names, when using the rsfcdb to build the database, the machine name should be preceded by the percent symbol, i.e. %0-node1

Example code: To establish a network heartbeat between the two nodes we shoud use:

rsfcdb ha_net %0-node1#0-node2 0-node2#0-node1

On the other hand, when the vip starts with a number or contains special characters, we should use double quotes. For example, if the vip name we would like to use for the pool zpool1 is 0-testvip1, we should use double quotes each time we use the vip name.

Example code: Add a description for the service zpool1 which uses 0-testvip1 as vip name:

rsfcdb sa_desc zpool1#"0-testvip1" "RSF-1 zpool1 ZFS service"

Sample snippet of a functional RSF-1configuration file:

# Machines section
MACHINE %0-node1
NET %0-node2
DISC %0-node2 /dev/rdsk/c15t2d0s0:512:518
SERVICE tank1 "0-viptest1" "RSF-1 tank1 ZFS service"

Author: Paul Griffiths-Todd
Last update: 2021-02-04 16:49


Does RSF-1 support one node cluster configurations?

Yes it does.

Here is an example of the RSF-1 configuration file for a one node cluster, where the pool name is tank1, the VIP floating address is 10.0.2.201, and the network interface is e1000g0:

#
# Created Wed 2014-10-08 15:31:48 BST by RSF-1 config_db
# This file is automatically generated by rsfcdb - DO NOT EDIT BY HAND
#
# [SINGLE NODE CONFIGURATION] - some properties will be disabled.
#
# [Global Section]
#  Global settings affecting the whole cluster.
#
CLUSTER_NAME My-Single_Cluster
#DISC_HB_BACKOFF 20,3,3600
#POLL_TIME 1
REALTIME 1
EVENT_NOTIFY "/opt/HAC/RSF-1/bin/event_notifier"
#IPDEVICE_MONITOR 3,2

#
# [Machine Section]
#  Specifies the machines in the cluster and the heartbeats between them.
#  Note! Heartbeats are disabled in single-node cluster mode.
#
MACHINE node-a

#
# [Services Section]
#  Specifies the services in the cluster, the nodes they can
#  run on and the IP addresses used to access those services.
#
SERVICE service1 10.0.2.201 "Single Node Service"
 OPTION "sdir=appliance"
 INITIMEOUT 20
 RUNTIMEOUT 8
 MOUNT_POINT "/volumes/tank1"
 SERVER node-a
  IPDEVICE "e1000g0"

Using rsfcdb to generate a single node configuration

In order to generate a single node cluster configuration file using rsfcdb, set the single node property prop_single_node_cluster to true using the following command:

rsfcdb update prop_single_node_cluster true

Next view the configuration file to show the resulting single node cluster configuration using the command:

rsfcdb config_preview

Please note the following points when generating a single node cluster using rsfcdb:

  • Any global section values that are not required for a single node cluster are commented out of the resulting configuration.
    • Only CLUSTER_NAME, EVENT_NOTIFY and REALTIIME are valid for single node cluster.
  • The heartbeat section is replaced by the machine name added via the ga_name sub-command of rsfcdb.
  • Each service has also been limited to displaying one SERVER section where the SERVER parameter matches the machine name.

Author: Paul Griffiths-Todd
Last update: 2021-02-04 17:28


Property descriptions for the RSFCDB database

This is the list of supported properties in the cluster configuration database.
It can be displayed running the following command: "rsfcdb -I list_props"

prop_rsf_zfs_event_logging:
  [boolean] When set to true, ZFS sysevents are logged to the RSF-1 log file
  (current events are cache file changes - use this option to ensure cache file
  synchronisation is occurring).
  [Default setting for NexentaStor: false.]


prop_zpool_import_option_m:
  [boolean] When set to true, zpool import supports the -m (import missing) option.
  [Default setting for NexentaStor: true.]


prop_abort_unfenced_import:
  [boolean] Prevents an unfenced pool being imported. When set to true, a pool in
  which no disk is persistently reserved will not be imported and the service set
  to broken_safe should an import be attempted.
  [Default setting for NexentaStor: true.]


prop_zpool_threads_option_t:
  [boolean] When set to true, enables parallel import/export pool mounts using threads.
  [Default setting for NexentaStor: true.]


prop_comstar_support:
  [boolean] Enables failover support for COMSTAR targets configured on clustered volumes.
  [Default setting for NexentaStor: true.]


prop_zpool_sync_cache:
  [boolean] Synchronize ZFS cache files between HA nodes whenever a ZPOOL cache is
  updated. The copy is instigated by a sysevent listener.
  [Default setting for NexentaStor: true.]


prop_zpool_run_dtrace_on_import:
  [boolean] Run dtrace on zpool import. This property is set by the zpool import
  process itself in response to an import operation taking longer than the
  property prop_zpool_dtrace_needed_timeout_seconds. Once the import time takes
  less than prop_zpool_dtrace_needed_timeout_seconds then this property is set
  back to false automatically.
  [Default setting for NexentaStor: false.]


prop_gui_res_drive_support:
  [boolean] When set to true enables reservation drive selection in the GUI rather
  than automatic generation. This ability in the GUI is being deprecated in
  favour of automatic drive selection.
  [Default setting for NexentaStor: false.]


prop_mhdc_use_pgr3:
  [boolean] When set to true, disk reservations will use PGR3. When set to false
  SCSI-2 reservations will be used.
  [Default setting for NexentaStor: false.]


prop_gui_cluster_initialized:
  [boolean] Set to true once the cluster has been initialized. This is a GUI
  initialization speedup that signifies a number of (one off) GUI checks have
  completed successfully and therefore do not need to be run again.
  [Default setting for NexentaStor: true.]


prop_skip_pgr3_log_and_cache:
  [boolean] When set to true, do not attempt to reserve the log and cache devices
  when using PGR3 reservations.
  [Default setting for NexentaStor: true.]


prop_gui_enable_fastfail:
  [boolean] When set to true enable the fast failover option in the GUI - reserved
  for future use.
  [Default setting for NexentaStor: false.]


prop_sync_rsfcdb:
  [boolean] When set to true the RSF-1 configuration database will be synchronised
  between all cluster nodes on any change.
  [Default setting for NexentaStor: true.]


prop_mhdc_use_failfast:
  [boolean] When set to true SCSI-2/PGR3 disk reservations have fail fast
  protection applied. Losing a reservation results in a system panic (and thus
  cross-mounts are avoided).
  [Default setting for NexentaStor: true.]


prop_event_notify:
  [boolean] When set to true send cluster events (such as service start/stop)
  through the event notifier interface.
  [Default setting for NexentaStor: true.]


prop_skip_vip_check:
  [boolean] When set to true, the initial VIP check done on service start is
  bypassed. Under normal circumstances this check should always be left on (i.e.
  the setting should be false).
  [Default setting for NexentaStor: false.]


prop_comstar_standby_luns_before_failover:
  [boolean] When set to true, COMSTAR LUNS wil be placed into standby before failover/deletion.
  [Default setting for NexentaStor: false.]


prop_zpool_threads_number:
  [numeric] Number of threads to run when option prop_zpool_threads_option_t is
  set to true.
  [Default setting for NexentaStor: 100.]


prop_zpool_dtrace_needed_timeout_seconds:
  [numeric] Threshold (in seconds) after which point the dtrace import/export
  scripts should be enabled for the next operation.
  [Default setting for NexentaStor: 180.]


prop_scsi2_drive_count:
  [numeric] Determines the number of disks in the pool on which SCSI-2
  reservations are applied.
  [Default setting for NexentaStor: 2.]


prop_scsi2_threaded_reserve:
  [boolean] When set to true, SCSI-2 reservations are issued in parallel and as
  such the total reservation/release time during poll import/export is reduced.
  Normally a SCSI-2 reserve takes on average 2 seconds, so reserving for example
  6 drives could takeadd up to 12 seconds to the overall fail over time -
  switching to parallel reservations means all 6 drives can be reserved in under
  two seconds. Due to a known issue with SAS disks, this option is disabled by
  default.
  [Default setting for NexentaStor: false.]


prop_zpool_dtrace_import_export:
  [boolean] Enable the running of custom dtrace scripts to trace ZPOOL
  import/export. This option is used in conjunction with the properties
  prop_zpool_run_dtrace_on_import and prop_zpool_run_dtrace_on_export - Please
  also reference their description to fully understand what these properties affect.
  [Default setting for NexentaStor: true.]


prop_use_zfs_cache:
  [boolean] When set to true, uses the ZFS cache file as part of the import to
  reduce the overall import time.
  [Default setting for NexentaStor: true.]


prop_gui_javascript_debug:
  [boolean] When set to true debugging messages are written to the browsers
  JavaScript console (when active). This option is not supported on IE browsers
  due to a bug in their JavaScript implementation.
  [Default setting for NexentaStor: false.]


prop_zpool_run_dtrace_on_export:
  [boolean] Run dtrace on zpool export. This property is set by the zpool export
  process itself in response to an export operation taking longer than the
  property prop_zpool_dtrace_needed_timeout_seconds. Once the export time takes
  less than prop_zpool_dtrace_needed_timeout_seconds then this property is set
  back to false automatically.
  [Default setting for NexentaStor: false.]


prop_gui_serial_hb_support:
  [boolean] When set to true allows the user to configure serial heartbeats in the
  cluster from the GUI.
  [Default setting for NexentaStor: false.]


prop_zpool_fail_mode:
  [string] Determines the behaviour of a catastrophic pool failure due to a loss of
  device connectivity or the failure of all devices in the pool. Options are:
  panic, continue, wait - see ZFS fail mode property for more details.
  [Default setting for NexentaStor: panic.]


prop_gui_enable_ipv6:
  [boolean] When set to true enables IPV6 support for Cluster VIPs along with IPV4
  in the Cluster GUI.
  [Default setting for NexentaStor: false.]


prop_zpool_export_option_c:
  [boolean] When set to true, zpool export supports the -c (flush cache) option.
  [Default setting for NexentaStor 3.x: false.]
  [Default setting for NexentaStor 4.x: true.]


prop_pgr3_drive_count:
  [numeric] Determines the number of disks in the pool on which PGR3 reservations
  are applied.
  [Default setting for NexentaStor: 0.]


prop_additional_common_directories :
  [string] Specifies a colon separated list of extra directories to search when a
  common (shared) startup directory is specified for a service in the service
  configuration. These directories will be searched to build up a list of
  scripts/programmes to execute when starting/stopping a service. Duplicate
  elements (i.e. those with the same file name found in more than one common
  directory) will be selected and executed from the first instance found by
  prioritising the property directory list over the service configured directory.
  For example with both /usr/rc.appliance.c/S01announce and
  /opt/HAC/RSF-1/etc/rc.appliance.c/S01announce the S01announce script from
  /usr/rc.appliance.c will be the only one run at service start time.

prop_cluster_hook:
  [string] A property that specifies an external command to be run during service
  start and stop. By default this property is emptyand therefore no action is taken.
  The properties use is intended to allow system integrators to build in their own
  start/stop hooks as required.

prop_ssh_bound:
  [boolean] Set to true when cluster nodes are bound together, otherwise false and
  no scp or ssh attempted.
  [Default setting for NexentaStor: true.]

prop_gui_disable_online_help:
  [boolean] Enables/Disables the showing of the About, FAQ menu items and manual
  page links in the title bar of each cluster panel.
  [Default setting for NexentaStor: false.]

prop_pool_hook:
  [string] A property specifying an external command to be run during pool import by
  S20zfs (and subsequent export by K80zfs). By default this property is empty and
  therefore no action is taken. The properties use is intended to allow system integrators
  to build in their own import/export hooks as required.

prop_gui_enable_comstar_browser:
  [boolean] When set to true reveals the COMSTAR browser GUI component.
  [Default setting for NexentaStor: false.]

prop_fping_retries:
  [numeric] Number of fping retries when performing VIP check prior to service startup.
  A value of 0 or -1 means just the one check, a numeric value means check this
  number of times if and only if the VIP is pingable on the first check. Ignored if the VIP
  is not detected before service startup.
  [Default setting for NexentaStor: 0.]

prop_gui_default_cluster_name:
  [string] The default name for a new cluster.
  [Default setting for NexentaStor: HA-Cluster.]

prop_scsi2_retry_count:
  [numeric] Sets the number of SCSI-2 TKOWN retry attempts when an I/O error is
  encountered back from the disk being operated on - set to 0 for no retries.
  [Default setting for NexentaStor: 0.]

prop_gui_default_runtimeout:
  [numeric] The default run timeout value for the cluster.
  [Default setting for NexentaStor: 8.]

prop_gui_default_inittimeout:
  [numeric] The default initial timeout value for the cluster.
  [Default setting for NexentaStor: 20.]

prop_gui_failover_state_sticky:
  [boolean] When moving a service move over the service state (automatic/manual) with it.
  [Default setting for NexentaStor: false.]

prop_gui_plugin_version:
  [string] Identifies the plugin version.

Author: Paul Griffiths-Todd
Last update: 2021-02-04 17:29


What are the prop_zpool_fail_mode properties for?

The failmode property of a zpool controls how the pool handles I/O after it has gone into a 'faulted' state. There are 3 options:

  1. wait - all I/O from clients will hang
  2. continue - clients will get I/O errors for all I/O operations to the pool
  3. panic - as soon as the pool goes faulted, ZFS triggers a kernel panic

For RSF clusters, panic should be used (which is not the default failmode property on newly created pools). The mode setting of panic means if a pool goes faulted due to a faulty controller card, broken fibre cable, etc. the active node will panic, and the service can automatically fail over to the other node.

RSF-1 is configured by default to change the failmode of all zpools to panic each time a service starts. If this behaviour is not wanted for any reason, it can be changed by altering an RSF database property.

The property prop_zpool_fail_mode controls the failmode on a cluster-wide basis. If it is necessary to have a pool use a different failmode, then a new property can be created with the format prop_zpool_fail_mode_<pool> (note that this is a pool name, not a service name; if a service contains more than one pool, then a separate property can be declared for each pool).

To modify the cluster wide failmode property to wait run:

# rsfcdb update prop_zpool_fail_mode wait

To add a new property (in this case continue) specifically for the pool tank, run:

# rsfcdb create prop_zpool_fail_mode_tank continue

Possible values for the global and individual pool setting are wait, continue, panic and none. A value of none means RSF will not set the failmode of pools on import, so they will retain the failmode setting they already had.

Possible values for the pool specific settings are wait, continue, panic, none and default. A value of default effectively disables the setting and causes that pool to use the global value prop_zpool_fail_mode. A value of none causes RSF not to set the failmode of this pool at all.

For example, if there are 5 pools in a cluster, pool1, pool2, pool3, pool4 and pool5, the properties:

 prop_zpool_fail_mode       : panic
 prop_zpool_fail_mode_pool1 : wait
 prop_zpool_fail_mode_pool2 : default
 prop_zpool_fail_mode_pool3 : none
 prop_zpool_fail_mode_pool4 : continue

mean that the following failmodes are applied:

 pool1 - wait
pool2 - panic
pool3 - no failmode setting used (keeps its original setting)
pool4 - continue
pool5 - panic (as there is no specific declaration for pool5, the default is used)

Author: Paul Griffiths-Todd
Last update: 2021-02-05 17:18


REST API documentation and interaction

The REST API documentation is built into the REST server and can be viewed by directing a web browser to any cluster node using the following address (where <hostname> is one of the cluster nodes):

https://<hostname>:4330/docs

The resultant page will provide a top level heirarchy of the available REST API calls, grouped by functionality. You can also use this page to interact with your cluster. Once initially connected you will be presented with a page similar to the following:

Before any operations can be performed it is necessary to login to the REST API. Expand the Authentication group and then select the login operation and fill in the username and password (note, clicking on the Example Value box will populate Parameters box with the JSON template):

The username and password values are the same as used when logging into the main RSF-1 GUI. On successful login the server will send back a confirmation page:

The browser is now authenticated with the REST API and is free to interact with the cluster using the available REST operations as presented in the main page. For example, running the cluster get operation [using the Try it out! button] results in output similar to:

Author:
Last update: 2021-03-12 15:16


Reservation drives are getting 'Failed to power up' errors

When a ZFS service is running on a node in the cluster, that node will hold SCSI reservations on some of the zpool disks to prevent the other node from being able to access those disks. With some disk models, when the passive node reboots, it will no longer be able to access those reservation disks and will get the message:

Device <path-to-device> failed to power up

Because of the failure to power up then that node will then always encounter I/O error from those disks.

To resolve this issue, add an entry to /kernel/drv/sd.conf to disable the bootup power check for a specific disk model. The entry should be similar to:

sd-config-list= "SEAGATE ST2000NM0001","power-condition:false";

or if there are multiple disk models showing this behaviour:

sd-config-list= "SEAGATE ST2000NM0001","power-condition:false",
                "SEAGATE ST32000644NS","power-condition:false";

After sd.conf has been modified on both nodes, there should be no 'failed to power up' error on the next bootup and the passive node should be able to access the disks as expected (although it will still get 'reservation conflict' because the disks are still reserved).

Author: Paul Griffiths-Todd
Last update: 2021-03-23 15:07


VHCI: devices not recognised as multi-path candidates for Solaris/OmniOS and derivatives

With the Solaris family of OS's, the virtual host controller interconnect (vHCI) driver enables a device with multiple paths to be represented as single device instance rather than as an instance per physical path. Devices under VHCI control appear in format listings with the device path starting /scsi_vhci, as in the following example:

# format
Searching for drives...done

AVAILABLE DISK SELECTIONS:
       0. c0t5000C500237B2E53d0 <SEAGATE-ST3300657SS-ES62-279.40GB>
                 /scsi_vhci/disk@g5000c500237b2e53
       1. c0t5000C5002385CE4Fd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c5002385ce4f
       2. c0t5000C500238478ABd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c500238478ab
       3. c1t5000C50013047C55d0 <HP-DG0300BALVP-HPD3-279.40GB>
          /pci@71,0/pci8086,2f04@2/pci1028,1f4f@0/iport@1/disk@w5000c50013047c55,0
       4. c2t5000C5000F81EDB1d0 <HP-DG0300BALVP-HPD4-279.40GB>

          /pci@71,0/pci8086,2f08@3/pci1028,1f4f@0/iport@20/disk@w5000c5000f81edb1,0

However, in the above example two devices are not under the control of the VHCI driver, as can be seen by the device /pci path rather than the /scsi_vhci one. In order to resolve this the VHCI driver needs to be made aware these drives can be multipathed. This is accompished by adding specific entries into the VHCI configuration file /kernel/drv/scsi_vhci.conf; in essence, for each differing (vendor/model combination) candidate SCSI target device, the scsi_vhci code must identify a failover module to support the device by adding them to the property 'scsi-vhci-failover-override' in the VHCI configuration file.

By using the format command we can identify the device vendor/model from the resulting output. Taking the entry <HP-DG0300BALVP-HPD4-279.40GB> from the above example, the first two digits identify the manufacturer, HP, with the next block identifying the model number, DG0300BALVP. These identifiers can then be added to the VHCI configuration file /kernel/drv/scsi_vhci.conf thus (syntax for more than one entry shown here for reference):

scsi-vhci-failover-override =
    "HP      DG0300BALVP", "f_sym",
    "HP      DG0300FARVV", "f_sym";
#END: FAILOVER_MODULE_BLOCK (DO NOT MOVE OR DELETE)

Please note that the spacing is important in the vendor declaration - it must be padded out to eight characters, immediately followed by the model number (which does not require any padding). Once the entries have been added the host machine must be rebooted in order for them to take effect. In the example above, once the configuration has been updated and the host rebooted, the output of format now returns:

AVAILABLE DISK SELECTIONS:
       0. c0t5000C500237B2E53d0 <SEAGATE-ST3300657SS-ES62-279.40GB>
                 /scsi_vhci/disk@g5000c500237b2e53
       1. c0t5000C5002385CE4Fd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c5002385ce4f
       2. c0t5000C500238478ABd0 <SEAGATE-ST3300657SS-ES62-279.40GB>
          /scsi_vhci/disk@g5000c500238478ab
       3. c1t5000C50013047C55d0 <HP-DG0300BALVP-HPD3-279.40GB>
         
/scsi_vhci/disk@g5000c50013047c57
       4. c2t5000C5000F81EDB1d0 <HP-DG0300BALVP-HPD4-279.40GB>
          /scsi_vhci/disk@g5000c500238478ab

The drives have now been sucessfully configured for multi-pathing via the VHCI driver.

Author: Paul Griffiths-Todd
Last update: 2021-04-01 15:58


What disk layout should I use for my ZFS HA pool and how does this impact reservations and heartbeat drives?

For ZFS file systems there are essentially two main approaches in use, RAID Z2 and a mirrored stripe. To give a brief overview of these two schemes lets see how they layout when we have six drives, 1TB each (note that within any pool, any drives used for reservations or to heartbeat through are still usable for data, i.e. NO dedicated drives are required; the cluster software happily co-exists with ZFS pools).

RAID Z2

Raid Z2 uses two parity drives and at least two data drives, so the minimum amount of drives is four. With six drives this then equates to the following layout with roughly 4TB of usable space:


D4

D3

D2

D1

P2

P1
 

With this configuration up to two drives (parity or data) can be lost and pool integrity still maintained; however, any more drive losses will result in the pool becoming faulted (essentially unreadable/unimportable).

In order to place reservations on this drive layout it is necessary to reserve three drives (say P1, P2, D1) - in this way no other node will be able to successfully import the pool as there are not enough unreserved drives to read valid data from.

With resertions in place on drives P1, P2 and D1, this leaves drives D2, D3 and D4 free to use for disk heartbeats. The RSF-1 cluster software is aware of the ondisk ZFS structure and is able to heartbeat through the drives without affecting pool integrity.

RAID 10

Raid 10 is a combination of mirroring and striping; firstly mirrored vdevs are created (RAID 1) then striped together (RAID 0). With six drives we have a choice on the mirror layout depending on the amount of redundancy desired. These two schemas can be visualised as follows, firstly two three way mirrors striped together:


D0
 
D3

D1
 
D4

D2
 
D5
vdev1 + vdev2

In this example two mirrors have been created (D0/D1/D2 and D3/D4/D5) giving a total capacity of 2TB. This layout allows a maximum of two drives to fail in any single vdev (for example D0 and D2 in vdev1, D0 and D3 in vdev 1 and 2, etc.); the pool could survive four drive failures as long as a single drive is left in vdev1 and vdev2, but if all fail in one side of the stripe (for example D3, D4 and D5) then the pool would fault.

The reservations for this layout would be placed on all drives in either vdev1 or vdev2, leaving three drives free for heartbeats.

Alternatively the drives could be layed out as three two way mirrors striped together:


D0
 
D2
 
D4

D1
 
D3
 
D5
vdev1 + vdev2 + vdev3

In this example three mirrors have been created (D0/D1, D2/D3 and D4/D5) giving a total capacity of 3TB, with a maximum of one drive failure in any single vdev. Reservations will be placed on either vdev1, vdev2 or vdev3 leaving four drives available for heartbeating.

In all of the above scenarios it is NOT necessary to configure reservations or heartbeats; when a pool is added to a cluster, the cluster software will interrogate the pool structure and automatically work out the amount of drives it needs to reserve with any remaining drives utilised for heartbeats. Note that for each clustered pool only a maximum of two heartbeat drives are configured (any more is overkill). 

 

Author: Paul Griffiths-Todd
Last update: 2021-04-07 19:43


How do I create a COMSTAR iSCSI HA share

Access denied

Author: Paul Griffiths-Todd
Last update: 2021-04-26 16:29


HA Monitor

HA Monitor Guide 1.0.7

1. Overview

This document describes the RSF-1 external resource availability monitor (referred to in this document as “the monitor”), a software extension for RSF-1 clusters that monitors the end-point availability of clustered HA resources (NFS, SMB, etc) from the perspective of a consumer of those services on the network.

The monitor runs in a docker container on a host machine located anywhere on the network where monitoring the availability of cluster services is desired. Note that this host must have network access to the cluster resources being monitored as the docker container internally mounts any NFS or SMB shares to be monitored through the network stack of the docker host.

Once the desired resources have been configured they are continually monitored for availability. Should availability be lost (or gained in the case where they were lost) then the monitor is capable of sending a number of differing types of alerts depending upon what has been configured (alert types include SNMP, Email, Slack, Teams, etc).

Pictorially this can be viewed as:


2. Installation and Upgrade

2.1 Requirements

The resource availability monitor is delivered as a self-contained docker image using the Docker Content Trust (DCT) system to ensure the integrity and publisher of all the data downloaded. Installation of the image can be undertaken on any host running docker (note at present docker only supports IPv6 on hosts running Linux, therefore if IPv6 monitoring is required, i.e. monitoring of shared resources over IPv6, then a linux derivative should be chosen as the docker host).

In this document the host actually running the monitor container is referred to as the docker host. This host must have network access to any cluster resources to be monitored so it in turn can make those resources available to the container running the monitor itself.

For example, to monitor a clustered NFS share then the docker host must be able to mount and access that share, as the docker container will require access to successfully monitor that resources availability. Also note, the process running the monitor docker image must belong to the docker group on the host OS (this is a requirement of docker).

Running under docker, the monitor tool requires a persistent location to store its data, logs and ancillary files. This location is provided by the host OS and mapped to the directory /tmp/hamonitor when the container is started (see section 3 Starting the Monitor). This way existing configurations, logs and other components are preserved during upgrades, migrations etc.

2.2 Installation

The monitor is distributed from a docker content trust server and is installed as follows:

  1. Download and install the docker application package from www.docker.com onto the docker host machine. Start the docker daemon on the host machine.
  2. Enable docker trust; docker uses environment variables to modify its behaviour, therefore to enable trust set the following:
    Unix command shell:
    # export DOCKER_CONTENT_TRUST=1

    Windows power shell:
    $Env: DOCKER_CONTENT_TRUST=1
  3. Set the high-availability notary server for the trust system, again using an environment variable:
    Unix command shell:
    # export DOCKER_CONTENT_TRUST_SERVER=https://notary-server.high-availability.com:4443

    Windows power shell:
    # $Env:DOCKER_CONTENT_TRUST_SERVER="https://notary-server.high-availability.com:4443"
  4. In order to use the docker registry for trusted downloads it is necessary to have a user name/password combination - this should be requested using the email address docker-trust@high-availability.com.
  5. Using the user/password combination retrieved in the previous step, login to the docker framework:
    # docker login dkr.high-availability.com
    The docker login subcommand will prompt for a username and password. Note that once you have successfully logged into the server, docker saves a login token locally in the users home directory in the file .docker/config.json thereby avoiding the need for this user to login again. The token can be cleared using the docker logout subcommand.
  6. Inspect the list of signed monitor images available from the registry:
    # docker trust inspect --pretty dkr.high-availability.com/hamonitor

    here is some example output showing two signed versions of the monitor:
    Signatures for dkr.high-availability.com/hamonitor

    SIGNED_TAG DIGEST                                                        SIGNERS
    v1.0       46d706ebead9e7746b3c1ffcbc2247562d035a5eed85410dc54eebe5c1aed hacsigner
    v1.1       eaea423478652348753463487563487563324856473285763434876538475 hacsigner

    List of signers and their keys for dkr.high-availability.com/hamonitor

    SIGNER    KEYS
    hacsigner 346c3155f96c

    Administrative keys for dkr.high-availability.com/hamonitor

    Repository Key: 2058e5cfcc725b7b00607c60941e6105d5709ab20d56278ac3a4e1dd386c0

    Root Key:       a6ae8b3cc1ac73aa34d4237d6c0562fb5d69c5eaa3ab9a9e78b2ac3ecce93c39
  7. Download the desired version:
    # docker pull https://dkr.high-availability.com/hamonitor:v1.0

    Note that with the trust framework enabled, the docker command line tool takes care of validating the digest of the images downloaded and checks against the official signatures held by the notary server. Any image that has not been signed and verified will be blocked from download.

    The output from the docker pull should look similar to the following:
    Pull (1 of 1): dkr.high-availability.com/hamonitor:v1.0@sha256:46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed

    sha256:46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed:

    Pulling from hamonitor

    ab3acf868d91: Pull complete
    bf8f3d9e8100: Pull complete
    4cf71b2b4422: Pull complete
    668c80dc67a6: Pull complete
    1b527012fdfd: Pull complete
    ade8b6ab4354: Pull complete
    4849fab77f68: Pull complete
    ccedab781a09: Pull complete

    Digest: sha256:46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed

    Status: Downloaded newer image for dkr.high-availability.com/hamonitor@sha256:46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed

    Tagging dkr.high-availability.com/hamonitor@sha256:46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed as dkr.high-availability.com/hamonitor:v1.0

    dkr.high-availability.com/hamonitor:v1.0 docker pull dkr.high-availability.com/hamonitor:v1.0
  8. Confirmation of the downloaded image can be performed by comparing the digest of the local images with those held by the remote trust server (use the trust inspect docker subcommand as detailed earlier to retrieve the remote image digests). To list the digest of locally installed images use the command:
    # docker images --digests

    The output will look similar to the following - the third digest fields should correspond to the digest listed by the trust server:
    REPOSITORY                          TAG  DIGEST IMAGE ID CREATED SIZE

    dkr.high-availability.com/hamonitor v1.0  sha256:46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed
     fc254100c107 4 weeks ago 768MB

2.2.1 Offline (dark site) installation

To install the monitor on hosts that have no external internet connection (and thus cannot make a connection to the the docker trust server) necessitates a two step approach, the result of which creates an offline image which can then be used to install the monitor on any hosts, regardless of whether or not they have external connectivity.

The first step in creating the image is to designate a download host (with internet connectivity) that can be used to download the monitor as described in the previous section. Once that is accomplished the next step is to create an image that can be shipped to the non-connected hosts and installed locally. Creating an image is accomplished as follows:

  1. On the host where the monitor has been downloaded create an image of the monitor:
    # docker image save -o hamonitor_v1.0.tar dkr.high-availability.com/hamonitor:v1.0
  2. The newly created tar file (in this example hamonitor_v1.0.tar) can then be copied to any host and installed using the following command:
    # docker load -i hamonitor_v1.0.tar
  3. Finally, on the host where the image was loaded, check the image ID is the same, output will be similar to the following:
    # docker images —digests

    REPOSITORY                          TAG  DIGEST IMAGE ID     CREATED  SIZE                           
    dkr.high-availability.com/hamonitor v1.0 <none> fc254100c107 3 months 768MB

The monitor is now installed and ready to run.

2.2.2 Understanding the <none> digest column for loaded images

When an image is loaded from an image file (created using the docker image save command), the digest field will always be shown as <none>. To understand why this is first requires some understanding of where the digest field originates from.

When an image is pushed to a docker registry, the layers that go to make up that image are transferred over in an uncompressed format. When docker saves those layers, it saves them in a compressed format. Once all the layers that make up an image have been received and stored in compressed format, docker then creates an image manifest listing all the layers and a SHA256 checksum for each compressed layer. Once the manifest has been created a digest is calculated and the image tag is signed. The signatures and digest can be seen by inspecting the trust information for the image using the command:

# docker trust inspect --pretty dkr.high-availability.com/hamonitor:v1.0

The resulting output will look similar to the following:

Signatures for dkr.high-availability.com/hamonitor:v1.0

SIGNED TAG   DIGEST                                                             SIGNERS
v1.0         46d706ebead9e7746b3c1ffcbc2247562d038865a5eed85410dc54eebe5c1aed   hacsigner

List of signers and their keys for dkr.high-availability.com/hamonitor:v1.0

SIGNER      KEYS
hacsigner   96d8fb669c3e

Administrative keys for dkr.high-availability.com/hamonitor:v1.0

  Repository Key: 619faa1dc970583f7b366fe68ecfa48b5e6cd5b07ccea2647d5c0ab7bb50191e
  Root Key: 4dd62ed17d61204196017fff4c1e9f2ded9508d9bfbdf5321a10f401b555a414


The digest listed for the signed tag v1.0 is the same as the one shown when requesting digests for locally installed images using docker images --digests.

The process described so far details how an image is uploaded and verified to a docker registry using the trust framework. When that image is downloaded using docker pull, the trust chain is maintained because a signed digest for the manifest is available, meaning the manifest can be trusted (after login/key exchange etc.), and therefore it’s contents, and therefore the checksums for the compressed layers and so on.

However, when an image file is created from a locally installed image, the original manifest cannot be used because, firstly, it contains checksums for the compressed layers and locally generated image files contain uncompressed layers, but more importantly, any manifest shipped with an image has no way to establish trust of that manifest as it is not derived from the original docker registry and thus there is no way to verify it’s contents (most importantly the layer checksums).

Because of this, verification of an image file is done using the image ID field from the image once it has been installed. The image ID should correspond to the original image ID from the host where the image file was originally created. This is because the image ID is calculated by applying the algorithm (SHA256) to a layer's content and so long as that content has not changed from the originating host, then the ID’s will match and the image can be trusted.


3. Starting the monitor

3.1 Starting from the command line

Once the monitor image has been installed, use the docker run command to start the monitors container (and thus the monitor itself). Note, the user starting the container must belong to the docker group on the host OS.

When starting the container there are a number of required arguments:

# docker run \
    --detach \
    --name hamonitor \
    --net host \
    --privileged \
    --restart unless-stopped \
    --volume <host directory>:/tmp/hamonitor \
    dkr.high-availability.com/hamonitor:v1.0 \
    --publish 13514


Once the container has been started it will run unattended; if the docker host is restarted then the container will restart automatically.

Note that the --publish argument should always be last on the command line and that a suitable value for the <host directory> parameter should be provided (see the description below for more details). These arguments have the following effect:

--detach
Runs the monitor in the background.

--name hamonitor
Assigns a friendly name to the container that can then be used as a more memorable argument to other docker commands (such as docker start and stop) as opposed to the less memorable image ID that docker uses (i.e. fa37f3788bb3).

Furthermore, note that although an ID is unique to an installed image, a new image ID is generated on every upgrade, meaning any process that refers to a specific image ID will have to be modified should the image ID change (shell scripts for example). Using a friendly name avoids this problem.

--net host
Use the host’s network stack for the container.

--privileged
Gives
all capabilities to the container and also access to the host’s devices (those that reside under /dev).

--restart unless-stopped
Specifies the restart policy for the container, in this case always restart the container if it stops, unless it is manually stopped in which case it will not be restarted even if the docker daemon itself restarts (alternatively, if always is specified instead of unless-stopped then a stopped container will be restarted if the Docker daemon restarts). Also note that should the monitor process terminate for any reason the container itself will exit and docker will restart another instance of the container.

--volume <host directory>:/tmp/monitor
Maps the directory <host directory> on the docker host to the directory /tmp/hamonitor in the container.

The container directory portion of this mapping (/tmp/hamonitor) is used by the monitor to store all it’s permanent data (encrypted database, logs etc) and cannot be changed. The <host directory> portion should be any suitable local filesystem directory on the docker host; it is recommended this is a local filesystem as opposed to a remotely mounted one (SMB/NFS etc.) to avoid network outages etc. adversely affecting the running monitor.

By using a mapping from the docker host to the container rather than using a local filesystem within the container gives a number of advantages:

  • Upgrades can be performed without having to first backup data in the container (and then re-import afterwards).
  • The monitor data can be backed up without the need for the container to be running.
  • It simplifies migration of the container to another host

dkr.high-availability.com/hamonitor:v1.0
The name of the docker image to run.
Note the version number (v1.0) should correspond to the version downloaded.

--publish <port-no>
The TCP port that the monitors REST API listens on for incoming requests; port 13514 is used by default if one is not provided on the command line.

 3.2 Create a start script

To simplify starting the monitor, the command line can be saved as a shell script and run as a single command. For example, save the following to a file:

#!/bin/sh
docker run \
    --detach \
    --name hamonitor \
    --net host \
    --privileged \
    --restart unless-stopped \
    --volume <host directory>:/tmp/hamonitor \
    dkr.high-availability.com/hamonitor:v1.0 \
    --publish 13514

Then set execute permission on the file and start the monitor by running the newly created script (if the file saved to is start-monitor.sh then):

# chmod +x start-monitor.sh
# ./start-monitor.sh

4. Initial configuration

Once the monitor is installed and running, the next step is configuration. The first task is to create an administrator. This must be done before any other operations are performed as adding monitored resources, creating alerts, user management etc., can only be performed by a user with the administrator privilege. Configuration is done via the hamonitor command line utility, located in the docker image as /usr/bin/hamonitor.

4.1 Check progress by watching the log file

The log file records all actions taken by the monitor when configuration is undertaken and during the monitoring process itself (recording lost and regained connections, alerts sent, etc.). It is therefore useful to monitoring the contents of the log file during any configuration to assist in debugging.

The log file is held in the shared volume specified with the --volume argument when the monitor is started. The log file name is hamonitor.log, therefore in the docker container the full path is /tmp/hamonitor/hamonitor.log.

The log file can also be accessed from the docker host using the host directory path supplied to the volume argument, by appending hamonitor.log to the end of the path.

4.2 Create an administrator

  1. Start a shell connected to the docker container:
    # docker exec -it hamonitor bash
  2. Issue the user create command:
    # hamonitor user create
  3. When prompted fill in the details for the administrator:

    Enter username: admin
    Enter password: [hidden]
    Verify password: [hidden]
    Enter real name [None]: admin
    Enter email address [None]: someone@some.domain.com
    Available roles: 0 (view-only), 1 (operator), 2 (admin)
    Enter role [0]: 2

    Initial admin user successfully created.  
    The API will restart now to enforce security.  
    User creation is now limited to admin roles.  
    Please log in as the new user

  4. From now on you will need to login to the monitor in order to perform any operations:
    # hamonitor login
    Enter URL [https://localhost:13514 if empty]:
    Enter Username: admin
    Enter Password: ******
    Welcome admin

Once the administrator is created resources to be monitored can be added.

4.3 Add a monitored resource

  1. From a shell connected to the docker container issue the resource add command:
    # hamonitor resource add
  2. Fill in the resource details:
    Enter protocol [NFS, SMB]: NFS
    IP address of NFS server: 2001:efca:56e4::70e7:fd8:999
    Path of the NFS share: /tank
    Mount options [return for None]:

    User name [return for None]:
    Password [return for None]:

    Resource created, ID is 1

This creates a monitored NFS resource and assigned it the unique ID 1 within the monitor framework. The ID is then used by the hamonitor to refer to this resource for all operations (such as adding or removing an alert).

To view configured resources use the resource list command:

# hamonitor resource list
[{

    "path": “/tank”,
    "ip": "[2001:efca:56e4::70e7:fd8:999]",
    "protocol": “nfs",
    "enabled": true,
    "creationDate": "2020-08-26T09:48:08+00:00",
    "notifications": {
      "slack": false,
      "teams": false,
      "email": false,
      "snmp": false
    },
    “resourceid”: 1
}]

In the above listing all alert notifications are set false. The next step is to configure and enable an alert for this resource.

4.4 Create an alert

Once a resource has been configured, alerts can then be associated with it. In this step an email alert is added and associated with the resource created previously. Note that the resource ID (in this case 1) is used to link the alert to the resource being monitored.

  1. From a shell connected to the docker container issue the following command:
    # hamonitor alert email create -id 1
  2. Fill in the alert details from the prompts:
    This alert method only supports authenticated email delivery over TLS.

    Enter SMTP server address (MX:PORT): mx1.yourcompany.com:587

From now on any changes in the availability of this resource will generate an email alert. Other alerts can be added as required.


5. Users and roles

5.1 User authentication

Before performing any operations on the monitor using the CLI or the REST API, it is necessary to authenticate as a user of the system. The monitor uses a role base access control approach with the administrator role providing the most access (when the monitor is first installed and configured an administrator user is created).

5.2 Available roles

There are three roles that can be assigned to users:

Role ID Description
View only 0 Basic access only. Check status of resources and alerts only.
Operator 1 Same access as view only but also the ability to enable/disable alerts.
Administrator 2 No restrictions.


5.3 Logging into the monitor

To authenticate to the monitor use the following command:

# hamonitor login

You will be prompted to enter a valid URL to connect to (defaulting to localhost if run inside the docker image or on the docker host), followed by user name and password. Upon successful login, the monitor issues the following response:

# hamonitor login
Enter URL [https://localhost:13514 if empty]:
Enter Username: admin
Enter Password:
Welcome admin


5.4 Creating new users

Only users with the administrator role are able to create new users (who can in turn be assigned the administrator role). The monitor will enforce at least one user having administrator role and will prevent any attempt to delete an administrative user if there are no other users with that role.

To create a new user enter the following command:

# hamonitor user create

Here is an example of the creation of a user with operation role:

Enter username: oper
Enter password: [hidden]

Verify password: [hidden]
Enter real name [None]: Operator
Enter email address [None]: operations@some.domain.com
Available roles: 0 (view only), 1 (operator), 2 (admin)
Enter role [0]: 1
User oper successfully created

6. CLI reference guide

The utility hamonitor is used to perform all monitor actions. It is supplied as part of the docker image. To run the utility first gain shell access to the running docker instance:

# docker exec -it hamonitor bash

The hamonitor utility is self-documented; typing any command or subcommand with no arguments produces a help summary:

# hamonitor
NAME:
   hamonitor - RSF-1 shared resources monitor

USAGE:
   hamonitor [global options] command [command options] [argument...]

VERSION:
   1.6.15

COMMANDS:
   alert       Alert management
   login       Login to RSF-1 shared resource monitor
   logout      Sign out of RSF-1 shared resource monitor
   resource    Resource management
   user        User management
   help, h     Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --help, -h     show help

 


6.1 User management

To administer users the user subcommand is used. It allows for user creation, deletion and listing existing users (modification is not supported in this release).

6.1.1 User addition

To create a new user, enter the subcommand:

# hamonitor user create
Enter username: oper
Enter password: *********

Verify password: *********
Enter real name [None]: Operator
Enter email address [None]: ops@example.com
Available roles: 0 (view only), 1 (operator), 2 (admin)
Enter role [0]: 1
User oper successfully created

6.1.2 Listing users

To show a list of configured users enter the subcommand:

# hamonitor user list

Note that the resulting list is provided in JSON format; to make it more readable, pipe the output through jq (a utility to print JSON into a more human readable format, shipped with the docker image). Running the above command through the jq utility results in:

# hamonitor user list | jg
[

  {
    "userid": 1,
    "creation_time": "1600692753.915523",

    "username": "admin",
    "realname": "",
    "password": "<*** HIDDEN ***>",
    "role": 2,
    "enabled": "True"
  },
  {
    "userid": 2,
    "creation_time": "1600701250.404133",
    "username": "user",
    "realname": "",
    "password": "<*** HIDDEN ***>",
    "role": 1,
    "enabled": "True"
  }
]


6.1.3 Deleting a user

To delete a user enter the subcommand:

# hamonitor user delete
Enter userid: 2
Do you really want to remove user 2? [y/n]: y
User successfully deleted


Note that users are referenced by their userid (shown by the user list subcommand).


6.2 Logging in and out

Before any changes are made to the monitor, it is necessary to authenticate using the login command. The connection URL uses localhost and port 13514 by default - an alternative can be entered when logging in:

# hamonitor login
Enter URL [https://localhost:13514 if empty]:
Enter Username: admin
Enter Password:

Welcome admin

To logout issue the logout subcommand:

# hamonitor logout

6.3 Resource management

Resources are managed using the hamonitor resource command.

6.3.1 Adding a monitored resource

To configure a monitored resource, use the resource add subcommand:

# hamonitor resource add
Enter protocol [NFS, SMB]: SMB
IP address of the SMB server: 192.168.4.1
SMB share name: Scratch
Mount options [return for None]:

User name [return for None]: system
Password [return for None]:

Do you want to test the connection now? [y/n]y
Test successful.

201: Created

6.3.2 Removing a monitored resource

To remove a resource from the monitor use the resource remove subcommand:

# hamonitor resource remove --id 1
200: OK

Any alerts associated with the resource are also removed.

6.3.3 Enabling and Disabling a monitored resource

The active monitoring state of a resource can be toggled between enabled or disabled. When a resource is first added its monitor state is enabled. To suspend monitoring without removing the resource entirely (say for example the resource is going offline for maintenance) its state can be set to disabled in the monitor. Reinstate monitoring for a resource by enabling it.

To disable a resource use the resource disable subcommand:

# hamonitor resource disable --id 1
200: OK

To enable a resource use the resource enable subcommand:

# hamonitor resource enable --id 1
200: OK

6.3.4 Listing resources being monitored

To list all resources configured use the resource list subcommand:

# hamonitor resource list
...

The list is reported back as a JSON object; pipe it through jq for a more human readable form.

6.3.5 Status of an individual resource

To check the status of individual resources use the resource status subcommand:

# hamonitor resource status --id 1 | jq
{
  "path": "Scratch",
  "ip": “192.168.4.1”,
  "protocol": "SMB",
  "enabled": true,
  "creationDate": "2020-09-22T12:16:54+00:00",
  "notifications": {
    "slack": false,
    "teams": false,
    "email": false,
    "snmp": false
  }
}

6.4 Alert management

The monitor supports several types of alerts. Alerts are configured on a per-resource basis so each resource has it’s own alert schema. Alerts are bound to resources using the mandatory --id argument when adding an alert.

6.4.1 Adding an email alert

To configure an email alert, a valid SMTP server is required along with a user name and password. The monitor only supports email delivery over an encrypted TLS connection. The (optional) TLS port on the email server is specified when the server address is entered:

# hamonitor alert email create --id 5
This alert method only supports authenticated email delivery over TLS.

Enter SMTP server address (MX:[PORT]): mx1.yourcomany.com:587
Enter SMTP server user name: realuseraccount@yourcompany.com
Enter SMTP server user password:
Verify password:
Enter email FROM address: alerts_sender_alias@yourcompany.com
Enter email TO address: alerts_manager_alias@yourcompany.com

201: Created

6.4.2 Adding a Slack alert

A slack alert will update a slack channel with events for which the resource has been configured. Before creating an alert a slack webhook is required. Webhook's are created using the slack application itself, please see the slack documentation for how to create a suitable webhook.

Once a webhook link has been generated it is then used as part of the URL when creating the alert:

# hamonitor alert slack create --id 5
Enter Slack hook URL: https://hooks.slack.com/services/<link>
201: Created

An alert published to slack will be similar to:

Resource OFFLINE: nfs 192.168.22.6 /pool/nfs-share


6.4.3 Adding a Microsoft Teams alert

A teams alert will update a teams channel with events for which the resource has been configured. Before creating an alert a teams webhook is required. Webhook's are created from the teams application itself, please see the teams documentation for how to create a suitable webhook.

Once a webhook link has been generated it is then used as part of the URL when creating the alert:

# hamonitor alert teams create --id 5
Enter Teams hook URL: https://outlook.office.com/webhook/<link>
201: Created

An alert published to teams will look similar to this example:

Resource OFFLINE: smb 10.6.11.12 /pool/smb-share

6.4.4 Adding an SNMP alert

To add an SNMP alert the IP address of an SNMP manager is added to the resource being monitored:

# hamonitor alert snmp create --id 5
Enter SNMP manager address: 10.5.14.22
201: Created

An SNMP MIB file for the monitor is shipped with the docker image in /root/RSF-MIB.txt.


6.5 Changing the HTTPS authentication certificate

The CLI communicates with the monitor using its REST API over HTTPS (TLS version 1.3). A self signed certificate is shipped in the docker image as /root/cert.pem.

A site specific certificate can be used instead by installing it in the shared host directory as cert.pem. It is necessary to restart the docker container in order for it to pick up the new certificate, on the docker host run:

# docker restart hamonitor

7. Troubleshooting

7.1 Configuring resources or alerts results in a “403: Forbidden” response

When creating or modifying any form of resource or alert, if the response returned back from the monitor is “403: Forbidden this indicates that the CLI is not authenticated to the monitor (or a previous authenticated connection has timed out). To resolve simply login to the monitor, for example:

# hamonitor resource add
Enter protocol [NFS, SMB]: SMB

IP address of the SMB server: 10.6.4.68
SMB share name: Acc_mnt
Mount options [return for None]:
Username [return for None]: admin
Password [return for None]:
Do you want to test the connection now? [y/n]y
403: Forbidden

# hamonitor login
Enter URL [https://localhost:13514 if empty]:

Enter Username:
Enter Password:
Welcome admin

# hamonitor resource add
Enter protocol [NFS, SMB]: SMB

IP address of the SMB server: 10.6.4.68
SMB share name: Acc_mnt
Mount options [return for None]:
Username [return for None]: admin
Password [return for None]:
Do you want to test the connection now? [y/n]y
Test successful
201: Created

7.2 Adding an SMB resource results in an unsuccessful test

If, at the final stage of adding an SMB resource to be monitored, the connection test fails then the underlying cause is likely to be recorded on the SMB server itself in system specific message files.

For example, in the case where the share name is incorrect you would see a log entry similar in format to:

Dec 9 12:14:35 NOTICE: smbd[MG\guest]: smb share not found

Or for an authentication issue:

Dec 9 12:19:22 NOTICE: smbd[MG\guest]: pool1_smb access denied: guest disabled.

 

Author: Paul Griffiths-Todd
Last update: 2021-04-30 17:09


Documentation

Licensing RSF-1

Licensing in RSF-1 has been made very simple and can be handled almost entirely by the browser based GUI. The following sections describe the main licensing changes that you can achieve through the GUI:

Initial Evaluation Licences

When RSF-1 is first installed on a pair of servers, it will be unlicensed. This can be seen clearly during cluster creation, where any unlicensed node is marked as such:

If any unlicensed nodes are selected for cluster creation, a further "Licensing" dialogue will be shown. This will prompt for a valid email address and using the "Licence" button, that email address will be used to request a set of RSF-1 licences:

When the licensing has been completed, the dialogue will change to show that and the option to create a cluster will be enabled:

Once the cluster is created, the same GUI can be used to check the licence status. The "Licensing" tab displays the licence status of each node in the cluster, where the status includes the licence type (Temporary or Permanent) and the expiry time/date:

Extension of Evaluation Licences

Initial evaluation licences are valid for 45 days and are intended to facilitate the testing and evaluation of RSF-1. After the initial evaluation period, the licences will be shown as "Expired", which will be shown in the "Licensing" tab:

If after this time, more testing time is needed, new evaluation licences can be arranged by contacting High Availability at support@high-availability.com. When the licence extension is approved, new licences can be installed using the "Re-Issue" button on the "Licensing" tab of the GUI. In this case, a new dialogue will be shown, prompting for a valid email address:

After the re-licensing is completed, the GUI will still show the expired licence status. RSF-1 reads its licence every hour on the hour, so unless the rsf-1 service is restarted on each node, the licence status will be updated within one hour:

Upgrading to Permanent Licences

When sufficient testing and evaluation has been completed, permanent cluster licences can be purchased by contacting sales@high-availability.com. When the purchase is completed, new permanent licences will be issued by High Availability, and these licences can be installed in the same way as replacement evaluation licences. From the "Licensing" tab in the GUI, it can be seen that the current licences are temporary:

Clicking either of the "Re-Issue" buttons will result in a new dialogue, prompting for a valid email address:

After the re-licensing is completed, the GUI will still show the temporary licence status. RSF-1 reads its licence every hour on the hour, so unless the rsf-1 service is restarted on each node, the licence status will be updated within one hour. When it is updated, it will show permanent licences for both nodes:

Licence Installation for Dark Sites

In the previous sections it was shown that licence requests always require a valid email address. In addition to the licences automatically being installed on cluster nodes, the same licence files are sent to the provided email address. Where the network topology or firewall does not allow access to the High Availability licensing server from cluster nodes, it is possible to install licences manually using the files attached to the licence email.

To request licences for a cluster, contact support@high-availability.com and include the following details:

  • Platform (OmniOS, Ubuntu, etc.)
  • hac_hostid from all cluster nodes
    • hac_hostid can be found by running /opt/HAC/bin/hac_hostid and will usually be the same as the system UUID

When the licences have been generated they will be returned in an email. Attached to the email will be:

  • licences.txt
    • Contains licence strings for all cluster nodes
    • Needs to be installed to /opt/HAC/RSF-1/etc/licences.txt
  • passwd
    • Contains login information for default RSF-1 users _rsfadmin and _rsfconfig
    • Needs to be installed to /opt/HAC/RSF-1/etc/passwd
  • passwd.config
    • Contains authentication information needed for distributing cluster configuration changes
    • Needs to be installed to /opt/HAC/RSF-1/etc/passwd.config
  • install_lic.sh
    • Small script that contains all data from the other files
    • This can be used as a simple way to install licences and passwd/passwd.config - just copy install_lic.sh to both nodes and run to install files

Author: Matt Youds
Last update: 2021-03-24 11:55


Creating a Cluster

Requirements for Node Discovery

Cluster creation will always involve the local node and one or more remote nodes. To simplify the process and to ensure the selected remote nodes are indeed unclustered RSF-1 nodes, the first step in cluster creation is the discovery of nodes.

Firewall Rules

Node discovery is achieved using a network UDP broadcast on RSF-1's registered port. Any machines that respond to the broadcast in an expected manner, are RSF-1 nodes.

Because of this, it is important that any firewalls allow broadcast packets and allow communications to and from port 1195. Once the cluster is created, RSF-1 communicates over port 1195 using both TCP and UDP, so it is important that firewall rules allow both protocols.

Hostname Resolution

When nodes are discovered, they report their "management" IP address, which will then be used for cluster creation. The IP address reported is the one associated with the machine's hostname, so it is important to set that to an appropriate value in /etc/hosts. A common issue with node discovery is when a node's hostname is associated with the loopback address 127.0.0.1. The result would be each node reporting its own address as 127.0.0.1.

Before attempting to discover nodes, ensure /etc/hosts on each node contains entries for all nodes that are to be clustered, and that all of those entries are for external IP addresses, not loopback.

If any changes are made to the hosts file, restart the main "rsf" service. rsfmon resolves its own hostname when it starts and will only re-resolve when it is restarted.

Note for NappIt clusters:

The installation of NappIt will add hosts file entries for the local hostname and set them to 127.0.0.1. This is done to optimise communications by NappIt but interferes with the functioning of the cluster. If NappIt is installed after RSF-1, ensure that those additional entries are removed from /etc/hosts.

Cluster Creation from the Browser

After installing the RSF-1 package on all nodes, the cluster can be created from a web browser. The RSF-1 webapp is available on all nodes with RSF-1 installed and can be accessed using HTTPS port 4330. If the webapp cannot be accessed on port 4330, ensure any firewall allows access to that port and also make sure the "rsf-rest" (or "rsfrestapi" for FreeBSD clusters) service is running.

When accessing the webapp for the first time, a dialogue will be shown asking password for the "admin" user:

After a password has been set, it must be used to log in:

Once logged in, the only available option is to proceed to cluster creation:

The cluster creation page will show all available RSF-1 nodes that can be used to create the cluster. The local node (the node providing the webapp that is currently in use) is always selected by default and cannot be deselected. All other nodes can be selected for addition to the cluster.

If any of the selected nodes are unlicensed, a new dialogue will be shown and will provide a guide for requesting licences. For more information about licensing, see the chapter Licensing RSF-1.

When all nodes are successfully licensed, it will be possible to click the "Create Cluster" button. The following dialogues will be shown:

When the cluster has been created, its status will be shown in the cluster Dashboard:

Cluster Creation using the Command Line

All RSF-1 command-line tools are located in /opt/HAC/RSF-1/bin and /opt/HAC/bin, so is it helpful to add those directories to the PATH environment variable:

[root@CentOS8-1 ~]# grep RSF-1 .bash_profile 
PATH=$PATH:$HOME/bin:/opt/HAC/bin:/opt/HAC/RSF-1/bin
[root@CentOS8-1 ~]#

Cluster creation can then by completed using the hacli command.

Before the cluster can be created, both nodes must have valid licences installed. For more information about installing licences, see the following page:

Admin User Creation

If this is the first time the cluster has been created, it will be necessary to first create an admin user:

[root@CentOS8-1 ~]# hacli user create
Enter username: admin
Enter password: 
Verify password: 
Enter real name [None]: 
Enter email address [None]: 
Available roles: 0 (view-only), 1 (operator), 2 (admin)
Enter role [0]: 2
{
  "timeout": 4,
  "errorMsg": "",
  "execTime": 0.024,
  "error": false,
  "output": "Initial user admin successfully created"
}
2021/03/24 11:20:26 HAC REST API has been restarted and cluster security has been enabled.
[root@CentOS8-1 ~]#

The admin user has now been created. To proceed, login using that admin user:

[root@CentOS8-1 ~]# hacli login
Enter URL [https://localhost:4330 if empty]: 
Enter Username: 
Enter Password: 
[root@CentOS8-1 ~]#

Node Discovery

Before creating the cluster, it is possible to "discover" nodes on the network that are available for clustering. This step is not required for cluster creation but it can be useful as a way of gathering information and anticipating any problems with cluster creation:

[root@CentOS8-1 ~]# hacli node discover
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 2.019,
  "error": false,
  "output": [
    {
      "Node": "CentOS8-1",
      "Release": "4.18.0-193.28.1.el8_2.x86_64",
      "System": "CentOS Linux release 8.2.2004 (Core)",
      "expireTime": "1970-01-01T00:00:00Z",
      "guid": "17eb0a69-85e0-be49-94c4-e037652c084d",
      "ipAddress": "192.168.100.132",
      "isLicenced": true,
      "islocal": true,
      "licenseOS": "Linux",
      "licenseStatus": "This copy of RSF-1 is licensed for automatic service startup"
    }
  ]
}
[root@CentOS8-1 ~]#

In this example only the local node is discovered. This is an indication that either:

  • The remote node is not running RSF-1
  • There is a problem with the /etc/hosts file - this is usually caused by the local node being listed as 127.0.0.1
  • There is a network/firewall problem preventing communications between the two nodes via port 1195

For more details about networking requirements, see the above section Requirements for Node Discovery

When any network issues are resolved, the node discovery shows:

[root@CentOS8-1 ~]# hacli node discover
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 2.034,
  "error": false,
  "output": [
    {
      "Node": "CentOS8-2",
      "Release": "4.18.0-193.28.1.el8_2.x86_64",
      "System": "CentOS Linux release 8.2.2004 (Core)",
      "expireTime": "1970-01-01T00:00:00Z",
      "guid": "bc0641e1-e463-a245-adb3-d493c36c2dfc",
      "ipAddress": "192.168.100.131",
      "isLicenced": false,
      "islocal": false,
      "licenseOS": "Linux",
      "licenseStatus": "CentOS8-2 does not have a valid licence"
    },
    {
      "Node": "CentOS8-1",
      "Release": "4.18.0-193.28.1.el8_2.x86_64",
      "System": "CentOS Linux release 8.2.2004 (Core)",
      "expireTime": "1970-01-01T00:00:00Z",
      "guid": "17eb0a69-85e0-be49-94c4-e037652c084d",
      "ipAddress": "192.168.100.132",
      "isLicenced": true,
      "islocal": true,
      "licenseOS": "Linux",
      "licenseStatus": "This copy of RSF-1 is licensed for automatic service startup"
    }
  ]
}
[root@CentOS8-1 ~]#

It is clear that while one remote node can be seen on the network, it has not been licensed.

Once any node discovery issues have been resolved, the cluster can be created.

Cluster Creation

Cluster creation using hacli can be achieved using the following subcommand:

NAME:
   hacli cluster create - Create a new RSF-1 HA cluster

USAGE:
   hacli cluster create [command options] [arguments...]

OPTIONS:
   --name value, -n value   Name of the cluster. Required argument.
   --nodes value, -s value  Comma-separated list of nodes. Required argument.
   --desc value, -d value   Description of the cluster. Optional argument.
   --fcmon                  Enables fibre channel monitoring. Disabled by default.
   --nethbs value           Manual network heartbeats. Optional argument.
                            I.e: A:B (2 nodes) or A:B,B:C,A:C (3 nodes)
   --nonetmon               Disables network interfaces monitoring. Enabled by default.
   --serial                 Enables serial heartbeats. Disabled by default.
   --force                  Forced node entries. This option should not be used unless suggested by HAC support. Disabled by default.
   
2021/03/24 11:50:19 name and nodes are both required.

To create a cluster equivalent to the cluster created in the graphical webapp, only the cluster name and a list of nodes need to be entered, with an optional cluster description:

[root@CentOS8-1 ~]# hacli cluster create --name "HA-cluster" --desc "ZFS cluster" --nodes "CentOS8-1,CentOS8-2"
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 8.36,
  "error": false,
  "output": "Cluster successfully created"
}
[root@CentOS8-1 ~]#

Additional Cluster Creation Arguments

There are however several options available from the command line that are not available from the browser. These are:

  • --fcmon
    • This enables the monitoring of FiberChannel target mode ports. This monitoring can block a service startup and potentially trigger a controlled failover if RSF-1 detects all target mode FiberChannel ports are offline.
    • Note that FiberChannel ports in initiator mode are node monitored.
  • --nonetmon
    • This disables the monitoring of network interfaces associated with VIPs. By default, RSF-1 monitors the network interfaces used to plumb in VIPs and if those interfaces become unavailable, HA services are marked as blocked and will potentially failover to another cluster node. This option will disable that feature.
  • --nethbs
    • By default, when a cluster is created it will be configured with one network heartbeat between the management IP addresses of each pair of nodes. If there are more than 2 nodes in the cluster and there is no need to configure heartbeats between every pair of nodes, then the --nethbs argument can be used to specify a different network heartbeat configuration.
  • --serial
    • Serial heartbeats are disabled by default. This option can be used to enable a serial heartbeat between cluster nodes.
    • The serial heartbeat will be in addition to the mandatory network heartbeat.
  • --force
    • This option is only recommended for use in special conditions. It disables the "node validation" step taken during cluster creation. By default when a cluster create command is issued, RSF-1 will attempt a node discovery and compare the result with the list of nodes passed to the cluster create command. This is an important step because it ensures that any given node names are valid RSF-1 nodes.
      The node discovery works by issuing a UDP broadcast packet on port 1195. If the network configuration is such that the broadcast is not allowed, then the node discovery will return no nodes and the cluster creation will fail.
      The --force option disables the validation step and allows the cluster to be created without performing a node discovery.

Non-Standard Cluster Create Examples

Author: Matt Youds
Last update: 2021-03-24 17:45


Heartbeats

The functioning of an RSF-1 cluster relies on each node gathering detailed information about the state of every other node in the cluster. Each node sends status information to other nodes at regular intervals. This stream of data from one node to another is called a heartbeat.

Because heartbeats are unidirectional, they are always created in pairs so that any heartbeat medium is used by two heartbeats (node 1 --> node 2 and node 2 --> node 1).

When a cluster is initially created, a pair of heartbeats is automatically added between each pair of nodes. This initial set of heartbeats is sufficient to allow the cluster to perform all of its functionality, but to make the cluster as robust as possible, it is always recommended to add additional heartbeats. Additional heartbeats will allow the cluster to differentiate between a node going down and the failure of a heartbeat due to a network issue.

For an automatic failover of a service to be triggered, all heartbeats from the active node must go down.

Heartbeat Types

There are three types of heartbeats that can be configured in a cluster. Each heartbeat type delivers exactly the same status information from one node to another so the choice of heartbeat type for a particular cluster depends entirely on the hardware and communication channels available.

It is recommended to always configure at least two different types of heartbeat in a cluster. The use of multiple heartbeat types protects against unnecessary failovers caused by failure of a particular heartbeat type. E.g. problems in the network stack could potentially affect all network heartbeats but would not affect disk heartbeats.

Network Heartbeats

Network heartbeats are the most common type used. As long as the network configuration and any firewall allows for UDP communication between cluster nodes, it will be possible to configure network heartbeats. A network heartbeat pair is also configured by default during cluster creation.

Any number of network heartbeat pairs can be configured in a cluster and the heartbeat itself is a very small amount of data transferred every second, so it is generally recommended to add a network heartbeat pair for every network interface on the cluster nodes. This means a problem with any single network interface or switch will not cause a failover to be triggered unnecessarily.

Disk Heartbeats

Disk heartbeats use fixed block offsets on shared disk devices to transfer status information from one node to another. For a disk heartbeat pair, two block offsets will be configured for use - one block for each heartbeat direction.

In the case of ZFS clusters, a number of disk heartbeats are added whenever a ZFS pool is added to the cluster. Pool disks are used for heartbeats so it is not necessary to maintain dedicated heartbeat disks.

Serial Heartbeats

Serial heartbeats utilise RS232 serial devices to transfer cluster status information from one node to another via a serial crossover cable.

Configuring Network Heartbeats from the Browser

This section assumes there is a cluster already created on a set of nodes. For more information about setting up a cluster, see Cluster Creation.

To add additional network heartbeats to an RSF-1 cluster, navigate to the webapp's "Heartbeats" tab. The heartbeats display should show all heartbeats that are already configured in the cluster. For a new 2-node cluster, that will be a single network heartbeat pair between the management addresses of the nodes.

This shows two network heartbeats:

  1. CentOS8-2 --> CentOS8-1 using the address obtained by resolving the hostname CentOS8-1
  2. CentOS8-1 --> CentOS8-2 using the address obtained by resolving the hostname CentOS8-2

To add a new set of heartbeats, click "Add Network Heartbeat Pair" and the following dialogue will be shown:

The drop-down menus in the first heartbeat line allow selection of the nodes that are to be involved in this heartbeat pair. For a 2-node cluster these will already be populated. If there are more than 2 nodes, the appropriate pair of nodes should be selected.

The last column allows entry of a hostname or IP address. This is the address that should be used for sending the heartbeat packets and should already be plumbed in and available on the target node (selected in the "To" column).

Note that if a hostname is to be used rather than an IP address, the hostname should be defined in /etc/hosts on both nodes. This is to avoid problems with failover due to connectivity issues with DNS servers.

Before the heartbeat is created, a confirmation will be shown:

When the heartbeat pair have been created successfully, they will be shown in the heartbeats display:

Configuring Network Heartbeats using the Command Line

This section assumes there is a cluster already created on a set of nodes. For more information about setting up a cluster, see Cluster Creation.

The existing heartbeat status can be obtained using "rsfcli status". In the case of a new 2-node cluster with no services, there will be a single network heartbeat pair between the management addresses of the nodes:

[root@CentOS8-1 ~]# rsfcli stat
Contacted 127.0.0.1
 Found cluster "HA-Cluster", CRC = 0xcc86, ID = <none>
 - - - - - - - - - - - - - - - - - - - -
Hosts:

 CentOS8-1 (192.168.100.132) UP, service startups enabled
 RSF-1 release 1.4.1 (built on 23-Feb-2021-16:44)

 CentOS8-2 (192.168.100.131) UP, service startups enabled
 RSF-1 release 1.4.1 (built on 23-Feb-2021-16:44)

 2 nodes configured, 2 online.
 - - - - - - - - - - - - - - - - - - - -
Services:
 0 services configured
   0 service instances stopped
   0 service instances running
 - - - - - - - - - - - - - - - - - - - -
Heartbeats:

 00 net CentOS8-1 -> CentOS8-2 [192.168.100.131]: Up, last heartbeat #82 Mon 2021-03-29 10:47:30 BST
 01 net CentOS8-2 -> CentOS8-1 [192.168.100.132]: Up, last heartbeat #81 Mon 2021-03-29 10:47:32 BST
2 cluster heartbeats configured, 2 up, 0 down
 - - - - - - - - - - - - - - - - - - - -

Errors:
No errors detected

[root@CentOS8-1 ~]#

This shows two network heartbeats:

  1. CentOS8-1 --> CentOS8-2 using the address obtained by resolving the hostname CentOS8-2
  2. CentOS8-2 --> CentOS8-1 using the address obtained by resolving the hostname CentOS8-1

To add a new set of heartbeats, use the "hacli heartbeat create" subcommand:

[root@CentOS8-1 ~]# hacli heartbeat create --help
NAME:
   hacli heartbeat create - Create a new heartbeat between two cluster nodes.

USAGE:
   hacli heartbeat create [command options] [arguments...]

OPTIONS:
   --type value   Hearbeat type: network or disc. Required argument.
   --devid value  Device/disc ID. Disc heartbeats only.
   --nodes value  Pair of nodes used for the network heartbeat.
                  Required argument for 3+ node clusters
                  or private network heartbeats.
                  Usage:
                  Network heartneat using public IPs:
                    <nodeA>,<nodeB>
                  Network heartbeat using private IPs:
                    <nodeA>#<privateAddrA>,<nodeb>#<privateAddrB>
   
[root@CentOS8-1 ~]#

First, log in to RSF-1:

[root@CentOS8-1 ~]# hacli login
Enter URL [https://localhost:4330 if empty]: 
Enter Username: 
Enter Password: 
[root@CentOS8-1 ~]#

Ensure any hostnames that are to be used for heartbeats are defined in /etc/hosts:

[root@CentOS8-1 ~]# grep priv /etc/hosts
10.50.50.31	CentOS8-1-priv
10.50.50.32	CentOS8-2-priv
[root@CentOS8-1 ~]#

Now use hacli to create a network heartbeat pair. The heartbeats will be:

  1. CentOS8-1 --> CentOS8-2 @ CentOS8-2-priv (10.50.50.32)
  2. CentOS8-2 --> CentOS8-1 @ CentOS8-1-priv (10.50.50.31)
[root@CentOS8-1 ~]# hacli heartbeat create \
> --type network \
> --nodes CentOS8-1#CentOS8-1-priv,CentOS8-2#CentOS8-2-priv
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 4.084,
  "error": false,
  "output": "Heartbeat successfully created"
}
[root@CentOS8-1 ~]#

The new heartbeat status can then be shown using "rsfcli status" as before:

[root@CentOS8-1 ~]# rsfcli status
Contacted 127.0.0.1
 Found cluster "HA-Cluster", CRC = 0x2f81, ID = <none>
 - - - - - - - - - - - - - - - - - - - -
Hosts:

 CentOS8-1 (192.168.100.132) UP, service startups enabled
 RSF-1 release 1.4.1 (built on 23-Feb-2021-16:44)

 CentOS8-2 (192.168.100.131) UP, service startups enabled
 RSF-1 release 1.4.1 (built on 23-Feb-2021-16:44)

 2 nodes configured, 2 online.
 - - - - - - - - - - - - - - - - - - - -
Services:
 0 services configured
   0 service instances stopped
   0 service instances running
 - - - - - - - - - - - - - - - - - - - -
Heartbeats:

 00 net CentOS8-1 -> CentOS8-2 [192.168.100.131]: Up, last heartbeat #55 Mon 2021-03-29 11:05:01 BST
 01 net CentOS8-1 -> CentOS8-2 (via CentOS8-2-priv) [192.168.100.131]: Up, last heartbeat #55 Mon 2021-03-29 11:05:02 BST
 02 net CentOS8-2 -> CentOS8-1 [192.168.100.132]: Up, last heartbeat #55 Mon 2021-03-29 11:05:04 BST
 03 net CentOS8-2 -> CentOS8-1 (via CentOS8-1-priv) [192.168.100.132]: Up, last heartbeat #55 Mon 2021-03-29 11:05:04 BST
4 cluster heartbeats configured, 4 up, 0 down
 - - - - - - - - - - - - - - - - - - - -

Errors:
No errors detected

[root@CentOS8-1 ~]#

The above output now shows 4 heartbeats; the original heartbeat pair using management addresses, and the new heartbeat pair.

Configuring Disk Heartbeats from the Browser

The steps that follw make the following assumptions:

  • A cluster has already been created on a set of nodes. For more information about creating a cluster using the command line, see Cluster Creation
  • An HA service has already been created in the cluster. For more information about creating a ZFS HA service using the command line, see Service Creation
    An HA service must be created before adding disk heartbeats because the heartbeat disks will be components of the service storage. In the case of ZFS clusters, the heartbeat disks will always be pool disks, so there must already be a pool in the cluster.

Note that some disks need to be used for SCSI reservations (disk fencing) so it is important not to configure heartbeats on all disks in a pool.

To add additional disk heartbeats from the webapp, navigate to the "Heartbeats" tab. The status of any existing heartbeats will be shown. There will generally be 2 disk heartbeat pairs created for each pool that is added to the cluster:

In this example, there are 4 disk heartbeats shown - one disk heartbeat pair for each of two disks.

To configure an additional pair of disk heartbeats, click "Add Disk Heartbeat Pair" and the following dialogue will be shown:

Select one of the available disks for use as a disk heartbeat and click Submit. A confirmation window will be shown with further details about the new disk heartbeat device:

After confirming the details about the new disk heartbeat, it will be added to the cluster and the new heartbeat status will be shown.

Configuring Disk Heartbeats using the Command Line

The steps that follw make the following assumptions:

  • A cluster has already been created on a set of nodes. For more information about creating a cluster using the command line, see Cluster Creation
  • An HA service has already been created in the cluster. For more information about creating a ZFS HA service using the command line, see Service Creation
    An HA service must be created before adding disk heartbeats because the heartbeat disks will be components of the service storage. In the case of ZFS clusters, the heartbeat disks will always be pool disks, so there must already be a pool in the cluster.

Note that some disks need to be used for SCSI reservations (disk fencing) so it is important not to configure heartbeats on all disks in a pool.

The existing heartbeat configuration can be viewed with rsfcli:

[root@CentOS8-1 ~]# rsfcli -v heartbeats
CentOS8-2 : net=2 disc=1 serial=0
	CentOS8-1	net	CentOS8-2
	CentOS8-1	net	CentOS8-2-priv
	CentOS8-1	disc	/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:512,/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:520
CentOS8-1 : net=2 disc=1 serial=0
	CentOS8-2	net	CentOS8-1
	CentOS8-2	net	CentOS8-1-priv
	CentOS8-2	disc	/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:520,/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:512
[root@CentOS8-1 ~]#

In this example there is a single disk heartbeat pair using the disk  with ID wwn-0x60014059223f3d03c1a4388ae7e73444.

To add a further disk heartbeat pair, select an appropriate device (in the case of ZFS clusters this will be a device from one of the cluster controlled pools) and use the "hacli heartbeat create" subcommand:

[root@CentOS8-1 ~]# hacli heartbeat create --help
NAME:
   hacli heartbeat create - Create a new heartbeat between two cluster nodes.

USAGE:
   hacli heartbeat create [command options] [arguments...]

OPTIONS:
   --type value   Hearbeat type: network or disc. Required argument.
   --devid value  Device/disc ID. Disc heartbeats only.
   --nodes value  Pair of nodes used for the network heartbeat.
                  Required argument for 3+ node clusters
                  or private network heartbeats.
                  Usage:
                  Network heartneat using public IPs:
                    <nodeA>,<nodeB>
                  Network heartbeat using private IPs:
                    <nodeA>#<privateAddrA>,<nodeb>#<privateAddrB>
   
[root@CentOS8-1 ~]#

First, log in to RSF-1:

[root@CentOS8-1 ~]# hacli login
Enter URL [https://localhost:4330 if empty]: 
Enter Username: 
Enter Password: 
[root@CentOS8-1 ~]#

In the following example, a new heartbeat pair will be created using the disk with ID wwn-0x6001405df531f843bd347c78a5af78e0:

[root@CentOS8-1 ~]# hacli heartbeat create --type disk --devid /dev/disk/by-id/wwn-0x6001405df531f843bd347c78a5af78e0
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 4.11,
  "error": false,
  "output": "Heartbeat successfully created"
}
[root@CentOS8-1 ~]#

The new heartbeat status can now be seen from rsfcli:

[root@CentOS8-1 ~]# rsfcli -v heartbeats
CentOS8-2 : net=2 disc=2 serial=0
	CentOS8-1	net	CentOS8-2
	CentOS8-1	net	CentOS8-2-priv
	CentOS8-1	disc	/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:512,/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:520
	CentOS8-1	disc	/dev/disk/by-id/wwn-0x6001405df531f843bd347c78a5af78e0:512,/dev/disk/by-id/wwn-0x6001405df531f843bd347c78a5af78e0:520
CentOS8-1 : net=2 disc=2 serial=0
	CentOS8-2	net	CentOS8-1
	CentOS8-2	net	CentOS8-1-priv
	CentOS8-2	disc	/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:520,/dev/disk/by-id/wwn-0x60014059223f3d03c1a4388ae7e73444:512
	CentOS8-2	disc	/dev/disk/by-id/wwn-0x6001405df531f843bd347c78a5af78e0:520,/dev/disk/by-id/wwn-0x6001405df531f843bd347c78a5af78e0:512
[root@CentOS8-1 ~]#

Author: Matt Youds
Last update: 2021-03-29 14:39


Understanding Service Status

A "service instance" is the combination of HA service and cluster node. For example in a 2-node cluster, each HA service will have two instances - one for each node in the cluster.

Each service instance has its own state and a service instance state consists of three items:

Mode

Each service instance has a "mode" setting which can be set to either:

  • Automatic
    Automatic mode means the service instance will be automatically started when all of the following requirements are satisfied:
    • The service instance is in the stopped state
    • The service instance is not blocked
    • No other instance of this service is in an active state
  • Manual
    Manual mode means the service instance will never be automatically started

Blocked

The "blocked" state is similar to the service "mode" except that instead of being set by the user, it is controlled automatically by the cluster's monitoring features. A service instance can be either:

  • Blocked
    • The cluster's monitoring has detected a problem that affects this service instance.
    • This service instance will not start until the problem is resolved, even if the service is in automatic mode.
    • If a service instance becomes blocked when it is already running, the cluster may decide to stop that instance to allow it to be started on another node. This will only happen if there is another service instance in the cluster that is:
      • Unblocked
      • In automatic mode, and
      • Stopped
  • Unblocked
    • The service instance is free to start as long as it is in automatic mode.

State

Each service instance has a state. The service instance states that are possible to be seen in a ZFS cluster can be separated into the following categories:

Active States

When the service instance is in an "active" state, it is likely to be holding some service resources (e.g. a ZFS pool is imported or a VIP is plumbed in). Because of this, it cannot be safe for any other instance of the same service to be started.

For example, if a service is "stopping" on node 1, it cannot yet be started on node 2.

Active states are:

  • running
    • The service is running on this node and only this node. All service resources have been brought online. For ZFS clusters this means the main ZFS pool and any additional pools have been imported, any VIPs have been plumbed in and any configured logical units have been brought online.
  • starting
    • The service is in the process of starting on this node. Service start scripts are currently running - when they complete successfully the service instance will transition to "running".
  • stopping
    • The service is in the process of stopping on this node. Service stop scripts are currently running - when they complete successfully the service instance will transition to "stopped".
  • broken_unsafe
    • The service is in a broken state because service stop or abort scripts failed to run successfully. Some or all service resources are likely to be online so it is not safe for the cluster to start another instance of this service on another node.
    • This can be caused by one of two circumstances:
      • The service failed to stop
      • The service failed to start and began running abort scripts. The abort scripts also failed
  • panicked
    • While the service was in an active state on this node, it was seen in an active state on another node. Panic scripts have been run.
  • panicking
    • While the service was in an active state on this node, it was seen in an active state on another node. Panic scripts are running and when they are finished, the service instance will transition to "panicked".
  • aborting
    • Service start scripts failed to complete successfully. Abort scripts are running (these are the same as service stop scripts). When abort scripts complete successfully the service instance will transition to the "broken_safe" state.

Inactive States

When a service instance is in an "inactive" state, no service resources are online. That means it is safe for another instance of the service to be started elsewhere in the clsuter.

Inactive states are:

  • stopped
    • The service is stopped on this node. No service resources are online.
  • broken_safe
    • This state can be the result of either of the following circumstances:
      • The service failed to start on this node but had not yet brought any service resources online. It transitioned directly to broken_safe when it failed.
      • The service failed to start after having brought some resources online. Abort scripts were run to take the resources back offline and those abort scripts finished successfully.

Author: Matt Youds
Last update: 2021-03-30 13:04


Controlling a HA Service

When a new HA service is added to a cluster, its state will initially be set to "stopped" and "automatic". The cluster will then start the service on its preferred node.

After the service is started for the first time, its state can be controlled simply using the supplied webapp or command line utilities.

The steps that follw make the following assumptions:

  • A cluster has already been created on a set of nodes. For more information about creating a cluster, see Cluster Creation
  • An HA service has already been created in the cluster. For more information about creating a ZFS HA service, see Service Creation

For more information about HA service status, see Understanding a service status.

Controlling a Service from the Browser

All configured HA services in the cluster can be controlled from the "Cluster Control" tab.

Upon navigating to "Cluster Control" a summary of each service's state will be shown for all cluster nodes:

In this example, it can be easily seen that there is a single service "lio-pool" which is running on the node CentOS8-1, automatic on both nodes and also unblocked on both nodes.

Setting a Service Mode

A service's mode can be set to either "automatic" or "manual" from this tab. Click the "Actions" button for the service instance that is to be controlled. The display will change to show available actions:

One of the options shown is to set the service mode. If the service is in automatic mode as in the example, then the option will be to set the service to manual. If it is already manual, then there will be the option to set it to automatic mode.

Select the option to change the service's mode, confirm the action and the service status will change:

Moving a Service

A running service can be moved easily from the Cluster Control tab. It is not necessary to change the service mode (automatic/manual) on either the source or destination node - any changes to these settings are handled internally by RSF-1.

The service can only move if it is running on the source node and stopped on the destination. Any other states will block the move operation.

Select the action to move the service from the running node to a remote node:

A confirmation window will be shown with an option to retain failover modes after the service move. This option controls whether the service mode settings move with the service.

When the service move is confirmed, the state changes will be displayed in real-time in the Cluster Control view:

Starting a Service

It is only possible to start a service when it is stopped on all nodes in the cluster. This will be shown in the list of actions:

Select the option to start the service on one of the cluster nodes and the service state changes will be shown in the Cluster Control view:

Stopping a Service

To stop a service, select the appropriate option from the list of actions for the node that is currently running the service.

When a service is stopped on a node, the service move is set to manual on that node to avoid the service starting again.

Note that if it is not desired for the service to start on another node after the service stop completes, it is important to set the service to manual mode on all nodes before performing the stop action.

When the stop completes, the service will be shown as manual and stopped on all nodes:

Controlling a Service using the Command Line

Setting a Service Mode

 

Moving a Service

 

Starting a Service

 

Stopping a Service

 

Repairing a Broken Service

 

Author: Matt Youds
Last update: 2021-03-30 13:05


Documentation » RSF-1 for ZFS

RSF-1 for ZFS - Administrator's Guide

RSF-1 provides Enterprise-Grade High Availability for ZFS storage servers and includes simple configuration and administration interfaces. The following chapters provide comprehensive details about the installation and use of RSF-1.

Installation

Quick Start Guide

The following chapters provide in-depth information about the configuration and use of an RSF-1 cluster. For a brief overview of using RSF-1 to provide High Availability for a ZFS pool, see the QuickStart Guide.

Cluster Creation

Services

A service is an entity that the RSF-1 cluster can move between nodes. Although any individual service can run on only 1 cluster node at a time, if there are multiple services configured they can start, stop and move independently, allowing for an Active/Active cluster configuration.

In the case of a ZFS cluster, each service adopts the name of the pool that it controls.

Service Creation

The following sections describe the process of creating a new service to put a ZFS pool under cluster control.

Additional ZFS Pools

By default, each service will adopt the name of a ZFS pool and will take control of that pool. However it is possible to add additional pools to a service. If there are multiple pools controlled by one service, those pools will failover together and always be imported on the same node as each other. The following sections give more details about additional pools in a ZFS service:

Heartbeats

Heartbeats are a fundamental part of any RSF-1 cluster. They provide the communication channel that allows each cluster node to monitor the status of each other node in the cluster. The following sections describe the different types of heartbeat available and how to configure the heartbeats in a cluster.

Controling a Service

Each service in the cluster has its own state and can be controlled independently. See the following sections for more information on controlling a service:

Failover

When RSF-1 detects the failure of a service, it will attempt to failover that service to another node in the cluster. The following sections provide details about service failovers:

  • Moving a service manually
  • What causes a service to failover?
  • What happens when a service fails over?
  • How to temporarily prevent a failover

Shares

The following sections provide information about creating shares from a clustered ZFS pool

  • Creating file shares
  • Creating block shares (Logical Units)

Cluster Properties

There are a number of properties that are available for tuning an RSF-1 cluster. During the initial package installation, these properties are set to sensible default settings and the recommendation is always to avoid changing away from those defaults. However, if necessary, those settings can be changed easily. See the page below for details about the available cluster properties:

  • Cluster properties

Author: Matt Youds
Last update: 2021-03-30 11:15


RSF-1 Requirements for ZFS clusters

Author: Matt Youds
Last update: 2021-03-18 18:16


Creating a Highly Available ZFS Service

A service is an entity that the RSF-1 cluster can move between nodes. A service can only ever be active on one node at a time, so when it is moved by the cluster, it has to be shut down on one node and started up on another.

The purpose of a service is to bring a set of resources under cluster control and make those resources highly available. In the case of a ZFS cluster, each service will take control of at least one ZFS pool and typically one or more IP addresses (VIPs).

The service will adopt the name of the ZFS pool that it controls.

Prerequesites for creating a ZFS service

Before a ZFS HA service can be created, the following requirements must all be satisfied:

  • The RSF-1 software is installed on all nodes - see the Installation section for more details
  • A cluster has been created - see Cluster Creation for more details
  • All cluster nodes are running
  • There is a "clusterable" ZFS pool imported on one of the cluster nodes (Important: the pool should never be imported on more than one node at the same time). For more information about clusterable pools, see Clusterable Pools
  • Any VIP that will be required by the service should not be present on the network.

Clusterable Pools

A ZFS pool can only be added to a cluster if it is considered "clusterable". To be clusterable the pool must be both:

  • Imported on one of the cluster nodes (only one node)
  • Available to all cluster nodes
    • This means the pool must be made up entirely from shared disks that all cluster nodes can access. This includes cache and log devices as well as regular data devices.

To find out if a pool is considered to be "clusterable", it is possible to use either the browser based webapp or the hacli command:

Discovering Clusterable Pools Using the Browser Interface

Using the cluster's webapp, it is simple to see if there are any clusterable pools simply by visiting the "Services" tab. Upon navigating to "Services", if there are no configured services, the following message will be displayed:

Next, click "Add Service". The following dialogue will be shown:

If there are any clusterable pools, they will be available for selection in the dropdown box:

Discovering Clusterable Pools Using the Command Line

The hacli command can be used to show all clusterable pools. hacli can be run from either node and will gather information from both nodes so it is not important to run the command from the node that currently has the pool imported.

First, log in using hacli:

[root@CentOS8-1 ~]# hacli login
Enter URL [https://localhost:4330 if empty]: 
Enter Username: 
Enter Password:
[root@CentOS8-1 ~]#

Next, use the "hacli zpool clusterable" subcommand to display any pools that can be added to the cluster:

[root@CentOS8-1 ~]# hacli zpool clusterable
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 0.165,
  "error": false,
  "output": [
    {
      "Pool": "lio-pool",
      "guid": "7823116208146933468",
      "importedNode": "CentOS8-1"
    }
  ]
}
[root@CentOS8-1 ~]#

In this example here is a single pool - lio-pool - that can be added to the cluster.

What is a VIP and does every service need one?

A VIP (Virtual IP) is simply an IP address that is associated with an HA service under RSF-1, rather than being associated with any particular node. Because the VIP is associated with a service, it will move between cluster nodes when that service moves. In the case of ZFS clusters, that means the service VIP is always available on the same cluster node as the ZFS pool.

If the ZFS pool is used to provide network shares to clients, then those clients should connect to the shares using the VIP, rather than any other IP address on the active cluster node. When the service fails over (either automatically because of a node failure or manually by a managed service move), the clients will pause I/O when the VIP is unplumbed or when the active node goes down. I/O will then resume when the VIP is plumbed in on another cluster node.

Not every service will require a VIP. A VIP is essential for services providing network shares (NFS, SMB, iSCSI block storage) but if a service is providing block storage over FiberChannel, then no VIP is required because clients will connect using FiberChannel.

Creating a ZFS service from the browser

To create a service using the RSF-1 webapp, navigate to the "Services" tab. If there are already services configured, their status will be displayed. Otherwise there will be a simple message and an option to add a new service to the cluster.

Click the "Add Service" button and a new dialogue will be shown with a dropdown menu. The menu will list any ZFS pools that are considered to be "clusterable".

Upon selecting a clusterable pool, the "Add Service" window will expand to give more options:

From the "Add Service" window, it is possible to simply create the service, but there are further details that can be changed first.

Firstly, the "Preferred Node" can be changed. The preferred node setting causes the cluster to use a shorter timeout on one particular node before starting the service upon booting. The result is that if all nodes are booted at the same time, the service will naturally start on its preferred node.

It is also possible to specify a "Virtual Hostname" for the service. This is a VIP that will be created when the service is added to the cluster. By default, no VIP will be added.

Note that it is possible to add VIPs to a service after it has been created, so adding the VIP at this point is not essential.

To add a VIP, specify a hostname, IP address and subnet mask in CIDR form. Also adjust the "Node Interfaces" which sets the network interface on each node that will be used to plumb in the VIP. The hostname will be added to each node's /etc/hosts when the service is added to the cluster.

Finally, it is possible to change the heartbeat disk selection from the default. The default selection is calculated based on the pool structure and is designed to provide sufficient heartbeat redundancy while also leaving enough disks for sufficient disk fencing. However it may be desirable to make a change from the default selection, for example to ensure there are heartbeats using each disk array/shelf.

Before creating the service, a summary of all settings will be displayed to allow for any mistakes or omissions to be fixed:

After confirming the settings are correct, the service will be added to the cluster:

When the service has been added successfully, the webapp will redirect back to the Services tab, which will now show the status of the new service:

Creating a ZFS service using the command line

Services can be easily created using the hacli command.

The steps that follow make the following assumptions:

  • A cluster has already been created on a set of nodes. For more information about creating a cluster using the command line, see Cluster Creation
  • There is a suitable clusterable pool available and imported on one of the cluster nodes. For more information about clusterable pools, see Clusterable Pools

Before attempting to create a new service, log into the administration API using hacli:

[root@CentOS8-1 ~]# hacli login
Enter URL [https://localhost:4330 if empty]: 
Enter Username: 
Enter Password:
[root@CentOS8-1 ~]#

Service creation will require at least the clusterable pool's name and GUID. This information can be gathered using the following hacli command:

[root@CentOS8-1 ~]# hacli zpool clusterable
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 0.168,
  "error": false,
  "output": [
    {
      "Pool": "lio-pool",
      "guid": "7823116208146933468",
      "importedNode": "CentOS8-1"
    }
  ]
}
[root@CentOS8-1 ~]#

From the above output, it can be seen that there is a clusterable pool named "lio-pool" with GUID "7823116208146933468".

The service can now be created using the hacli service create subcommand:

[root@CentOS8-1 ~]# hacli service create
NAME:
   hacli service create - Create a new cluster service

USAGE:
   hacli service create [command options] [arguments...]

OPTIONS:
   --name value, -n value        Name of the pool. Required argument.
   --guid value, -g value        GUID of the pool. Required argument.
   --nodes value                 Comma-separated list of nodes.
                                 Optional argument.
   --desc value                  Description of the service.
                                 Optional argument.
   --panic                       Allows to panic the active node if
                                 it cannot export the pool. Disabled by default.
   --dhbs value                  Manual disk heartbeats. Optional argument.
                                 I.e: devid1:node1:node2+devid2:node2:node3
   --init value                  Initial timeout for the service (in seconds).
                                 Optional argument.
   --main value                  Name of the node with higher priority to
                                 start the service by default. Optional argument.
   --nomhdc                      Disable the MHDC security mechanism.
                                 Enabled by default.
   --runtime value               Default timeout for the service (in seconds).
                                 Optional argument.
   --vipname value, -v value     VIP name. Optional argument. I.e. vip01
   --vipip value, -i value       VIP IP address. Optional argument. I.e. 10.10.2.74
   --vipnetmask value, -m value  VIP netmask. Optional argument. I.e. 255.255.0.0 or 16
   --vipnics value, -c value     '+'-separated list of node:nic. Optional argument.
                                 I.e. node1:e1000g0+node2:vmxnet3s0
   --vipv6, -6                   Needed if VIP is IPv6. Disabled by default.
   --live                        Leave the pool imported and the VIP plumbed.
                                 EXPERIMENTAL. Disabled by default.

2021/03/25 17:05:15 name and guid are both required.
[root@CentOS8-1 ~]#

The following example shows the simplest possible service create. The result of this will be a service with no VIP and default disk heartbeats:

[root@CentOS8-1 ~]# hacli service create --name lio-pool --guid 7823116208146933468
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 4.535,
  "error": false,
  "output": "Service added"
}
[root@CentOS8-1 ~]#

Command line Service Creation Examples

The following hacli example adds a service to the cluster with a VIP:

[root@CentOS8-1 ~]# hacli service create --name lio-pool --guid 7823116208146933468 \
> --vipname vip1 \
> --vipip 10.50.50.101 \
> --vipnetmask 24 \
> --vipnics CentOS8-1:enp0s8+CentOS8-2:enp0s8
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 5.432,
  "error": false,
  "output": "Service added"
}
[root@CentOS8-1 ~]#

For more information about VIPs, see the above section: What is a VIP

The following hacli example adds a service to the cluster with a non-default list of disk heartbeats. This is achieved using the --dhbs argument with the argument format:

<disk 1>:<node 1>:<node 2>[+<disk 2>:<node 1>:<node 2>...]
[root@CentOS8-1 ~]# hacli service create --name lio-pool --guid 7823116208146933468 \
> --dhbs /dev/disk/by-id/wwn-0x6001405b852627d81be4bc0a9c8156f2:CentOS8-1:CentOS8-2+/dev/disk/by-id/wwn-0x6001405d6d727e0d49d4c838eccae6c4:CentOS8-1:CentOS8-2
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 4.498,
  "error": false,
  "output": "Service added"
}
[root@CentOS8-1 ~]#

Note:

When providing a list of disk heartbeats please ensure there are sufficient non-heartbeat disks left to fence the pool.

E.g. For a pool made up of mirrors, at least one full mirror vdev must be left without heartbeats. This will allow the cluster to place SCSI reservations on those drives to fence the pool.

What happens when a service is added to the cluster?

Before a ZFS service is created, any VIP IP address is checked to make sure it doesn't already exist on the network (including locally) and the associated ZFS pool will be exported from the node that it is currently imported on.

After the service is created, it will be initially placed in automatic mode on both nodes with the service state set to "stopped". The cluster will then be free to start the service on its preferred node.

Author: Matt Youds
Last update: 2021-03-26 18:01


Additional Pools

In a ZFS cluster, each HA service is created to take control of a ZFS pool. The service adopts the name of the pool and when it moves between cluster nodes, it takes the pool and any associated VIPs with it.

If multiple ZFS pools exist on the system and there is a need to cluster them, then multiple services can be created. Using multiple services allows the pools to move between cluster nodes independently of each other. However, it is also possible to add additional pools to an existing HA service.

Why Add Additional Pools?

In general a service will control a single pool and when there is a need for a second pool, a new service will be created. This allows the pools to failover independently, but where a VIP is required to allow clients access to the storage, each service will require at least one VIP.

If there is a need to access storage from more than one pool using a single VIP, then those pools should all be added to the same HA service.

Adding Additional Pools from the Browser

 

Adding Additiional Pools using the Command Line

ZFS pools can be added to an existing HA service using hacli.

The steps that follw make the following assumptions:

  • A cluster has already been created on a set of nodes. For more information about creating a cluster using the command line, see Cluster Creation
  • An HA service has already been created in the cluster. For more information about creating a ZFS HA service using the command line, see Service Creation
  • There is a suitable clusterable pool available and imported on one of the cluster nodes. For more information about clusterable pools, see Clusterable Pools

Before attempting to create a new service, log into the administration API using hacli:

[root@CentOS8-1 ~]# hacli login
Enter URL [https://localhost:4330 if empty]: 
Enter Username: 
Enter Password:
[root@CentOS8-1 ~]#

Addition of a pool to a service will require at least the clusterable pool's name and GUID. This information can be gathered using the following hacli command:

[root@CentOS8-1 ~]# hacli zpool clusterable
{
  "timeout": 120,
  "errorMsg": "",
  "execTime": 0.153,
  "error": false,
  "output": [
    {
      "Pool": "tank",
      "guid": "16605435915946911042",
      "importedNode": "CentOS8-1"
    }
  ]
}
[root@CentOS8-1 ~]#

From the above output, it can be seen that there is a clusterable pool named "tank" with GUID "16605435915946911042".

The pool can then be added to an existing service using the "hacli service pools add" subcommand:

[root@CentOS8-1 ~]# hacli service pools add
NAME:
   hacli service pools add - Add a new ZFS pool to this service. This new pool will failover with the service.

USAGE:
   hacli service pools add [command options] [arguments...]

OPTIONS:
   --name value  Name of the service. Required argument.
   --pool value  Name of the ZFS pool to be added. Required argument.
   --guid value  GUID of the ZFS pool to be added. Required argument.
   
2021/03/27 19:03:50 name, pool and guid are required.
[root@CentOS8-1 ~]#

The following example adds the additional pool "tank" to the existing HA service lio-pool:

[root@CentOS8-1 ~]# hacli service pools add --name lio-pool --pool tank --guid 16605435915946911042
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 4.897,
  "error": false,
  "output": "Additional pool tank successfully added to lio-pool"
}
[root@CentOS8-1 ~]#

The list of additional pools that currently exist in a cluster can be obtained with the "hacli service pools info" subcommand:

[root@CentOS8-1 ~]# hacli service pools info --name lio-pool
{
  "timeout": 4,
  "errorMsg": "",
  "execTime": 0.001,
  "error": false,
  "output": [
    {
      "poolGUID": "16605435915946911042",
      "poolName": "tank"
    }
  ]
}
[root@CentOS8-1 ~]#

What Happens when an Additional Pool is Added?

When a ZFS pool is added to an existing HA service as an additional pool, the following steps are taken by the cluster:

  1. The pool is evaluated to ensure it is clusterable
    For more information about clusterable pools, see Clusterable Pools
  2. If necessary, the pool is moved to the node that is currently running the HA service. If the HA service is currently stopped on all nodes, the pool is exported.
  3. The additional pool is added to the cluster configuration

The following example demonstrates adding an additional pool to a service which is running on a different node to the pool.

First, the service is running on CentOS8-1 and the pool "tank" is imported on CentOS8-2:

[root@CentOS8-1 ~]# rsfcli -v list
CentOS8-1:
 lio-pool    running          automatic         unblocked       NONE    NONE    20  8
CentOS8-2:
 lio-pool    stopped          automatic         unblocked       NONE    NONE    20  8
[root@CentOS8-1 ~]# ssh CentOS8-2 zpool list tank
root@centos8-2's password: 
NAME   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank   352M   110K   352M        -         -     0%     0%  1.00x    ONLINE  -
[root@CentOS8-1 ~]# 

Next, the pool is added to the lio-service:

[root@CentOS8-1 ~]# hacli service pools add --name lio-pool --pool tank --guid 16605435915946911042
{
  "timeout": 60,
  "errorMsg": "",
  "execTime": 5.668,
  "error": false,
  "output": "Additional pool tank successfully added to lio-pool"
}
[root@CentOS8-1 ~]#

Once the pools is added, the pool "tank" is moved to CentOS8-1 so that it is imported on the same node that is running the lio-service:

[root@CentOS8-1 ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
lio-pool   352M   255K   352M        -         -     4%     0%  1.00x    ONLINE  -
tank       352M   132K   352M        -         -     1%     0%  1.00x    ONLINE  -
[root@CentOS8-1 ~]# ssh CentOS8-2 zpool list tank
root@centos8-2's password: 
cannot open 'tank': no such pool
[root@CentOS8-1 ~]#

Author: Matt Youds
Last update: 2021-03-27 21:52