Skip to content

TrueNAS cluster with RSF-1

Introduction

This document describes how to create a highly available clustered TrueNAS system using RSF-1 software. The base system should consist of two nodes running the latest release of TrueNAS, with external storage connected to both nodes concurrently (commonly referred to as shared storage).

Features

  • ZFS pools created on the shared storage can be failed over between cluster nodes - these are referred to as shared pools.
  • RSF-1 is an Active-Active cluster. This means a pool can be active on, and failover to, any node in the cluster.
  • Multiple pools can be clustered with no interdependencies; meaning you could have two pools on one node and three on another and then fail over all pools from the first to the second, or just one from the second to the first etc.
  • A shared pool can only be imported on any one cluster node at a time. RSF-1 uses disk reservations to enforce this rule to protect data.
  • Any services configured to use a shared pool (such as NFS/SMB) are accessible on the node the pool is imported on.
  • Multiple heartbeats over network and disk (no dedicated heartbeat drive required - integrates with existing ZFS drives with no reconfiguration required)

The TrueNAS System Dataset Pool

TrueNAS saves system configuration information in the System Dataset Pool - usually the first ZFS pool created on the system. The effect of this means that the pool containing that dataset is not eligible for clustering (as the pool containing the system dataset cannot be exported, and attempts to do so will result in failure with a 'unmount failed' message). The solution to this is to move the location of the system dataset to the boot pool (or a pool not being considered for clustering). This is done in the GUI by navigating to System==>System Dataset , selecting the boot pool from the drop down list of pools and finally saving the change. For a highly-available system we would recommend each cluster node has a dedicated boot drive, mirrored if possible1.


Note

When the boot pool is the only imported pool, TrueNAS will always show this as the location of the system dataset. This configuration however is not permanent until it is actually saved. Failure to do so leaves TrueNAS open to the possibility of relocating the system dataset, which can cause issues in the cluster (as outlined above). The rule here is, even if TrueNAS reports the system dataset as residing on the boot pool, make sure that setting is saved, thereby making it a permanent feature (this need only be done once on each cluster node).

QS image 2


Accessing cluster services over the network

With a non clustered storage appliance, services such as NFS, SMB etc. are accessed using the IP address of the storage appliance itself. For clustered systems this causes an issue in that when the pool, and by implication any services reliant on that pool, is migrated to another node, those services become inaccessible using the original storage appliances IP address (as it no longer hosts those services).

RSF-1 solves this problem by associating a Virtual IP address (VIP) with a pool, and by implicaition any services using the pool. The VIP is then migrated with the pool should a fail over occur. Clients then access storage services using the VIPs configured rather than the static IP address of the node itself. This approach means that clients need not be aware of where a highly available service is running, or indeed need reconfiguration when a failover occurs, safe in the knowledge that the VIP will always point to the location of the services in the cluster.

When configuring a VIP in the cluster, either an IP address or a hostname can be used. When using a hostname the cluster needs to resolve this to an IP address. To ensure that this resolution is not dependent on external naming services (such as DNS2) we strongly recommend adding the VIP name and address to the hosts file on each node in the cluster. Do this from the GUI by navigating to Network==>Global Configuration and adding the VIP entries.

Installation and configuration

Perform steps 1-5 on both nodes in the cluster:

  1. Allow the RSF-1 package to be installed by enabling the FreeBSD repository and disabling the local one. This is only a temporary change which will revert back to the default settings next time the system is rebooted (done as part of this installation).

    Start a command shell using the >_ Shell menu item in the TrueNAS GUI and edit the file /usr/local/etc/pkg/repos/FreeBSD.conf and set the FreeBSD enabled value to yes:
    FreeBSD: {
        enabled: yes
    }
    
    Next edit the file /usr/local/etc/pkg/repos/local.conf and set the enabled value to no:
    local: {
        url: "file:///usr/ports/packages",
        enabled: no
    }
    
  2. From the command shell download the RSF-1 package and signature files:
    # wget https://packages2.high-availability.com/offline-packages/beta/rsf-1-1.5.0-TNCore.pkg
    # wget https://packages2.high-availability.com/offline-packages/beta/rsf-1-1.5.0-TNCore.pkg.sha512
    
    Calculate the checksum of the downloaded RSF-1 package and ensure it matches the checksum held in the rsf-1-1.5.0-TNCore.pkg.sha512 file (the output of the sha512 and cat commands should be the same):
    # sha512 sf-1-1.5.0-TNCore.pkg
    ...output...
    # cat rsf-1-1.5.0-TNCore.pkg.sha512
    ...output...
    
    Once the checksum has been verified, install the RSF-1 package:
    # pkg install ./rsf-1-1.5.0-TNCore.pkg
    
  3. Enable automatic start of the RSF-1 system. This is done from the TrueNAS GUI by navigating to Tasks==>Init/Shutdown Scripts and adding three new tasks via the ADD button (one at a time):

    /opt/HAC/RSF-1/init.d/rc-rsf onestart
    /opt/HAC/RSF-1/init.d/rc-restapi onestart
    /opt/HAC/RSF-1/init.d/rc-rpcstmfha onestart
    
    For example, the setting for the first task should be:

    Field Setting
    Description rc-rsf onestart
    Type Command
    Command /opt/HAC/RSF-1/init.d/rc-rsf onestart
    When Post init
    Enabled ✅
    Timeout 60

    QS image 1

  4. In the GUI navigate to Network==>Global Configuration and update the TrueNAS host name database with static entries for the cluster nodes. This step is essential so host name lookup is not reliant on any external services that could potentially fail. Each node should have entries for all cluster nodes in the host name database using the format:

    IPaddress FQDN hostname
    
    Here is an example configuration with two static entries in the hosts file:

    QS image 2

  5. Finally, reboot the node.

Configure pools and create cluster

  1. If you haven't already done so create your cluster storage pool(s) on one of the cluster nodes. This must be done using only drives from the shared storage3.

  2. Once you have a pool eligible for clustering it is necessary to make other cluster nodes "aware" of that pool for failover. This is accomplished in two steps:

    1. Export the pool via the cli4 using the command zpool export <poolname>. The pool should go offline in the GUI (in this example pool1 has been exported): QS image 3
    2. Import the pool on the other node using the GUI. This ensures TrueNAS is aware of the pool on the cluster node. QS image 4 This step should only performed once for each pool being clustered.
  3. Add any VIPs you intend to use in the cluster to the host name database on each node in the cluster:

    QS image 3

  4. Finally, navigate to the RSF-1 secure web interface running on port 4330 on the node where the shared pool is imported5 to complete the cluster configuration. Documentation for this step can be found in the RSF-1 quickstart guide

Setting up shares on clustered pools

TrueNAS uses a local configuration to save details of shares created for a pool (NFS, SMB etc). When a pool in a cluster fails over from one node to another that share information is not migrated with the pool. For this reason, when setting up a new share on a clustered pool, it is necessary to duplicate the share configuration on each node in the cluster.

For example, in a cluster with two nodes, Node-A and Node-B, with clustered pool nas-shares, to share /mnt/nas-shares/user-data via NFS the following steps are required:

  1. Start the service configured with the nas-shares pool on Node-A.
  2. Add the NFS share:

    QS image 3
  3. Fail over the service to Node-B.
  4. Again add the NFS share using the same parameters as were used on Node-A.

Note - this configuration step needs only be done once on the cluster for each share (but will need to be repeated for each additional share).


  1. Startup/running performance can also be improved by using SSD or NVMe disks as the boot drive(s). 

  2. Because if this service is unavailable when the cluster tries to resolve the hostname then service startup cannot continue in a normal fashion. 

  3. If any drives in a clustered pool are local to a node, i.e. do not reside in the shared storage, then this will result in a failure to import on any other cluster node as that local drive will be inaccessible. For this reason it is mandatory that all cluster drives reside in shared storage. 

  4. Use the GUI >_ Shell menu item to access the cli. 

  5. If the pool is imported on say truenas-node2 then the URL is https://truenas-node2:4330