Creating a ZFS HA Cluster on Linux using shared or shared-nothing storage
This guide goes through a basic setup of a RSF-1 ZFS HA cluster on Linux. Upon completion the following will be configured:
- A working Active-Active Linux cluster with either shared or shared-nothing storage
- A clustered service sharing a ZFS pool (further services can be added as required)
- A virtual hostname by which clients are able access the service
RSF-1 supports both shared and shared-nothing storage clusters.
A shared storage cluster utilises an common set of storage devices that are accessible to both nodes in the cluster (housed in a shared JBOD for example). A ZFS pool is created using these devices and access to that pool is controlled by RSF-1.
Pool integrity is maintained by the cluster software using a combination of redundant heartbeating and PGR3 disk reservations to ensures any pool in a shared storage cluster can only be accessed by a single node at any one time.
flowchart TD SSa("Node A") & SSb("Node B") <-- SAS/FC etc. --> SSS[("Storage")]
A shared-nothing cluster consists of two nodes, each with their own locally accessible ZFS storage pool residing on non shared storage:
flowchart TD SNa("Node A")<-->|SAS/FC etc.|SNSa SNb("Node B")<-->|SAS/FC etc.|SNSb SNSa[("Storage")] SNSb[("Storage")]
Data is replicated between nodes by an HA synchronisation process. Replication is always done from the active to the passive node, where the active node is the one serving out the pool to clients:
flowchart LR SNa("Node A (active)<br />Pool-A")-->|HA Synchronisation|SNb SNb("Node B (passive)<br />Pool-A")
Should a failover occur then synchronisation is effectively reversed:
flowchart RL SNa("Node B (active)<br />Pool-A")-->|HA Synchronisation|SNb SNb("Node A (passive)<br />Pool-A")
Before creating pools for shared nothing clusters
To be eligible for clustering the storage pools must have the same name on each node in the cluster
It is strongly recommended the pools are of equal size, otherwise the smaller of the two runs the risk of depleting all available space during synchronization
Download cluster software
If not already done so, download and install the RSF-1 cluster software onto each cluster node. More information can be found here.
Initial connection and user creation
Please make sure that any firewalls in the cluster environment have the following ports open before attempting configuration:
- 1195 (TCP & UDP) - 4330 (TCP) - 4331 (TCP) - 8330 (TCP)
shared-nothingcluster, both nodes require ssh access to each other without a password. This is needed for the replication of the ZFS pool.
To connect to the RSF-1 GUI, direct your web browser to:
Next, create an admin user account for the GUI.
Enter the information in the provided fields and click the
Submit button when ready:
Once you click the
Submit button, the admin user account will be
created and you will be redirected to the login screen. Login with the
username and password just created:
Once logged in the main dashboard page is displayed:
Configuration and Licensing
Before continuing, ensure the
file is configured correctly on both nodes. Hostnames cannot
be directed to
127.0.0.1, and both nodes should be
resolvable. Here is a correctly configured hosts file for two
127.0.0.1 localhost 10.6.18.1 node-a 10.6.18.2 node-b # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters
To begin configuration, click on
Create/Destroy option on the
side-menu (or the shortcut on the panel shown when first logging in).
The Cluster Create page scans for clusterable nodes (those running
RSF-1 that are not yet part of a cluster)
and presents them for selection:
Now enter the cluster name and description, and then
select the type of cluster being created (either
If setting up a
shared-nothing cluster an additional option to add a
node manually is shown at the bottom of the page. This is because
RSF-1 will detect nodes on the local network, but for
clusters, the partner node could be on a separate
network/location, and therefore may not automatically be detected1.
If any of the selected nodes have not been licensed, a panel is shown to obtain 45 day trial licenses:
Next, the RSF-1 End User License Agreement (EULA) will
be displayed. Click
accept to proceed:
Once the license keys have been successfully installed, click the
Create Cluster button to initialize the cluster:
When the cluster has been created, you can enable support for disk multipathng in RSF-1 (if the disks are already configured) and/or netplan (ubuntu) if required:
These settings can be modified after cluster set-up if needed.
They can be found in
Settings -> Linux.
Enabling Multipath Support
If the disks have been configured to use multipathing you must enable multipath support otherwise disk reservations will not function correctly. Do not enable if disks are configured for singlepath only.
The next step is to add pools to the cluster.
Creating a Pool in the WebApp
If a zpool isn't already created, this can be done via the
Pools -> Volumes on the side menu, then
Enter the desired Pool Name and select a Pool Mode (jbod, raidz2 or mirror). Add your drives to the pool by selecting them in the list and choosing their role using the buttons at the bottom.
To configure multiple mirrors in a pool, select the first set of
drives from the list and add them as data disks. Next select your next
set of drives, and click
Once configured, click submit and your pool is created and ready to be clustered:
Preparing Pools to Cluster
Pools must be imported on one of the nodes before they can be
clustered. Check their status by selecting the
Pools option on the
shared-nothing cluster, the pools will need
to have the same name and be individually imported on each node
In the above example
pool2 are exported,
imported. To import
pool1 first select it:
Actions, followed by
The status of the pool should now
Should any issues be encountered when importing the pool it will
be marked as UNCLUSTERABLE. Check the RestAPI log
/opt/HAC/RSF-1/log/rest-operations.log) for details on why the
shared-nothingcluster, this may happen if
the pools aren't imported on both nodes.
The pool is now ready for clustering.
Clustering a Pool
Highlight the desired pool
to be clustered (choose only pools marked
CLUSTERABLE ), then select
Cluster this pool:
Fill out the description and select the preferred node for the service:
What is a preferred node
When a service is started, RSF-1 will initially attempt to run it on it's preferred node. Should that node be unavailable (node is down, service is in manual etc) then the service will be started on the next available node.
shared-nothing pool the GUID's for each pool will be shown:
To add a virtual hostname to the service click
Add in the Virtual
Hostname panel. Enter the IP address, and optionally a hostname, in the
popup. For nodes with multiple network interfaces, use the drop down
lists to select which interface the virtual hostname should be assigned
to. Click the
next button to continue:
Finally, click the
The pool will now show as
View Cluster Status
To view the cluster status, click on the
Dashboard option on the side-menu:
The dashboard shows the location of each service and the respective pool
states and failover modes (manual or automatic). The dashboard also allows
the operator to stop, start and move services in the cluster.
Select a pool then click the
⋮ button on the right
hand side to see the available options:
To view cluster heartbeat information select the
Heartbeats option on the left
To add an additional network heartbeat to the cluster, select
Network Heartbeat Pair.
In this example an additional connection exists between the two nodes with the
mgub02-priv respectively. These
hostnames are then used when configuring the additional heartbeat:
Submit to add the heartbeat.
The new heartbeat will now be displayed on the
Heartbeats status page:
This completes basic cluster configuration.
RSF-1 uses broadcast packets to detect cluster nodes on the local network. Broadcast packets are usually blocked from traversing other networks and therefore cluster node discovery is usually limited to the local network only. ↩