Shared Nothing Storage Cluster

Shared Nothing Storage refers to the architecture where Storage Cluster nodes do not share a physical storage subsystem but have their own unique and exclusive storage that is used for the provision of critical storage services. The Shared Nothing concept means that storage and data replication and synchronisation is not handled directly by the underlying storage subsystem or volume management layer, but between Storage Cluster Server nodes independently over a network, using synchronous replication.

Synchronous Replication

Synchronous Replication is a timed process that runs at regular intervals to copy data from each active Storage Cluster Server nodes to passive nodes to ensure the data is synchronised across the cluster. This is a complex process to achieve as data additions and changes are constant, and regardless of the speed and latency of replication services, there will always be a data lag between the active and passive nodes. The frequency and robustness of the replication process completing is therefore critical to ensure data is up to date between the Storage Cluster Server nodes.

In addition, the system failure of an active Storage Cluster Server node during the execution of a replication process means that the passive node does not have the latest version of the data held locally, thus presenting some data loss to users. It is important to note the use of the word “some” here is deliberate as the level of data loss may be indeterminate, unless data replication can be performed atomically. In other words, in the event of failure, it is vital to know exactly at what point in time the data on the new active node is integral, reliable and predictable. A partially completed replication to a data volume for example may prove catastrophic.

Replication should therefore be performed atomically by ensuring that either all the replication changes are committed or none. In other words, the entire replication step is the atomic action. This is best achieved with storage volume snapshot technology.

Replication Snapshots

A snapshot is a record of a storage volume taken at a specific point and frozen in time. Subsequent viewing of a snapshot shows the true and exact view of the storage and its data contents at that point in time regardless of what changes have occurred since. Once a snapshot has been taken, it can be used as a base point for data replication.

Once the replication process has completed, the replication target node has an exact copy of the storage volume data from the point of the snapshot taken on the replication destination node. Subsequent snapshots and replications ensure the target replication node is kept synchronised in atomic steps.

If a replication process fails to complete, that partial replication step can be disregarded leaving the target replication node atomically in step with the destination node albeit one step behind. Using storage snapshot technology therefore eliminates the potential for potentially catastrophic partial replication updates.

Replication Limitations

In an ideal world, replicating snapshots would be performed every minute between active and passive Storage Cluster Server nodes meaning that in the event of failure, the maximum data loss after a storage service failover would be less than one minute.

However, this may not always be achievable as there are unknown and unpredictable parameters to be considered:

  • How much data loss can be afforded?
  • How much data is being created or changed?
  • What network bandwidth is available for replication?
  • Storage specifications and performance of all Storage Cluster Server nodes
Illustration showing man with graph of data changes vs replication frequency

Replication Data Loss Tolerance

The amount of data loss, or replication frequency, that can be tolerated can be answered by analysing how much data is changing during the replication steps and how much replication network bandwidth is available.

For example, consider a Storage Cluster with Active and Passive Server nodes connected with a 2mpbs WAN connection, capable of transmitting 900MB/Hour, hosting a single active storage volume with 20MB of data changing every minute. For this example, the optimum replication frequency would be around 7 minutes.

Of course, the size of data changes is unlikely to be linear and predictable, and so replication frequency calculations should consider reasonable fluctuations rather than an average view.

In the example above, if synchronous data replication was configured to run every 7 minutes, what would happen if there was a sudden spurt of data changes or drop in network bandwidth such that the replication step could not be completed in the 7-minute window cycle? The next replication step would therefore start before the previous completed, the replication steps would get out of sync and enter a state of replication synchronisation overrun.

Adaptive Replication

To avoid replication synchronisation overrun, an adaptive replication strategy can be implemented to ensure that multiple replication steps are not executed until the previous step has successfully completed. This may result in some replication steps exceeding the desired replication frequency and therefore it will be necessary to monitor and amend the replication frequency accordingly over time.

Push or Pull Replication

The final consideration for a replication strategy is whether to implement a push or pull strategy. This refers to the origin of the replication request, i.e., push from the active server or pull from the passive server? For Storage Clusters with multiple passive nodes, there is the additional consideration of which node to pull from, as it can pull from either the active node or another passive node that has already received the latest snapshot update.

Divergent History

Another problematic scenario that needs to be resolved for Shared Nothing Storage Clusters is divergent history after failure and failover. This refers to a situation where storage replication is out of sync and a failover event occurs leaving Storage Cluster Server nodes with different versions of replicated data.

Illustration showing high availability divergent history

For example, in the case above, active Node A crashes before snapshot-4 has completed replicating to passive Node B. The local storage on Node B is therefore one step behind having only completed replication with snapshot-3, but now becomes the active Storage Cluster Server node. There has therefore been a data loss of one step.

Active Node B is now serving new data and change requests. However, when Node A recovers and becomes passive node for the storage service, subsequent replicated snapshots from active Node B will be incompatible with its own data that included the snapshot-4 step that was not replicated to Node B.

To circumvent this situation, it is critical that the recovered passive node reverts its data position to that of the latest completed replication step, (in this case snapshot-3), before accepting new replication steps from the current active node.

Conclusions

A Shared Nothing Storage Cluster architecture presents several challenges over Shared Storage Cluster configurations and this article highlights the various obstacles that must be considered together with solutions for resolving those challenges.

The most fundamental issue to take into account when considering which approach to deploy however is addressing the balance of cost and separation versus how much data loss can be tolerated.