QUADStor clusters have the advantage of distributing load across multiple nodes and high availability

A QUADStor Cluster consists of the following nodes

1. Controller Node
2. Client Node

For a controller node the following packages are installed
quadstor-core
quadstor-itf

For a client node the following packages are installed
quadstor-client
quadstor-itf

The procedure for installing the packages is as listed in http://www.quadstor.com/support/123-installation-on-rhel-centos-sles-debian.html and http://www.quadstor.com/support/61-installation-on-freebsd-8-2.html

However on the client node instead of installing the quadstor-core package, install the quadstor-client package

On the controller node ensure that the following ports are allowed for TCP traffic in your firewall configuration

9950
9951
9952
9954
9956


On the controller node create a file /quadstor/etc/ndcontroller.conf and add the following lines
Controller=<Controller IP Address>
Node=<Node IP Address>

In the above, Controller IP Address is the IP Address on which the controller binds. The Node IP Address is the client's IP Address. For each client in the cluster there will be a Node=<Node IP Address> For example

Controller=10.0.13.4
Node=10.0.13.5
Node=10.0.13.6

In the above example, the controller will bind to 10.0.13.4 for cluster traffic. 10.0.13.5 and 10.0.13.6 are the client which are allowed in the cluster. It is a good idea to build the cluster network as private network so that the cluster traffic does not interfere with the data path of the VDisk clients

On the client node create a file /quadstor/etc/ndclient.conf and add the following lines
Controller=<Controller IP Address>
Node=<Node IP Address>

For example for 10.0.13.5 the ndclient.conf contents would be

Controller=10.0.13.4
Node=10.0.13.5

Similarly for node 10.0.13.6 the ndclient.conf contents would be

Controller=10.0.13.4
Node=10.0.13.6

In the above example 10.0.13.4, 10.0.13.5 and 10.0.13.6 node form a cluster. Only on the controller node can configuration such as adding physical storage, adding VDisks etc can be performed. Configuration changes are automatically propogated to the client nodes. VDisks are accessible from any of the controller or client nodes

Any changes to the ndclient.conf or ndcontroller.conf will require a restart of the quadstor service on that node. The only exception to that is the addtion/deletion of "Node=..." lines in ndcontroller.conf

Shared storage access

In order to read/write to a VDisk, a client node needs to have access to the physical disks configured on the controller node. physical disks are identified by their serial number and/or SCSI Device identifiers. However for a client node it is not mandatory to have access to all/any of the physical disks configured. If a disk to which data is to be written/read from is not accessible by the client node, the data is read/written through the controller node. This however would leads to a drop in IO performance.

Disk partitions cannot be configured as physical storage even if accessible by the client nodes

High Availability

In a cluster configuration, VDisks available as long as the controller node is active. If the controller node is inaccessible by any of the client nodes, the entire cluster is unavailable.

High availability is achieved by configuring any one of the client nodes as a master node. To configure a client node as a master node add the following line to /quadstor/etc/ndclient.conf
Type=Master

For example to make 10.0.13.5 as a master node in our example the contents of ndclient.conf will be

Controller=10.0.13.4
Node=10.0.13.5
Type=Master
Fence=/usr/sbin/fence_apc --ssh -l ...

Master Node Requirements

  • For a client node to perform as a master the node must have access to all the physical disks configured (or will be configured) on the controller node.
  • The available memory (RAM) of the master must be equal or greater than the available memory of the controller node

Once a master node configured, metadata state is synced between the controller node and the master node. The metadata traffic can be limited to a private network between the controller node and the master node. This is achieved by adding the following lines to ndclient.conf and ndcontroller.conf

HABind=<Bind IP Address>
HAPeer=<Peer IP Address>

For example the contents of /quadstor/etc/ndcontroller.conf could be

Controller=10.0.13.4
Node=10.0.13.5
Node=10.0.13.6
HABind=192.168.1.2
HAPeer=192.168.1.3
Fence=/usr/sbin/fence_apc --ssh -l ...

And the contents of of /quadstor/etc/ndclient.conf could be

Controller=10.0.13.4
Node=10.0.13.5
Type=Master
HABind=192.168.1.3
HAPeer=192.168.1.2
Fence=/usr/sbin/fence_apc --ssh -l ...

In the above example the controller would bind to 192.16.1.2 and sync metadata state to and from 192.168.1.3 and similarly the master would bind to 192.168.1.3 and sync metadata state to and from 192.168.1.2

NOTE (HABind and HAPeer are optional. If they are missing the Controller and Node values are used instead)

Once a master has been setup and metadata state has been sync (Initial metadata state sync up time is between 1 and 5 minutes), if the controller goes down the other client nodes can still continue to read/write to VDisks as long as the master node is up.

Once the controller node is back online, it would sync metadata state back from the master node and take over as the cluster owner

Node Fencing

QUADstor daemon on a client node with type 'Master' will not start if a fence command is not specified

Installing Fence Agents

On RHEL/CentOS 6.x
yum install fence-agents
On Debian7.x
apt-get install fence-agents
For a list of possible fence agents for your hardware please refer https://access.redhat.com/site/articles/28603

In order to fence the controller during a takeover add the following to ndclient.conf

Fence=<fence cmd>

for example

Fence=/usr/sbin/fence_apc --ssh -l userid -p password --plug=1 ...

Note that everything after Fence= is considered as the fence command to execute. It is a good idea to test the fence command manually on the command line

Similarly add a command to fence the client on controller startup. On controller startup if the client is not reachable, it needs to be fenced before the controller can resume.

With fencing configured ownership decisions are easier and take less time. After adding the fence command ensure that the quadstor services are restarted.

Clustering status of Controller, Master and Client Nodes

A new utility 'ndconfig' is available to know the current status of a cluster node. In order to use the tool run the following command as root

/quadstor/bin/ndconfig

The following is an example output when ndconfig is run on a controller node

[root@quadstor]# /quadstor/bin/ndconfig 
Node Type: Controller
Controller: 10.0.13.7
HA Peer: 10.0.13.6
HA Bind: 10.0.13.7
Node Status: Controller Inited
Node Role: Master
Sync Status: Sync Done
Nodes: 10.0.13.6 10.0.13.5 10.0.13.4
Node Type: Mirror Recv
Recv Address: 10.0.13.7
Node Status: Recv Inited

The following is an example output when ndconfig is run on a client node

[root@quadstor]# /quadstor/bin/ndconfig 
Node Type: Client
Controller: 10.0.13.4
Node: 10.0.13.7
Node Status: Client Inited

In the above output "Node Role" indicates the current role of the node. Node Role can be Master, Standby or Unknown. If role is standby, the node will take over as Master when the peer node is down

Sync Status is important for correct HA operation. If Sync Status is 'Sync Done' High Availability of the cluster is possible. Other statues are 'Sync Error', 'Sync InProgress'.

HA Limitations

In order to effectively perform a switch over from a Master node to a Standby node both node should have a Sync Status of 'Sync Done'. There are certain conditions which can prevent from attaining this status.

1. The status is still 'Sync InProgress' when the node master crashed or restarted.

The solution to this is first start/restart the quadstor service on the controller node and then start/restart quadstor service on the client node

2. The status is 'Sync Error' and the node master is the client node and not the controller.

The solution to this is to start/restart the quadstor service on the controller node. If the problem still persists, then stop the quadstor service on the client node, start the service on the controller and then the client node.

3. The status is 'Sync Error' and the node master is the controller node.

Usually the client node will try to restart the sync process, however if the 'Sync Error' state persists then restarting the quadstor service will fix this problem