With QUADStor synchronous mirroring the following High Availability (HA) scenarios are possible

  • Active-Active (Dual Primary)
  • Active-Passive (Single Primary)
  • Active-Passive with Manual Failover

In this tutorial we describe a step-by-step procedure for achieving HA with QUADStor VDisks

What is Synchronous Mirroring

Synchronous mirroring is a block level replication in real time. Data written to a block on a VDisk is mirrored/replicated to a peer VDisk on another node at the same time. It is left to QUADStor to handle conditions such as failure to write to the peer node, the peer node failure etc.

How is HA achieved ?

Since VDisks on two node contain the same data, even if one of the VDisks node fails, the other node can still read/write from/to a VDisk

HA from a client's perspective

To a client both VDisks present the same vendor identification, serial number etc. The client sees both VDisks as paths to the same disk and is unaware that the VDisk reside on two different systems. When the client is unable to read/write on one path, it continues on the other path.

Now the client can access the paths in a failover (active-passive) configuration, round-robin (active-active) configuration etc.

In a cluster configuration, more than one client can access the VDisk simultaneous. Usually such a configuration requires cluster aware application, file systems. In order to aid such applications, serialized access to the VDisks is maintained by support for SCSI Persistent Reservations (GFS2, OCFS2 etc) and SCSI Compare and Write command (ESXi)

Deployment Scenario

In order to describe the HA capabilities of QUADStor we have installed three virtual machines vm10, vm11 and vm12 with CentOS 6.2 as described below

vm10 and vm11 are installed with QUADStor Version 3.0.29 software

vm12 acts as the initiator

The following are the interfaces configured

On vm10:

/sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:23:CE:2C
inet addr:10.0.13.100 Bcast:10.0.13.255 Mask:255.255.255.0
eth1 Link encap:Ethernet HWaddr 52:54:00:03:BC:A8
inet addr:10.0.13.150 Bcast:10.0.13.255 Mask:255.255.255.0

On vm11:

/sbin/ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:61:5C:1B
inet addr:10.0.13.101 Bcast:10.0.13.255 Mask:255.255.255.0
eth1 Link encap:Ethernet HWaddr 52:54:00:56:E2:65
inet addr:10.0.13.151 Bcast:10.0.13.255 Mask:255.255.255.0

For our setup we will use 10.0.13.100 and 10.0.13.101 as the interfaces accessed by the client vm12. 10.0.13.150 and 10.0.13.151 will be used as as a private network between vm10 and vm11 for mirroring data

On vm10:

/sbin/route add 10.0.13.151 gw 10.0.13.150 dev eth1
/sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.13.151 10.0.13.150 255.255.255.255 UGH 0 0 0 eth1

On vm11:

/sbin/route add 10.0.13.150 gw 10.0.13.151 dev eth1
/sbin/route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.13.150 10.0.13.151 255.255.255.255 UGH 0 0 0 eth1

We now configure QUADStor to use 10.0.13.150 and 10.0.13.151 as the mirroring addresses

On vm10 the following is added to /quadstor/etc/ndrecv.conf

RecvAddr=10.0.13.150

On vm11 the following is added to /quadstor/etc/ndrecv.conf

RecvAddr=10.0.13.151

By the above steps we have configured vm10 and vm11 to use the network 10.0.13.150 <-> 10.0.13.151 for VDisk mirroring traffic

Configuring Fencing

Since vm10 and vm11 are two virtual machines using QEMU/KVM virtualization, we configure fence_kvm. Useful links for configuring fence_kvm are http://clusterlabs.org/wiki/Guest_Fencing and http://www.daemonzone.net/e/3/

We skip any clustering related configuration. As long as fence_xvm can fence the peer node it is sufficient for us.

Testing fencing

On vm10 to fence vm11 we run the command

/usr/sbin/fence_xvm -H vm11

On vm11 to fence vm10 we run the command

/usr/sbin/fence_xvm -H vm10

QUADStor Service is now started on vm10 and vm11.

To know if the RecvAddr is correctly configured we use the ndconfig command

On vm10:

/quadstor/bin/ndconfig
Node Type: Mirror Recv
Recv Address: 10.0.13.150
Node Status: Recv Inited

On vm11:

/quadstor/bin/ndconfig 
Node Type: Mirror Recv
Recv Address: 10.0.13.151
Node Status: Recv Inited

We now configure QUADStor on vm10 and vm11 for fencing

On vm10:

/quadstor/bin/qmirrorcheck -a -t fence -r 10.0.13.151 -v '/usr/sbin/fence_xvm -H vm11'

The above command specifies that in order to fence vm11 we need to run the following command

/usr/sbin/fence_xvm -H vm11 

Note that we have specified the 10.0.13.151 which is the address vm11 is using for mirroring traffic.

/quadstor/bin/qmirrorcheck -l
Mirror Ipaddr Type Value
10.0.13.151 fence /usr/sbin/fence_xvm -H vm11

On vm11:

/quadstor/bin/qmirrorcheck -a -t fence -r 10.0.13.150 -v '/usr/sbin/fence_xvm -H vm10'
/quadstor/bin/qmirrorcheck -l
Mirror Ipaddr Type Value
10.0.13.150 fence /usr/sbin/fence_xvm -H vm10

Configuring Storage

1. We configure Storage Pools as described in http://www.quadstor.com/storage-pools.html

For this example we are using a pool named 'HAPOOL'. This pool is created on both vm10 and vm11

2. On vm10 and vm11 we now configure physical storage as described in http://www.quadstor.com/support/64-configuring-disk-storage.html

3. We then create a new VDisk M1 on vm10 as show in the figure below

VDisk M1

4. We then click on "Modify" and then under the mirroring configuration specify the address of vm11 as shown in the figure below

Configuring Mirroring

5. Then we Submit and we are done as shown in the figure below

After Configuring Mirroring

To know the current roles of a VDisk we run qsync

On vm10:

/quadstor/bin/qsync -l
Local Remote Pool Dest Addr Role Status
M1 M1 HAPOOL 10.0.13.151 Master Enabled

On vm11:

/quadstor/bin/qsync -l
Local Remote Pool Dest Addr Role Status
M1 M1 HAPOOL 10.0.13.150 Slave Enabled

At this point M1 on vm10 is configured to mirror to M1 on vm11. M1 on vm10 acts as the master while M1 on vm11 acts as a slave.

From a performance, speed, or users point of view there is no difference between a VDisk in a master role or a VDisk in a slave role. However there is one significant difference, in the absence of the peer a VDisk can only perform writes in a Master role. This is to ensure data integrity in the loss of peer. In the case of a probable peer failure, the online node ensures that the peer is really offline by fencing the node. On a successful fence, if a VDisk is in slave role it would assume a master role. One a fence failure a master node will assume the role of a slave. When a lost peer is back online new writes to the Master VDisk is synced back to the slave VDisk.

SNMP Trap Notification

Before we get to the client configuration, we also configure vm10 and vm11 to a snmptrapd on 10.0.13.33.

On vm10:

We add the following line to /quadstor/etc/quadstor.conf

TrapAddr=10.0.13.33

On vm11:

We add the following line to /quadstor/etc/quadstor.conf

TrapAddr=10.0.13.33

On 10.0.13.33 where we have snmptrapd running we copied QUADSTOR-REG.mib and QUADSTOR.mib (available from downloads) to /home/quadstor/mibs

sh# export MIBS="ALL"
sh#export MIBDIR=/home/quadstor/mibs
sh#snmptrapd -Lf /tmp/test.log
sh#tail -f /tmp/test.log
NET-SNMP version 5.7.2

Client configuration

We now configure a linux client vm12 to access M1 on vm10 and vm11 simultaneously.

Below is our /etc/multipath.conf configuration for QUADStor VDisks

devices {
device {
vendor "QUADSTOR"
product "VDISK"
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
rr_min_io 100
rr_weight priorities
path_grouping_policy multibus
}
}

vm12 accesses M1 on vm10 and vm11 using the iscsi protocol

We use the below script to login

list=`/sbin/iscsiadm -m discovery -p 10.0.13.100 -t st | grep "10.0.13.100" | cut -f 2 -d' '`
for i in $list; do
/sbin/iscsiadm -m node -T $i -p 10.0.13.100 -l
done
list=`/sbin/iscsiadm -m discovery -p 10.0.13.101 -t st | grep "10.0.13.101" | cut -f 2 -d' '`
for i in $list; do
/sbin/iscsiadm -m node -T $i -p 10.0.13.101 -l
done

And run the below script when we want to logout

list=`/sbin/iscsiadm -m discovery -p 10.0.13.100 -t st | cut -f 2 -d' '`
for i in $list; do
/sbin/iscsiadm -m node -T $i -u
/sbin/iscsiadm -m node -T $i -o delete
done
list=`/sbin/iscsiadm -m discovery -p 10.0.13.101 -t st | cut -f 2 -d' '`
for i in $list; do
/sbin/iscsiadm -m node -T $i -u
/sbin/iscsiadm -m node -T $i -o delete
done

Running the login script we get

Logging in to [iface: default, target: iqn.2006-06.com.quadstor.vdisk.M1, portal: 10.0.13.100,3260] (multiple)
Login to [iface: default, target: iqn.2006-06.com.quadstor.vdisk.M1, portal: 10.0.13.100,3260] successful.
Logging in to [iface: default, target: iqn.2006-06.com.quadstor.vdisk.M1, portal: 10.0.13.101,3260] (multiple)
Login to [iface: default, target: iqn.2006-06.com.quadstor.vdisk.M1, portal: 10.0.13.101,3260] successful.

After login we check the multipath configuration

On vm12:

multipath -ll

mpathba (36ef899ba04190ade706b96ec16c2ed0b) dm-0 QUADSTOR,VDISK
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 13:0:0:0 sdg 8:96 active ready running
`- 14:0:0:0 sdh 8:112 active ready running

Now we configure LVM using the following series of commands although we can use /dev/mapper/mpathba directly without the need for LVM.

sh# pvcreate /dev/mapper/mpathba
Writing physical volume data to disk "/dev/mapper/mpathba"
Physical volume "/dev/mapper/mpathba" successfully created

sh# vgcreate vgtest /dev/mapper/mpathba
Volume group "vgtest" successfully created

sh# lvcreate -L99G -n testvol vgtest
Logical volume "testvol" created

sh# /sbin/mkfs.ext4 /dev/mapper/vgtest-testvol
mke2fs 1.41.12 (17-May-2010)
Discarding device blocks: 2621440/25952256
...

...
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 36 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
sh# mount /dev/mapper/vgtest-testvol /mnt/

We now use dd to write data to /mnt/.

sh# dd if=/dev/zero of=/mnt/1 bs=1M count=10000

During the write we force a panic on vm10

sh# echo c > /proc/sysrq-trigger

We now check for failure messages in /var/log/messages on vm11

Mar 17 23:32:54 vm11 kernel: WARN: tdisk_mirror_write_setup:1554 peer takeover, continuing with write command
Mar 17 23:32:54 vm11 mdaemon: WARN: node_usr_notify:556 VDisk M1 switching over to master role

At this point M1 on vm11 takes over as master

On vm11:

#sh /quadstor/bin/qsync -l
Local Remote Pool Dest Addr Role Status
M1 M1 HAPOOL 10.0.13.150 Master Disabled

On 10.0.13.33 we received the following trap

2013-03-17 23:37:12 vm11 [UDP: [10.0.13.101]:38262->[10.0.13.33]:162]:
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (63838) 0:10:38.38
SNMPv2-MIB::snmpTrapOID.0 = OID: QUADSTOR-TRAPS-MIB::quadstorNotification
QUADSTOR-TRAPS-MIB::notifyType = INTEGER: warning(2)
QUADSTOR-TRAPS-MIB::notifyMessage = STRING: VDisk M1 switching over to master role

dd on vm12 completes successfully

sh# dd if=/dev/zero of=/mnt/1 bs=1M count=10000
10000+0 records in
10000+0 records out
10485760000 bytes (10 GB) copied, 174.169 s, 60.2 MB/s

mutipath shows one of the paths as faulty

sh# multipath -ll
mpathba (36ef899ba04190ade706b96ec16c2ed0b) dm-0 QUADSTOR,VDISK
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 13:0:0:0 sdg 8:96 active faulty running
`- 14:0:0:0 sdh 8:112 active ready running

Now when vm10 is back online we check qsync status

On vm10:

sh# /quadstor/bin/qsync -l
Local Remote Pool Dest Addr Role Status
M1 M1 HAPOOL 10.0.13.151 Slave Resyncing

The 'Resyncing' status is for the data written to VDisk M1 on vm11 while vm10 was offline. After the resyncing has completed

On vm10:

sh# /quadstor/bin/qsync -l
Local Remote Pool Dest Addr Role Status
M1 M1 HAPOOL 10.0.13.151 Slave Enabled

On vm11:

sh# /quadstor/bin/qsync -l
Local Remote Pool Dest Addr Role Status
M1 M1 HAPOOL 10.0.13.150 Master Enabled

multipath on vm12 now shows both paths as active,ready

sh# multipath -ll
mpathba (36ef899ba04190ade706b96ec16c2ed0b) dm-0 QUADSTOR,VDISK
size=100G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 13:0:0:0 sdg 8:96 active ready running
`- 14:0:0:0 sdh 8:112 active ready running