What is DRBD?
DRBD is an abbreviation of Distributed Replicated Block Device. DRBD is a block device which is designed to build high-availability clusters. This is done by mirroring a whole block device via (a dedicated) network. You could see it as a network RAID1.
Pre-Configuration Requirements:
I used two nodes with the following system settings:
- cnode1.rnd (hostname) with the IP address 172.16.4.80. Operating system CentOS 4.5, two SCSI hard drives of 18 GB. Regarding partition I used the following scheme on cnode1.rnd:
- cnode2.rnd (hostname) with the IP address 172.16.4.81. Operating system CentOS 4.5 one IDE drive of 10 GB. The following partition scheme was used on cnode2.rnd:
/dev/sda1 / 13257MB ext3 primary
/dev/sda2 4095MB swap primary
/dev/sdb1 umounted 128MB ext3 primary (for DRBD’s meta data)
/dev/sdb2 umounted 4306MB ext3 primary (used as DRBD’s Disk)
/dev/hda1 / 4503MB ext3 primary
/dev/hda2 769MB swap primary
/dev/hda3 unmounted 128MB ext3 primary (for DRBD’s meta data)
/dev/hda4 unmounted 4306MB ext3; primary (used as DRBD’s Disk)
The sizes and names of the partitions may vary according to disk capacity, and hard disks used. You can define a partition scheme according to your requirements.
My /etc/hosts file on both nodes (cnode1.rnd & cnode2.rnd) looks like this:
127.0.0.1 localhost.localdomain localhost 172.16.4.80 cnode1.rnd cnode2 172.16.4.81 cnode2.rnd cnode3
DRBD Installation:
Install DRBD software and DRBD’s kernel module on cnode1.rnd and cnode2.rnd:
yum install -y kmod-drbd drbd
Load DRBD’s kernel module with insmod:
insmod /lib/modules/2.6.9-55.0.9.EL/extra/drbd.ko
Verify with lsmod that module is loaded. If you see DRBD then move to the sample configuration section.
Configuring DRBD:
The configuration file of DRBD is /etc/drbd.conf. So we will edit this file and make the following changes in it and copy it to the other node (/etc/drbd.conf will be same on both nodes).
On cnode1.rnd edit this file with any editor. I am using vi:
vi /etc/drbd.conf
resource r0 { protocol C; incon-degr-cmd "halt -f"; startup { degr-wfc-timeout 120; # 2 minutes. } disk { on-io-error detach; } net { } syncer { rate 3M; group 1; al-extents 257; } on cnode1.rnd { device /dev/drbd0; disk /dev/sdb2; address 172.16.4.80:7789; meta-disk /dev/sdb1[0]; } on cnode2.rnd { device /dev/drbd0; disk /dev/hda4; address 172.16.4.81:7789; meta-disk /dev/hda3[0]; } }
Save your changes and copy it to the other node (cnode2.rnd):
scp /etc/drbd.conf root@172.16.4.81:/etc/
Now let’s start the DRBD daemon on both nodes, but before this we want to start DRBD daemon on the next reboot. For this we will use the command chkconfig on both nodes.
chkconfig –level 235 drbd on
/etc/init.d/drbd start
You will find out that DRBD has started the synchronization process. You can see it running:
/etc/init.d/drbd status
or
cat /proc/drbd
drbd.conf Configuration Technical Details:
Protocols:
- A write operation is complete as soon as the data is written to disk & sent to network.
- B write operation is complete as soon as a reception acknowledgment arrives.
- C write operation is complete as soon as a write acknowledgment arrives.
Hostname:
Should match exactly the output of
uname -n
Device:
The device node to use: /dev/drbd0 – DRBD block device.
Address, Port:
The inet address and port to bind to locally, or to connect to the partner node.
Meta-disk:
The disks to store meta data. DRBD allows you to either place its meta data on the same backing device where it puts the actual usable production data (internal meta data), or on a separate block device (external meta data). I have allocated 128 MB for meta data on a external meta data block device. However you may concern table given below to estimate meta data sizes
Block device size | DRBD meta data |
1 GB | 2 MB |
100 GB | 5 MB |
1 TB | 33 MB |
4 TB | 128 MB |
Incon-degr-cmd:
What should be done in case the cluster starts up in degraded mode, but knows it has inconsistent data (the lower level device reports io-error).
on-io-error detach:
If the lower level device reports io-error, the node drops its backing storage device and continues in diskless mode.
degr-wfc-timeout:
Wait for connection time out, if this node was a degraded cluster. This is used instead of wfc-timeout (wait for cluster time out).
Syncer:
Limit the bandwidth used by the resynchronization process.
group:
All devices in one group are resynchronized parallel. Resynchronization of groups is serialized in ascending order.
Al-extents:
Automatically performs hot area detection. With this parameter you control how big the hot area can get. Each extent marks 4M of the backing storage (=low level device). In case a primary node leaves the cluster unexpectedly the areas covered by the active set must be resynced upon rejoin of the failed node. The data structure is stored in the meta-data area, therefore each change of active set is a write operation to meta-data device. A higher number of extents gives longer resync times but less updates to the meta-data.
The Do And Don’t:
Do not attempt to mount a DRBD in secondary state. Once you set up DRBD never bypass it or access the underlying device directly!
Test DRBD:
Make cnode1.rnd primary and mount the block device of the primary node (cnode1.rnd) on /mnt/disk:
drbdsetup /dev/drbd0 primary –do-what-I-say
mount /dev/drbd0 /mnt/disk
Copy some files and folders to it:
cp -r /root/documents /mnt/disk
Now umount DRBD’s block device and make the primary nodeĀ “secondary”:
umount /mnt/disk/
drbdadm secondary all
Make the cnode2.rnd primary and mount block device on /mnt/disk:
drbdsetup /dev/drbd0 primary –do-what-I-say
mount /dev/drbd0 /mnt/disk
You will find that documents exists on cnode2.rnd.