What is DRBD?

DRBD is an abbreviation of Distributed Replicated Block Device. DRBD is a block device which is designed to build high-availability clusters. This is done by mirroring a whole block device via (a dedicated) network. You could see it as a network RAID1.

Pre-Configuration Requirements:

I used two nodes with the following system settings:

  1. cnode1.rnd (hostname) with the IP address 172.16.4.80. Operating system CentOS 4.5, two SCSI hard drives of 18 GB. Regarding partition I used the following scheme on cnode1.rnd:
  2. /dev/sda1 / 13257MB ext3 primary

    /dev/sda2 4095MB swap primary

    /dev/sdb1 umounted 128MB ext3 primary (for DRBD’s meta data)

    /dev/sdb2 umounted 4306MB ext3 primary (used as DRBD’s Disk)

  3. cnode2.rnd (hostname) with the IP address 172.16.4.81. Operating system CentOS 4.5 one IDE drive of 10 GB. The following partition scheme was used on cnode2.rnd:
  4. /dev/hda1 / 4503MB ext3 primary

    /dev/hda2 769MB swap primary

    /dev/hda3 unmounted 128MB ext3 primary (for DRBD’s meta data)

    /dev/hda4 unmounted 4306MB ext3; primary (used as DRBD’s Disk)

The sizes and names of the partitions may vary according to disk capacity, and hard disks used. You can define a partition scheme according to your requirements.

My /etc/hosts file on both nodes (cnode1.rnd & cnode2.rnd) looks like this:

127.0.0.1		localhost.localdomain		localhost
172.16.4.80		cnode1.rnd			cnode2
172.16.4.81		cnode2.rnd			cnode3

DRBD Installation:

Install DRBD software and DRBD’s kernel module on cnode1.rnd and cnode2.rnd:

yum install -y kmod-drbd drbd

Load DRBD’s kernel module with insmod:

insmod /lib/modules/2.6.9-55.0.9.EL/extra/drbd.ko

Verify with lsmod that module is loaded. If you see DRBD then move to the sample configuration section.

Configuring DRBD:

The configuration file of DRBD is /etc/drbd.conf. So we will edit this file and make the following changes in it and copy it to the other node (/etc/drbd.conf will be same on both nodes).

On cnode1.rnd edit this file with any editor. I am using vi:

vi /etc/drbd.conf

resource r0 {
  protocol C;
        incon-degr-cmd "halt -f";
        startup {
    degr-wfc-timeout 120;    # 2 minutes.
  }
        disk {
    on-io-error   detach;
  }
        net {
  }
syncer {
        rate 3M;
        group 1;
        al-extents 257;
  }
  on cnode1.rnd {
    device    /dev/drbd0;
    disk      /dev/sdb2;
    address   172.16.4.80:7789;
    meta-disk /dev/sdb1[0];
  }
  on cnode2.rnd {
    device    /dev/drbd0;
    disk      /dev/hda4;
    address   172.16.4.81:7789;
    meta-disk /dev/hda3[0];
  }
}

Save your changes and copy it to the other node (cnode2.rnd):

scp /etc/drbd.conf root@172.16.4.81:/etc/

Now let’s start the DRBD daemon on both nodes, but before this we want to start DRBD daemon on the next reboot. For this we will use the command chkconfig on both nodes.

chkconfig –level 235 drbd on

/etc/init.d/drbd start

You will find out that DRBD has started the synchronization process. You can see it running:

/etc/init.d/drbd status

or

cat /proc/drbd

drbd.conf Configuration Technical Details:

Protocols:

  1. A write operation is complete as soon as the data is written to disk & sent to network.
  2. B write operation is complete as soon as a reception acknowledgment arrives.
  3. C write operation is complete as soon as a write acknowledgment arrives.

Hostname:

Should match exactly the output of

uname -n

Device:

The device node to use: /dev/drbd0 – DRBD block device.

Address, Port:

The inet address and port to bind to locally, or to connect to the partner node.

Meta-disk:

The disks to store meta data. DRBD allows you to either place its meta data on the same backing device where it puts the actual usable production data (internal meta data), or on a separate block device (external meta data). I have allocated 128 MB for meta data on a external meta data block device. However you may concern table given below to estimate meta data sizes

Block device size DRBD meta data
1 GB 2 MB
100 GB 5 MB
1 TB 33 MB
4 TB 128 MB

Incon-degr-cmd:

What should be done in case the cluster starts up in degraded mode, but knows it has inconsistent data (the lower level device reports io-error).

on-io-error detach:

If the lower level device reports io-error, the node drops its backing storage device and continues in diskless mode.

degr-wfc-timeout:

Wait for connection time out, if this node was a degraded cluster. This is used instead of wfc-timeout (wait for cluster time out).

Syncer:

Limit the bandwidth used by the resynchronization process.

group:

All devices in one group are resynchronized parallel. Resynchronization of groups is serialized in ascending order.

Al-extents:

Automatically performs hot area detection. With this parameter you control how big the hot area can get. Each extent marks 4M of the backing storage (=low level device). In case a primary node leaves the cluster unexpectedly the areas covered by the active set must be resynced upon rejoin of the failed node. The data structure is stored in the meta-data area, therefore each change of active set is a write operation to meta-data device. A higher number of extents gives longer resync times but less updates to the meta-data.

The Do And Don’t:

Do not attempt to mount a DRBD in secondary state. Once you set up DRBD never bypass it or access the underlying device directly!

Test DRBD:

Make cnode1.rnd primary and mount the block device of the primary node (cnode1.rnd) on /mnt/disk:

drbdsetup /dev/drbd0 primary –do-what-I-say

mount /dev/drbd0 /mnt/disk

Copy some files and folders to it:

cp -r /root/documents /mnt/disk

Now umount DRBD’s block device and make the primary nodeĀ  “secondary”:

umount /mnt/disk/

drbdadm secondary all

Make the cnode2.rnd primary and mount block device on /mnt/disk:

drbdsetup /dev/drbd0 primary –do-what-I-say

mount /dev/drbd0 /mnt/disk

You will find that documents exists on cnode2.rnd.

Leave a Reply