Monday 27 October 2014

Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 6

Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 1 - Introduction and lab description
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 2 - Deploy and configure the PKI infrastructure
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 3 - Configure and test the Exchange 2013 Client Access role
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 4 - Install CentOS 7
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 5 - Install and configure HAProxy
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 6 - Make HAProxy highly available (this page)
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 7 - Demo



In part 5 we installed and fully configured HAProxy. Technically we would be good to go, but we take it one step further: we want our HAProxy servers to be highly available.

In this part we will install and configure keepalived and will make HAProxy highly available. Part 6 is organised into the following sections:

  • Install and configure keepalived.
  • Testing keepalived.

Install and Configure keepalived

We log on to lab-hap01 via Putty. By default we'll download the source tarball to root's home directory:

cd ~
wget http://www.keepalived.org/software/keepalived-1.2.13.tar.gz













We then uncompress the tarball, change to the uncompressed directory, configure the installation, compile the program, and install it. Lots of fast scrolling output, so no screenshots. Here are the commands:

tar –zxvf keepalived-1.2.13.tar.gz
cd keepalived-1.2.13
./configure
make
make install

We need to tell the kernel to allow binding to non-local addresses, so we open the /etc/sysctl.conf file and add the following line:

net.ipv4.ip_nonlocal_bind=1







We create the /etc/keepalived/keepalived.conf file (note that the file will be created and written to by vi when we save it)...

mkdir /etc/keepalived
vi /etc/keepalived/keepalived.conf

...and add the following content:

global_defs {
  notification_email {
    administrator@digitalbrain.com.au
  }
  notification_email_from lab-hap01@digitalbrain.com.au
  smtp_server 10.30.1.11
  smtp_connect_timeout 30
}

vrrp_script check_haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
}

vrrp_instance VI_1 {
  interface ens160
  state MASTER
  virtual_router_id 10
  priority 101
  virtual_ipaddress {
    10.30.1.15
  }
  track_script {
    check_haproxy
  }
  smtp_alert
}


In your lab, update the interface name in the interface ens160 line with your server’s interface, for example interface eth0. If not sure what your interface name is, then run ifconfig on your server:








Also, if you still remember from Part 1, the HAProxy virtual IP in my lab is 10.30.1.15. In yours, replace the virtual_ipaddress value with one that’s valid in your environment.

Our keepalived solution also supports SMTP (email) notifications in case something happens. In your implementation, change the recipient in the notification_email directive. Also change the sender e-mail address on the notification_email_from line with the hostname@yourdomain that’s valid for your environment. Hostname is the computer’s host part of its FQDN. Technically it can be anything you like, but it makes sense to have it set so.

Due to a coding issue in keepalived which returns a blank host name under certain conditions, we need to add the following line to the /etc/hosts file, otherwise email notifications will fail:

10.30.1.13   lab-hap01.localdomain



It is important that we add the FQDN of the server, and not just the hostname.

For those interested, I found in my lab that the gethostbyname(name.nodename) function in /root/keepalived-1.2.13/lib/utils.c (remember that we extracted the sources to /root/keepalived-1.2.13) will return NULL, and keepalived will greet Exchange with HELO (null). Exchange doen't know who (null) is, and therefore it will drop the communication, causing SMTP notifications to fail.









I also want to make the point that in my lab the SMTP server is a single point of failure: email notifications go to the IP address of a single server as opposed to a clustered/HA SMTP agent. In real life I would send notifications to a system that is always up and not affected by failures of a single mail server

For additional safety in terms of monitoring, SNMP support can also be built into keepalived and integrated into your enterprise monitoring system of choice. Not in this lab.

We now make the keepalived daemon start automatically:

cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/
chmod +x /etc/init.d/keepalived
chkconfig keepalived on
cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/






What these commands do:
  • cp - it copies the keepalived init script from the default installation location to /etc/init.d. More on /etc/init.d here.
  • chmod - makes the script executable. More on chmod here.
  • chkconfig - enables the keepalived service to run at startup. More on chkconfig here.
  • cp - it copies the default keepalived configuration file to /etc/sysconfig/. The /etc/sysconfig directory contains system configuration files, including our keepalived configuration file. For more click here.
The default daemon line in /etc/init.d/keepalived looks like this:

daemon keepalived ${KEEPALIVE_OPTIONS}

Open /etc/init.d/keepalived in your favorite text editor and change the daemon line as follows so that keepalived can actually start:

daemon /usr/local/sbin/keepalived ${KEEPALIVED_OPTIONS}














Our CentOS 7 minimal install doesn’t include killall. We need it in our keepalived config script to test whether the haproxy service is running. We install it as part of the psmisc package:

yum install psmisc –y

Also, by default, the CentOS 7 firewll blocks VRRP traffic. VRRP is essential for keepalived to function. We allow VRRP traffic with the following command – read more about it here:

firewall-cmd –-permanent -–add-rich-rule='rule protocol value="vrrp" accept'

Now we restart our server:

shutdown –r now

Log back on as root and run these commands to do a basic health check:
  • service keepalived status
  • The service is running.
  • cat /var/log/messages | grep VRRP_Instance
  • keepalived started in MASTER mode.

 
  • ip a | grep "inet 10"
  • We have the virtual IP bound to our ens160 interface on lab-hap01...
 

  • ping 10.30.1.15 (run it on another machine, e.g. LAB-WS01)
  • ...and it is communicating on the VIP.

  • firewall-cmd –-list-rich-rule
  • We confirm that our firewall rule survived the restart.

Awesome, we are looking good!

Now we repeat these steps on lab-hap02, with a couple of important differences.
  1. In the /etc/keepalived/keepalived.conf file we change the priority to a lower value than the master, for instance to 100:
  1. While still in the /etc/keepalived/keepalived.conf file, we also change the notification_email_from line to lab-hap02@digitalbrain.com.au.
  2. This is an obvious one, but need to ensure it doesn't slip through the cracks: in the /etc/hosts file we enter the correct hostname for lab-hap02.
When it’s all done and lab-hap02 has been rebooted, we repeat the same tests:

  • service keepalived status
  • The service is running.

  • cat /var/log/messages | grep VRRP_Instance
  • keepalived started in BACKUP mode.

  • Note that the server entered the BACKUP state because it received a higher priority advert and removed the VIP from its network card as the VIP is supposed to live on the MASTER.
  •  
  • ip a | grep "inet 10"
  • We do NOT have the virtual IP bound to our ens160 interface on lab-hap02 because lab-hap02 is the BACKUP node.
  • firewall-cmd --list-rich-rule
  • We confirm that our firewall rule survived the restart.

We skipped the ping test as the VIP is bound to lab-hap01 and therefore it hasn’t got anything to do with lab-hap02 testing.

Testing keepalived

Time for some HA testing. To recap:
  • haproxy is running on both lab-hap01 and lab-hap02.
  • keepalived is running on both lab-hap01 and lab-hap02.
  • lab-hap01 is the MASTER and lab-hap02 is the BACKUP.
  • lab-hap01 holds the VIP.

Let’s confirm. On lab-hap01:

ps –A | grep haproxy
ps –A | grep keepalived
ip a | grep "inet 10"









Same check on lab-hap02:









On lab-hap01 we stop haproxy and we check its IP addresses:

systemctl stop haproxy.service
ip a | grep "inet 10"






Then we confirm that lab-hap-01 is no longer the MASTER (expected for the VIP is not bound to its network card):

cat /var/log/messages | grep VRRP-Instance






On lab-hap02 we confirm that the VIP has been bound to the NIC:

ip a | grep "inet 10"






Then we confirm that lab-hap02 is now the new MASTER:

cat /var/log/messages | grep VRRP-Instance








Up to this point we confirmed that stopping haproxy on lab-hap01 was correctly detected and the VIP has been transferred to lab-hap02. Therefore, if we point our Exchange DNS records to the VIP, continued service is assured.

Now we start the haproxy service on lab-hap01 and check the IP address:

systemctl start haproxy.service
ip a | grep "inet 10"






Checking the IP address on lab-hap02 shows that the VIP has been removed from it:






And last, we want to know how long it takes for the VIP to fail over once a service failure is detected. For this we kick off a continuous PING to the VIP from LAB-WS02:

ping 10.30.1.15 -t

Then we stop the haproxy service on lab-hap01 and watch how many pings are lost while service failure is detected and the VIP is moved to lab-hap02:

systemctl stop haproxy.service

Finally we start the haproxy service on lab-hap01 and, again, we watch the pings:

systemctl start haproxy.service

The screenshot shows that failover is virtually instantaneous, with only one ping lost during service failover:












Impressive!

In this part we installed, configured and tested keepalived, the bit which makes HAProxy highly available, on both HAProxy servers. Technically we've almost reached the end of our journey, with only one last step left: confirm that client access actually works, traffic is load balanced, and service level failure is correctly detected and handled.

In part 7, our last part, we will test various client access methods and we’ll confirm that load balancing, error detection and high availability actually works from a client’s perspective too.



Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 1 - Introduction and lab description
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 2 - Deploy and configure the PKI infrastructure
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 3 - Configure and test the Exchange 2013 Client Access role
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 4 - Install CentOS 7
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 5 - Install and configure HAProxy
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 6 - Make HAProxy highly available (this page)
Highly Available L7 Load Balancing for Exchange 2013 with HAProxy – Part 7 - Demo

5 comments:

  1. Hola
    I really like your write-up very much, very appreciated. I think one thing was missing and that is to add the line net.ipv4.ip_nonlocal_bind=1 to /etc/sysctl.conf. This allows HAProxy to bind to a non-existing IP, so when the VIP is on the other node, HAProxy can still startup. Otherwise it fails.
    Further, really great article! Many thanks!

    ReplyDelete
  2. Ronald, this is already addressed. Please see the section "Install and Configure keepalived" at the top of this article.

    ReplyDelete
  3. Hi Zoltan.

    I am constrained by the configuration in the backup node at the time of the test by turning off the haproxy service in the master node the result of virtual ip can not move in the backup node. please also show for the configuration in the backup node in this discussion so I can do the testing again.

    Many Thx.

    ReplyDelete
    Replies
    1. Billy, if you followed the instructions closely then it will work. Please go through the configuration again. There is one important thing to note: you may have to alter the procedure to start the services automatically. Things have changed in the way service autostart is done since I published the articles. Not sure about keepalived, but I am fairly sure that HAProxy changed it as of v1.8.
      Apart from that , everything else is the same, so if you followed the instructions then it will work. Look out for typos in the config files and missed items.

      Delete