How to Set Up Nginx High Availability Cluster using Pacemaker on CentOS 7


High availability is an important topic nowadays because service outages can be very costly. It's prudent to take measures which will keep your your website or web application running in case of an outage. With the Pacemaker stack, you can configure a high availability cluster.

Pacemaker is a cluster resource manager. It manages all cluster services (resources) and uses the messaging and membership capabilities of the underlying cluster engine. We will use Corosync as our cluster engine. Resources have a resource agent, which is a external program that abstracts the service.

In an active-passive cluster, all services run on a primary system. If the primary system goes down, all services get moved to the next availabe server in cluster. An active-passive cluster makes it possible to do maintenance work without interruption.

In this tutorial, we will show you how to set up a high availability Nginx active-passive cluster. The web cluster will get addressed by its virtual IP address and will automatically fail over if a node fails.

The users will access web application through the virtual IP address, which is managed by Pacemaker. The Nginx service and the virtual IP are always located on the same host. When this host fails, they get migrated to the second host and the users will not notice the outage.





Prerequisites

Before you begin with this tutorial, you will need the following:

Two CentOS 7 Server, which will be the cluster nodes.

Throughout this guide We'll refer to these as webserver1 (IP address: 172.22.10.1) and webserver2 (IP address: 172.22.10.2).


Configuring Name Resolution

First, we need to make sure that both servers can resolve the hostname of the two cluster nodes. To accomplish that, we'll add entries to the /etc/hosts file. Follow this step on both webserver1 and webserver2.

Open /etc/hosts with nano or your favorite text editor.

nano /etc/hosts

Add the following entries to the end of the file.

172.22.10.1 webserver1.techsupportpk.com webserver1
172.22.10.2 webserver2.techsupportpk.com webserver2

Save and close the file.


Installing Epel Repository

EPEL or Extra Packages for Enterprise Linux repository is needed for installing Nginx packages. In this section, we will install the epel repository

Install EPEL Repository using the following command on both servers

yum -y install epel-release


Installing Nginx

We will install Nginx web server from the EPEL repository on both servers

yum -y install nginx

After the installation is complete, change the default index.html page on each server with a new page.

echo 'webserver1 - webserver1.techsupportpk.com</h1>' > /usr/share/nginx/html/index.html


and on webserver2

echo 'webserver2 - webserver2.techsupportpk.com
' > /usr/share/nginx/html/index.html


Installing Pacemaker

In this section, we will install the Pacemaker stack. You have to complete this step on both servers.

Install the Pacemaker stack and the pcs cluster shell.

yum install pacemaker corosync pcs -y

Now we have to start the pcs daemon, which is used for synchronizing the Corosync configuration across the nodes.

systemctl start pcsd.service

In order that the daemon gets started after every reboot, we will also enable the service.

systemctl enable pcsd.service
systemctl enable corosync.service
systemctl enable pacemaker.service

After you have installed these packages, there will be a new user on your system called hacluster. After the installation, remote login is disabled for this user. For tasks like synchronizing the configuration or starting services on other nodes, we have to set the same password for this user on both nodes.

passwd hacluster


Configuring Pacemaker

Next, we'll allow cluster traffic in FirewallD to allow our nodes to communicate.

First, check if FirewallD is running.

firewall-cmd --state

If it's not running, start it.

systemctl start firewalld.service

You'll need to do this on both nodes. Once it's running, add the high-availability service to FirewallD.

firewall-cmd --permanent --add-service=high-availability
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https

After this change, you need to reload FirewallD.

firewall-cmd --reload

Now that our both nodes can communicate to each other, we can set up the authentication between the two nodes by running this command on one host (in our case, webserver1).

pcs cluster auth webserver1 webserver2
Username: hacluster
Password: ******

You should see the following output:

Output
webserver1: Authorized
webserver2: Authorized

Next, we'll generate and synchronize the Corosync configuration on the same node. Here, we'll name the cluster webcluster, but you can call it whatever you like.

pcs cluster setup --name webcluster webserver1 webserver2

You'll see the following output:

Output
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop  pacemaker.service
Redirecting to /bin/systemctl stop  corosync.service
Killing any remaining services...
Removing all cluster configuration files...
webserver1: Succeeded
webserver2: Succeeded

The corosync configuration is now created and distributed across all nodes. The configuration is stored in the file /etc/corosync/corosync.conf.


Starting the Cluster

The cluster can be started by running the following command on webserver1.

pcs cluster start --all
pcs cluster enable --all

We can now check the status of the cluster by running the following command on either node.

pcs status

Check that both hosts are marked as online in the output.

Output
Online: [ webserver1 webserver2 ]

Full list of resources:

PCSD Status:
  webserver1: Online
  webserver2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Note: After the first setup, it can take some time before the nodes are marked as online.


Disabling STONITH and Ignoring Quorum

You will see a warning in the output of pcs status that no STONITH devices are configured and STONITH is not disabled:

WARNING: no stonith devices and stonith-enabled is not false

What does this mean and why should you care?

When the cluster resource manager cannot determine the state of a node or of a resource on a node, fencing is used to bring the cluster to a known state again.

Resource level fencing ensures mainly that there is no data corruption in case of an outage by configuring a resource. You can use resource level fencing, for instance, with DRBD (Distributed Replicated Block Device) to mark the disk on a node as outdated when the communication link goes down.

Node level fencing ensures that a node does not run any resources. This is done by resetting the node and the Pacemaker implementation of it is called STONITH (which stands for "shoot the other node in the head"). Pacemaker supports a great variety of fencing devices, e.g. an uninterruptible power supply or management interface cards for servers.

Because the node level fencing configuration depends heavily on your environment, we will disable it for this tutorial.

pcs property set stonith-enabled=false

Note: If you plan to use Pacemaker in a production environment, you should plan a STONITH implementation depending on your environment and keep it enabled.

Also a cluster has quorum when more than half of the nodes are online. Pacemaker's default behavior is to stop all resources if the cluster does not have quorum. However, this does not make sense in a two-node cluster; the cluster will lose quorum if one node fails.

For this tutorial, we will tell Pacemaker to ignore quorum by setting the no-quorum-policy:

pcs property set no-quorum-policy=ignore

Check the property list and make sure stonith and the quorum policy are disabled.

pcs property list


Configuring the Virtual IP address

From now on, we will interact with the cluster via the pcs shell, so all commands need only be executed on one node; it doesn't matter which one.

The Pacemaker cluster is now up and running and we can add the first resource to it, which is the virtual IP address. To do this, we will configure the ocf:heartbeat:IPaddr2 resource agent.

First, we will create the virtual IP address resource. Here, we'll use 172.22.10.100 as our virtual IP and Cluster_VIP for the name of the resource.

pcs resource create Cluster_VIP ocf:heartbeat:IPaddr2 ip=172.22.10.100 cidr_netmask=24 op monitor interval=20s

Next, check the status of the resource.

pcs status

Look for the following line in the output:

Output
Full list of resources:
Cluster_VIP    (ocf::heartbeat:IPaddr2):   Started webserver1

The virtual IP address is active on the node webserver1.


Adding the Resource

Now we can add the second resource to the cluster, which will the Nginx service. The resource agent of the service is ocf:heartbeat:nginx.

We will name the resource WebServer and set the instance attributes configfile (the location of the Nginx configuration file) and statusurl (the URL of the Nginx server status page). We will choose a monitor interval of 20 seconds again.

pcs resource create WebServer ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op monitor interval=20s

We can query the status of the resource like before.

pcs status

You should see WebServer in the output running on webserver2.

Output
Full list of resources:

 Cluster_VIP    (ocf::heartbeat:IPaddr2):   Started webserver1
 WebServer  (ocf::heartbeat:nginx):    Started webserver2

As you can see, the resources run on different nodes. We did not yet tell Pacemaker that these resources must run on the same node, so they are evenly distributed across the nodes.

Note: You can restart the Nginx resource by running pcs resource restart WebServer (e.g. if you change the Nginx configuration). Make sure not to use systemctl to manage the Nginx service.


Configuring Colocation Constraints

Almost every decision in a Pacemaker cluster, like choosing where a resource should run, is done by comparing scores. Scores are calculated per resource, and the cluster resource manager chooses the node with the highest score for a particular resource. (If a node has a negative score for a resource, the resource cannot run on that node.)

We can manipulate the decisions of the cluster with constraints. Constraints have a score. If a constraint has a score lower than INFINITY, it is only a recommendation. A score of INFINITY means it is a must.

We want to ensure that both resources are run on the same host, so we will define a colocation constraint with a score of INFINITY.

pcs constraint colocation add WebServer Cluster_VIP INFINITY

The order of the resources in the constraint definition is important. Here, we specify that the Nginx resource (WebServer) must run on the same hosts the virtual IP (Cluster_VIP) is active on. This also means that WebSite is not permitted to run anywhere if Cluster_VIP is not active.

It is also possible to define in which order the resources should run by creating ordering constraints or to prefer certain hosts for some resources by creating location constraints.

Verify that both resources run on the same host.

pcs status

Output
Full list of resources:

 Cluster_VIP    (ocf::heartbeat:IPaddr2):   Started webserver1
 WebServer  (ocf::heartbeat:nginx):    Started webserver1

Both resources are now on webserver1.


Testing Cluster

In this section, we will test cluster high-availability of the Nginx webserver by accessing the Virtual IP address on web browser.

Open your web browser and type the virtual IP address http://172.22.10.100

You will see the web page from the webserver1.

Next, stop the cluster on the webserver1 with the following command:

pcs cluster stop webserver1

Now, refresh the page, and you will get the page from the webserver2.




Conclusion

You have set up an Nginx two node active-passive cluster which is accessible by the virtual IP address. You can now configure Nginx further, but make sure to synchronize the configuration across the hosts.