For those who seeks help in different areas of Software and Hardware platform

How To Set Up a Highly Available PostgreSQL Cluster on Ubuntu 19/20

This guide will walk you through the steps to set up a highly available PostgreSQL cluster using Patroni and HAProxy on Ubuntu 19/20.04. If you wish to skip step by step guide, the below video tutorial will be sufficient if you are good enough to pick things up quickly.






Note that, this guide is written for Ubuntu 19.04, 19.10, 20.04 and Debian 9, 10.

Prerequisites

To follow the steps covered in this tutorial, you will need five (physical or virtual) machines installed with Ubuntu or Debian having sudo non-root user privileges.

These are the machines we will use in this guide for our cluster setup. However, if you wish you can add up more or go with less, its completely up to your requirement.

HOSTNAME IP ADDRESS PURPOSE
NODE1 192.168.10.1 Postgresql, Patroni
NODE2 192.168.10.2 Postgresql, Patroni
NODE3 192.168.10.3 Postgresql, Patroni
NODE4 192.168.10.4 etcd
NODE5 192.168.10.5 HAProxy


When you have prerequisites in place, please proceed to the below steps:

Installing PostreSQL

In this step, we will install postgres on three of the nodes (node1, node2, node3) one by one using the below command:
sudo apt update
sudo apt -y install postgresql postgresql-server-dev-all
At the time of this tutorial, the postgresql version 11 was the default release in Ubuntu packages repository. If you wish you can install postgresql version 12 like below:
sudo apt -y install postgresql-12 postgresql-server-dev-12
Upon installation, Postgres automatically runs as a service. We need to stop the Postgres service at this point with below command:
sudo systemctl stop postgresql

Installing Patroni

Patroni is an open-source python package that manages postgres configuration. It can be configured to handle tasks like replication, backups and restorations. 

Patroni uses utilities that come installed with postgres, located in the /usr/lib/postgresql/11/bin directory by default on Ubuntu 19. You will need to create symbolic links in the PATH to ensure that Patroni can find the utilities. 

Type below command to create symbolic link and make sure you replace postgresql version if you are running an earlier or later release.
sudo ln -s /usr/lib/postgresql/11/bin/* /usr/sbin/
Type the below command to install python and python-pip packages:
sudo apt -y install python python-pip
Ensure that you have latest version of the setuptools of python package with below command:
sudo pip install --upgrade setuptools
Now you can type below commands to install Patroni:
sudo pip install psycopg2
sudo pip install patroni
sudo pip install python-etcd
Repeat these steps on remaining nodes (node2, node3 in our setup) as well. When you are finished with the above on all three nodes, you can move to next step.

Configuring Patroni

Patroni can be configured using a YAML file which can be placed anywhere. For sake of this guide, we will place this file under /etc/patroni.yml. 

Create a patroni.yml file on all three nodes that have postgres and Patroni installed (node1, node2, node3 in our case). Change name to something unique, and change listen and connect_address (under postgresql and restapi) to the appropriate values on each node. 

Create /etc/patroni.yml file on your first node (node1 in our case) like below:
sudo nano /etc/patroni.yml
add, update below configuration parameters to reflect yours:
scope: postgres
namespace: /db/
name: node1

restapi:
    listen: 192.168.10.1:8008
    connect_address: 192.168.10.1:8008

etcd:
    host: 192.168.10.4:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.10.1/0 md5
    - host replication replicator 192.168.10.2/0 md5
    - host replication replicator 192.168.10.3/0 md5
    - host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb

postgresql:
    listen: 192.168.10.1:5432
    connect_address: 192.168.10.1:5432
    data_dir: /data/patroni
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: replicator
            password: reppassword
        superuser:
            username: postgres
            password: secretpassword
    parameters:
        unix_socket_directories: '.'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
Save and close file. 

You can see /etc/patroni.yml file on (node1 in our case) in below screenshot:


The below screenshot is /etc/patroni.yml on (node2 in our case)


and below is the /etc/patroni.yml screenshot of (node3 in our case):


Make note of the data_dir value in the above file. The postgres user needs the ability to write to this directory. If this directory doesn’t exist, create it with below command on all three nodes (node1, node2, node3 in our case):
sudo mkdir -p /data/patroni
Make postgres the owner of /data/patroni:
sudo chown postgres:postgres /data/patroni
Change the permissions on this directory to make it accessible only to the postgres user:
sudo chmod 700 /data/patroni
Next, we will create a systemd script that will allow us to start, stop and monitor Patroni.

You need to create a file /etc/systemd/system/patroni.service on all three nodes (node1, node2, node3 in our case) with below command:
sudo nano /etc/systemd/system/patroni.service
add below configuration parameters in it:
[Unit]
Description=High availability PostgreSQL Cluster
After=syslog.target network.target

[Service]
Type=simple
User=postgres
Group=postgres
ExecStart=/usr/local/bin/patroni /etc/patroni.yml
KillMode=process
TimeoutSec=30
Restart=no

[Install]
WantedBy=multi-user.targ
Save and close file.

If Patroni is installed in a location other than /usr/local/bin/patroni on your machine, update the appropriate path in above file accordingly. 

Installing etcd

Etcd is a fault-tolerant, distributed key-value store that is used to store the state of the postgres cluster. Using Patroni, all of the postgres nodes make use of etcd to keep the postgres cluster up and running. 

For sake of this guide we will use a single-server etcd cluster. However, in production, it may be best to use a larger etcd cluster so that one etcd node fails, it doesn’t affect your postgres servers. 

Type below command to install etcd on (node4 in our case):
sudo apt -y install etcd

Configuring etcd

At this point, you need to edit the /etc/default/etcd file on (node4 in our case) like below:
sudo nano /etc/default/etcd
Look for the following parameters, uncomment by removing and update these parameters to reflect yours:
ETCD_LISTEN_PEER_URLS="http://192.168.10.4:2380"
ETCD_LISTEN_CLIENT_URLS="http://localhost:2379,http://192.168.10.4:2379"
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.10.4:2380"
ETCD_INITIAL_CLUSTER="default=http://192.168.10.4:2380,"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.10.4:2379"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
Save and close the file.

You can see /etc/default/etcd screenshot on node4 in our case:


Restart the etcd service to take changes into effect:
sudo systemctl restart etcd
If etcd service failed to start, reboot your machine.

At this point, start Patroni and Postgres service on your first node (node1 in our case) with below command:
sudo systemctl start patroni
Verify the status of Patroni service status with below command:
sudo systemctl status patroni
If everything is set up correctly, the output from the node1 (master) will look similar to like below: 


When starting patroni on subsequent nodes, (node2, node3 in our case) the output will look similar to like below:


Make sure you have performed these steps on each of the three nodes with Postgres installed to create a highly available Postgres cluster with one master and two slaves.

Installing HAProxy

When developing an application that uses a database, it can be cumbersome to keep track of the database endpoints if they keep changing. Using HAProxy simplifies this by giving a single endpoint to which you can connect the application.

HAProxy forwards the connection to whichever node is currently the master. It does this using a REST endpoint that Patroni provides. Patroni ensures that, at any given time, only the master postgres node will appear as online, forcing HAProxy to connect to the correct node.

Type the below command to install HAProxy on (node5 in our case):
sudo apt -y install haproxy

Configuring HAProxy

With the Postgres cluster set up, you need a method to connect to the master regardless of which of the servers in the cluster is the master. This is where HAProxy steps in. All Postgres clients (your applications, psql, etc.) will connect to HAProxy which will make sure you connect to the master in the cluster.

You need to edit the configuration file /etc/haproxy/haproxy.cfg on the HAProxy node (in our case node5) that has HAProxy installed:
sudo nano /etc/haproxy/haproxy.cfg
add, update following configuration parameters:
global
    maxconn 100

defaults
    log global
    mode tcp
    retries 2
    timeout client 30m
    timeout connect 4s
    timeout server 30m
    timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen postgres
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server postgresql_192.168.10.1_5432 192.168.10.1:5432 maxconn 100 check port 8008
    server postgresql_192.168.10.2_5432 192.168.10.2:5432 maxconn 100 check port 8008
    server postgresql_192.168.10.3_5432 192.168.10.3:5432 maxconn 100 check port 8008
Save and close file.

You can see below screenshot of our /etc/haproxy/haproxy.cfg file on node5:


Restart HAProxy to take the changes into effect and use the new settings:
sudo systemctl restart haproxy
If HAProxy fails to start, check for syntax errors:
/usr/sbin/haproxy -c -V -f /etc/haproxy/haproxy.cfg

Testing Postgres Cluster Setup

Connect Postgres clients to the HAProxy IP address of the node on which you installed HAProxy (in this guide, 192.168.10.5) on port 5000.

You can also access HAProxy node on port 7000 using any of your preferred web browser to see the HAProxy dashboard like below: 


As you can see, the postgresql_192.168.10.1_5432 row is highlighted in green. This indicates that 192.168.10.1 is currently acting as the master.

If you kill the primary node (using sudo systemctl stop patroni or by completely shutting down the server), the dashboard will look similar to like below: 



In the postgres section, the postgresql_192.168.10.1_5432 row is now red and the postgresql_192.168.10.3_5432 row is highlighted in green. This indicates that 192.168.10.3 is currently acting as the master. 

Note: In this case, it just so happens that the third Postgres server is promoted to master. This might not always be the case. It is equally likely that the second server may be promoted to master. 

When you now bring up the first server, it will rejoin the cluster as a slave and will sync up with the master. 

Wrapping up 

You now have a robust, highly available Postgres cluster ready for use. While the setup in this tutorial should go far in making your Postgres deployment highly available, here are few more steps you can take to improve it further: 


  • Use a larger etcd cluster to improve availability.
  • Use PgBouncer to pool connections. 
  • Add another HAProxy server and configure IP failover to create a highly available HAProxy cluster.
Share:

4 comments:

  1. could you please try it on ubuntu 18.04? its now working, you can not install sudo pip install psycopg2 and sudo pip install patroni.

    ReplyDelete
    Replies
    1. Hi Enes Lanpir,

      I didn't find issue while installing psycogp2 on Ubuntu 18.04 and I have documented the steps for Ubuntu 18.04 on your request which you can follow here: https://www.layloyar.com/2020/02/how-to-set-up-postgresql-high-availability-cluster-using-patroni-haproxy-ubuntu-18.html.

      Good luck!

      Delete
    2. Thanks for your quick feedback. As I understand I can install psycogp2 with postgresql 11 or old version but with postgresql 12 not possible to install.

      Here is the output
      x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -Wdate-time -D_FORTIFY_SOURCE=2 -g -fdebug-prefix-map=/build/python2.7-5Z483E/python2.7-2.7.17=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC -DPSYCOPG_VERSION=2.8.4 (dt dec pq3 ext lo64) -DPG_VERSION_NUM=120002 -DHAVE_LO64=1 -I/usr/include/python2.7 -I. -I/usr/include/postgresql -I/usr/include/postgresql/12/server -c psycopg/psycopgmodule.c -o build/temp.linux-x86_64-2.7/psycopg/psycopgmodule.o -Wdeclaration-after-statement
      In file included from psycopg/psycopgmodule.c:27:0:
      ./psycopg/psycopg.h:35:10: fatal error: libpq-fe.h: No such file or directory
      #include
      ^~~~~~~~~~~~
      compilation terminated.

      It appears you are missing some prerequisite to build the package from source.

      Delete
    3. Hello Enes Lanpir,

      I have Ubuntu 18.04.4 server. I did install postgresql version 12 and find no issue at all and the installation goes smooth. Here are the steps for postgresql 12 if you are missing any of them:

      ##
      sudo apt update
      sudo apt -y install postgresql-12 postgresql-server-dev-12

      sudo systemctl stop postgresql

      sudo ln -s /usr/lib/postgresql/12/bin/* /usr/sbin/

      sudo apt -y install python python-pip

      sudo -H pip install --upgrade setuptools
      sudo -H pip install psycopg2
      sudo -H pip install patroni
      sudo -H pip install python-etcd
      ##

      Good luck!

      Delete

Comments with links will not be published.