Set Up a Highly Available PostgreSQL12 Cluster on CentOS, RHEL 7

This guide will walk you through the steps to set up a highly available PostgreSQL cluster using Patroni and HAProxy on CentOS7. These steps can also be applied, (changes may be required) if you are running an earlier release of CentOS or RHEL Linux. 

Prerequisites

To follow this tutorial along, you will need at least 4 (physical or virtual) machines installed with CentOS7. We will use these 4 virtual machines prepared with CentOS Linux release 7.9.2009 throughout this guide for postgres cluster set up.

HOSTNAME         IP ADDRESS PURPOSE
pg_node1 192.168.10.1 Postgresql+Patroni pg_node2 192.168.10.2 Postgresql+Patroni etcd_node1 192.168.10.3 etcd haproxy_node1 192.168.10.4 HAProxy
Note: For a production environment, it is recommended to set up at least (2-nodes postgres+patroni), (2-nodes etcd), and (2-nodes HAProxy), in order to achieve high availability and to avoid a single point of failure.
 
When all prerequisites are in place, you may proceed with the following steps to set up your PostgreSQL HA cluster.
 

Install Epel Repo

We will install epel repository on all of the nodes, (pg_node1, pg_node2, etcd_node1, and haproxy_node1) in our case, with below command:
 
sudo yum -y install epel-release
sudo yum -y install centos-release-scl-rh
sudo yum -y update
Reboot your machine when you are done with the above:
 
sudo shutdown -r now
Make sure you repeat the same steps on each node before proceeding to next step.

Install PostgreSQL

For this guide, we will install PostgreSQL version 12 on two of the nodes (pg_node1,pg_node2) in our case, like below: 
 
sudo yum -y install llvm-toolset-7-clang nano
sudo yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
sudo yum -y install postgresql12-server postgresql12 postgresql12-devel

Install Patroni

Patroni is a cluster manager used to customize and automate deployment and maintenance of PostgreSQL HA (High Availability) clusters. You should check the latest available release from Github page.

For this guide, the currently available release is 1.6.5-1 and we will install it on (pg_node1 and pg_node2) in our case, like below:
 
sudo yum -y install https://github.com/cybertec-postgresql/patroni-packaging/releases/download/1.6.5-1/patroni-1.6.5-1.rhel7.x86_64.rpm

Configure Patroni

Patroni uses the YAML file to store its configuration. So, you will need to create a configuration file for Patroni on (pg_node1 and pg_node2) like below:
 
sudo cp -p /opt/app/patroni/etc/postgresql.yml.sample /opt/app/patroni/etc/postgresql.yml
Next edit the postgresql.yml file with any of your favorite text editor like below:
 
sudo nano /opt/app/patroni/etc/postgresql.yml
Remove everything from this file, and add the following configuration parameters. Make sure, you change namespace, listen and connect_address to reflect yours.
 
scope: postgres
namespace: /pg_cluster/
name: pg_node1

restapi:
listen: 192.168.10.1:8008
connect_address: 192.168.10.1:8008

etcd:
host: 192.168.10.3:2379

bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:

initdb:
- encoding: UTF8
- data-checksums

pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.10.1/0 md5
- host replication replicator 192.168.10.2/0 md5
- host all all 0.0.0.0/0 md5

users:
admin:
password: admin
options:
- createrole
- createdb

postgresql:
listen: 192.168.10.1:5432
connect_address: 192.168.10.1:5432
data_dir: /var/lib/pgsql/12/data
bin_dir: /usr/pgsql-12/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres

tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Save and close the editor when you are finished. 
 
Next, edit postgresql.yml file on (pg_node2) in our case, and add the following configuration parameters. Make sure, you change namespace, listen and connect_address to reflect yours:
 
scope: postgres
namespace: /pg_cluster/
name: pg_node2

restapi:
listen: 192.168.10.2:8008
connect_address: 192.168.10.2:8008

etcd:
host: 192.168.10.3:2379

bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
use_slots: true
parameters:

initdb:
- encoding: UTF8
- data-checksums

pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.10.1/0 md5
- host replication replicator 192.168.10.2/0 md5
- host all all 0.0.0.0/0 md5

users:
admin:
password: admin
options:
- createrole
- createdb

postgresql:
listen: 192.168.10.2:5432
connect_address: 192.168.10.2:5432
data_dir: /var/lib/pgsql/12/data
bin_dir: /usr/pgsql-12/bin
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replicator
superuser:
username: postgres
password: postgres

tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
Save and close the editor when you are finished.
 

Install etcd

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. We will use etcd to store the state of the Postgres cluster in order to keep the Postgres cluster up and running.
 
Log in to (etcd_node1) in our case, and type below command to install etcd: 

sudo yum -y install etcd
 

Configure etcd

We will edit default etcd configuration file to make few changes. 
 
sudo nano /etc/etcd/etcd.conf
Locate and uncomment the following parameters, and make sure you update these highlighted values to reflect yours: 
 
[Member]
ETCD_LISTEN_PEER_URLS="http://192.168.10.3:2380,http://localhost:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.10.3:2379,http://localhost:2379"

[Clustering]
ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.10.3:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.10.3:2379"
ETCD_INITIAL_CLUSTER="default=http://192.168.10.3:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
Save and close the editor when you are finished. 
 
Next, start the etcd to take the changes into effect:
sudo systemctl enable etcd
sudo systemctl start etcd
sudo systemctl status etcd
The etcd is now active and running.


Reboot your (etcd_node1) if etcd failed to start.
 
Once your etcd is up and running, you need to go back to (pg_node1 and pg_node2) in our case, to start patroni like below: 
 
sudo systemctl enable patroni
sudo systemctl start patroni
sudo systemctl status patroni
If you look carefully at the bottom of the following output, you can see that the (pg_node1) is acting as leader (master) node in the cluster. 
 

When you start patroni on subsequent nodes, (pg_node2) for example, you will see (pg_node2) is acting as secodary.

 

Install HAProxy

HAProxy is free, open source software that provides a high availability load balancer and proxy server for TCP and HTTP-based applications that spreads requests across multiple servers. HAProxy forwards the connection to whichever node is currently the master. It does this using a REST endpoint that Patroni provides. Patroni ensures that, at any given time, only the master node will appear as online, forcing HAProxy to connect to the correct node. 
 
Log in to your (haproxy_node1) in our case, and install haproxy with below command: 
 
sudo yum -y install haproxy
 

Configure HAProxy

With the Postgres cluster, you need a method to connect to the master regardless of which of the nodes in the cluster is the master. This is where HAProxy steps in. Database client or applications, (psql) for example, will connect to haproxy, and haproxy will make sure you connect to the master node in the cluster.
 
We will edit and make few changes in the default haproxy.cfg file: 
 
sudo cp -p /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bkp
sudo nano /etc/haproxy/haproxy.cfg

Remove everything from this file, and add the following configuration parameters. 

global
log 127.0.0.1 local2

chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
daemon

stats socket /var/lib/haproxy/stats

defaults
mode tcp
log global
retries 3
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s
maxconn 3000

listen stats
mode http
bind *:7000
stats enable
stats uri /

listen postgres
bind *:5000
option httpchk
http-check expect status 200
default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
server pg_node1 192.168.10.1:5432 maxconn 1000 check port 8008
server pg_node2 192.168.10.2:5432 maxconn 1000 check port 8008
Make sure you replace the highlighted text with yours.
 
Save and close the editor when you are finished. 
 
Next, start HAProxy to take changes into effect: 
 
sudo systemctl start haproxy
sudo systemctl enable haproxy
sudo systemctl status haproxy
HAProxy is now active and running:
 


If haproxy failed to start, you should check configuration file syntax errors with below command:
 
sudo haproxy -c -V -f /etc/haproxy/haproxy.cfg
 

Test Postgres Cluster

You can test and verify your Postgres cluster by initiating a connection request to (haproxy_node1) in our case, from any of your applications, (psql) for example, and see if this successfully establish connection to the database master node in the cluster. 
 
psql -h 192.168.10.4 -p 5000 -U postgres
As you can see in the image below, the (psql) successfully connected to a database node in the  cluster via haproxy.
 

Next, open up a web browser and access http://192.168.10.4:7000, and you will see a haproxy dashboard like below:
 

As you can see, the pg_node1 row is highlighted in green. This indicates that 192.168.10.1 is currently acting as the master. If you kill the primary node pg_node1 using (sudo systemctl stop patroni) or by completely shutting down the server, the dashboard will look similar to like below: 
 

As you can see, in the Postgres section, the pg_node1 row is now red and the pg_node2 row is highlighted in green. This indicates that 192.168.10.2 is currently acting as the master. 
 
Please note that, in this particular scenario, it just so happens that the second Postgres server is promoted to master. This might not always be the case if you have more than two nodes in the cluster. It is equally likely that the third, fourth or fifth node may be promoted to master.
 

Test Replication

We will create a test database to see if it is replicated to other nodes in the cluster. For this guide, we will use (psql) to connect to database via haproxy like below: 

psql -h 192.168.10.4 -p 5000 -U postgres
From the Postgres prompt, create a test database like below:
 
create database testdb;
create user testuser with encrypted password 'testpass';
grant all privileges on database testdb to testuser;

\q
Next, stop patroni on the master node (pg_node1) in our case with below command:
 
sudo systemctl stop patroni
Next, connect to database using psql, and this time haproxy will make connection to (pg_node2) acting as master node as we have already stopped patroni on (pg_node1): 

sudo psql -h 192.168.10.4 -p 5000 testuser -d testdb
As you can see in the output below, connection to testdb was successfully via haproxy on (pg_node2).
 
 
Now bring up the first node with (systemctl start patroni), it will rejoin the cluster as a slave and will automatically sync up with the master.
 

Patroni Operations – switchover and failover

With patronictl, you can administer, manage and troubleshoot your Postgres cluster. Type below command to list the options and commands you can use with patronictl: 

sudo patronictl --help
This will show you the options and commands you can use with patronictl.

Options:
-c, --config-file TEXT Configuration file
-d, --dcs TEXT Use this DCS
-k, --insecure Allow connections to SSL sites without certs
--help Show this message and exit.

Commands:
configure Create configuration file
dsn Generate a dsn for the provided member, defaults to a dsn of...
edit-config Edit cluster configuration
failover Failover to a replica
flush Discard scheduled events (restarts only currently)
history Show the history of failovers/switchovers
list List the Patroni members for a given Patroni
pause Disable auto failover
query Query a Patroni PostgreSQL member
reinit Reinitialize cluster member
reload Reload cluster member configuration
remove Remove cluster from DCS
restart Restart cluster member
resume Resume auto failover
scaffold Create a structure for the cluster in DCS
show-config Show cluster configuration
switchover Switchover to a replica
version Output version of patronictl command or a running Patroni
For example you can check failover history across nodes in the cluster with below command:

sudo patronictl -c /opt/app/patroni/etc/postgresql.yml history
 
The failover is executed automatically, when the Leader node is getting unavailable for unplanned reason. If you wish to test failover across the nodes in the cluster, you can manually initiate failover to a replica node with below command:
 
sudo patronictl -c /opt/app/patroni/etc/postgresql.yml failover

You can also check cluster state, role and members with below command:

sudo patronictl -c /opt/app/patroni/etc/postgresql.yml list

In some cases it is necessary to perform maintenance task on a single node. For example applying patches or release updates. When you manually disable auto failover, patroni won’t change the state of the PostgreSQL. For example it will not try to start the cluster when it is stopped.

You can disable auto failover with below command:

sudo patronictl -c /opt/app/patroni/etc/postgresql.yml pause
There are two possibilities to run a switchover, either in scheduled mode or immediately. At the given time, the switchover will take place, and you will see in the logfile an entry of switchover activity.
 

If you go with [now] option, switchover will take place immediately.

Conclusion

Your Postgres cluster is now ready to serve your purpose. However, if you would like to make it more robust and highly available, here are a few more steps you can take to improve it further: 
 
Use a larger etcd cluster to improve availability. 
Use PgBouncer to pool connections. 
Add a second node in HAProxy and configure IP failover to avoid single point of failure.

28 comments:

  1. In your configuration
    patronictl -c /opt/app/etc/postgres.yml show-config does not work

    Error: 'Can not find suitable configuration of distributed configuration store\nAvailable implementations: exhibitor, zookeeper, etcd, kubernetes, consul'

    ReplyDelete
    Replies
    1. Your path for postgres.yml file is not correct...the correct path to show the configuration is:

      patronictl -c /opt/app/patroni/etc/postgresql.yml show-config

      Delete
  2. Hi, In which node I should create database and which IP should I include in my application as DB IP

    ReplyDelete
    Replies
    1. In this guide node1 and node2 will run postgresql so you can create database on any of them and it will automatically sync to remaining node....at client end, you will configure HAProxy IP and HAProxy will make the connection back to any of the database node whichever is holding master role.

      Delete
    2. Thanks for your quick response. I am also little bit confused in this part " You need to copy /opt/app/patroni/etc/postgresql.yml.sample file on (node1) with below command:" Do I need to do the same on node 2?

      Delete
    3. Of course you need do the same on node2. If you have more than two nodes in your setup for example, then you will need to do the same on all of the nodes.

      Delete
    4. Actually, I wasn't clearly mentioned on the document that's why I got confused. BTW thanks for the nice tutorial.

      Delete
  3. Hi, facing some issues. http://158.101.228.3:7000/ servers are down shows in the haproxy. Also, postgresq service also not starting in the dbnode1 and dbnode2.

    ReplyDelete
  4. haproxy shows server down. Any guess what might cause this?

    ReplyDelete
  5. psql: error: could not connect to server: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

    ReplyDelete
  6. Hi,

    I have this problem, i don't have idea. :S

    [root@patroni-1 ~]# systemctl status patroni -l
    ● patroni.service - PostgreSQL high-availability manager
    Loaded: loaded (/usr/lib/systemd/system/patroni.service; disabled; vendor preset: disabled)
    Active: failed (Result: exit-code) since miƩ 2020-09-23 08:29:51 -05; 21min ago
    Process: 1258 ExecStart=/opt/app/patroni/bin/patroni ${PATRONI_CONFIG_LOCATION} (code=exited, status=1/FAILURE)
    Main PID: 1258 (code=exited, status=1/FAILURE)

    sep 23 08:29:51 patroni-1 patroni[1258]: File "/opt/app/patroni/lib/python3.6/site-packages/patroni/ha.py", line 1260, in _run_cycle
    sep 23 08:29:51 patroni-1 patroni[1258]: return self.post_bootstrap()
    sep 23 08:29:51 patroni-1 patroni[1258]: File "/opt/app/patroni/lib/python3.6/site-packages/patroni/ha.py", line 1156, in post_bootstrap
    sep 23 08:29:51 patroni-1 patroni[1258]: self.cancel_initialization()
    sep 23 08:29:51 patroni-1 patroni[1258]: File "/opt/app/patroni/lib/python3.6/site-packages/patroni/ha.py", line 1151, in cancel_initialization
    sep 23 08:29:51 patroni-1 patroni[1258]: raise PatroniException('Failed to bootstrap cluster')
    sep 23 08:29:51 patroni-1 patroni[1258]: patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'
    sep 23 08:29:51 patroni-1 systemd[1]: patroni.service: main process exited, code=exited, status=1/FAILURE
    sep 23 08:29:51 patroni-1 systemd[1]: Unit patroni.service entered failed state.
    sep 23 08:29:51 patroni-1 systemd[1]: patroni.service failed.

    ReplyDelete
    Replies
    1. Please post your postgresql.yml file content here...there must be something you messed-up.

      Delete
  7. Hi, I have solved that problem cluster is running now. But facing another issue using HAproxy developer from remote machine unable to connect the database. I have opened 5432 port for his IP from the firewall. What else I can do to fix it?

    ReplyDelete
    Replies
    1. From a remote machine you have to make connection to database via haproxy ip and its listen port you defined in haproxy.cfg file. For example in this guide, listen port is 5000 so making connection to database via haproxy_ip:5000 should work.

      Delete
    2. You can test your connection to database from haproxy machine like below:

      psql -h localhost -U postgres -p 5000

      OR

      psql -h your_haproxy_ip -U postgres -p 5000

      and the haproxy will automatically make the connection to the backend database server.

      Delete
    3. Thanks it works. But I am facing a little problem, it disconnects automatically. Did you face it?

      Delete
    4. Little bit more detail...automatically disconnects idle session or what? Note down the time from making connection and automatically disconnection...Post here

      Delete
    5. I used haproxyip:5000 in my javaapp as db connection and after starting tomcat service getting this message: HTTP Status 404 – Not Found "The origin server did not find a current representation for the target resource or is not willing to disclose that one exists"

      From catalina out I am getting 24-Sep-2020 13:07:07.777 SEVERE [main] org.apache.catalina.core.StandardServer.await Failed to create server shutdown socket on address [localhost] and port [8005] (base port [8005] and offset [0])
      java.net.BindException: Address already in use (Bind failed)

      Delete
    6. For heaven sack man...do some little research using google on your own....pay attention to error msg.

      Address already in use (Bind failed)

      Delete
  8. Thanks Anwar. I have fixed the issue on my own. Thanks for nice documentation.

    ReplyDelete
  9. Hi, I am facing this problem in one dbnode "failed to start postgres"

    ReplyDelete
  10. how to check status postgresql-12 in patroni

    ReplyDelete
  11. Hello,

    after several restarts for testing data is not being sync from server 1 to server 2 knowing that patroni is running on both servers

    ReplyDelete
    Replies
    1. How many nodes do you have in your cluster setup? If your cluster has only two nodes (server1, server2), synchronization will fail. Patroni cluster must have at least 3 nodes. If you follow the tutorial, we have four nodes in our patroni cluster, (2-nodes postgresql+patroni),(1-node for Etcd), and (1-node for HAProxy).

      Delete
  12. san shresthaJanuary 28, 2021

    thank you for well documentation.
    everything all nodes worked fine but i have one question replication is not replicated
    one thing more do i have to create replicator user and replication database name in all postgres-patroni server.
    i have 3 HA server and 1 haproxy and 1 etcd

    ReplyDelete
    Replies
    1. You do not need to create replicator user or database manually. If you carefully look into postgresql.yml file, there is an entry for replicator user creation, and when you start patroni first time, it will create replicator user in database automatically.

      Delete
  13. san shresthaJanuary 28, 2021

    on postgres.yml
    wal_level=replica
    but when i run below command it only show one leader and nothing more, other cluster doesnot shows up
    [root@pg_node1 xsan]# sudo patronictl -c /opt/app/patroni/etc/postgresql.yml list
    + Cluster: postgres1 (6922849511148078751) -+----+-----------+
    | Member | Host | Role | State | TL | Lag in MB |
    +----------+-------------+--------+---------+----+-----------+
    | pg_node1 | 10.X.X.X | Leader | running | 1 | |
    +----------+-------------+--------+---------+----+-----------+
    in 3 different cluster this Cluster: postgres1 (6922849511148078751) are different and only one that match with below is leader
    [root@localhost ~]# etcdctl get /service/postgres/initialize
    6922837456925557301
    i even tried to remove this number by using below command
    patronictl -c /opt/app/patroni/etc/postgresql.yml remove postgres
    after few step it asked for master cluster and i typed postgres because on list one an only 10.X.X.X is showed but is web browser all 3 nodes are up
    http://haproxy_ip:7000/#postgres/pg_node3

    ReplyDelete
    Replies
    1. Make sure you have correct parameters and values in postgresql.yml file on each node you have in the cluster. Follow the guidelines carefully to set up postgresql.yml file on each node.

      Run "sudo systemctl status patroni -l" on each node and share the output.

      Delete

Comments with links will not be published.

Powered by Blogger.