For those who seek help in different areas of software and hardware platform.

How To Create a Highly Available PostgreSQL Cluster on CentOS/RHEL 7

This step by step guide will show you how to set up a highly available PostgreSQL cluster using Patroni and HAProxy on CentOS/RHEL 7. These steps can also be applied (slight changes may be required) if you are running an earlier release of CentOS or RHEL 6.





Prerequisites

To follow this tutorial, you will need 4 (physical or virtual) machines with CentOS Linux release 7.7.1908 server with minimal installed, having sudo non-root user privileges. We have prepared the following 4 machines for this guide. However, if you wish you can add up more machines in your cluster environment.


HOSTNAMEIP ADDRESSPURPOSE
node1192.168.10.1Postgresql, Patroni
node2192.168.10.2Postgresql, Patroni

node3192.168.10.3etcd
node4192.168.10.4HAProxy

When you are ready with the above prerequisites, you can move to the below steps.

    

Installing PostgreSQL

In this step, we will install postgresql on two of the nodes (node1 and node2) one by one using the below command:
yum -y install epel-release
yum -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
We are installing postgresql version 12. If you wish you can install any other postgresql version of your choice:
yum -y install postgresql12-server postgresql12 postgresql12-devel

Installing Patroni

Patroni is an open-source python package that manages Postgres configuration. It can be configured to handle tasks like replication, backups, and restorations.

We will install patroni on (node1, node2) with below command:
yum -y install https://github.com/cybertec-postgresql/patroni-packaging/releases/download/1.6.0-1/patroni-1.6.0-1.rhel7.x86_64.rpm
Repeat these steps on remaining nodes (node2 in our case) as well. When you are finished with the above on each node (designated for postgresql and patroni), you can move to the next step.


Configuring Patroni

Patroni can be configured using a YAML file which is by default located under /opt/app/patroni/etc/.

You need to copy /opt/app/patroni/etc/postgresql.yml.sample file on (node1) with below command:
cp -p /opt/app/patroni/etc/postgresql.yml.sample /opt/app/patroni/etc/postgresql.yml
Now edit /opt/app/patroni/etc/postgresql.yml file with your preferred text editor i.e. vi, nano, etc
vi /opt/app/patroni/etc/postgresql.yml
Remove everything from this file and add below configuration parameters. Change name to something unique, and change listen and connect_address (under postgresql and restapi) to the appropriate values on each node.
scope: postgres
namespace: /pg_cluster/
name: node1

restapi:
    listen: 192.168.10.1:8008
    connect_address: 192.168.10.1:8008

etcd:
    host: 192.168.10.3:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 127.0.0.1/32 md5
  - host replication replicator 192.168.10.1/0 md5
  - host replication replicator 192.168.10.2/0 md5
  - host all all 0.0.0.0/0 md5

  users:
    admin:
      password: admin
      options:
        - createrole
        - createdb

postgresql:
  listen: 192.168.10.1:5432
  connect_address: 192.168.10.1:5432
  data_dir: /var/lib/pgsql/12/data
  bin_dir: /usr/pgsql-12/bin
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: reppassword
    superuser:
      username: postgres
      password: postgrespassword

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
Save and close the file when you are finished.

You can see the below screenshot of /opt/app/patroni/etc/postgresql.yml from node1 in our setup:


and below screenshot of /opt/app/patroni/etc/postgresql.yml from node2 in our setup:



Installing etcd

Etcd is a fault-tolerant, distributed key-value store that is used to store the state of the Postgres cluster. Using Patroni, all of the Postgres nodes make use of etcd to keep the Postgres cluster up and running. 

For the sake of this guide, we will use a single-server etcd cluster. However, in production, it may be best to use a larger etcd cluster so that one etcd node fails, it doesn’t affect your Postgres servers. 

Type below command to install etcd on a node that is designated for (etcd in our case):
yum -y install etcd

Configuring etcd

At this point, you need to edit the /etc/etcd/etcd.conf file like below:
vi /etc/etcd/etcd.conf
update, replace the following highlighted parameters to reflect yours:
#[Member]
#ETCD_CORS=""
ETCD_DATA_DIR="/var/lib/etcd/default.etcd"
#ETCD_WAL_DIR=""ETCD_LISTEN_PEER_URLS="http://192.168.10.3:2380"
ETCD_LISTEN_CLIENT_URLS="http://192.168.10.3:2379"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
ETCD_NAME="default"
#ETCD_SNAPSHOT_COUNT="100000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
#ETCD_QUOTA_BACKEND_BYTES="0"
#ETCD_MAX_REQUEST_BYTES="1572864"
#ETCD_GRPC_KEEPALIVE_MIN_TIME="5s"
#ETCD_GRPC_KEEPALIVE_INTERVAL="2h0m0s"
#ETCD_GRPC_KEEPALIVE_TIMEOUT="20s"
#
#[Clustering]ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.10.3:2380"
ETCD_ADVERTISE_CLIENT_URLS="http://192.168.10.3:2379"
#ETCD_DISCOVERY=""
#ETCD_DISCOVERY_FALLBACK="proxy"
#ETCD_DISCOVERY_PROXY=""
#ETCD_DISCOVERY_SRV=""ETCD_INITIAL_CLUSTER="default=http://192.168.10.3:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
#ETCD_STRICT_RECONFIG_CHECK="true"
#ETCD_ENABLE_V2="true"
#
#[Proxy]
#ETCD_PROXY="off"
#ETCD_PROXY_FAILURE_WAIT="5000"
#ETCD_PROXY_REFRESH_INTERVAL="30000"
#ETCD_PROXY_DIAL_TIMEOUT="1000"
#ETCD_PROXY_WRITE_TIMEOUT="5000"
#ETCD_PROXY_READ_TIMEOUT="0"
#
#[Security]
#ETCD_CERT_FILE=""
#ETCD_KEY_FILE=""
#ETCD_CLIENT_CERT_AUTH="false"
#ETCD_TRUSTED_CA_FILE=""
#ETCD_AUTO_TLS="false"
#ETCD_PEER_CERT_FILE=""
#ETCD_PEER_KEY_FILE=""
#ETCD_PEER_CLIENT_CERT_AUTH="false"
#ETCD_PEER_TRUSTED_CA_FILE=""
#ETCD_PEER_AUTO_TLS="false"
#
#[Logging]
#ETCD_DEBUG="false"
#ETCD_LOG_PACKAGE_LEVELS=""
#ETCD_LOG_OUTPUT="default"
#
#[Unsafe]
#ETCD_FORCE_NEW_CLUSTER="false"
#
#[Version]
#ETCD_VERSION="false"
#ETCD_AUTO_COMPACTION_RETENTION="0"
#
#[Profiling]
#ETCD_ENABLE_PPROF="false"
#ETCD_METRICS="basic"
#
#[Auth]
#ETCD_AUTH_TOKEN="simple"
Save and close the file when you are finished.

At this point, start the etcd service with below command:
systemctl enable etcd
systemctl start etcd
Verify the etcd status with below command:
systemctl status etcd

If etcd service failed to start, reboot your machine.

Next, start patroni service on your first node (node1 in our case) with below command:
systemctl enable patroni
systemctl start patroni
Verify the Patroni status with below command:
systemctl status patroni
If everything is set up correctly, the output from the node1 (master) will look similar to like below:


When starting patroni on subsequent nodes, (node2 in our case) the output will look similar to like below:




Make sure you have performed these steps on each of the nodes with Postgres and Patroni installed (node1, node2 in our case).


Installing HAProxy

When developing an application that uses a database, it can be cumbersome to keep track of the database endpoints if they keep changing. Using HAProxy simplifies this by giving a single endpoint to which you can connect the application.

HAProxy forwards the connection to whichever node is currently the master. It does this using a REST endpoint that Patroni provides. Patroni ensures that, at any given time, only the master Postgres node will appear as online, forcing HAProxy to connect to the correct node.

Type the below command to install HAProxy on (node4 in our case):
yum -y install haproxy

Configuring HAProxy

With the Postgres cluster set up, you need a method to connect to the master regardless of which of the servers in the cluster is the master. This is where HAProxy steps in. All Postgres clients (your applications, psql, etc.) will connect to HAProxy which will make sure you connect to the master in the cluster.

You need to edit /etc/haproxy/haproxy.cfg like below:
vi /etc/haproxy/haproxy.cfg
Remove everything from this file, add and update following configuration parameters to reflect yours:
global
        maxconn 100
        log     127.0.0.1 local2

defaults
        log global
        mode tcp
        retries 2
        timeout client 30m
        timeout connect 4s
        timeout server 30m
        timeout check 5s

listen stats
    mode http
    bind *:7000
    stats enable
    stats uri /

listen postgres
    bind *:5000
    option httpchk
    http-check expect status 200
    default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
    server node1 192.168.10.1:5432 maxconn 100 check port 8008
    server node2 192.168.10.2:5432 maxconn 100 check port 8008
Save and close the file when you are finished.

If HAProxy service is already running, you can simply restart with systemctl restart haproxy command otherwise start HAProxy service with below command:
systemctl start haproxy
systemctl enable haproxy
systemctl status haproxy
If everything goes well, you will see the output similar to like below:


If HAProxy fails to start, check for configuration file syntax errors with below command:
haproxy -c -V -f /etc/haproxy/haproxy.cfg

Testing Postgres HA Cluster Setup

Connect Postgres clients to the HAProxy IP address of the node on which you installed HAProxy (in this guide, 192.168.10.4) on port 5000.

You can also access HAProxy node on port 7000 using any of your preferred web browsers to see the HAProxy dashboard like below: 


As you can see, the postgresql_192.168.10.1_5432 row is highlighted in green. This indicates that 192.168.10.1 is currently acting as the master.

If you kill the primary node (using sudo systemctl stop patroni or by completely shutting down the server), the dashboard will look similar to like below: 


In the Postgres section, the postgresql_192.168.10.1_5432 row is now red and the postgresql_192.168.10.2_5432 row is highlighted in green. This indicates that 192.168.10.2 is currently acting as the master. 

Note: In this case, it just so happens that the second Postgres server is promoted to master. This might not always be the case if you have more than two nodes in the cluster. It is equally likely that the third, fourth or fifth server may be promoted to master. 

When you bring up the first server, it will rejoin the cluster as a slave and will sync up with the master.


Wrapping up 

You now have a robust, highly available Postgres cluster ready for use. While the setup in this tutorial should go far in making your Postgres deployment highly available, here are a few more steps you can take to improve it further: 

  • Use a larger etcd cluster to improve availability.
  • Use PgBouncer to pool connections. 
  • Add another HAProxy server and configure IP failover to create a highly available HAProxy cluster.
Share:

19 comments:

  1. In your configuration
    patronictl -c /opt/app/etc/postgres.yml show-config does not work

    Error: 'Can not find suitable configuration of distributed configuration store\nAvailable implementations: exhibitor, zookeeper, etcd, kubernetes, consul'

    ReplyDelete
    Replies
    1. Your path for postgres.yml file is not correct...the correct path to show the configuration is:

      patronictl -c /opt/app/patroni/etc/postgresql.yml show-config

      Delete
  2. Hi, In which node I should create database and which IP should I include in my application as DB IP

    ReplyDelete
    Replies
    1. In this guide node1 and node2 will run postgresql so you can create database on any of them and it will automatically sync to remaining node....at client end, you will configure HAProxy IP and HAProxy will make the connection back to any of the database node whichever is holding master role.

      Delete
    2. Thanks for your quick response. I am also little bit confused in this part " You need to copy /opt/app/patroni/etc/postgresql.yml.sample file on (node1) with below command:" Do I need to do the same on node 2?

      Delete
    3. Of course you need do the same on node2. If you have more than two nodes in your setup for example, then you will need to do the same on all of the nodes.

      Delete
    4. Actually, I wasn't clearly mentioned on the document that's why I got confused. BTW thanks for the nice tutorial.

      Delete
  3. Hi, facing some issues. http://158.101.228.3:7000/ servers are down shows in the haproxy. Also, postgresq service also not starting in the dbnode1 and dbnode2.

    ReplyDelete
  4. haproxy shows server down. Any guess what might cause this?

    ReplyDelete
  5. psql: error: could not connect to server: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

    ReplyDelete
  6. Hi,

    I have this problem, i don't have idea. :S

    [root@patroni-1 ~]# systemctl status patroni -l
    ● patroni.service - PostgreSQL high-availability manager
    Loaded: loaded (/usr/lib/systemd/system/patroni.service; disabled; vendor preset: disabled)
    Active: failed (Result: exit-code) since miƩ 2020-09-23 08:29:51 -05; 21min ago
    Process: 1258 ExecStart=/opt/app/patroni/bin/patroni ${PATRONI_CONFIG_LOCATION} (code=exited, status=1/FAILURE)
    Main PID: 1258 (code=exited, status=1/FAILURE)

    sep 23 08:29:51 patroni-1 patroni[1258]: File "/opt/app/patroni/lib/python3.6/site-packages/patroni/ha.py", line 1260, in _run_cycle
    sep 23 08:29:51 patroni-1 patroni[1258]: return self.post_bootstrap()
    sep 23 08:29:51 patroni-1 patroni[1258]: File "/opt/app/patroni/lib/python3.6/site-packages/patroni/ha.py", line 1156, in post_bootstrap
    sep 23 08:29:51 patroni-1 patroni[1258]: self.cancel_initialization()
    sep 23 08:29:51 patroni-1 patroni[1258]: File "/opt/app/patroni/lib/python3.6/site-packages/patroni/ha.py", line 1151, in cancel_initialization
    sep 23 08:29:51 patroni-1 patroni[1258]: raise PatroniException('Failed to bootstrap cluster')
    sep 23 08:29:51 patroni-1 patroni[1258]: patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'
    sep 23 08:29:51 patroni-1 systemd[1]: patroni.service: main process exited, code=exited, status=1/FAILURE
    sep 23 08:29:51 patroni-1 systemd[1]: Unit patroni.service entered failed state.
    sep 23 08:29:51 patroni-1 systemd[1]: patroni.service failed.

    ReplyDelete
    Replies
    1. Please post your postgresql.yml file content here...there must be something you messed-up.

      Delete
  7. Hi, I have solved that problem cluster is running now. But facing another issue using HAproxy developer from remote machine unable to connect the database. I have opened 5432 port for his IP from the firewall. What else I can do to fix it?

    ReplyDelete
    Replies
    1. From a remote machine you have to make connection to database via haproxy ip and its listen port you defined in haproxy.cfg file. For example in this guide, listen port is 5000 so making connection to database via haproxy_ip:5000 should work.

      Delete
    2. You can test your connection to database from haproxy machine like below:

      psql -h localhost -U postgres -p 5000

      OR

      psql -h your_haproxy_ip -U postgres -p 5000

      and the haproxy will automatically make the connection to the backend database server.

      Delete
    3. Thanks it works. But I am facing a little problem, it disconnects automatically. Did you face it?

      Delete
    4. Little bit more detail...automatically disconnects idle session or what? Note down the time from making connection and automatically disconnection...Post here

      Delete
    5. I used haproxyip:5000 in my javaapp as db connection and after starting tomcat service getting this message: HTTP Status 404 – Not Found "The origin server did not find a current representation for the target resource or is not willing to disclose that one exists"

      From catalina out I am getting 24-Sep-2020 13:07:07.777 SEVERE [main] org.apache.catalina.core.StandardServer.await Failed to create server shutdown socket on address [localhost] and port [8005] (base port [8005] and offset [0])
      java.net.BindException: Address already in use (Bind failed)

      Delete
    6. For heaven sack man...do some little research using google on your own....pay attention to error msg.

      Address already in use (Bind failed)

      Delete

Comments with links will not be published.

Video Tutorials