Deploying an EDB Postgres Distributed example cluster on Linux hosts v5
Introducing TPA and PGD
We created TPA to make installing and managing various Postgres configurations easily repeatable. TPA orchestrates creating and deploying Postgres. In this quick start, you install TPA first. If you already have TPA installed, you can skip those steps. You can use TPA to deploy various configurations of Postgres clusters.
PGD is a multi-master replicating implementation of Postgres designed for high performance and availability. The installation of PGD is orchestrated by TPA. You will use TPA to generate a configuration file for a PGD demonstration cluster.
The TPA Linux host option allows users of any cloud or VM platform to use TPA to configure EDB Postgres Distributed. All you need from TPA is for the target system to be configured with a Linux operating system and accessible using SSH. Unlike the other TPA platforms (Docker and AWS), the Linux host configuration doesn't provision the target machines. You need to provision them wherever you decide to deploy.
This cluster uses Linux server instances to host the cluster's nodes. The nodes include three replicating database nodes, three cohosted connection proxies, and one backup node. TPA can then provision, prepare, and deploy the required EDB Postgres Distributed software and configuration to each node.
On host compatibility
This set of steps is specifically for users running Ubuntu 22.04 LTS on Intel/AMD processors.
Prerequisites
Configure your Linux hosts
You need to provision four hosts for this quick start. Each host must have a supported Linux operating system installed. To eliminate prompts for password, each host also needs to be SSH accessible using certificate key pairs.
On machine provisioning
Azure users can follow a Microsoft guide on how to provision Azure VMs loaded with Linux. Google Cloud Platform users can follow a Google guide on how to provision GCP VMs with Linux loaded. You can use any virtual machine technology to host a Linux instance, too. Refer to your virtualization platform's documentation for instructions on how to create instances with Linux loaded on them.
Whichever cloud or VM platform you use, you need to make sure that each instance is accessible by SSH and that each instance can connect to the other instances. They can connect through either the public network or over a VPC for the cloud platforms. You can connect through your local network for on-premises VMs.
If you can't do this, you might want to consider the Docker or AWS quick start. These configurations are easier to set up and quicker to tear down. The AWS quick start, for example, automatically provisions compute instances and creates a VPC for those instances.
In this quick start, you will install PGD nodes onto four hosts configured in the cloud. Each of these hosts in this example is installed with Rocky Linux. Each has a public IP address to go with its private IP address.
Host name | Public IP | Private IP |
---|---|---|
linuxhost-1 | 172.19.16.27 | 192.168.2.247 |
linuxhost-2 | 172.19.16.26 | 192.168.2.41 |
linuxhost-3 | 172.19.16.25 | 192.168.2.254 |
linuxhost-4 | 172.19.16.15 | 192.168.2.30 |
These are example IP addresses. Substitute them with your own public and private IP addresses as you progress through the quick start.
Set up a host admin user
Each machine requires a user account to use for installation. For simplicity, use a user with the same name on all the hosts. On each host, also configure the user so that you can SSH into the host without being prompted for a password. Be sure to give that user sudo privileges on the host. On the four hosts, the user rocky is already configured with sudo privileges.
Preparation
EDB account
You need an EDB account to install both TPA and PGD.
Sign up for a free EDB account if you don't already have one. Signing up gives you a trial subscription to EDB's software repositories.
After you're registered, go to the EDB Repos 2.0 page, where you can obtain your repo token.
On your first visit to this page, select Request Access to generate your repo token. Copy the token using the Copy Token icon, and store it safely.
Setting environment variables
First, set the EDB_SUBSCRIPTION_TOKEN
environment variable to the value of your EDB repo token, obtained in the EDB account step.
You can add this to your .bashrc
script or similar shell profile to ensure it's always set.
Configure the repository
All the software needed for this example is available from the EDB Postgres Distributed package repository. Download and run a script to configure the EDB Postgres Distributed repository. This repository also contains the TPA packages.
Installing Trusted Postgres Architect (TPA)
You'll use TPA to provision and deploy PGD. If you previously installed TPA, you can move on to the next step. You'll find full instructions for installing TPA in the Trusted Postgres Architect documentation, which we've also included here.
Linux environment
TPA supports several distributions of Linux as a host platform. These examples are written for Ubuntu 22.04, but steps are similar for other supported platforms.
Install the TPA package
Configuring TPA
You now need to configure TPA, which configures TPA's Python environment. Call tpaexec
with the command setup
:
You can add the export
command to your shell's profile.
Testing the TPA installation
You can verify TPA is correctly installed by running selftest
:
TPA is now installed.
Installing PGD using TPA
Generating a configuration file
Run the tpaexec configure
command to generate a configuration folder:
You specify the PGD-Always-ON architecture (--architecture PGD-Always-ON
), which sets up the configuration for PGD's Always-on architectures. As part of the default architecture, it configures your cluster with three data nodes, cohosting three PGD Proxy servers and a Barman node for backup.
For Linux hosts, specify that you're targeting a "bare" platform (--platform bare
). TPA determines the Linux version running on each host during deployment. See the EDB Postgres Distributed compatibility table for details about the supported operating systems.
Specify that the data nodes will be running EDB Postgres Advanced Server v16 (--edb-postgres-advanced 16
) with Oracle compatibility (--redwood
).
You set the notional location of the nodes to dc1
using --location-names
. You then set --pgd-proxy-routing
to local
so that proxy routing can route traffic to all nodes in each location.
By default, TPA commits configuration changes to a Git repository. For this example, you don't need to do that, so pass the --no-git
flag.
Finally, you ask TPA to generate repeatable hostnames for the nodes by passing --hostnames-unsorted
. Otherwise, it selects hostnames at random from a predefined list of suitable words.
This command creates a subdirectory in the current working directory called democluster
. It contains the config.yml
configuration file TPA uses to create the cluster. You can view it using:
You now need to edit the configuration file to add details related to your Linux hosts, such as admin user names and public and private IP addresses.
Editing your configuration
Using your preferred editor, open democluster/config.yml
.
Search for the line containing ansible_user: root
. Change root
to the name of the user you configured with SSH access and sudo privileges. Follow that with this line:
Your instance_defaults
section now looks like this:
Next, search for node: 1
, which is the configuration settings of the first node, kaboom.
After the node: 1
line, add the public and private IP addresses of your node. Use linuxhost-1
as the host for this node. Add the following to the file, substituting your IP addresses. Align the start of each line with the start of the node:
line.
The whole entry for kaboom looks like this but with your own IP addresses:
Repeat this process for the three other nodes.
Search for node: 2
, which is the configuration settings for the node kaftan. Use linuxhost-2
for this node. Substituting your IP addresses, add:
Search for node: 3
, which is the configuration settings for the node kaolin. Use linuxhost-3
for this node. Substituting your IP addresses, add:
Finally, search for node: 4
, which is the configuration settings for the node kapok. Use linuxhost-4
for this node. Substituting your IP addresses, add:
Provisioning the cluster
You can now run:
This command prepares for deploying the cluster. (On other platforms, such as Docker and AWS, this command also creates the required hosts. When using Linux hosts, your hosts must already be configured.)
Further reading
tpaexec provision
in the Trusted Postgres Architect documentation
One part of this process for Linux hosts is creating key-pairs for the hosts for SSH operations later. With those key-pairs created, you need to copy the public part of the key-pair to the hosts. You can do this with ssh-copy-id
, giving the democluster identity (-i
) and the login to each host. For this example, these are the commands:
You can now create the tpa_known_hosts
file, which allows the hosts to be verified. Use ssh-keyscan
on each host (-H
) and append its output to tpa_known_hosts
:
Deploy your cluster
You now have everything ready to deploy your cluster. To deploy, run:
TPA applies the configuration, installing the needed packages and setting up the actual EDB Postgres Distributed cluster.
Further reading
tpaexec deploy
in the Trusted Postgres Architect documentation
Connecting to the cluster
You're now ready to log in to one of the nodes of the cluster with SSH and then connect to the database. Part of the configuration process set up SSH logins for all the nodes, complete with keys. To use the SSH configuration, you need to be in the democluster
directory created by the tpaexec configure
command earlier:
From there, you can run ssh -F ssh_config <hostname>
to establish an SSH connection. Connect to kaboom, the first database node in the cluster:
Notice that you're logged in as rocky, the admin user and ansible user you configured earlier, on kaboom.
You now need to adopt the identity of the enterprisedb user. This user is preconfigured and authorized to connect to the cluster's nodes.
You can now run the psql
command to access the bdrdb
database:
You're directly connected to the Postgres database running on the kaboom node and can start issuing SQL commands.
To leave the SQL client, enter exit
.
Using PGD CLI
The pgd utility, also known as the PGD CLI, lets you control and manage your EDB Postgres Distributed cluster. It's already installed on the node.
You can use it to check the cluster's health by running pgd check-health
:
Or, you can use pgd show-nodes
to ask PGD to show you the data-bearing nodes in the cluster:
Similarly, use pgd show-proxies
to display the proxy connection nodes:
The proxies provide high-availability connections to the cluster of data nodes for applications. You can connect to the proxies and, in turn, to the database with the command psql -h kaboom,kaftan,kaolin -p 6432 bdrdb
:
Explore your cluster
- Connect to your database to applications.
- Explore replication with hands-on exercises.
- Explore failover with hands-on exercises.
- Understand conflicts by creating and monitoring them.
- Take the next steps for working with your cluster.