Nodes with differences v5
Replicating between nodes with differences
By default, DDL is sent to all nodes. You can control this behavior, as described in DDL replication, and you can use it to create differences between database schemas across nodes. PGD is designed to allow replication to continue even with minor differences between nodes. These features are designed to allow application schema migration without downtime or to allow logical standby nodes for reporting or testing.
Currently, replication requires the same table name on all nodes. A future feature might allow a mapping between different table names.
It's possible to replicate between tables with dissimilar partitioning definitions, such as a source that's a normal table replicating to a partitioned table, including support for updates that change partitions on the target. It can be faster if the partitioning definition is the same on the source and target since dynamic partition routing doesn't need to execute at apply time. For details, see Replication sets.
By default, all columns are replicated.
PGD replicates data columns based on the column name. If a column has the same name but a different data type, PGD attempts to cast from the source type to the target type, if casts were defined that allow that.
PGD supports replicating between tables that have a different number of columns.
If the target has missing columns from the source, then PGD raises a
target_column_missing
conflict, for which the default conflict resolver is
ignore_if_null
. This throws an error if a non-NULL value arrives.
Alternatively, you can also configure a node with a conflict resolver of
ignore
. This setting doesn't throw an error but silently ignores any
additional columns.
If the target has additional columns not seen in the source record, then PGD
raises a source_column_missing
conflict, for which the default conflict
resolver is use_default_value
. Replication proceeds if the additional columns
have a default, either NULL (if nullable) or a default expression. If not, it throws an
error and halts replication.
Transform triggers can also be used on tables to provide default values or alter the incoming data in various ways before apply.
If the source and the target have different constraints, then replication is attempted, but it might fail if the rows from source can't be applied to the target. Row filters can help here.
Replicating data from one schema to a more relaxed schema doesn't cause failures. Replicating data from a schema to a more restrictive schema can be a source of potential failures. The right way to solve this is to place a constraint on the more relaxed side, so bad data can't be entered. That way, no bad data ever arrives by replication, so it never fails the transform into the more restrictive schema. For example, if one schema has a column of type TEXT and another schema defines the same column as XML, add a CHECK constraint onto the TEXT column to enforce that the text is XML.
You can define a table with different indexes on each node. By default, the index definitions are replicated. To specify how to create an index on only a subset of nodes or just locally, see DDL replication .
Storage parameters, such as fillfactor
and toast_tuple_target
, can differ
between nodes for a table without problems. An exception to that behavior is that the
value of a table's storage parameter user_catalog_table
must be identical on
all nodes.
A table being replicated must be owned by the same user/role on each node. See Security and roles for details.
Roles can have different passwords for connection on each node, although by default changes to roles are replicated to each node. See DDL replication to specify how to alter a role password on only a subset of nodes or locally.
Comparison between nodes with differences
LiveCompare is a tool for data comparison on a database against PGD and non-PGD nodes. It needs a minimum of two connections to compare against and reach a final result.
Starting with LiveCompare 1.3, you can configure with all_bdr_nodes
set. This setting
saves you from clarifying all the relevant DSNs for each separate node in the
cluster. An EDB Postgres Distributed cluster has N amount of nodes with
connection information, but it's only the initial and output connection that
LiveCompare 1.3 and later needs to complete its job. Setting logical_replication_mode
states how all the nodes are communicating.
All the configuration is done in a .ini
file named bdrLC.ini
, for example.
Find templates for this configuration file in /etc/2ndq-livecompare/
.
While LiveCompare executes, you see N+1 progress bars, N being the number of processes. Once all the tables are sourced, a time displays as the transactions per second (tps) was measured. This mechanism continues to count the time, giving you an estimate and then a total execution time at the end.
This tool offers a lot of customization and filters, such as tables, schemas, and replication_sets. LiveCompare can use stop-start without losing context information, so it can run at convenient times. After the comparison, a summary and a DML script are generated so you can review it. Apply the DML to fix any differences found.
Replicating between different release levels
The other difference between nodes that you might encounter is where there are different major versions of PostgreSQL on the nodes. PGD is designed to replicate between different major release versions. This feature is designed to allow major version upgrades without downtime.
PGD is also designed to replicate between nodes that have different versions of PGD software. This feature is designed to allow version upgrades and maintenance without downtime.
However, while it's possible to join a node with a major version in a cluster, you can't add a node with a minor version if the cluster uses a newer protocol version. Doing so returns an error.
Both of these features might be affected by specific restrictions. See Release notes for any known incompatibilities.