Application behavior v5
Much of PGD's replication behavior is transparent to applications. Understanding how it achieves that and the elements that aren't transparent is important to successfully developing an application that works well with PGD.
Replication behavior
PGD supports replicating changes made on one node to other nodes.
PGD, by default, replicates all changes from INSERT, UPDATE, DELETE, and TRUNCATE
operations from the source node to other nodes. Only the final changes are sent,
after all triggers and rules are processed. For example, INSERT ... ON CONFLICT
UPDATE
sends either an insert or an update, depending on what occurred on the
origin. If an update or delete affects zero rows, then no changes are sent.
You can replicate INSERT without any preconditions.
For updates and deletes to replicate on other nodes, PGD must be able to identify the unique rows affected. PGD requires that a table have either a PRIMARY KEY defined, a UNIQUE constraint, or an explicit REPLICA IDENTITY defined on specific columns. If one of those isn't defined, a warning is generated, and later updates or deletes are explicitly blocked. If REPLICA IDENTITY FULL is defined for a table, then a unique index isn't required. In that case, updates and deletes are allowed and use the first non-unique index that's live, valid, not deferred, and doesn't have expressions or WHERE clauses. Otherwise, a sequential scan is used.
Truncate
You can use TRUNCATE even without a defined replication identity. Replication of TRUNCATE commands is supported, but take care when truncating groups of tables connected by foreign keys. When replicating a truncate action, the subscriber truncates the same group of tables that was truncated on the origin, either explicitly specified or implicitly collected by CASCADE, except in cases where replication sets are defined. See Replication sets for details and examples. This works correctly if all affected tables are part of the same subscription. But if some tables to truncate on the subscriber have foreign-key links to tables that aren't part of the same (or any) replication set, then applying the truncate action on the subscriber fails.
Row-level locks
Row-level locks taken implicitly by INSERT, UPDATE, and DELETE commands are
replicated as the changes are made. Table-level locks taken implicitly by
INSERT, UPDATE, DELETE, and TRUNCATE commands are also replicated. Explicit
row-level locking (SELECT ... FOR UPDATE/FOR SHARE
) by user sessions isn't
replicated, nor are advisory locks. Information stored by transactions running
in SERIALIZABLE mode isn't replicated to other nodes. The transaction isolation
level of SERIALIAZABLE is supported, but transactions aren't serialized across
nodes in the presence of concurrent transactions on multiple nodes.
If DML is executed on multiple nodes concurrently, then potential conflicts might occur if executing with asynchronous replication. You must either handle these or avoid them. Various avoidance mechanisms are possible, discussed in Conflicts.
Sequences
Sequences need special handling, described in Sequences. This is because in a cluster, sequences must be global to avoid nodes creating conflicting values. Global sequences are available with global locking to ensure integrity.
Binary objects
Binary data in BYTEA columns is replicated normally, allowing "blobs" of data up to 1 GB. Use of the PostgreSQL "large object" facility isn't supported in PGD.
Rules
Rules execute only on the origin node so aren't executed during apply, even if they're enabled for replicas.
Base tables only
Replication is possible only from base tables to base tables. That is, the tables on the source and target on the subscription side must be tables, not views, materialized views, or foreign tables. Attempts to replicate tables other than base tables result in an error. DML changes that are made through updatable views are resolved to base tables on the origin and then applied to the same base table name on the target.
Partitioned tables
PGD supports partitioned tables transparently, meaning that you can add a partitioned table to a replication set and changes that involve any of the partitions are replicated downstream.
Triggers
By default, triggers execute only on the origin node. For example, an INSERT
trigger executes on the origin node and is ignored when you apply the change on
the target node. You can specify for triggers to execute on both the origin node
at execution time and on the target when it's replicated (apply time) by using
ALTER TABLE ... ENABLE ALWAYS TRIGGER
. Or, use the REPLICA
option to execute
only at apply time: ALTER TABLE ... ENABLE REPLICA TRIGGER
.
Some types of trigger aren't executed on apply, even if they exist on a table and are currently enabled. Trigger types not executed are:
- Statement-level triggers (
FOR EACH STATEMENT
) - Per-column UPDATE triggers (
UPDATE OF column_name [, ...]
)
PGD replication apply uses the system-level default search_path. Replica triggers, stream triggers, and index expression functions can assume other search_path settings that then fail when they execute on apply. To prevent this from occurring, use any of these techniques:
- Resolve object references clearly using only the default search_path.
- Always use fully qualified references to objects, for example,
schema.objectname
. - Set the search path for a function using
ALTER FUNCTION ... SET search_path = ...
for the functions affected.
PGD assumes that there are no issues related to text or other collatable datatypes, that is, all collations in use are available on all nodes, and the default collation is the same on all nodes. Replicating changes uses equality searches to locate Replica Identity values, so this does't have any effect except where unique indexes are explicitly defined with nonmatching collation qualifiers. Row filters might be affected by differences in collations if collatable expressions were used.
Toast
PGD handling of very long "toasted" data in PostgreSQL is transparent to the user. The TOAST "chunkid" values likely differ between the same row on different nodes, but that doesn't cause any problems.
Other restrictions
PGD can't work correctly if Replica Identity columns are marked as external.
PostgreSQL allows CHECK() constraints that contain volatile functions. Since PGD reexecutes CHECK() constraints on apply, any subsequent reexecution that doesn't return the same result as before causes data divergence.
PGD doesn't restrict the use of foreign keys. Cascading FKs are allowed.