DDL locking details v5
Two kinds of locks enforce correctness of replicated DDL with PGD: the global DDL lock and the global DML lock.
The global DDL lock
A global DDL lock is used only when ddl_locking = 'all'
.
This kind of lock prevents any other DDL from executing on the cluster while
each DDL statement runs. This behavior ensures full correctness in the general case but
is too strict for many simple cases. PGD acquires a global lock on
DDL operations the first time in a transaction where schema changes are made.
This effectively serializes the DDL-executing transactions in the cluster. In
other words, while DDL is running, no other connection on any node can run
another DDL command, even if it affects different tables.
To acquire a lock on DDL operations, the PGD node executing DDL contacts the other nodes in a PGD group and asks them to grant it the exclusive right to execute DDL.
The lock request is sent by the regular replication stream, and the nodes respond by the replication stream as well. So it's important that nodes (or at least a majority of the nodes) run without much replication delay. Otherwise it might take a long time for the node to acquire the DDL lock. Once the majority of nodes agree, the DDL execution is carried out.
The ordering of DDL locking is decided using the Raft protocol. DDL statements executed on one node are executed in the same sequence on all other nodes.
To ensure that the node running a DDL has seen effects of all prior DDLs run in the cluster, it waits until it catches up with the node that ran the previous DDL. If the node running the current DDL is lagging behind in replication with respect to the node that ran the previous DDL, then it might take a long time to acquire the lock. Hence it's preferable to run DDLs from a single node or the nodes that have nearly caught up with replication changes originating at other nodes.
A global DDL lock must be granted by a majority of data and witness nodes, where a majority is N/2+1 of the eligible nodes. Subscriber-only nodes aren't eligible to participate.
The global DML lock
Known as a global DML lock or relation DML lock, this kind of lock is used when
either ddl_locking = all
or ddl_locking = dml
, and the DDL statement might cause
in-flight DML statements to fail. These failures can occur when you add or modify a constraint,
such as a unique constraint, check constraint, or NOT NULL constraint.
Relation DML locks affect only one relation at a time. These
locks ensure that no DDL executes while changes are in the queue that
might cause replication to halt with an error.
To acquire the global DML lock on a table, the PGD node executing the DDL contacts all other nodes in a PGD group, asking them to lock the table against writes and waiting while all pending changes to that table are drained. Once all nodes are fully caught up, the originator of the DML lock is free to perform schema changes to the table and replicate them to the other nodes.
The global DML lock holds an EXCLUSIVE LOCK on the table on each node, so it blocks DML, other DDL, VACUUM, and index commands against that table while it runs. This is true even if the global DML lock is held for a command that normally doesn't take an EXCLUSIVE LOCK or higher.
Waiting for pending DML operations to drain can take a long time and even longer if replication is currently lagging. This means that, unlike with data changes, schema changes affecting row representation and constraints can be performed only while all configured nodes can be reached and are keeping up reasonably well with the current write rate. If such DDL commands must be performed while a node is down, first remove the down node from the configuration.
All eligible data nodes must agree to grant a global DML lock before the lock is granted. Witness and subscriber-only nodes aren't eligible to participate.
If a DDL statement isn't replicated, no global locks are acquired.
Specify locking behavior with the bdr.ddl_locking
parameter, as
explained in Executing DDL on PGD systems:
ddl_locking = all
takes global DDL lock and, if needed, takes relation DML lock.ddl_locking = dml
skips global DDL lock and, if needed, takes relation DML lock.ddl_locking = off
skips both global DDL lock and relation DML lock.
Some PGD functions make DDL changes. For those functions, DDL locking behavior applies, which is noted in the documentation for each function.
Thus, ddl_locking = dml
is safe only when you can guarantee that
no conflicting DDL is executed from other nodes. With this setting,
the statements that require only the global DDL lock don't use the global
locking at all.
ddl_locking = off
is safe only when you can guarantee that there are no
conflicting DDL and no conflicting DML operations on the database objects
DDL executes on. If you turn locking off and then experience difficulties,
you might lose in-flight changes to data. The user application team needs to resolve any issues caused.
In some cases, concurrently executing DDL can properly be serialized. If these serialization failures occur, the DDL might reexecute.
DDL replication isn't active on logical standby nodes until they're promoted.
Some PGD management functions act like DDL, meaning that they attempt to take global locks, and their actions are replicated if DDL replication is active. The full list of replicated functions is listed in PGD functions that behave like DDL.
DDL executed on temporary tables never need global locks.
ALTER or DROP of an object created in the current transaction doesn't required global DML lock.
Monitoring of global DDL locks and global DML locks is shown in Monitoring.
- On this page
- The global DDL lock
- The global DML lock