CRDT Disk-space requirements v5
An important consideration is the overhead associated with CRDT types, particularly the on-disk size.
Operation-based CRDT disk-space reqs
For operation-based types, the issue of disk-space is trivial because the types are merely domains on top of other types. They have the same disk space requirements no matter how many nodes are there:
crdt_delta_counter
— Same asbigint
(8 bytes).crdt_delta_sum
— Same asnumeric
(variable, depending on precision and scale).
There's no dependency on the number of nodes because operation-based CRDT types don't store any per-node information.
State-based CRDT disk-space reqs
For state-based types, the situation is more complicated. All the types are variable length (stored essentially as a bytea
column) and consist of a header and a certain amount of per-node information for each node that modified the value.
For the bigint
variants, formulas computing approximate size are:
crdt_gcounter
—32B (header) + N * 12B (per-node)
.crdt_pncounter
—48B (header) + N * 20B (per-node)
.
N
denotes the number of nodes that modified this value.
For the numeric
variants, there's no exact formula because both the header and per-node parts include numeric
variable-length values. To give you an idea of how many such values you need to keep:
crdt_gsum
- fixed:
20B (header) + N * 4B (per-node)
- variable:
(2 + N)
numeric
values
- fixed:
crdt_pnsum
- fixed:
20B (header) + N * 4B (per-node)
- variable:
(4 + 2 * N)
numeric
values
- fixed:
Note
It doesn't matter how many nodes are in the cluster if the values are never updated on multiple nodes. It also doesn't matter whether the updates were concurrent (causing a conflict).
In addition, it doesn't matter how many of those nodes were already removed from the cluster. There's no way to compact the state yet.