Conflict-Free Replicated Data Types
Table of Contents
1. Overview
A Conflict-Free Replicated Data Type (CRDT) is a data structure that can be replicated across multiple nodes, updated independently and concurrently without coordination, and merged deterministically into a consistent state. The merge function is mathematically guaranteed to converge — no conflict resolution logic, no last-write-wins heuristics, no central authority.
CRDTs are the foundation of local-first software: applications that work offline, sync peer-to-peer, and treat the server as just another replica.
2. Algebraic Foundations
CRDTs exploit two algebraic structures to guarantee convergence.
2.1. State-based CRDTs (CvRDTs)
The state is a join-semilattice: a partially ordered set with a least upper bound (LUB) operation for every pair of elements. Merge is the LUB.
Semilattice requirements: - Commutative: a ⊔ b = b ⊔ a - Associative: (a ⊔ b) ⊔ c = a ⊔ (b ⊔ c) - Idempotent: a ⊔ a = a
Because merge is idempotent and commutative, duplicate or reordered messages are harmless. The state monotonically increases through the lattice.
2.2. Operation-based CRDTs (CmRDTs)
Instead of shipping full state, nodes exchange operations. Operations must be commutative (but need not be idempotent — the transport layer must guarantee exactly-once delivery).
The tradeoff: CmRDTs are more bandwidth-efficient but require a reliable causal broadcast layer. CvRDTs are simpler to implement over unreliable channels.
2.3. Delta-state CRDTs
A hybrid: nodes exchange only the delta (the difference since the last sync), not the full state. The delta is itself a join-semilattice element, so it merges the same way. This combines the bandwidth efficiency of CmRDTs with the robustness of CvRDTs.
3. CRDT Families
3.1. Counters
| Type | Description |
|---|---|
| G-Counter | Grow-only counter. Each node maintains its own count. |
| PN-Counter | Increment and decrement. Two G-Counters (positive, negative). |
A G-Counter is a map from node ID to count. Merge takes the max per node. The value is the sum of all entries.
;; G-Counter merge (defn merge-gcounter [a b] (merge-with max a b)) ;; Value (defn gcounter-value [c] (reduce + (vals c)))
3.2. Registers
| Type | Description |
|---|---|
| LWW-Register | Last-Writer-Wins by timestamp. Not strictly a CRDT. |
| MV-Register | Multi-Value register. Preserves all concurrent writes. |
LWW-Register is the most common in practice (used by DynamoDB, Cassandra) but sacrifices the "conflict-free" guarantee — concurrent writes are silently dropped based on wall-clock ordering.
3.3. Sets
| Type | Description |
|---|---|
| G-Set | Grow-only set. Elements can be added, never removed. |
| 2P-Set | Two-Phase set. Separate add-set and remove-set. |
| OR-Set | Observed-Remove set. Each add is tagged; removes target tags. |
| AWSet | Add-Wins set. Concurrent add + remove → element present. |
The OR-Set (Observed-Remove Set) is the most practical general-purpose CRDT
set. Each add(e) generates a unique tag. A remove(e) removes all
currently observed tags for e. A concurrent add on another node generates
a fresh tag that survives the remove.
3.4. Sequences (for collaborative text)
| Type | Description |
|---|---|
| RGA | Replicated Growable Array. Linked list with unique IDs. |
| LSEQ | Log-structured sequence. Fractional indexing. |
| Fugue | Maximally non-interleaving sequence CRDT (2023). |
Sequence CRDTs are the backbone of collaborative text editors. The key challenge: interleaving. When two users type at the same position concurrently, their characters must not interleave arbitrarily. Fugue (Weidner et al., 2023) is the first sequence CRDT proven to be maximally non-interleaving.
4. Implementations
4.1. Automerge
Automerge is a JSON-like CRDT library. Documents are nested maps, lists, and text. The backend uses a compressed columnar format for efficient storage and sync.
Key properties:
- Full document history (immutable log of changes)
- Binary sync protocol (compact over the wire)
- Rust core with JS/WASM, Python, Swift, Go bindings
- OR-Set semantics for concurrent map key updates
- RGA-based sequence for text and lists
4.2. Yjs
Yjs is an operation-based CRDT optimized for collaborative text editing.
Key properties:
- Sub-document granularity (sync parts of a document independently)
- YATA algorithm for sequences (Yet Another Transformation Approach)
- Awareness protocol (cursor positions, selections, presence)
- Provider architecture (WebSocket, WebRTC, IndexedDB, LevelDB)
- Smaller memory footprint than Automerge for text-heavy documents
4.3. Loro
Loro (2024) is a newer CRDT framework combining ideas from both:
Key properties:
- Movable tree CRDT (file system operations: move, rename)
- Peritext-inspired rich text CRDT
- Columnar encoding for history
- Rust core with WASM bindings
5. Practical Considerations
5.1. When to use CRDTs
CRDTs are the right choice when:
- Users must work offline and sync later
- Latency to a central server is too high for interactive use
- The application is peer-to-peer with no single authority
- Conflict resolution rules are well-defined (counters, sets, text)
5.2. When not to use CRDTs
CRDTs add complexity that is unnecessary when:
- A single database is the source of truth and always reachable
- Conflicts require human judgment (merging spreadsheet formulas)
- The data model requires global invariants (account balance >= 0)
- Garbage collection of tombstones is impractical
5.3. The tombstone problem
Deleted elements in many CRDTs (2P-Set, OR-Set) leave metadata behind. This metadata grows without bound unless garbage collected. GC requires coordination — knowing that all replicas have observed the deletion. This is the fundamental tension: CRDTs avoid coordination for writes but often need it for cleanup.
6. References
- Shapiro et al. (2011). "A comprehensive study of Convergent and Commutative Replicated Data Types." INRIA Technical Report 7506.
- Kleppmann & Beresford (2017). "A Conflict-Free Replicated JSON Datatype." IEEE Transactions on Parallel and Distributed Systems.
- Weidner et al. (2023). "The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing." arXiv:2305.00583.
- Gentle (2024). "Loro: Reimagining state management with CRDTs."
- See also: Local-First Software for the broader application architecture context.