Conflict-Free Replicated Data Types

Table of Contents

1. Overview

A Conflict-Free Replicated Data Type (CRDT) is a data structure that can be replicated across multiple nodes, updated independently and concurrently without coordination, and merged deterministically into a consistent state. The merge function is mathematically guaranteed to converge — no conflict resolution logic, no last-write-wins heuristics, no central authority.

CRDTs are the foundation of local-first software: applications that work offline, sync peer-to-peer, and treat the server as just another replica.

2. Algebraic Foundations

CRDTs exploit two algebraic structures to guarantee convergence.

2.1. State-based CRDTs (CvRDTs)

The state is a join-semilattice: a partially ordered set with a least upper bound (LUB) operation for every pair of elements. Merge is the LUB.

Semilattice requirements:
  - Commutative:  a ⊔ b = b ⊔ a
  - Associative:  (a ⊔ b) ⊔ c = a ⊔ (b ⊔ c)
  - Idempotent:   a ⊔ a = a

Because merge is idempotent and commutative, duplicate or reordered messages are harmless. The state monotonically increases through the lattice.

2.2. Operation-based CRDTs (CmRDTs)

Instead of shipping full state, nodes exchange operations. Operations must be commutative (but need not be idempotent — the transport layer must guarantee exactly-once delivery).

The tradeoff: CmRDTs are more bandwidth-efficient but require a reliable causal broadcast layer. CvRDTs are simpler to implement over unreliable channels.

2.3. Delta-state CRDTs

A hybrid: nodes exchange only the delta (the difference since the last sync), not the full state. The delta is itself a join-semilattice element, so it merges the same way. This combines the bandwidth efficiency of CmRDTs with the robustness of CvRDTs.

3. CRDT Families

3.1. Counters

Type Description
G-Counter Grow-only counter. Each node maintains its own count.
PN-Counter Increment and decrement. Two G-Counters (positive, negative).

A G-Counter is a map from node ID to count. Merge takes the max per node. The value is the sum of all entries.

;; G-Counter merge
(defn merge-gcounter [a b]
  (merge-with max a b))

;; Value
(defn gcounter-value [c]
  (reduce + (vals c)))

3.2. Registers

Type Description
LWW-Register Last-Writer-Wins by timestamp. Not strictly a CRDT.
MV-Register Multi-Value register. Preserves all concurrent writes.

LWW-Register is the most common in practice (used by DynamoDB, Cassandra) but sacrifices the "conflict-free" guarantee — concurrent writes are silently dropped based on wall-clock ordering.

3.3. Sets

Type Description
G-Set Grow-only set. Elements can be added, never removed.
2P-Set Two-Phase set. Separate add-set and remove-set.
OR-Set Observed-Remove set. Each add is tagged; removes target tags.
AWSet Add-Wins set. Concurrent add + remove → element present.

The OR-Set (Observed-Remove Set) is the most practical general-purpose CRDT set. Each add(e) generates a unique tag. A remove(e) removes all currently observed tags for e. A concurrent add on another node generates a fresh tag that survives the remove.

3.4. Sequences (for collaborative text)

Type Description
RGA Replicated Growable Array. Linked list with unique IDs.
LSEQ Log-structured sequence. Fractional indexing.
Fugue Maximally non-interleaving sequence CRDT (2023).

Sequence CRDTs are the backbone of collaborative text editors. The key challenge: interleaving. When two users type at the same position concurrently, their characters must not interleave arbitrarily. Fugue (Weidner et al., 2023) is the first sequence CRDT proven to be maximally non-interleaving.

4. Implementations

4.1. Automerge

Automerge is a JSON-like CRDT library. Documents are nested maps, lists, and text. The backend uses a compressed columnar format for efficient storage and sync.

Key properties:

  • Full document history (immutable log of changes)
  • Binary sync protocol (compact over the wire)
  • Rust core with JS/WASM, Python, Swift, Go bindings
  • OR-Set semantics for concurrent map key updates
  • RGA-based sequence for text and lists

4.2. Yjs

Yjs is an operation-based CRDT optimized for collaborative text editing.

Key properties:

  • Sub-document granularity (sync parts of a document independently)
  • YATA algorithm for sequences (Yet Another Transformation Approach)
  • Awareness protocol (cursor positions, selections, presence)
  • Provider architecture (WebSocket, WebRTC, IndexedDB, LevelDB)
  • Smaller memory footprint than Automerge for text-heavy documents

4.3. Loro

Loro (2024) is a newer CRDT framework combining ideas from both:

Key properties:

  • Movable tree CRDT (file system operations: move, rename)
  • Peritext-inspired rich text CRDT
  • Columnar encoding for history
  • Rust core with WASM bindings

5. Practical Considerations

5.1. When to use CRDTs

CRDTs are the right choice when:

  • Users must work offline and sync later
  • Latency to a central server is too high for interactive use
  • The application is peer-to-peer with no single authority
  • Conflict resolution rules are well-defined (counters, sets, text)

5.2. When not to use CRDTs

CRDTs add complexity that is unnecessary when:

  • A single database is the source of truth and always reachable
  • Conflicts require human judgment (merging spreadsheet formulas)
  • The data model requires global invariants (account balance >= 0)
  • Garbage collection of tombstones is impractical

5.3. The tombstone problem

Deleted elements in many CRDTs (2P-Set, OR-Set) leave metadata behind. This metadata grows without bound unless garbage collected. GC requires coordination — knowing that all replicas have observed the deletion. This is the fundamental tension: CRDTs avoid coordination for writes but often need it for cleanup.

6. References

  • Shapiro et al. (2011). "A comprehensive study of Convergent and Commutative Replicated Data Types." INRIA Technical Report 7506.
  • Kleppmann & Beresford (2017). "A Conflict-Free Replicated JSON Datatype." IEEE Transactions on Parallel and Distributed Systems.
  • Weidner et al. (2023). "The Art of the Fugue: Minimizing Interleaving in Collaborative Text Editing." arXiv:2305.00583.
  • Gentle (2024). "Loro: Reimagining state management with CRDTs."
  • See also: Local-First Software for the broader application architecture context.