Annotation Systems for Evolving Documents

Table of Contents

Org-mode research documents reorganize: headings rename, sections split, files restructure. Review notes, citation flags, and verification metadata attached to those headings must survive the reorganization or be lost.

This note evaluates anchoring strategies for that problem.

1. The Anchor Problem

Three anchor families, ordered by brittleness.

1.1. Byte offsets

The simplest anchor: character position 4,273. Dies on the first edit upstream of the annotation. Not used in practice except in binary formats.

1.2. Structural selectors

XPath (//section[3]/p[1]), CSS selectors, heading paths (Introduction > Background > Prior Work). Survive whitespace changes. Break on any structural reorganization. The W3C Web Annotation Data Model (Sanderson, Ciccarese, and Young 2017) uses a layered selector chain: a RangeSelector wraps a XPathSelector which wraps a TextPositionSelector, so at least one layer may still resolve when others fail.

1.3. Text quote selectors

Store the exact text (exact) plus a prefix and suffix for context. On re-anchoring, search for the exact string; if missing, fuzzy-match against candidates weighted by context overlap. The Hypothesis annotation client uses this approach. It survives minor rewrites. It does not survive paragraph deletion or heavy paraphrase.

1.4. Content-addressed anchors

Hash the paragraph's canonical text (strip trailing whitespace, normalize Unicode). Store sha256:a3f7.... An annotation indexed this way survives file moves and heading renames, as long as the paragraph text itself is stable. Dies on any edit to the target paragraph. Useful for locking a "needs citation" flag to a specific claim.

1.5. Semantic anchors

Reference a stable heading ID (:CUSTOM_ID: background-prior-work) rather than the heading text. In org-mode, the :CUSTOM_ID: property is set by the author and survives heading renames. It does not survive heading deletion, but deletion is detectable: a script can scan for orphaned annotation references.

The trade-off table:

Anchor type Survives rename Survives reorder Survives text edit Survives deletion
Byte offset No No No No
XPath / heading path No No Yes No
Text quote + context Yes Yes Partial No
Content hash Yes Yes No No
:CUSTOM_ID: Yes Yes Yes No (detectable)

No anchor survives deletion. The best you can do is detect it.

2. Org-Mode Native Approaches

Org-mode provides four annotation mechanisms in the base installation. Each makes a different trade-off between visibility, structuring, and survival rate.

2.1. #+begin_comment blocks

#+begin_comment
REVIEW 2026-06-07 jwalsh: Claims about XPath brittleness need a citation.
Hypothesis client source code would work. Or the W3C spec itself.
#+end_comment

Invisible on HTML/PDF export. Inline with the text they annotate. Version controlled. The weakness: they live inside the section they comment on, so heading renames do not break them, but section deletion takes them with it. Good for author-to-author notes that belong to a specific passage.

2.2. Property drawers

* Background
:PROPERTIES:
:CUSTOM_ID: background
:REVIEW:   2026-06-07 jwalsh: Needs citation for Kalir & Garcia claim
:STATUS:   needs-citation
:END:

Every heading can carry arbitrary key-value metadata. :REVIEW: and :STATUS: are author-defined. These survive heading renames — they sit below the heading text. They survive file reorganization. They do not survive heading deletion.

A useful pattern: combine :CUSTOM_ID: with :STATUS: needs-citation so that a search across the repo can find all outstanding annotation tasks:

grep -rn ':STATUS:.*needs-citation' site/research/

2.3. org-annotate-file

VERIFIED_AT: 2026-06-07T22:50Z
VERDICT: corrected
FINDING: no package called org-annotate exists on MELPA; correct name is org-annotate-file (EmacsWiki) or annotate.el (bastibe/annotate.el, MELPA)

org-annotate-file (EmacsWiki, not MELPA) maintains a separate annotations file. Each entry records the file path and heading text of its target. annotate.el (bastibe/annotate.el, MELPA) is a more actively maintained alternative that stores annotations keyed to file and line number.

Decoupled: the annotation file lives in a private overlay repo, separate from the document source. But both key on heading text or line position, not :CUSTOM_ID:. A heading rename breaks the link silently.

2.4. org-noter

VERIFIED_AT: 2026-06-07T22:50Z
VERDICT: correct
FINDING: exists on MELPA (org-noter/org-noter fork), description accurate

org-noter is designed for annotating PDFs and EPUBs via a parallel note buffer synchronized by position. It works for org-mode documents too, with the document in one window and the notes in another. Position sync is by heading for org files.

Works for reading sessions: open document, open notes buffer, write as you read. Does not persist review flags across editing sessions.

2.5. Inline footnotes as marginal commentary

The anchor problem has three sub-problems[fn:: Position, survival, and
discoverability. See also the W3C selector chain design.]: position encoding,
survival under edits, and discoverability.

Inline footnotes ([fn:: ...]) are rendered as footnotes on export and are searchable in the source. They sit immediately adjacent to the text they annotate. Not suitable for structured metadata, but useful for brief clarifications that belong in the published document.

3. Git Notes

Git notes (git notes add -m "...") attach metadata to commits without amending them. Notes live in a separate ref (refs/notes/commits) and appear in git log --show-notes.

# Annotate a commit after the fact
git notes add -m "verified by skeptic-agent: 4 corrections" abc1234

# View notes inline with log
git log --show-notes --oneline

# Push notes to remote (not default!)
git push origin refs/notes/*

The appeal: annotations are decoupled from the commit message, addable retroactively, and don't rewrite history. An agent could verify a commit's claims days later and attach findings without amending.

The problems are severe for this use case:

  • Temporal, not spatial. Notes key to commits, not to content. A note on commit abc1234 says nothing about heading jwt-hs256-claim — it says something about a point in time. If the heading moved across three commits, you need git blame to trace backward through each, checking for notes at every step. Property drawers move with the heading; git notes do not.
  • Invisible by default. git push does not push notes. git fetch does not fetch them. You must explicitly configure refs/notes/* in both directions. Most teams never do.
  • Lost on rebase. Notes reference commit SHAs. A rebase rewrites SHAs. Notes on the old commits become orphans pointing at objects that no longer exist in the branch history.
  • No structure. A note is a free-text blob. No schema, no queryable fields, no lifecycle. You cannot grep for "all headings with verdict 'corrected'" across notes the way you can across property drawers.

Git notes solve a real problem (retroactive annotation of history) but fail the spatial anchoring requirement. For a living document workflow where headings move, split, and rename, property drawers and beads are stronger primitives. Git notes are best for annotating releases or deployments — temporal events that don't move.

4. Web Annotation (W3C Standard)

VERIFIED_AT: 2026-06-07T22:50Z
VERDICT: correct
FINDING: hypothesis/h confirmed (Python/Pyramid, BSD-2, 3.2k stars, active). API description accurate.

The W3C Web Annotation Data Model (Sanderson, Ciccarese, and Young 2017) defines a JSON-LD format for attaching annotations to any web resource. The core structure:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "type": "Annotation",
  "target": {
    "source": "https://wal.sh/research/annotation-systems/",
    "selector": {
      "type": "TextQuoteSelector",
      "exact": "No anchor survives deletion.",
      "prefix": "table:\n\n",
      "suffix": " The best you can do"
    }
  },
  "body": {
    "type": "TextualBody",
    "value": "This claim should be qualified — content-addressed anchors detect deletion indirectly via missing hash."
  }
}

Hypothesis (hypothes.is) implements this standard with a browser extension and public/private annotation groups. It stores annotations server-side, decoupled from the document. The text quote selector adds prefix and suffix so that re-anchoring can fuzzy-match when exact text is unavailable.

Strengths of the W3C approach:

  • Decoupled from the document source. Annotations survive file moves if the canonical URL is stable.
  • Shareable. Multiple readers can annotate the same published URL.
  • Standard format. Tooling can consume annotations.json files.

Weaknesses:

  • Brittle to content rewrites. The TextQuoteSelector fails if the exact string is edited.
  • Requires a stable published URL. Draft org-mode docs have no such URL.
  • No integration with git history. You cannot ask "what did this annotation refer to at commit d1bb4e1?"

5. Scholarly Annotation Tradition

Kalir and Garcia (Kalir and Garcia 2021) map annotation's intellectual history across marginalia, classroom practice, and digital reading environments. They identify five annotation functions:

Function Description Research workflow equivalent
Enabling Comprehension aids, definitions, glosses Inline footnotes, #+begin_comment
Summarizing Condensing key points Property :SUMMARY: on sections
Assessing Quality judgments, flagging weak arguments :STATUS: needs-citation
Bridging Connecting to external sources [cite:@key] cross-references
Problematizing Raising questions, flagging contradictions :REVIEW: drawers, comment blocks

The mapping is not perfect. Scholarly marginalia are often ephemeral; code review annotations are often task-tracking artifacts that want a lifecycle (open, addressed, closed). The bridging function maps cleanly onto citations. The problematizing function maps onto the "needs citation" and "check this claim" flags that appear throughout long-form research notes.

An annotation system for research documents needs three lifecycles. Enabling and summarizing annotations are editorial; they stay permanently. Assessing and problematizing annotations are transient task flags — they should be closeable without deleting the surrounding prose.

6. Design for Resilience

Four design choices determine how well an annotation system survives document evolution.

6.1. Stable identifiers for anchors

Assign :CUSTOM_ID: properties to every heading at creation time. Use kebab-case slugs derived from the heading text, but do not update them when the heading text changes. The ID is the stable handle; the heading text is the display label.

* The Anchor Problem
:PROPERTIES:
:CUSTOM_ID: anchor-problem
:END:

An annotation referencing anchor-problem survives renaming "The Anchor Problem" to "Anchoring Strategies" because the :CUSTOM_ID: does not change.

6.2. Separate annotations file

Keep annotations out of the document source when they are transient. A companion annotations.org file in the same directory can hold review notes keyed by :CUSTOM_ID::

* Annotation Index
** anchor-problem
:PROPERTIES:
:TARGET_ID: anchor-problem
:DATE:      2026-06-07
:AUTHOR:    jwalsh
:STATUS:    open
:END:

The table on anchor survival rates should note that fuzzy matching degrades
gracefully rather than failing hard. Add a note about Levenshtein distance
thresholds in the Hypothesis implementation.

** scholarly-annotation-tradition
:PROPERTIES:
:TARGET_ID: scholarly-annotation-tradition
:DATE:      2026-06-07
:AUTHOR:    jwalsh
:STATUS:    closed
:CLOSED:    2026-06-08
:END:

Cite Kalir & Garcia 2021. Done.

6.3. Orphan detection

When a heading is deleted, annotations referencing its :CUSTOM_ID: become orphans. A short script detects them:

# Extract all CUSTOM_IDs from a document
IDS=$(grep ':CUSTOM_ID:' site/research/annotation-systems/index.org \
      | awk '{print $2}')

# Extract all TARGET_IDs from the annotations file
TARGETS=$(grep ':TARGET_ID:' site/research/annotation-systems/annotations.org \
          | awk '{print $2}')

# Report targets with no matching ID
for t in $TARGETS; do
  if ! echo "$IDS" | grep -qx "$t"; then
    echo "ORPHAN: $t"
  fi
done

Run this as a pre-publish hook or as part of gmake lint.

6.4. Version-aware annotations via git

Git blame surfaces the commit that last touched each line. Combined with a :CUSTOM_ID:-keyed annotation, you can ask: was this annotation created before or after the commit that rewrote the target paragraph?

git log --follow --oneline -- site/research/annotation-systems/index.org | head -5

This is not automatic in any current tool. It requires correlating annotation :DATE: fields with commit timestamps. The tooling gap is real: no org-mode annotation system currently integrates with git blame output. That is the research gap worth closing.

7. Recommendation

For org-mode research documents that evolve over months:

  1. Add :CUSTOM_ID: to every heading at creation. Use a Yasnippet or a Makefile target to enforce this on gmake new-note.
  2. Use property drawers (:REVIEW:, :STATUS:, :CITATION-NEEDED:) for author-facing annotation that belongs to the document's editorial lifecycle. These are invisible on export and searchable with grep.
  3. Use #+begin_comment blocks for extended review notes that need paragraph- level context. These sit inline and survive heading renames.
  4. Maintain a companion annotations.org for annotations with their own lifecycle (open / addressed / closed). Reference :TARGET_ID: using :CUSTOM_ID: values from the document.
  5. Run orphan detection before every deploy. Flag missing targets as build warnings, not errors — a reorganization may have intentionally deleted an obsolete section.
  6. For published documents with stable URLs, consider Hypothesis groups for external reader commentary. Keep internal author annotations in the source tree; keep external reader annotations in Hypothesis. Do not conflate the two.

The W3C standard is the right format for external annotations on published content. :CUSTOM_ID: plus a companion annotations.org is the right approach for internal annotations on evolving source documents. The boundary between them is the git commit that marks a document as published.

8. LLM Verification as Annotation

REVIEW: 2026-06-07 — 3 research agents, findings integrated

When an LLM writes a research document and then runs verification agents against it, the verification trail is a kind of annotation. Tonight's session ran seven skeptic agents against the reversible pipeline transforms article. Four errors were corrected. The findings live in git commits. The process — what was checked, by whom, what the verdict was — lived only in the conversation transcript until we annotated the article with a Verification Audit section.

Three annotation strategies apply to this problem.

8.1. Property drawers as verification metadata

The zero-infrastructure option. An agent adds :VERIFIED_AT:, :VERIFIED_BY:, and :VERDICT: properties directly to the heading it reviewed:

* Janus: A Reversible Programming Language
:PROPERTIES:
:CUSTOM_ID: janus-lang
:VERIFIED_AT: 2026-06-07T22:00Z
:VERIFIED_BY: skeptic-agent
:VERDICT: corrected
:FINDING: date was 1982, corrected to 1986
:END:

Properties are invisible on HTML export, greppable, and tied to the heading. They survive renames because :CUSTOM_ID: is the stable anchor. Multi-line values use the + append syntax: :FINDING+: second observation.

The concern is clutter — a heading with eight review properties is visually heavy in the raw source. Emacs folds drawers by default, so it is invisible in the editor. In git diffs it is visible and can feel noisy.

8.2. Beads as verification annotations

The project already uses bd for issue tracking. A verification bead references a heading by :CUSTOM_ID: and carries the verdict:

bd create --title="verify: Janus date" \
  --description="Heading janus-lang. Original: 1982. Correct: 1986." \
  --type=task --priority=1
bd close <id> --reason="corrected inline, commit abc1234"

Clean-check beads record what was checked and found correct — the thing git blame cannot tell you:

bd create --title="verify: Toffoli gate semantics" \
  --description="Heading toffoli-gates. Verified correct." \
  --type=task --priority=3
bd close <id> --reason="verified correct, no changes"

The hybrid approach: property drawers for per-heading verdicts, beads for session-level scope ("7 agents, 4 corrections, 12 claims in scope").

8.3. Hypothesis API as external annotation layer

The Hypothesis annotation platform (hypothesis/h, Python, BSD-2) implements the W3C Web Annotation spec with a REST API. An LLM agent can POST annotations programmatically:

POST https://hypothes.is/api/annotations
Authorization: Bearer <token>

{
  "uri": "https://wal.sh/research/2026-reversible-pipeline-transforms/",
  "text": "Verified: reverse is anti-homomorphism, not homomorphism.",
  "tags": ["skeptic-review", "verdict:corrected", "agent:claude-opus-4-6"],
  "target": [{
    "selector": [{
      "type": "TextQuoteSelector",
      "exact": "reverse is a monoid homomorphism",
      "prefix": "Examples: upper, lower, rot13, ",
      "suffix": ". Non-homomorphism"
    }]
  }]
}

This stores the verification trail outside the document, queryable via GET /api/search?uri=<url>&tag=skeptic-review. No self-hosting required — use the public service with a private group.

The gap: the conversation transcript itself. What the agent was told, what searches it ran, what it rejected — none of this is captured by any annotation system. PaperTrail (Martin-Boyle et al. 2026) decomposes claims and maps verdicts at claim granularity, but found that showing provenance reduced trust because it revealed what was NOT checked.

8.4. Comparison for agent workflows

Approach Source disruption Agent complexity Queryable Survives rewrites
Property drawers Low (drawer only) Trivial (regexp) grep, org-ql Yes (CUSTOM_ID)
Beads None Low (bd CLI) bd search Yes (CUSTOM_ID)
Hypothesis API None Medium (HTTP) API search Partial (TextQuote)
Separate annotations.org None Low (file write) grep Yes (CUSTOM_ID)
org-remark marginalia None Medium (offsets) org-ql Partial (text search)

8.5. Prior art

  • ProvBook (Samuel and König-Ries 2018): Jupyter extension embedding W3C PROV-O provenance in cell metadata. Direct analogy to org property drawers.
  • GenProve: LLMs generating sentence-level provenance triples during text generation, not post-hoc.
  • PaperTrail (Martin-Boyle et al. 2026): claim-level verification with three verdicts (supported, unsupported, omitted). Published CHI 2026.
  • ai-blame: extends git blame with AI session provenance for code files.
  • Racket contracts (Findler and Felleisen 2002): Findler and Felleisen's blame-tracking contracts enforce pre/post conditions at module boundaries. A contract on a codec's decode function checks the round-trip property on every call — the contract IS the verification annotation, enforced at runtime rather than recorded after the fact. The Racket implementation of the transform tool uses involution/c and bijection/c contract combinators to catch the same bugs that skeptic agents found post-hoc.

9. References

Findler, Robert Bruce, and Matthias Felleisen. 2002. “Contracts for Higher-Order Functions.” In Icfp ’02: Proceedings of the Seventh Acm Sigplan International Conference on Functional Programming. ACM. https://doi.org/10.1145/581478.581484.
Kalir, Remi H., and Antero Garcia. 2021. Annotation. The Mit Press Essential Knowledge Series. Cambridge, MA: The MIT Press.
Martin-Boyle, Anna, Cara A.C. Leckey, Martha C. Brown, and Harmanpreet Kaur. 2026. “Papertrail: A Claim-Evidence Interface for Provenance in Scholarly Q&a.” In Chi ’26: Proceedings of the 2026 Chi Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3772318.3791101.
Samuel, Sheeba, and Birgitta König-Ries. 2018. “Provbook: Provenance-Based Semantic Enrichment of Interactive Notebooks for Reproducibility.” In Iswc 2018. Springer.
Sanderson, Robert, Paolo Ciccarese, and Benjamin Young. 2017. “Web Annotation Data Model.” W3C Recommendation. World Wide Web Consortium. https://www.w3.org/TR/annotation-model/.