Annotation Systems for Evolving Documents
Table of Contents
Org-mode research documents reorganize: headings rename, sections split, files restructure. Review notes, citation flags, and verification metadata attached to those headings must survive the reorganization or be lost.
This note evaluates anchoring strategies for that problem.
1. The Anchor Problem
Three anchor families, ordered by brittleness.
1.1. Byte offsets
The simplest anchor: character position 4,273. Dies on the first edit upstream of the annotation. Not used in practice except in binary formats.
1.2. Structural selectors
XPath (//section[3]/p[1]), CSS selectors, heading paths
(Introduction > Background > Prior Work). Survive whitespace changes. Break on
any structural reorganization. The W3C Web Annotation Data Model
(Sanderson, Ciccarese, and Young 2017) uses a layered selector chain: a
RangeSelector wraps a XPathSelector which wraps a TextPositionSelector, so
at least one layer may still resolve when others fail.
1.3. Text quote selectors
Store the exact text (exact) plus a prefix and suffix for context. On
re-anchoring, search for the exact string; if missing, fuzzy-match against
candidates weighted by context overlap. The Hypothesis annotation client uses
this approach. It survives minor rewrites. It does not survive paragraph
deletion or heavy paraphrase.
1.4. Content-addressed anchors
Hash the paragraph's canonical text (strip trailing whitespace, normalize
Unicode). Store sha256:a3f7.... An annotation indexed this way survives file
moves and heading renames, as long as the paragraph text itself is stable. Dies
on any edit to the target paragraph. Useful for locking a "needs citation" flag
to a specific claim.
1.5. Semantic anchors
Reference a stable heading ID (:CUSTOM_ID: background-prior-work) rather than
the heading text. In org-mode, the :CUSTOM_ID: property is set by the author
and survives heading renames. It does not survive heading deletion, but deletion
is detectable: a script can scan for orphaned annotation references.
The trade-off table:
| Anchor type | Survives rename | Survives reorder | Survives text edit | Survives deletion |
|---|---|---|---|---|
| Byte offset | No | No | No | No |
| XPath / heading path | No | No | Yes | No |
| Text quote + context | Yes | Yes | Partial | No |
| Content hash | Yes | Yes | No | No |
| :CUSTOM_ID: | Yes | Yes | Yes | No (detectable) |
No anchor survives deletion. The best you can do is detect it.
2. Org-Mode Native Approaches
Org-mode provides four annotation mechanisms in the base installation. Each makes a different trade-off between visibility, structuring, and survival rate.
2.1. #+begin_comment blocks
#+begin_comment REVIEW 2026-06-07 jwalsh: Claims about XPath brittleness need a citation. Hypothesis client source code would work. Or the W3C spec itself. #+end_comment
Invisible on HTML/PDF export. Inline with the text they annotate. Version controlled. The weakness: they live inside the section they comment on, so heading renames do not break them, but section deletion takes them with it. Good for author-to-author notes that belong to a specific passage.
2.2. Property drawers
* Background :PROPERTIES: :CUSTOM_ID: background :REVIEW: 2026-06-07 jwalsh: Needs citation for Kalir & Garcia claim :STATUS: needs-citation :END:
Every heading can carry arbitrary key-value metadata. :REVIEW: and :STATUS:
are author-defined. These survive heading renames — they sit below the heading
text. They survive file reorganization. They do not survive heading deletion.
A useful pattern: combine :CUSTOM_ID: with :STATUS: needs-citation so that
a search across the repo can find all outstanding annotation tasks:
grep -rn ':STATUS:.*needs-citation' site/research/
2.3. org-annotate-file
VERIFIED_AT: 2026-06-07T22:50Z VERDICT: corrected FINDING: no package called org-annotate exists on MELPA; correct name is org-annotate-file (EmacsWiki) or annotate.el (bastibe/annotate.el, MELPA)
org-annotate-file (EmacsWiki, not MELPA) maintains a separate annotations
file. Each entry records the file path and heading text of its target.
annotate.el (bastibe/annotate.el, MELPA) is a more actively maintained
alternative that stores annotations keyed to file and line number.
Decoupled: the annotation file lives in a private overlay repo, separate from
the document source. But both key on heading text or line position, not
:CUSTOM_ID:. A heading rename breaks the link
silently.
2.4. org-noter
VERIFIED_AT: 2026-06-07T22:50Z VERDICT: correct FINDING: exists on MELPA (org-noter/org-noter fork), description accurate
org-noter is designed for annotating PDFs and EPUBs via a parallel note
buffer synchronized by position. It works for org-mode documents too, with
the document in one window and the notes in another. Position sync is by
heading for org files.
Works for reading sessions: open document, open notes buffer, write as you read. Does not persist review flags across editing sessions.
2.5. Inline footnotes as marginal commentary
The anchor problem has three sub-problems[fn:: Position, survival, and discoverability. See also the W3C selector chain design.]: position encoding, survival under edits, and discoverability.
Inline footnotes ([fn:: ...]) are rendered as footnotes on export and are
searchable in the source. They sit immediately adjacent to the text they
annotate. Not suitable for structured metadata, but useful for brief
clarifications that belong in the published document.
3. Git Notes
Git notes (git notes add -m "...") attach metadata to commits without
amending them. Notes live in a separate ref (refs/notes/commits) and
appear in git log --show-notes.
# Annotate a commit after the fact git notes add -m "verified by skeptic-agent: 4 corrections" abc1234 # View notes inline with log git log --show-notes --oneline # Push notes to remote (not default!) git push origin refs/notes/*
The appeal: annotations are decoupled from the commit message, addable retroactively, and don't rewrite history. An agent could verify a commit's claims days later and attach findings without amending.
The problems are severe for this use case:
- Temporal, not spatial. Notes key to commits, not to content. A note on
commit
abc1234says nothing about headingjwt-hs256-claim— it says something about a point in time. If the heading moved across three commits, you needgit blameto trace backward through each, checking for notes at every step. Property drawers move with the heading; git notes do not. - Invisible by default.
git pushdoes not push notes.git fetchdoes not fetch them. You must explicitly configurerefs/notes/*in both directions. Most teams never do. - Lost on rebase. Notes reference commit SHAs. A rebase rewrites SHAs. Notes on the old commits become orphans pointing at objects that no longer exist in the branch history.
- No structure. A note is a free-text blob. No schema, no queryable
fields, no lifecycle. You cannot
grepfor "all headings with verdict 'corrected'" across notes the way you can across property drawers.
Git notes solve a real problem (retroactive annotation of history) but fail the spatial anchoring requirement. For a living document workflow where headings move, split, and rename, property drawers and beads are stronger primitives. Git notes are best for annotating releases or deployments — temporal events that don't move.
4. Web Annotation (W3C Standard)
VERIFIED_AT: 2026-06-07T22:50Z VERDICT: correct FINDING: hypothesis/h confirmed (Python/Pyramid, BSD-2, 3.2k stars, active). API description accurate.
The W3C Web Annotation Data Model (Sanderson, Ciccarese, and Young 2017) defines a JSON-LD format for attaching annotations to any web resource. The core structure:
{
"@context": "http://www.w3.org/ns/anno.jsonld",
"type": "Annotation",
"target": {
"source": "https://wal.sh/research/annotation-systems/",
"selector": {
"type": "TextQuoteSelector",
"exact": "No anchor survives deletion.",
"prefix": "table:\n\n",
"suffix": " The best you can do"
}
},
"body": {
"type": "TextualBody",
"value": "This claim should be qualified — content-addressed anchors detect deletion indirectly via missing hash."
}
}
Hypothesis (hypothes.is) implements this standard with a browser extension and public/private annotation groups. It stores annotations server-side, decoupled from the document. The text quote selector adds prefix and suffix so that re-anchoring can fuzzy-match when exact text is unavailable.
Strengths of the W3C approach:
- Decoupled from the document source. Annotations survive file moves if the canonical URL is stable.
- Shareable. Multiple readers can annotate the same published URL.
- Standard format. Tooling can consume
annotations.jsonfiles.
Weaknesses:
- Brittle to content rewrites. The
TextQuoteSelectorfails if the exact string is edited. - Requires a stable published URL. Draft org-mode docs have no such URL.
- No integration with git history. You cannot ask "what did this annotation
refer to at commit
d1bb4e1?"
5. Scholarly Annotation Tradition
Kalir and Garcia (Kalir and Garcia 2021) map annotation's intellectual history across marginalia, classroom practice, and digital reading environments. They identify five annotation functions:
| Function | Description | Research workflow equivalent |
|---|---|---|
| Enabling | Comprehension aids, definitions, glosses | Inline footnotes, #+begin_comment |
| Summarizing | Condensing key points | Property :SUMMARY: on sections |
| Assessing | Quality judgments, flagging weak arguments | :STATUS: needs-citation |
| Bridging | Connecting to external sources | [cite:@key] cross-references |
| Problematizing | Raising questions, flagging contradictions | :REVIEW: drawers, comment blocks |
The mapping is not perfect. Scholarly marginalia are often ephemeral; code review annotations are often task-tracking artifacts that want a lifecycle (open, addressed, closed). The bridging function maps cleanly onto citations. The problematizing function maps onto the "needs citation" and "check this claim" flags that appear throughout long-form research notes.
An annotation system for research documents needs three lifecycles. Enabling and summarizing annotations are editorial; they stay permanently. Assessing and problematizing annotations are transient task flags — they should be closeable without deleting the surrounding prose.
6. Design for Resilience
Four design choices determine how well an annotation system survives document evolution.
6.1. Stable identifiers for anchors
Assign :CUSTOM_ID: properties to every heading at creation time. Use
kebab-case slugs derived from the heading text, but do not update them when
the heading text changes. The ID is the stable handle; the heading text is the
display label.
* The Anchor Problem :PROPERTIES: :CUSTOM_ID: anchor-problem :END:
An annotation referencing anchor-problem survives renaming "The Anchor
Problem" to "Anchoring Strategies" because the :CUSTOM_ID: does not change.
6.2. Separate annotations file
Keep annotations out of the document source when they are transient. A
companion annotations.org file in the same directory can hold review notes
keyed by :CUSTOM_ID::
* Annotation Index ** anchor-problem :PROPERTIES: :TARGET_ID: anchor-problem :DATE: 2026-06-07 :AUTHOR: jwalsh :STATUS: open :END: The table on anchor survival rates should note that fuzzy matching degrades gracefully rather than failing hard. Add a note about Levenshtein distance thresholds in the Hypothesis implementation. ** scholarly-annotation-tradition :PROPERTIES: :TARGET_ID: scholarly-annotation-tradition :DATE: 2026-06-07 :AUTHOR: jwalsh :STATUS: closed :CLOSED: 2026-06-08 :END: Cite Kalir & Garcia 2021. Done.
6.3. Orphan detection
When a heading is deleted, annotations referencing its :CUSTOM_ID: become
orphans. A short script detects them:
# Extract all CUSTOM_IDs from a document IDS=$(grep ':CUSTOM_ID:' site/research/annotation-systems/index.org \ | awk '{print $2}') # Extract all TARGET_IDs from the annotations file TARGETS=$(grep ':TARGET_ID:' site/research/annotation-systems/annotations.org \ | awk '{print $2}') # Report targets with no matching ID for t in $TARGETS; do if ! echo "$IDS" | grep -qx "$t"; then echo "ORPHAN: $t" fi done
Run this as a pre-publish hook or as part of gmake lint.
6.4. Version-aware annotations via git
Git blame surfaces the commit that last touched each line. Combined with a
:CUSTOM_ID:-keyed annotation, you can ask: was this annotation created before
or after the commit that rewrote the target paragraph?
git log --follow --oneline -- site/research/annotation-systems/index.org | head -5
This is not automatic in any current tool. It requires correlating annotation
:DATE: fields with commit timestamps. The tooling gap is real: no org-mode
annotation system currently integrates with git blame output. That is the
research gap worth closing.
7. Recommendation
For org-mode research documents that evolve over months:
- Add
:CUSTOM_ID:to every heading at creation. Use a Yasnippet or a Makefile target to enforce this ongmake new-note. - Use property drawers (
:REVIEW:,:STATUS:,:CITATION-NEEDED:) for author-facing annotation that belongs to the document's editorial lifecycle. These are invisible on export and searchable withgrep. - Use
#+begin_commentblocks for extended review notes that need paragraph- level context. These sit inline and survive heading renames. - Maintain a companion
annotations.orgfor annotations with their own lifecycle (open / addressed / closed). Reference:TARGET_ID:using:CUSTOM_ID:values from the document. - Run orphan detection before every deploy. Flag missing targets as build warnings, not errors — a reorganization may have intentionally deleted an obsolete section.
- For published documents with stable URLs, consider Hypothesis groups for external reader commentary. Keep internal author annotations in the source tree; keep external reader annotations in Hypothesis. Do not conflate the two.
The W3C standard is the right format for external annotations on published
content. :CUSTOM_ID: plus a companion annotations.org is the right approach
for internal annotations on evolving source documents. The boundary between
them is the git commit that marks a document as published.
8. LLM Verification as Annotation
REVIEW: 2026-06-07 — 3 research agents, findings integrated
When an LLM writes a research document and then runs verification agents against it, the verification trail is a kind of annotation. Tonight's session ran seven skeptic agents against the reversible pipeline transforms article. Four errors were corrected. The findings live in git commits. The process — what was checked, by whom, what the verdict was — lived only in the conversation transcript until we annotated the article with a Verification Audit section.
Three annotation strategies apply to this problem.
8.1. Property drawers as verification metadata
The zero-infrastructure option. An agent adds :VERIFIED_AT:,
:VERIFIED_BY:, and :VERDICT: properties directly to the heading it
reviewed:
* Janus: A Reversible Programming Language :PROPERTIES: :CUSTOM_ID: janus-lang :VERIFIED_AT: 2026-06-07T22:00Z :VERIFIED_BY: skeptic-agent :VERDICT: corrected :FINDING: date was 1982, corrected to 1986 :END:
Properties are invisible on HTML export, greppable, and tied to the heading.
They survive renames because :CUSTOM_ID: is the stable anchor. Multi-line
values use the + append syntax: :FINDING+: second observation.
The concern is clutter — a heading with eight review properties is visually heavy in the raw source. Emacs folds drawers by default, so it is invisible in the editor. In git diffs it is visible and can feel noisy.
8.2. Beads as verification annotations
The project already uses bd for issue tracking. A verification bead
references a heading by :CUSTOM_ID: and carries the verdict:
bd create --title="verify: Janus date" \ --description="Heading janus-lang. Original: 1982. Correct: 1986." \ --type=task --priority=1 bd close <id> --reason="corrected inline, commit abc1234"
Clean-check beads record what was checked and found correct — the thing git blame cannot tell you:
bd create --title="verify: Toffoli gate semantics" \ --description="Heading toffoli-gates. Verified correct." \ --type=task --priority=3 bd close <id> --reason="verified correct, no changes"
The hybrid approach: property drawers for per-heading verdicts, beads for session-level scope ("7 agents, 4 corrections, 12 claims in scope").
8.3. Hypothesis API as external annotation layer
The Hypothesis annotation platform (hypothesis/h, Python, BSD-2) implements
the W3C Web Annotation spec with a REST API. An LLM agent can POST
annotations programmatically:
POST https://hypothes.is/api/annotations
Authorization: Bearer <token>
{
"uri": "https://wal.sh/research/2026-reversible-pipeline-transforms/",
"text": "Verified: reverse is anti-homomorphism, not homomorphism.",
"tags": ["skeptic-review", "verdict:corrected", "agent:claude-opus-4-6"],
"target": [{
"selector": [{
"type": "TextQuoteSelector",
"exact": "reverse is a monoid homomorphism",
"prefix": "Examples: upper, lower, rot13, ",
"suffix": ". Non-homomorphism"
}]
}]
}
This stores the verification trail outside the document, queryable via
GET /api/search?uri=<url>&tag=skeptic-review. No self-hosting required
— use the public service with a private group.
The gap: the conversation transcript itself. What the agent was told, what searches it ran, what it rejected — none of this is captured by any annotation system. PaperTrail (Martin-Boyle et al. 2026) decomposes claims and maps verdicts at claim granularity, but found that showing provenance reduced trust because it revealed what was NOT checked.
8.4. Comparison for agent workflows
| Approach | Source disruption | Agent complexity | Queryable | Survives rewrites |
|---|---|---|---|---|
| Property drawers | Low (drawer only) | Trivial (regexp) | grep, org-ql | Yes (CUSTOM_ID) |
| Beads | None | Low (bd CLI) | bd search | Yes (CUSTOM_ID) |
| Hypothesis API | None | Medium (HTTP) | API search | Partial (TextQuote) |
| Separate annotations.org | None | Low (file write) | grep | Yes (CUSTOM_ID) |
| org-remark marginalia | None | Medium (offsets) | org-ql | Partial (text search) |
8.5. Prior art
- ProvBook (Samuel and König-Ries 2018): Jupyter extension embedding W3C PROV-O provenance in cell metadata. Direct analogy to org property drawers.
- GenProve: LLMs generating sentence-level provenance triples during text generation, not post-hoc.
- PaperTrail (Martin-Boyle et al. 2026): claim-level verification with three verdicts (supported, unsupported, omitted). Published CHI 2026.
ai-blame: extends git blame with AI session provenance for code files.- Racket contracts (Findler and Felleisen 2002): Findler and Felleisen's
blame-tracking contracts enforce pre/post conditions at module boundaries.
A contract on a codec's decode function checks the round-trip property on
every call — the contract IS the verification annotation, enforced at
runtime rather than recorded after the fact. The
Racket implementation
of the transform tool uses
involution/candbijection/ccontract combinators to catch the same bugs that skeptic agents found post-hoc.