Annotation Systems for Evolving Documents

1. The Anchor Problem
2. Org-Mode Native Approaches
3. Git Notes
4. Web Annotation (W3C Standard)
5. Scholarly Annotation Tradition
6. Design for Resilience
7. Recommendation
8. LLM Verification as Annotation
9. Operational Specification
10. References

Org-mode research documents reorganize: headings rename, sections split, files restructure. Review notes, citation flags, and verification metadata attached to those headings must survive the reorganization or be lost.

This note evaluates anchoring strategies for that problem.

1. The Anchor Problem

Three anchor families, ordered by brittleness.

1.1. Byte offsets

The simplest anchor: character position 4,273. Dies on the first edit upstream of the annotation. Not used in practice except in binary formats.

1.2. Structural selectors

XPath (//section[3]/p[1]), CSS selectors, heading paths (Introduction > Background > Prior Work). Survive whitespace changes. Break on any structural reorganization. The W3C Web Annotation Data Model (Sanderson, Ciccarese, and Young 2017) uses a layered selector chain: a RangeSelector wraps a XPathSelector which wraps a TextPositionSelector, so at least one layer may still resolve when others fail.

1.3. Text quote selectors

Store the exact text (exact) plus a prefix and suffix for context. On re-anchoring, search for the exact string; if missing, fuzzy-match against candidates weighted by context overlap. The Hypothesis annotation client uses this approach. It survives minor rewrites. It does not survive paragraph deletion or heavy paraphrase.

1.4. Content-addressed anchors

Hash the paragraph's canonical text (strip trailing whitespace, normalize Unicode). Store sha256:a3f7.... An annotation indexed this way survives file moves and heading renames, as long as the paragraph text itself is stable. Dies on any edit to the target paragraph. Useful for locking a "needs citation" flag to a specific claim.

1.5. Semantic anchors

Reference a stable heading ID (:CUSTOM_ID: background-prior-work) rather than the heading text. In org-mode, the :CUSTOM_ID: property is set by the author and survives heading renames. It does not survive heading deletion, but deletion is detectable: a script can scan for orphaned annotation references.

The trade-off table:

Anchor type	Survives rename	Survives reorder	Survives text edit	Survives deletion
Byte offset	No	No	No	No
XPath / heading path	No	No	Yes	No
Text quote + context	Yes	Yes	Partial	No
Content hash	Yes	Yes	No	No
:CUSTOM_ID:	Yes	Yes	Yes	No (detectable)

No anchor survives deletion. The best you can do is detect it.

2. Org-Mode Native Approaches

Org-mode provides four annotation mechanisms in the base installation. Each makes a different trade-off between export visibility, data structure, and survival under reorganization.

2.1. `#+begin_comment` blocks

#+begin_comment
REVIEW 2026-06-07 jwalsh: Claims about XPath brittleness need a citation.
Hypothesis client source code would work. Or the W3C spec itself.
#+end_comment

Invisible on HTML/PDF export. Inline with the text they annotate. Version controlled. The weakness: they live inside the section they comment on, so heading renames do not break them, but section deletion takes them with it. Good for author-to-author notes that belong to a specific passage.

2.2. Property drawers

* Background
:PROPERTIES:
:CUSTOM_ID: background
:REVIEW:   2026-06-07 jwalsh: Needs citation for Kalir & Garcia claim
:STATUS:   needs-citation
:END:

Every heading can carry arbitrary key-value metadata. :REVIEW: and :STATUS: are author-defined. These survive heading renames – they sit below the heading text. They survive file reorganization. They do not survive heading deletion.

A useful pattern: combine :CUSTOM_ID: with :STATUS: needs-citation so that a search across the repo can find all outstanding annotation tasks:

grep -rn ':STATUS:.*needs-citation' site/research/

2.3. org-annotate-file

VERIFIED_AT: 2026-06-07T22:50Z
VERDICT: corrected
FINDING: no package called org-annotate exists on MELPA; correct name is org-annotate-file (EmacsWiki) or annotate.el (bastibe/annotate.el, MELPA)

org-annotate-file (EmacsWiki, not MELPA) maintains a separate annotations file. Each entry records the file path and heading text of its target. annotate.el (bastibe/annotate.el, MELPA) is a more actively maintained alternative that stores annotations keyed to file and line number.

Decoupled: the annotation file lives in a private overlay repo, separate from the document source. But both key on heading text or line position, not :CUSTOM_ID:. A heading rename breaks the link silently.

2.4. org-noter

VERIFIED_AT: 2026-06-12T00:00Z
VERDICT: corrected
FINDING: org-noter annotates RENDERED formats (PDF/EPUB/DJVU via DocView, PDF Tools, nov.el); the org file holds the notes, not a document org-noter annotates. Prior "works for org-mode documents, sync by heading" was wrong.

org-noter annotates rendered documents – PDF, EPUB, and similar formats via DocView, PDF Tools, or nov.el – keeping notes in a parallel org buffer synchronized by position in the document. The org file is where the notes live; it is not itself a document org-noter annotates. There is no mode for annotating an org-mode research note against itself.

Works for reading sessions over a PDF or EPUB: open the document, open the notes buffer, write as you read. It is not a mechanism for review flags on an org document.

2.5. Inline footnotes as marginal commentary

The anchor problem has three sub-problems[fn:: Position, survival, and
discoverability. See also the W3C selector chain design.]: position encoding,
survival under edits, and discoverability.

Inline footnotes ([fn:: ...]) are rendered as footnotes on export and are searchable in the source. They sit immediately adjacent to the text they annotate. Not suitable for structured metadata, but useful for brief clarifications that belong in the published document.

3. Git Notes

Git notes (git notes add -m "...") attach metadata to commits without amending them. Notes live in a separate ref (refs/notes/commits) and appear in git log --show-notes.

# Annotate a commit after the fact
git notes add -m "verified by skeptic-agent: 4 corrections" abc1234

# View notes inline with log
git log --show-notes --oneline

# Push notes to remote (not default!)
git push origin refs/notes/*

The appeal: annotations are decoupled from the commit message, addable retroactively, and don't rewrite history. An agent could verify a commit's claims days later and attach findings without amending.

The problems are severe for this use case:

Temporal, not spatial. Notes key to commits, not to content. A note on commit abc1234 says nothing about heading jwt-hs256-claim – it says something about a point in time. If the heading moved across three commits, you need git blame to trace backward through each, checking for notes at every step. Property drawers move with the heading; git notes do not.
Invisible by default. git push does not push notes. git fetch does not fetch them. You must explicitly configure refs/notes/* in both directions. Most teams never do.
Lost on rebase. Notes reference commit SHAs. A rebase rewrites SHAs. Notes on the old commits become orphans pointing at objects that no longer exist in the branch history.
No structure. A note is a free-text blob. No schema, no queryable fields, no lifecycle. You cannot grep for "all headings with verdict 'corrected'" across notes the way you can across property drawers.

Git notes solve a real problem (retroactive annotation of history) but fail the spatial anchoring requirement. For a living document workflow where headings move, split, and rename, property drawers and beads are stronger primitives. Git notes are best for annotating releases or deployments – temporal events that don't move.

4. Web Annotation (W3C Standard)

VERIFIED_AT: 2026-06-07T22:50Z
VERDICT: correct
FINDING: hypothesis/h confirmed (Python/Pyramid, BSD-2, 3.2k stars, active). API description accurate.

The W3C Web Annotation Data Model (Sanderson, Ciccarese, and Young 2017) defines a JSON-LD format for attaching annotations to any web resource. The core structure:

{
  "@context": "http://www.w3.org/ns/anno.jsonld",
  "type": "Annotation",
  "target": {
    "source": "https://wal.sh/research/annotation-systems/",
    "selector": {
      "type": "TextQuoteSelector",
      "exact": "No anchor survives deletion.",
      "prefix": "table:\n\n",
      "suffix": " The best you can do"
    }
  },
  "body": {
    "type": "TextualBody",
    "value": "This claim should be qualified -- content-addressed anchors detect deletion indirectly via missing hash."
  }
}

Hypothesis (hypothes.is) implements this standard with a browser extension and public/private annotation groups. It stores annotations server-side, decoupled from the document. The text quote selector adds prefix and suffix so that re-anchoring can fuzzy-match when exact text is unavailable.

Strengths of the W3C approach:

Decoupled from the document source. Annotations survive file moves if the canonical URL is stable.
Shareable. Multiple readers can annotate the same published URL.
Standard format. Tooling can consume annotations.json files.

Weaknesses:

Brittle to content rewrites. The TextQuoteSelector fails if the exact string is edited.
Requires a stable published URL. Draft org-mode docs have no such URL.
No integration with git history. You cannot ask "what did this annotation refer to at commit d1bb4e1?"

5. Scholarly Annotation Tradition

Kalir and Garcia (Kalir and Garcia 2021) map annotation's intellectual history across marginalia, classroom practice, and digital reading environments. They identify five annotation functions:

Function	Description	Research workflow equivalent
Enabling	Comprehension aids, definitions, glosses	Inline footnotes, `#+begin_comment`
Summarizing	Condensing key points	Property `:SUMMARY:` on sections
Assessing	Quality judgments, flagging weak arguments	`:STATUS: needs-citation`
Bridging	Connecting to external sources	`[cite:@key]` cross-references
Problematizing	Raising questions, flagging contradictions	`:REVIEW:` drawers, comment blocks

The mapping is not perfect. Scholarly marginalia are often ephemeral; code review annotations are often task-tracking artifacts that want a lifecycle (open, addressed, closed). The bridging function maps cleanly onto citations. The problematizing function maps onto the "needs citation" and "check this claim" flags that appear throughout long-form research notes.

Enabling and summarizing annotations are editorial; they stay permanently. Assessing and problematizing annotations are transient task flags – they should be closeable without deleting the surrounding prose.

6. Design for Resilience

Four design choices determine how well an annotation system survives document evolution.

6.1. Stable identifiers for anchors

Assign :CUSTOM_ID: properties to every heading at creation time. Use kebab-case slugs derived from the heading text, but do not update them when the heading text changes. The ID is the stable handle; the heading text is the display label.

* The Anchor Problem
:PROPERTIES:
:CUSTOM_ID: anchor-problem
:END:

An annotation referencing anchor-problem survives renaming "The Anchor Problem" to "Anchoring Strategies" because the :CUSTOM_ID: does not change.

6.2. Separate annotations file

Keep annotations out of the document source when they are transient. A companion annotations.org file in the same directory can hold review notes keyed by :CUSTOM_ID::

* Annotation Index
** anchor-problem
:PROPERTIES:
:TARGET_ID: anchor-problem
:DATE:      2026-06-07
:AUTHOR:    jwalsh
:STATUS:    open
:END:

The table on anchor survival rates should note that fuzzy matching degrades
gracefully rather than failing hard. Add a note about Levenshtein distance
thresholds in the Hypothesis implementation.

** scholarly-annotation-tradition
:PROPERTIES:
:TARGET_ID: scholarly-annotation-tradition
:DATE:      2026-06-07
:AUTHOR:    jwalsh
:STATUS:    closed
:CLOSED:    2026-06-08
:END:

Cite Kalir & Garcia 2021. Done.

6.3. Orphan detection

When a heading is deleted, annotations referencing its :CUSTOM_ID: become orphans. A short script detects them:

# Extract all CUSTOM_IDs from a document
IDS=$(grep ':CUSTOM_ID:' site/research/annotation-systems/index.org \
      | awk '{print $2}')

# Extract all TARGET_IDs from the annotations file
TARGETS=$(grep ':TARGET_ID:' site/research/annotation-systems/annotations.org \
          | awk '{print $2}')

# Report targets with no matching ID
for t in $TARGETS; do
  if ! echo "$IDS" | grep -qx "$t"; then
    echo "ORPHAN: $t"
  fi
done

Run this as a pre-publish hook or as part of gmake lint.

6.4. Version-aware annotations via git

Git blame surfaces the commit that last touched each line. Combined with a :CUSTOM_ID:-keyed annotation, you can ask: was this annotation created before or after the commit that rewrote the target paragraph?

git log --follow --oneline -- site/research/annotation-systems/index.org | head -5

This is not automatic in any current tool. It requires correlating annotation :DATE: fields with commit timestamps. No org-mode annotation system currently integrates with git blame output.

7. Recommendation

For org-mode research documents that evolve over months:

Add :CUSTOM_ID: to every heading at creation. Use a Yasnippet or a Makefile target to enforce this on gmake new-note.
Use property drawers (:REVIEW:, :STATUS:, :CITATION-NEEDED:) for author-facing annotation that belongs to the document's editorial lifecycle. These are invisible on export and searchable with grep.
Use #+begin_comment blocks for extended review notes that need paragraph- level context. These sit inline and survive heading renames.
Maintain a companion annotations.org for annotations with their own lifecycle (open / addressed / closed). Reference :TARGET_ID: using :CUSTOM_ID: values from the document.
Run orphan detection before every deploy. Flag missing targets as build warnings, not errors – a reorganization may have intentionally deleted an obsolete section.
For published documents with stable URLs, consider Hypothesis groups for external reader commentary. Keep internal author annotations in the source tree; keep external reader annotations in Hypothesis. Do not conflate the two.

The W3C standard is the right format for external annotations on published content. :CUSTOM_ID: plus a companion annotations.org is the right approach for internal annotations on evolving source documents. The boundary between them is the git commit that marks a document as published.

8. LLM Verification as Annotation

REVIEW: 2026-06-07 -- 3 research agents, findings integrated

When an LLM writes a research document and then runs verification agents against it, the verification trail is a kind of annotation. Tonight's session ran seven skeptic agents against the reversible pipeline transforms article. Four errors were corrected. The findings live in git commits. The process – what was checked, by whom, what the verdict was – lived only in the conversation transcript until we annotated the article with a Verification Audit section.

Three annotation strategies apply to this problem.

8.1. Property drawers as verification metadata

The zero-infrastructure option. An agent adds :VERIFIED_AT:, :VERIFIED_BY:, and :VERDICT: properties directly to the heading it reviewed:

* Janus: A Reversible Programming Language
:PROPERTIES:
:CUSTOM_ID: janus-lang
:VERIFIED_AT: 2026-06-07T22:00Z
:VERIFIED_BY: skeptic-agent
:VERDICT: corrected
:FINDING: date was 1982, corrected to 1986
:END:

Properties are invisible on HTML export, greppable, and tied to the heading. They survive renames because :CUSTOM_ID: is the stable anchor. Multi-line values use the + append syntax: :FINDING+: second observation.

The cost is clutter – a heading with eight review properties is visually heavy in the raw source. Emacs folds drawers by default, so it is invisible in the editor. In git diffs it is visible and can feel noisy.

8.2. Beads as verification annotations

The project already uses bd for issue tracking. A verification bead references a heading by :CUSTOM_ID: and carries the verdict:

bd create --title="verify: Janus date" \
  --description="Heading janus-lang. Original: 1982. Correct: 1986." \
  --type=task --priority=1
bd close <id> --reason="corrected inline, commit abc1234"

Clean-check beads record what was checked and found correct – the thing git blame cannot tell you:

bd create --title="verify: Toffoli gate semantics" \
  --description="Heading toffoli-gates. Verified correct." \
  --type=task --priority=3
bd close <id> --reason="verified correct, no changes"

The hybrid approach: property drawers for per-heading verdicts, beads for session-level scope ("7 agents, 4 corrections, 12 claims in scope").

8.3. Hypothesis API as external annotation layer

The Hypothesis annotation platform (hypothesis/h, Python, BSD-2) implements the W3C Web Annotation spec with a REST API. An LLM agent can POST annotations programmatically:

POST https://hypothes.is/api/annotations
Authorization: Bearer <token>

{
  "uri": "https://wal.sh/research/2026-reversible-pipeline-transforms/",
  "text": "Verified: reverse is anti-homomorphism, not homomorphism.",
  "tags": ["skeptic-review", "verdict:corrected", "agent:claude-opus-4-6"],
  "target": [{
    "selector": [{
      "type": "TextQuoteSelector",
      "exact": "reverse is a monoid homomorphism",
      "prefix": "Examples: upper, lower, rot13, ",
      "suffix": ". Non-homomorphism"
    }]
  }]
}

This stores the verification trail outside the document, queryable via GET /api/search?uri=<url>&tag=skeptic-review. No self-hosting required – use the public service with a private group.

The gap: the conversation transcript itself. What the agent was told, what searches it ran, what it rejected – none of this is captured by any annotation system. PaperTrail (Martin-Boyle et al. 2026) decomposes claims and maps verdicts at claim granularity, but found that showing provenance reduced trust because it revealed what was not checked.

8.4. Comparison for agent workflows

Approach	Source disruption	Agent complexity	Queryable	Survives rewrites
Property drawers	Low (drawer only)	Trivial (regexp)	grep, org-ql	Yes (CUSTOM_ID)
Beads	None	Low (bd CLI)	bd search	Yes (CUSTOM_ID)
Hypothesis API	None	Medium (HTTP)	API search	Partial (TextQuote)
Separate annotations.org	None	Low (file write)	grep	Yes (CUSTOM_ID)
org-remark marginalia	None	Medium (offsets)	org-ql	Partial (text search)

8.5. Prior art

ProvBook (Samuel and König-Ries 2018): Jupyter extension embedding W3C PROV-O provenance in cell metadata. Direct analogy to org property drawers.
GenProve: LLMs generating sentence-level provenance triples during text generation, not post-hoc.
PaperTrail (Martin-Boyle et al. 2026): claim-level verification with three verdicts (supported, unsupported, omitted). Published CHI 2026.
ai-blame: extends git blame with AI session provenance for code files.
Racket contracts (Findler and Felleisen 2002): Findler and Felleisen's blame-tracking contracts enforce pre/post conditions at module boundaries. A contract on a codec's decode function checks the round-trip property on every call – an analogous role to verification annotations, but enforced continuously at runtime rather than recorded after the fact. The Racket implementation of the transform tool uses involution/c and bijection/c contract combinators to catch the same class of bugs that skeptic agents found post-hoc. The analogy is imperfect: contracts reject violations at module boundaries; annotations describe findings on document headings.
Contracts as annotations (You, Dimoulas, and Findler 2025): You, Dimoulas, and Findler (OOPSLA 2025) recast higher-order contract systems as transition systems, and in the process observe that abstract annotations can represent both contracts and property-related information – one annotation mechanism, propagated through evaluation, carries the runtime check and the proof-meta (their example is Dimoulas' notion of ownership) together, composed "a la carte." That is the formal closure of the gap the previous bullet calls "imperfect": a property drawer overloading :VERDICT: (prose review) and :EXEC_VERDICT: (execution) on one heading is the same move – one annotation carrier, several verdict kinds – at the document layer rather than the value layer. Peripheral to a document-annotation note (it is PL metatheory, mechanized in Agda), but it names the duality this section gestures at, and it is the modern descendant of the Findler–Felleisen contracts cited above.

9. Operational Specification

The full specification – verdict vocabulary, drawer conventions, tooling, workflow, and gap proposals – is maintained separately:

10. References

Findler, Robert Bruce, and Matthias Felleisen. 2002. “Contracts for Higher-Order Functions.” In Icfp ’02: Proceedings of the Seventh Acm Sigplan International Conference on Functional Programming. ACM. https://doi.org/10.1145/581478.581484.

Kalir, Remi H., and Antero Garcia. 2021. Annotation. The Mit Press Essential Knowledge Series. Cambridge, MA: The MIT Press.

Martin-Boyle, Anna, Cara A.C. Leckey, Martha C. Brown, and Harmanpreet Kaur. 2026. “Papertrail: A Claim-Evidence Interface for Grounding Provenance in Llm-Based Scholarly Q&a.” In Chi ’26: Proceedings of the 2026 Chi Conference on Human Factors in Computing Systems. ACM. https://doi.org/10.1145/3772318.3791101.

Samuel, Sheeba, and Birgitta König-Ries. 2018. “Provbook: Provenance-Based Semantic Enrichment of Interactive Notebooks for Reproducibility.” In Iswc 2018 Posters & Demonstrations. CEUR-WS.

Sanderson, Robert, Paolo Ciccarese, and Benjamin Young. 2017. “Web Annotation Data Model.” W3C Recommendation. World Wide Web Consortium. https://www.w3.org/TR/annotation-model/.

You, Shu-Hung, Christos Dimoulas, and Robert Bruce Findler. 2025. “Contract System Metatheories À La Carte: A Transition-System View of Contracts.” Proceedings of the Acm on Programming Languages 9 (OOPSLA2). https://doi.org/10.1145/3764861.