REPL-Driven Schema Archaeology: Claude Code Transcript Drift Across 38 Versions

Table of Contents

1. Overview

The fork note shows you can fork a Claude Code conversation at a checkpoint: copy the JSONL prefix, mint a new id, claude -r it. This note answers the question that makes forking safe: when a fork is resumed under the current client, the prefix is replayed and validated against the current schema — so any structural change a fork introduces must be one the current version accepts.

The method is the same as the fork note — a Clojure REPL reading raw JSONL, no new infrastructure — applied to a wider corpus: a multi-host audit archive spanning 38 client versions, not one host on one release. The mess is the subject.

2. The corpus

  • Source: local SSD audit snapshot (2026-06-13)
  • 7,909 session files, 3.2 GB, three agent sandbox hosts (analysis ran on the macOS host which had the largest corpus; run in sandbox mode — host attribution is not material here)
  • 320,115 versioned entries across 38 distinct 2.1.x versions (2.1.2 .. 2.1.170)
  • A full version-frequency scan reads in ~3 s warm (chat/read-session over all files)
:versions 38 :entries 320115
top: [2.1.89 184913] [2.1.19 53360] [2.1.29 11690] [2.1.3 10219]
     [2.1.2 9915] [2.1.150 8512] [2.1.4 5687] [2.1.159 5592] ...

Contrast: the fork note's worked example ran in a uniform 2.1.161 environment. The real archive is 38 versions deep — which is exactly why a fork needs a contract, not a uniform assumption.

3. Seed findings

Validating every version's sampled entries against the newest version's inferred closed malli schema (dev/schema_check.clj, scope = the SSD archive):

  • Hard breakpoint at 2.1.78. Every version < 2.1.63 fails the newest closed schema 100%; every version > 2.1.78 passes. The entry schema stabilized at 2.1.78.
  • Discriminating key: :entrypoint. Old user entries lack it; the closed map rejects them. The newest schema carries ~19 top-level keys absent from 2.1.2 (:hookCount, :permissionMode, :promptSource, :toolUseID, :isMeta, …).
  • Lost keys. 2.1.2 had :agentId / :slug that later disappeared.
  • Content-shape drift. thinking blocks appear at 2.1.19, vanish, then return as standard from 2.1.138 on; document blocks appear only at 2.1.19; the oldest block (< 2.1.5) emitted combined ["text" "tool_use"] entries that later split.
  • Anomaly: 2.1.168. 4/85 samples fail even the recent schema — a late variant worth explaining.
  • The fix. A union schema inferred across all 38 versions accepts every entry (0 fails). That is the data-derived artifact shipped at schemas/claude-jsonl-2.1.json.

4. The version timeline

chat.clj/version-drift reads the archive and, per version, records the union of top-level keys and message-shapes plus the delta against the previous version. version-timeline-dot renders it: each node is a version (entry count below it),

Era Versions Keys missing Key transitions Fork cost
Pre-stabilization 2.1.2–2.1.5 -19 baseline ~19 keys to backfill
  2.1.9–2.1.29 -18 +data, +toolUseID ~18 keys
  2.1.63 -18 -error ~18 keys
Breakpoint 2.1.78 -18 +entrypoint, +promptId schema stabilizes here
Stabilization 2.1.89–2.1.104 -4 to -17 +hooks, +perms (+11 fields) 4–17 keys
Convergence 2.1.138–2.1.161 -3 to -1 +attachment, +mcpServer, +promptSource 1–3 keys
Current 2.1.168–2.1.170 0 +hookContext, -compact reference schema

The per-version key-set is the union over sampled entries, so the missing-vs-newest count is non-monotonic — a version that sampled hook or system entries shows more keys than one that sampled only plain turns. The trend (early ~18–19, recent 0–2) is the signal.

5. The 2.1.168 anomaly: rare subtypes and the limit of single-version inference

One result resists the tidy "newer accepts older" story: 2.1.168 — a recent version — has 4 of 85 sampled entries fail the newest closed schema. Forking the full data down to those 4 entries explains it.

All four are attachment entries of subtype file, whose :attachment/:content is a single map {:type "text" :file ...}. The error lands at [:attachment :content 0] / [:attachment :content 1] — the closed schema expects a different shape there.

(frequencies (map :type bad))                       ;=> {"attachment" 4}
(get-in (first bad) [:attachment :type])            ;=> "file"
(type (get-in (first bad) [:attachment :content]))  ;=> PersistentArrayMap

;; the newest version's sampled attachments — no :file subtype at all:
(frequencies (map #(get-in % [:attachment :type]) attachments-2-1-170))
;=> {"deferred_tools_delta" 3, "task_reminder" 3, "async_hook_response" 3,
;    "mcp_instructions_delta" 1, "skill_listing" 1}

The newest version's 11 sampled attachments are all non-file subtypes (tool deltas, task reminders, hook responses) whose content is a string, a vector, or empty. The inferred newest schema therefore never saw a file attachment, so it rejects the one 2.1.168 still emits. This is not a regression — it is a coverage gap in single-version inference: a closed schema built from a finite sample under-covers rare entry subtypes. The union schema (inferred across all 38 versions) accepts these entries with zero failures, which is exactly why the shipped artifact is the union and not the newest release alone.

The Claude Code changelog for 2.1.166 .. 2.1.170 mentions no attachment or transcript-format change (only a 2.1.170 fix for sessions not saving transcripts when launched from VS Code). The :content shape difference is an undocumented internal detail — drift below the changelog, visible only in the data. That is the case for reading the transcripts rather than the release notes.

6. Why this gates the fork

A fork that crosses a version boundary is a forward migration, not just a branch. Forking a 2.1.63 conversation and resuming it under 2.1.170 replays a prefix that the current client's closed schema rejects (it lacks :entrypoint and the later keys). The lesson for the fork tool is conservative: do not break the current Claude Code version. A safe fork

  • adds only fields the current schema already tolerates (the union schema names them),
  • preserves the keys the current closed schema requires, and
  • keeps provenance in a sidecar (<id>.fork.edn) so the resumable .jsonl stays within the contract the live client validates.

The version archaeology is what makes "what may a fork change?" a data-informed question rather than a guess. chat.clj/fork-migration quantifies it: for a fork whose prefix was minted by version from, resumed under the current client to, :backfill = (to-keys - from-keys) is the set of fields the replayed prefix must gain for the current client to accept it.

Fan-in diagram: five source-era versions forking forward into the current 2.1.170 client, edge labels showing backfill key counts shrinking from +19 to +1

Figure 1: Forking forward into the current client (2.1.170), by source era, from chat.clj/fork-migration. A pre-stabilization fork (2.1.2 / 2.1.63) must backfill ~18–19 keys (entrypoint, the hook/permission/prompt fields); a recent fork (2.1.161) backfills one (hookAdditionalContext). Migration cost shrinks as the source approaches current — the data-informed argument for forking from the freshest checkpoint that serves the goal. (obsolete counts are sample-sensitive; backfill is the material direction.)

The backfilled fields are not arbitrary: :entrypoint, :promptId, the hook* and permissionMode keys are exactly the ones the current closed schema requires. A fork tool that mints these (or refuses to fork across the 2.1.78 boundary without them) stays inside the contract the live client validates.

To be clear about scope: this is research, not a workflow. We would not fork a 2.1.2 conversation forward — in practice a fork starts from a recent checkpoint in the current line, where the backfill is one or two cosmetic keys. The old-version migrations are here to characterize how the structure changed, so the practical rule (mint what the current schema requires; don't reach across the stabilization boundary) is grounded in the full history rather than a single release. The interesting object is the drift; the deep migrations are the probe.

7. Open questions / TODO

  • [X] Data-derived version-timeline dot in chat.clj (version-drift + version-timeline-dot; color nodes by missing-vs-newest — the 2.1.78 boundary emerges from the data, malli-free so it stays in src). Rendered to version-timeline.png.
  • [X] fork-migration dot: forking old -> current client as a forward migration (fork-migration + fork-migration-dot). Backfill shrinks +19 (2.1.2) -> +1 (2.1.161); rendered to fork-migration.png.
  • [X] Explain the 2.1.168 anomaly: 4 attachment entries of subtype file with map-shaped :content; the newest sample has no file attachment, so its closed schema rejects them. A coverage gap in single-version inference, not a regression; the union schema accepts them. Changelog is silent.
  • [ ] Map version -> approximate release date (entry timestamps) for a real timeline.
  • [ ] Skeptic pass on every claim above; record verdicts in property drawers + the validation ledger.
  • [ ] Banner (gmake banner skill) before this goes fully public.

9. See also