REPL-Driven Schema Archaeology: Claude Code Transcript Drift Across 38 Versions
Table of Contents
1. Overview
The fork note shows you can fork a Claude Code conversation at a checkpoint:
copy the JSONL prefix, mint a new id, claude -r it. This note answers the
question that makes forking safe: when a fork is resumed under the current
client, the prefix is replayed and validated against the current schema — so
any structural change a fork introduces must be one the current version accepts.
The method is the same as the fork note — a Clojure REPL reading raw JSONL, no new infrastructure — applied to a wider corpus: a multi-host audit archive spanning 38 client versions, not one host on one release. The mess is the subject.
2. The corpus
- Source: local SSD audit snapshot (2026-06-13)
- 7,909 session files, 3.2 GB, three agent sandbox hosts (analysis ran on the macOS host which had the largest corpus; run in sandbox mode — host attribution is not material here)
- 320,115 versioned entries across 38 distinct
2.1.xversions (2.1.2..2.1.170) - A full version-frequency scan reads in ~3 s warm (
chat/read-sessionover all files)
:versions 38 :entries 320115
top: [2.1.89 184913] [2.1.19 53360] [2.1.29 11690] [2.1.3 10219]
[2.1.2 9915] [2.1.150 8512] [2.1.4 5687] [2.1.159 5592] ...
Contrast: the fork note's worked example ran in a uniform 2.1.161 environment.
The real archive is 38 versions deep — which is exactly why a fork needs a
contract, not a uniform assumption.
3. Seed findings
Validating every version's sampled entries against the newest version's inferred
closed malli schema (dev/schema_check.clj, scope = the SSD archive):
- Hard breakpoint at 2.1.78. Every version
<2.1.63 fails the newest closed schema 100%; every version>2.1.78 passes. The entry schema stabilized at 2.1.78. - Discriminating key:
:entrypoint. Olduserentries lack it; the closed map rejects them. The newest schema carries ~19 top-level keys absent from 2.1.2 (:hookCount,:permissionMode,:promptSource,:toolUseID,:isMeta, …). - Lost keys. 2.1.2 had
:agentId/:slugthat later disappeared. - Content-shape drift.
thinkingblocks appear at 2.1.19, vanish, then return as standard from 2.1.138 on;documentblocks appear only at 2.1.19; the oldest block (<2.1.5) emitted combined["text" "tool_use"]entries that later split. - Anomaly: 2.1.168. 4/85 samples fail even the recent schema — a late variant worth explaining.
- The fix. A union schema inferred across all 38 versions accepts every entry
(0 fails). That is the data-derived artifact shipped at
schemas/claude-jsonl-2.1.json.
4. The version timeline
chat.clj/version-drift reads the archive and, per version, records the union of
top-level keys and message-shapes plus the delta against the previous version.
version-timeline-dot renders it: each node is a version (entry count below it),
| Era | Versions | Keys missing | Key transitions | Fork cost |
|---|---|---|---|---|
| Pre-stabilization | 2.1.2–2.1.5 | -19 | baseline | ~19 keys to backfill |
| 2.1.9–2.1.29 | -18 | +data, +toolUseID | ~18 keys | |
| 2.1.63 | -18 | -error | ~18 keys | |
| Breakpoint | 2.1.78 | -18 | +entrypoint, +promptId | schema stabilizes here |
| Stabilization | 2.1.89–2.1.104 | -4 to -17 | +hooks, +perms (+11 fields) | 4–17 keys |
| Convergence | 2.1.138–2.1.161 | -3 to -1 | +attachment, +mcpServer, +promptSource | 1–3 keys |
| Current | 2.1.168–2.1.170 | 0 | +hookContext, -compact | reference schema |
The per-version key-set is the union over sampled entries, so the missing-vs-newest count is non-monotonic — a version that sampled hook or system entries shows more keys than one that sampled only plain turns. The trend (early ~18–19, recent 0–2) is the signal.
5. The 2.1.168 anomaly: rare subtypes and the limit of single-version inference
One result resists the tidy "newer accepts older" story: 2.1.168 — a recent
version — has 4 of 85 sampled entries fail the newest closed schema. Forking
the full data down to those 4 entries explains it.
All four are attachment entries of subtype file, whose :attachment/:content
is a single map {:type "text" :file ...}. The error lands at
[:attachment :content 0] / [:attachment :content 1] — the closed schema
expects a different shape there.
(frequencies (map :type bad)) ;=> {"attachment" 4}
(get-in (first bad) [:attachment :type]) ;=> "file"
(type (get-in (first bad) [:attachment :content])) ;=> PersistentArrayMap
;; the newest version's sampled attachments — no :file subtype at all:
(frequencies (map #(get-in % [:attachment :type]) attachments-2-1-170))
;=> {"deferred_tools_delta" 3, "task_reminder" 3, "async_hook_response" 3,
; "mcp_instructions_delta" 1, "skill_listing" 1}
The newest version's 11 sampled attachments are all non-file subtypes (tool
deltas, task reminders, hook responses) whose content is a string, a vector, or
empty. The inferred newest schema therefore never saw a file attachment, so it
rejects the one 2.1.168 still emits. This is not a regression — it is a
coverage gap in single-version inference: a closed schema built from a finite
sample under-covers rare entry subtypes. The union schema (inferred across all
38 versions) accepts these entries with zero failures, which is exactly why the
shipped artifact is the union and not the newest release alone.
The Claude Code changelog for 2.1.166 .. 2.1.170 mentions no attachment or
transcript-format change (only a 2.1.170 fix for sessions not saving
transcripts when launched from VS Code). The :content shape difference is an
undocumented internal detail — drift below the changelog, visible only in the
data. That is the case for reading the transcripts rather than the release notes.
6. Why this gates the fork
A fork that crosses a version boundary is a forward migration, not just a branch.
Forking a 2.1.63 conversation and resuming it under 2.1.170 replays a prefix
that the current client's closed schema rejects (it lacks :entrypoint and the
later keys). The lesson for the fork tool is conservative: do not break the
current Claude Code version. A safe fork
- adds only fields the current schema already tolerates (the union schema names them),
- preserves the keys the current closed schema requires, and
- keeps provenance in a sidecar (
<id>.fork.edn) so the resumable.jsonlstays within the contract the live client validates.
The version archaeology is what makes "what may a fork change?" a data-informed
question rather than a guess. chat.clj/fork-migration quantifies it: for a fork
whose prefix was minted by version from, resumed under the current client
to, :backfill = (to-keys - from-keys) is the set of fields the replayed
prefix must gain for the current client to accept it.
Figure 1: Forking forward into the current client (2.1.170), by source era, from chat.clj/fork-migration. A pre-stabilization fork (2.1.2 / 2.1.63) must backfill ~18–19 keys (entrypoint, the hook/permission/prompt fields); a recent fork (2.1.161) backfills one (hookAdditionalContext). Migration cost shrinks as the source approaches current — the data-informed argument for forking from the freshest checkpoint that serves the goal. (obsolete counts are sample-sensitive; backfill is the material direction.)
The backfilled fields are not arbitrary: :entrypoint, :promptId, the hook*
and permissionMode keys are exactly the ones the current closed schema
requires. A fork tool that mints these (or refuses to fork across the 2.1.78
boundary without them) stays inside the contract the live client validates.
To be clear about scope: this is research, not a workflow. We would not fork a
2.1.2 conversation forward — in practice a fork starts from a recent
checkpoint in the current line, where the backfill is one or two cosmetic keys.
The old-version migrations are here to characterize how the structure changed,
so the practical rule (mint what the current schema requires; don't reach across
the stabilization boundary) is grounded in the full history rather than a single
release. The interesting object is the drift; the deep migrations are the probe.
7. Open questions / TODO
[X]Data-derivedversion-timelinedot inchat.clj(version-drift+version-timeline-dot; color nodes by missing-vs-newest — the 2.1.78 boundary emerges from the data, malli-free so it stays insrc). Rendered toversion-timeline.png.[X]fork-migrationdot: forking old -> current client as a forward migration (fork-migration+fork-migration-dot). Backfill shrinks +19 (2.1.2) -> +1 (2.1.161); rendered tofork-migration.png.[X]Explain the 2.1.168 anomaly: 4attachmententries of subtypefilewith map-shaped:content; the newest sample has nofileattachment, so its closed schema rejects them. A coverage gap in single-version inference, not a regression; the union schema accepts them. Changelog is silent.[ ]Map version -> approximate release date (entry timestamps) for a real timeline.[ ]Skeptic pass on every claim above; record verdicts in property drawers + the validation ledger.[ ]Banner (gmakebanner skill) before this goes fully public.
9. See also
- Time-Travel Chat: Forking Conversation Trees from a Checkpoint — the fork thesis
dev/schema_check.clj,dev/gen_schema.clj,dev/chat_explore.clj— the REPL validatorsschemas/claude-jsonl-2.1.json— the union schema over all 38 versions