How We Contain Claude: Mapping Against the Stack

1. Thesis
2. Correspondence
3. The Two Egress Incidents = the Secret-Custody Empty Cell
4. Where the Corpus is Already Ahead
5. Related Work

1. Thesis

The post is not adjacent work; it is independent rediscovery of two structural primitives already in the stack – the L4 provenance layer and blast-radius as the deployment risk function – plus one direct refutation condition aimed at the custom egress components (saproxy, netax). Its two most consequential incidents are both egress through a permitted path, which is precisely the secret-custody empty cell named in Agent Sandbox Architectures one level up, on the same publication date.

Source: Anthropic Engineering: How We Contain Claude

Refutation condition for the thesis: it is wrong if any of the post's structural incidents reduces to a model-layer failure (something a classifier could have caught) rather than a provenance/egress boundary failure. Both headline incidents – the api.anthropic.com exfil and the employee phish – fail the model layer by construction (nothing anomalous to catch), so the thesis holds on the evidence presented.

2. Correspondence

Post primitive	Stack artifact	Relation
blast radius = P(fail)·damage(fail)	reliability-lab six-gate; elenctic-spec L0-L3	Same decomposition, operationalized as deployment risk function
three components: model / environment / external content	Seven Concerns L1-L7	Their triad coarsens the stack. env = L1; external content = L4. No analogue to L5-L7
probabilistic vs deterministic controls	Agent Permission Guardrails	Exact match. "telling != enforcing" is the guardrails piece verbatim
OS sandbox (Seatbelt/bubblewrap), egress-denied-by-default	Bastille jails (SEFACA); netax divert(4)	Convergent. netax is the homegrown egress interceptor; their primitives are the battle-tested form
two-isolation requirement (fs AND network)	Agent Sandbox Architectures	Already the reference axiom in the corpus
allowlist = capability grant, not destination filter	provenance-laundering taxonomy	The api.anthropic.com exfil is a textbook laundering case
in-VM MITM proxy "only the VM knows provenance"	Digital Shapeshifting L4; saproxy four-phase filter	Trust enforced where provenance is legible
direct prompt injection via user (phish)	role-boundary detection	Their finding (classifiers anchor on user intent) is the role-boundary failure
MCP authz optional; stdio excluded	Agent Permission Guardrails	"Every function reachable through an allowed domain" = no per-tool OAuth scope
persistent memory poisoning	JITIR Against the Field	Direct exposure. Falsification conditions already scoped; no session-startup classifier yet
multi-agent trust escalation	Agentic Q1 2026	Epistemic labels prevent sub-agent output from being promoted to higher trust

3. The Two Egress Incidents = the Secret-Custody Empty Cell

The sandbox-systems decomposition names four isolations – compute, filesystem custody, network egress, secret custody – and identifies secret custody as the axis the field is still building, the empty cell: vendors sell the compute boundary and leave egress and credential custody to a config the operator may never write.

The post's headline incident is that empty cell getting hit in production. A malicious workspace file carried an attacker-controlled API key; Claude called the Files API; the egress proxy saw api.anthropic.com, an approved destination, and passed it; the files landed in the attacker's account. The sandbox worked; the secret custody axis did not exist. Their fix – an in-VM proxy that passes only the VM's provisioned session token and rejects an embedded key – is exactly secret custody as a first-class boundary, enforced where provenance is legible.

This is the strongest single connection in the corpus: the post is the empirical falsification that the sandbox-systems piece predicted on the same day.

The phish is the same shape on a different vector: a user-delivered payload that reads ~/.aws/credentials and POSTs it out. Model layer is blind (the user typed it); only egress + filesystem custody hold. Two incidents, one axis.

4. Where the Corpus is Already Ahead

Agent identity. The post leaves "own principal vs inherited user permissions, probably a blend" as open. The governance tuple [persona:agent:reviewer@env(project:workspace)] already resolves it.
Adversarial review. The post's multi-agent trust-escalation warning is the security framing of structured challenge/response between agents. CPRR's refutation step is the mechanism.

5. Related Work

Agent Sandbox Architectures – secret-custody empty cell; this post is its empirical falsification
Agent Permission Guardrails – probabilistic vs deterministic; MCP authz gap; wire-level PDP/PEP
Agent Memory Architectures – persistent memory poisoning exposure
Agentic Systems Q1 2026 – identity/governance, adversarial review, MCP security
Agent Isolation with FreeBSD Jails – our implementation of the two-isolation requirement
anthropic-experimental/sandbox-runtime – their open-source sandbox runtime