Agent Sandbox Architectures: Where the Boundary Sits, and Who Holds the Key
A decomposition exercise: Cloudflare Sandbox SDK, Docker Sandboxes (sbx), Deno Sandbox, the browser, and what the local homelab still lacks
Table of Contents
- 1. What "sandbox" names
- 2. Reference axiom: both isolations, or neither
- 3. The field, decomposed
- 4. The axis the field is still building: secret custody
- 5. The empty cell
- 6. The egress proxy: one shape under four boundaries
- 7. Integration surface: wrap a harness, or be the SDK
- 8. The product layer: pricing, gating, lock-in
- 9. What composes, what complects
- 10. Field notes — installing sbx on mini
- 11. Related
- 12. Anchors
1. What "sandbox" names
"Sandbox" is one word for at least four distinct isolations.
1. Compute boundary: the kernel/VM line untrusted code cannot cross. 2. Filesystem custody: what the code can read and write. 3. Network egress: what hosts the code can reach. 4. Secret custody: credentials the code uses but must not exfiltrate.
A fifth concern — lifecycle (ephemeral, snapshot, volume) — is orthogonal and rides on top of the four.
The four have different threat models and different enforcement points. Most products that present a single "sandbox" guarantee are quietly conflating two or more of them, and the conflation is usually between (1) and the other three: vendors sell the compute boundary — "microVM isolation, hard security boundary" — and leave (3) and (4) to a config file the operator may never write. The interesting question is not "which sandbox is most isolated" but "where does each draw the four boundaries, and what gets complected as a result?"
The threat model has shifted, and the shift is the whole story. We are no longer sandboxing untrusted plugins. We are sandboxing untrusted code that we ourselves generated, that runs without review, that carries real credentials, and that an attacker can steer by prompt injection. Ryan Dahl's framing of Deno Sandbox names it exactly: "LLM-generated code, calling external APIs with real credentials, without human review. Sandboxing the compute isn't enough."
2. Reference axiom: both isolations, or neither
Anthropic's sandbox-runtime README states the invariant this
document tests against:
Both filesystem and network isolation are required for effective sandboxing. Without file isolation, a compromised process could exfiltrate SSH keys or other sensitive files. Without network isolation, a process could escape the sandbox and gain unrestricted network access.
Read it as a falsification condition. A system that isolates the
filesystem but lets the process open arbitrary sockets has not built
a sandbox; it has built a launchpad. A system that blocks egress but
mounts ~/.ssh has built a different launchpad. The axiom is the
conjunction, and the conjunction is where most of the field is still
weak — not because compute isolation is hard (microVMs solved that)
but because egress and secret custody are policy problems, and policy
defaults to permissive.
The contract has three properties worth naming:
- The boundary is structural, the policy is declared. The microVM or container gives you (1) for free. Whether you get (3) and (4) depends on configuration the operator supplies. Capability is structural; safety is declarative.
- The chokepoint is the egress proxy. Every system that controls network egress does it at one point — an outbound proxy the sandbox cannot bypass. coder/httpjail and the Worker request proxy are the same shape: one place where policy is enforced, because one place is the only place you can enforce it.
- Secrets are the asymmetric risk. A blocked egress is an annoyance; an exfiltrated long-lived credential is a breach that outlives the sandbox. Secret custody is the axis where the cost of getting it wrong is unbounded, and it is the axis the field has decomposed least.
That third property is the one this document organizes around.
3. The field, decomposed
3.1. Docker Sandboxes (sbx) — microVM for local agents
The compute boundary is a dedicated microVM per agent; the host stays
untouched. Filesystem custody is "only your project workspace mounted
in" — a sharp, defensible default. Network and filesystem controls
are "controls you define," which is the honest admission that (3) is
policy, not structure. The headline is that --dangerously-skip-permissions
is the default: YOLO mode is safe precisely because the box around
it is hard.
What is decomposed: the host from the agent. Cleanly. The agent can install packages, rewrite configs, even spin up its own Docker containers, and none of it touches the host.
What is complected: autonomy with configured policy. The microVM is
structural; the network allowlist is not. Out of the box you get a
hard wall around a workspace with broadly open egress — fine for
"run the test suite," load-bearing-but-empty for "don't let
prompt-injected code POST my repo to evil.com." Secret custody exists
(sbx secret) but injects into the sandbox environment; the secret
is present, and present means exfiltratable.
3.2. Cloudflare Sandbox SDK — credential custody as a separate tier
The compute boundary is a Container on Workers. Filesystem is a full Linux environment inside the container. Lifecycle composes cleanly with persistence: ephemeral by default, with R2/S3 buckets mounted as local filesystems for state that outlives the box.
The decomposition that matters is secret custody. Cloudflare's request-proxying pattern keeps credentials in the Worker and never in the sandbox: "A Worker proxy validates short-lived JWT tokens from the sandbox and injects real credentials at request time." The sandbox holds a capability (a short-lived JWT), not the credential. Compromise of the sandbox yields a token that expires, not a key that persists.
What is complected: the sandbox lifecycle with the Workers request
model. The box is a Durable-Object-backed thing reached through
getSandbox(env.Sandbox, 'user-123'); it is excellent if your
control plane is already a Worker, and an impedance mismatch if it is
a homelab launchd job on a Mac Mini.
3.3. Deno Sandbox — the secret that is never present
Sub-second-boot microVMs; volumes and snapshots for persistence; a 30-minute maximum lifetime. The technical envelope is modest (2 vCPU, up to 4 GB) and explicitly aimed at "AI agents executing code."
The move that occupies an empty cell: secrets never enter the
environment. Code sees DENO_SECRET_PLACEHOLDER_b14043a2...; the real
key materializes only when the sandbox makes an outbound request to a
pre-approved host.
await using sandbox = await Sandbox.create({ secrets: { OPENAI_API_KEY: { hosts: ["api.openai.com"], value: process.env.OPENAI_API_KEY, }, }, allowNet: ["api.openai.com", "*.anthropic.com"], });
Prompt-injected code that exfiltrates the placeholder to evil.com exfiltrates a string with no value. The key binds to a destination, not to a process. This is the cleanest cut on the secret-custody axis in the field: it separates the secret's value from the secret's presence. The enforcement point is, again, an outbound proxy (Deno's docs cite coder/httpjail as the model), which is the same chokepoint everyone converges on.
What is complected: the production path. sandbox.deploy() is frictionless
precisely because it ties the sandbox to Deno Deploy — sandbox and
serverless host are the same vendor, so dev-to-prod is one call. That
is composition for a Deno shop and lock-in for everyone else.
3.4. The browser — the 30-year-old sandbox, reused
Paul Kinlan's "the browser is the sandbox" tests the hypothesis that the origin sandbox — built to run hostile untrusted code the instant you tap a link — is good enough for agentic file work. The decomposition is honest about where it holds and where it does not.
- Compute boundary: the origin sandbox. Mature, battle-tested, free.
- Filesystem: the File System Access API gives a
chroot-like handle to one user-selected directory — read/write within, no access to siblings or parents. Layer it with the origin-private filesystem and you can edit a copy while leaving the original intact. - Network egress: this is where the model breaks. "Unless you have an
entirely client-side LLM, you can't" fully control egress — the
data must leave to reach the model. CSP is "our friend" here, but it
is partial. The classic aporia: an
<img>whose URL encodes sensitive file contents is an expected web behavior and an exfiltration channel at once. You cannot separate "render an image" from "send data to the image's host" without giving up the first.
What is complected: display and egress. The browser's whole purpose is to fetch and render from anywhere; that purpose is an egress channel. The filesystem story is genuinely good; the network story is the reference axiom's second clause, unmet, and CSP only narrows it.
3.5. macOS sandbox-exec / Seatbelt — the profile as the contract
The substrate Anthropic's experiment and Co-do both gesture at:
sandbox-exec applies a Scheme-syntax profile (Seatbelt) to a
process, allow/deny on file paths and network operations. The boundary
is the host kernel, not a VM. It is the lightest-weight option and the
one with the worst ergonomics — the profile language is undocumented,
deprecated-but-present, and unforgiving. It decomposes (2) and (3)
into one declarative profile; it complects nothing because it is
nothing but policy. Its cost is that you author that policy in a
language Apple stopped documenting a decade ago.
4. The axis the field is still building: secret custody
Memory systems had a reactive/proactive axis with an empty substrate-initiated cell. Sandboxes have an analogous axis, and it is where the secret lives. Three positions, increasing in safety:
| Position | Where the credential is | Compromise yields | Occupied by |
|---|---|---|---|
| Secret in environment | env var inside the box | the live, long-lived key | sbx, sandbox-exec, most local |
| Secret in trusted proxy | the Worker; box holds a JWT | a short-lived token | Cloudflare request proxying |
| Secret never present | placeholder; binds at egress | a useless string | Deno Sandbox |
The distinction is what an attacker who fully owns the sandbox walks away with. Position one hands them the key. Position two hands them a token that expires. Position three hands them a placeholder bound to a host they do not control.
5. The empty cell
Cross the secret-custody axis with where the sandbox runs and a cell is empty:
| Where it runs | Secret in env | Secret never present |
|---|---|---|
| Cloud (vendor edge) | common | Deno Sandbox |
| Local / self-hosted | sbx, Seatbelt | (no production tool) |
The bottom-right cell — a locally-run agent sandbox that materializes secrets only at approved-host egress — is unoccupied. sbx runs locally and has a secret store, but the secret is injected into the box; it is present. Deno does the placeholder trick but only in Deno's cloud. For a LAN-only homelab running local coding agents, the option that protects a long-lived API key from prompt-injected exfiltration without shipping the workload to a vendor edge does not yet exist as a general product. It is buildable — coder/httpjail plus a local chokepoint is the whole recipe — and, as the next section argues, LiteLLM already ships this exact shape for one class of host while nothing ships it for arbitrary egress.
6. The egress proxy: one shape under four boundaries
Position two of the secret-custody axis — credential in a trusted proxy, injected at the boundary — is not a Cloudflare quirk. It is the convergent shape, and naming it is the point.
The pattern: an agent in a sandbox tries to git push. It has no SSH
key and no token. The egress proxy intercepts the connection, looks up
the identity the sandbox was started with, and injects the credential
at the network boundary. From the agent's perspective it "just worked";
from the security perspective the credential never crossed into the
code-execution environment. The full sequence — credentialless request
→ cred fetch from vault → token injection → proxied call → stripped
response — is diagrammed in
Cloudflare Agents Week, §The egress proxy.
This inverts the standard model. The standard model hands the agent a credential and asks it to be careful; prompt injection turns "careful" into "exfiltrate." The egress-proxy model never lets the authority exist as an artifact the agent can read. On the credential-provenance axis it is strictly better. On operator control, LAN latency, and vendor lock-in the comparison runs the other way — which is exactly why the boundary's location is the whole question for a homelab.
Three boundaries, one shape:
| Boundary owner | Holds the credential | Injects at |
|---|---|---|
| Cloudflare Sandbox egress | the Worker | edge egress proxy |
| Anthropic managed agents | vault + proxy | managed egress |
| Deno Sandbox | host process (value) | approved-host egress |
Deno is the limiting case: the proxy holds the value and the sandbox holds only a placeholder, so even the reference is inert. The other two hold a real credential in the proxy and inject it; Deno keeps the value out of the sandbox's address space entirely.
6.1. LiteLLM is the local instance — for one class of host
mini already runs the pattern, narrowly. The LiteLLM proxy holds the
model-provider keys; local agents call it with no key of their own, and
the proxy injects the real credential toward api.anthropic.com /
api.openai.com. That is egress-proxy credential injection for model
endpoints — the same shape, scoped to one class of host. The guardrail
layer on top (PII custom_code guards firing per request) is the
inspect-and-modify hook Deno's docs promise and most proxies lack.
What LiteLLM does not do is the git push case: arbitrary-host egress
with identity-bound credentials. That is the unoccupied cell from the
previous section, restated precisely — mini has the proxy shape for
model traffic and nothing for general egress. The economy side of the
same problem — who may spend which credential, and how it is metered —
is worked out in Agent Token Exchange.
7. Integration surface: wrap a harness, or be the SDK
The four isolations say nothing about who writes the code that runs inside the box. That is a separate axis, and it is the one that decides which sandbox a given harness can even use.
| Surface | You supply | The sandbox runs | Systems |
|---|---|---|---|
| Wrap-a-harness | an agent CLI | the agent, unmodified | Docker sbx |
| SDK primitive | your own agent | code you wrote | Cloudflare Sandbox, Deno Sandbox |
| Build-in-page | a web app | browser-resident code | browser / Co-do |
| OS profile | any process | a profiled command | sandbox-exec, Anthropic sandbox-runtime |
Docker sbx is the only one that treats the harness as the unit. sbx
run takes an existing agent — Claude Code, Gemini CLI, Copilot CLI,
Codex, OpenCode, Kiro — and runs it unmodified inside the microVM. You
write no integration code; the sandbox is harness-agnostic because it
wraps the process, not the logic. This is why its default is
--dangerously-skip-permissions: the agent it wraps is one that
otherwise stops to ask, and the box is what makes "skip the prompts"
safe.
Cloudflare and Deno are the inverse. There is no agent to wrap; you
call getSandbox() or Sandbox.create() from code you wrote, and the
sandbox is the execution tier of an agent you are building. The
integration is total because you author both sides.
The distinction is not cosmetic. A wrap-a-harness sandbox adopts a new coding agent the day it ships, because it never modeled the agent's internals. An SDK sandbox cannot run Claude Code for you at all — it is a primitive, not a host. For mini, where the agent is Claude Code, only the wrap-a-harness surface (sbx) and the OS-profile surface (sandbox-exec, the sandbox-runtime shape) are candidates at all.
8. The product layer: pricing, gating, lock-in
Where the boundary runs is also where the bill is. The commercial shape is a property of the sandbox as much as its isolation model, and it correlates with the secret-custody axis in a way worth naming.
| System | Cost shape | Gating | Lock-in vector |
|---|---|---|---|
| Docker sbx | free core; team network/FS policy via sales | none for core | low — wraps any harness, runs local |
| Cloudflare Sandbox | Workers Paid plan; bucket mount needs production deploy | Workers Paid | high — Durable Objects, R2, Workers |
| Deno Sandbox | usage-based on Deno Deploy (≈$0.05/h CPU, $0.016/GB-h memory, $0.20/GiB-mo volume; Pro includes 40h / 1000 GB-h / 5 GiB) | Deno Deploy account | high — sandbox.deploy() binds to Deno Deploy |
| browser / Co-do | free (web-platform APIs) | none | none — and no managed boundary either |
| sandbox-exec | free (built into macOS) | none | none — but macOS-only, deprecated |
Two observations:
- Lock-in tracks the boundary's owner. The systems with the strongest secret custody — Cloudflare's Worker proxy, Deno's placeholder — are exactly the ones whose credential boundary is the vendor's edge. You buy provenance with dependence. sbx and sandbox-exec carry no lock-in precisely because they offer no managed credential boundary; that is the empty cell from earlier, seen from the billing side.
- Free-to-start is not free-to-run-unattended. sbx installs free, but the network and filesystem policies that make YOLO mode genuinely safe for a team are the paid, talk-to-sales tier. The core product is the hard box; the governance is the upsell. For a single-operator homelab the free core is the whole product; for a team it is the loss leader.
9. What composes, what complects
| System | Decomposed | Complected |
|---|---|---|
| Docker Sandboxes | host from agent (microVM, workspace) | autonomy with operator-set policy |
| Cloudflare Sandbox | credential custody from execution | lifecycle with Workers request model |
| Deno Sandbox | secret value from secret presence | dev-to-prod with Deno Deploy |
| Browser (Co-do) | filesystem custody (FS Access chroot) | display with network egress |
| sandbox-exec | filesystem and network into one profile | profile authored in dead language |
The systems differ less in their compute boundary — microVMs are a commodity now — than in which of the four isolations they treat as structural and which they leave to policy. The reference axiom is satisfied by construction only where both (2) and (3) are structural; everywhere else, "sandboxed" means "isolated compute plus a config file you were trusted to write."
10. Field notes — installing sbx on mini
The experience-report part, since the claim should carry its own provenance.
brew install docker/tap/sbxfailed first run: an HTTP/2PROTOCOL_ERRORmid-download of the cask asset, compounded by a Homebrew auto-update that could not refreshformula.jws.json. The failure was transient, not a bad URL — the release asset resolves fine.- Re-running with auto-update disabled (
brew install --cask docker/tap/sbx) succeeded. Binary links to/opt/homebrew/bin/sbx, with bash/fish/zsh completions. sbx version→ clientv0.30.0; server "Unavailable (daemon not running — use 'sbx daemon start')." The CLI is a client to a daemon that brokers the microVMs; nothing runs until the daemon does, and Docker Desktop is explicitly not required.- The subcommand surface (
create,run,exec,policy,secret,ports,template,kit) confirms the decomposition above:policyis where (3) lives,secretis where (4) lives, and both are opt-in. The default is the hard box with permissive policy — exactly the shape this document warns about.
For the homelab the fit is narrow: sbx is built for unattended agent
runs, and mini's directive is that Gatus owns restarts and all
telemetry goes to nexus. An sbx-wrapped agent would need its OTLP
egress (192.168.86.100:4317) added to the network policy explicitly,
or the hard box silently swallows the traces. That is the reference
axiom biting in the friendly direction: egress control that is doing
its job will block your observability too, until you declare it.
11. Related
- Agent Memory Architectures — the companion decomposition; same compose-vs-complect lens, the reactive/proactive axis there mirrors the secret-custody axis here
- Cloudflare Agents Week 2026 — §The egress proxy; the credential-injection sequence diagrammed in full
- Agent Token Exchange — the economy side of credential management: who may spend which authority, and how it is metered
- Agentic Systems Q4 2024 — MCE pattern; sandboxes are the execution tier of that architecture
12. Anchors
- Anthropic. sandbox-runtime. https://github.com/anthropics/sandbox-runtime — the two-isolation axiom (filesystem ∧ network)
- Cloudflare. Sandbox SDK. https://developers.cloudflare.com/sandbox/ — container on Workers; request proxying keeps credentials in the Worker
- Docker. Docker Sandboxes. https://www.docker.com/products/docker-sandboxes/ — microVM per agent;
brew install docker/tap/sbx - Dahl, R. Introducing Deno Sandbox. Deno blog, 3 Feb 2026. https://deno.com/blog/introducing-deno-sandbox — secret placeholders, allowNet, sub-second boot
- Kinlan, P. the browser is the sandbox. AI Focus, 25 Jan 2026. https://aifoc.us/the-browser-is-the-sandbox/ — File System Access chroot, CSP as partial egress control
- Willison, S. sandboxing. https://simonwillison.net/search/?q=sandbox — running coverage of the Anthropic sandbox experiment and sandbox-exec
- coder. httpjail. https://github.com/coder/httpjail — the egress-proxy chokepoint both Deno and Cloudflare converge on
- Apple.
sandbox-exec(1)/ Seatbelt — host-kernel profile sandboxing on macOS