AI Engineer World's Fair 2026
Event notes (not attended — remote scan of the schedule + Online Track)

Table of Contents

Logistics

  • When: June 29 – July 2, 2026 (Day 1 workshops → Days 2–4 sessions)
  • Where: San Francisco, CA
  • Organizer: AI Engineer (swyx & team)
  • Attended: no. This note is a structured remote scan of the schedule + Online Track videos, filtered against corpus research themes.

Sources

Three canonical + one raw data snapshot; the three canonical URLs are the linkable references, the sources/ files are the exact bytes I filtered.

  • worldsfair/schedule.pdf — official schedule PDF
  • YouTube playlist — AI Engineer World's Fair Online Track 2026 (82 videos; only the "Online Track" subset is public)
  • worldsfair/2026#schedule — official schedule page
  • sources/ (this dir):
    • sessions.json (548 KB, schedule v4927, 556 sessions with {title, day, time, room, track, type, status, speakers, description})
    • speakers.json (977 KB, 630 speakers with bios)
    • session-details.json (672 KB, per-session descriptions + speaker bios indexed by numeric id 0..560)
    • youtube-playlist.json (89 KB, yt-dlp --flat-playlist --dump-single-json output for the 82 Online Track videos)
    • gen-tables.jq — reproducibly emits the org tables from sessions.json (jq -rf gen-tables.jq sessions.json > session-tables.org)
    • session-tables.org — 579-line generated table of all 556 sessions grouped by day (time · track · title · speakers · type)

Overview

The Fair is four days, ~556 confirmed sessions across ~15 named tracks

  • workshop / sponsor / expo slots, and ~630 speakers. Day 1 is

workshop-heavy (58 sessions); Days 2–4 run ~166 sessions each. The Fair has dedicated tracks for three of the corpus's active research threads:

  • Harness Engineering (13 sessions on Days 2–4) — the closest thing to a first-class venue for "harness = everything except the model" as an operational discipline. Overlaps directly with Aune's natea/harness-eval methodology and the CPRR frame.
  • Sandbox & Platform Engineering (Day 3, 11+ sessions) — isolation primitives, agent runtimes, blast-radius containment. Overlaps with the sandbox + containment corpus and Kumar's secret-brokering thread.
  • Memory & Continual Learning (Day 3) + Evals (Days 2–3) — memory as first-class harness surface; eval as the attribution instrument. Overlaps with the memory-systems + telemetry corpus and Poliakov's DBOS durability angle.

The Online Track (public YouTube) is 82 of the 556 sessions. Harness engineering dominates the online cut; memory + observability are present but thinner; secret custody, permission guardrails, and team topologies are essentially absent from the online set.

Sessions relevant to current research

Three clusters, each anchored to a research thread in the wal.sh corpus. Sessions were filtered by four parallel agents against the sources/session-tables.org dataset (556 sessions). Overlap between clusters is noted inline (e.g. "Total Recall: Agent Memory and Harness Engineering" hits both memory and harness).

Each cluster is a curated list (10–30 sessions per cluster from the 556-session dataset), not the full track. The full per-day table lives at sources/session-tables.org if a broader scan is wanted.

Harness engineering + coding-agent eval + attribution

Anchor thread: CPRR methodology · elenctic vibe code review · Building AI Agent Tool Systems · CLI Coding Agents Q2 2026 · 2026 Q2 Claude Code Features · 2026 Q2 Skills

Adjacent event: AI Tinkerers Boston 2026-06-29 — Aune on natea/harness-eval (3-trial ablation, blind judge, framework attribution) is the closest 2026-Q2 methodological analog.

Fair track shape: dedicated Harness Engineering track (13 sessions on Days 2–4). Gap vs Aune's methodology: light on explicit blind-judge ablation / three-trial attribution formalization; Varun Krovvidi's "6 Pillars" and Rustem Feyzkhanov's trace-to-simulation progression are the closest structural equivalents.

Day / Time · Track · Title · Speakers · Why relevant
Day 1 · 9:00–11:00 · Workshops · Total Recall: Agent Memory and Harness Engineering · Ignacio Martinez · memory-as-harness (also cluster 2)
Day 1 · 12:10–1:10 · Workshops · Evals in AI: A Deep Dive · Tejas Kumar · rubric / benchmark design as harness instrumentation
Day 1 · 12:10–1:10 · Workshops · From Zero to Leaderboard: Building an End-to-End AI Agent Evaluation Pipeline · Wolfram Ravenwolf · end-to-end eval pipeline
Day 2 · 11:10–11:30 · Claws & Personal Agents · Your Agent Didn't Fail. Your Harness Did. · Vinoth Govindarajan · direct hit on harness-vs-model attribution thesis
Day 2 · 11:10–11:30 · AI Architects · Your Agent Evolved. Your Evals Didn't. · Ameya Bhatawdekar · eval drift as harness failure signal
Day 2 · 1:55–2:15 · AI-Native Enterprises · AI Evals Platform for Cross-Functional Teams at Scale · Nachiket Paranjape, Swaroop Chitlur Haridas · platform-level eval infra
Day 2 · 2:50–3:10 · — · 6 Pillars of an Agentic Harness That Fixes Production Incidents · Varun Krovvidi · direct harness-methodology decomposition
Day 2 · 3:20–3:40 · Claws & Personal Agents · Every Harness Will Become A Claw · Sam Bhagwat · harness architecture evolution
Day 2 · 4:30–4:50 · Software Factories · Harness Engineering is not Enough: Why Software Factories Fail · Dex Horthy · meta-critique of harness sufficiency
Day 2 · 4:50–5:10 · Harness Engineering · In Code They Act, In Proof We Trust · Erik Meijer · formal-semantics harness (links to CPRR)
Day 3 · 10:45–11:05 · Evals · Vending-Bench: Long-Horizon Agent Evals for a Simulated Vending Business · Lukas Petersson · long-horizon eval design
Day 3 · 11:10–11:30 · Evals · From Signal to PR: Anatomy of a Self-Improving Agent · Jason Lopatecki · harness-to-PR attribution loop
Day 3 · 11:40–12:00 · Evals · Building Closed-Loop Evals for a Multimodal Agent at Uber Scale · Soumya Gupta, Jai Chopra · production harness at scale
Day 3 · 12:05–12:25 · Evals · From Agent Traces to Agent Simulations · Rustem Feyzkhanov · trace-to-simulation methodology (elenctic-adjacent)
Day 3 · 3:20–3:40 · Evals · Don't Ship Skills Without Evals · Philipp Schmid · skills-layer eval (links to Q2 skills)
Day 4 · 9:40–10:00 · Harness Engineering · The Unreasonable Effectiveness of Separating the Task from the Model · Maxime Rivest, Isaac Miller · task↔model↔harness seam
Day 4 · 10:00–10:20 · Harness Engineering · How Anthropic Builds: Lessons from Labs · Mike Krieger · industrial harness methodology from the frontier lab
Day 4 · 10:45–11:05 · Harness Engineering · Tokens Should Have Jobs · Katelyn Lesse, Angela Jiang · token-level harness instrumentation
Day 4 · 12:05–12:25 · Agentic Engineering · Benchmarking Coding Agents on New vs Legacy Code bases · Denys Linkov · coding-agent benchmark design
Day 4 · 12:05–12:25 · Harness Engineering · Harness Engineering: Building the Production Cage for Powerful Domain Agents · Mike Chambers · production harness patterns
Day 4 · 1:30–1:50 · Harness Engineering · Loophole — Adversarial Agents To Stress Test Your Morality · Brendan Rappazzo · adversarial harness / robustness
Day 4 · 1:55–2:15 · Harness Engineering · Agent Frameworks Considered Harmful · Rémi Louf · meta-critique of framework abstractions
Day 4 · 2:25–2:45 · Harness Engineering · We let an AI agent execute Bash and lived to talk about it · Sarah Sanders · sandbox harness for shell (cluster 3 also)
Day 4 · 2:50–3:10 · Harness Engineering · No Memory, No Harness: Why the Database Is the Last Line of Defense · Kay Malcolm · DB-backed durable harness (cluster 2 also)
Day 4 · 3:45–4:05 · Harness Engineering · Agents Without Code: How Skills, YAML, and Filesystems Replaced Python · Philipp Schmid · harness-as-filesystem, declarative tool spec

Agent memory + durable execution + observability + telemetry

Anchor thread: Agent Memory Systems · Agent Memory as Institutional Knowledge · The Agent Context Thread · Agent Telemetry Systems · Team Topologies for the Agentic Platform

Adjacent event: AI Tinkerers Boston 2026-06-29 — Poliakov on "Durable and Observable AI Agents" (DBOS, Postgres, replayable workflows) is the closest 2026-Q2 substrate analog.

Fair track shape: dedicated Memory & Continual Learning track (Day 3), with scattered observability sessions across Days 2–4. Kay Malcolm's DB-as-harness talk and Viren Baraiya's durable-runtime sessions hit the DBOS-adjacent substrate directly.

Day / Time · Track · Title · Speakers · Why relevant
Day 1 · 9:00–11:00 · Workshops · Total Recall: Agent Memory and Harness Engineering · Ignacio Martinez · memory decomposition + harness engineering substrate
Day 1 · 9:00–11:00 · Session · Advanced workshop: Mastering AI Observability · Doug Guthrie · telemetry / observability fundamentals
Day 1 · 2:20–4:20 · Sponsor · Context Engineering in 2026: Compaction, Memory & Cost · Louis-François Bouchard, Samridhi Vaid, Omar Solano · memory layers + token-spend accounting
Day 2 · 2:25–2:45 · Session · Beyond Golden Signals: Monitoring in the Age of GenAI · Marina Petzel · observability signals for agents
Day 2 · 3:20–3:40 · Session · From Context to Memory: Your Agents Need a Real Memory Layer · Anders Swanson · memory-layer readonly/writable/searchable semantics
Day 2 · 3:45–4:05 · Session · Unlock Agent Autonomy: The Runtime for AI-Native Systems · Tushar Jain · runtime substrate
Day 3 · 11:00–12:00 · CTO Circle · Tokenomics: From AI Spend to AI Value · Martin Harrysson, Matt Linderman, Prakhar Dixit · token-spend accounting
Day 3 · 11:10–11:30 · Session · Harnessing Agents: The Durable Runtime for Dynamic Workflows · Viren Baraiya · durable execution (DBOS adjacency)
Day 3 · 11:40–12:00 · Session · Memory Harnesses for Long-Running Research Agents · Stefania Druga · durable memory for long-horizon tasks
Day 3 · 3:20–3:40 · Session · Lessons from Studying Every Memory System · Shlok Khemani · memory-systems taxonomy
Day 3 · 3:45–4:05 · Session · LLM Knowledge Bases: a practical guide · Ben Holmes · persistent knowledge layer (survives model upgrade)
Day 4 · 11:10–11:30 · Sponsor · Tracing and debugging agents across systems with OpenTelemetry · Chang Liu · distributed tracing / OTel substrate
Day 4 · 2:50–3:10 · Session · No Memory, No Harness: Why the Database Is the Last Line of Defense · Kay Malcolm · DB-backed durability (DBOS-aligned)

Sandbox + secret custody + permission guardrails + MCP + deployment

Anchor thread: Agent Sandbox Architectures · Practical Sandbox Configs · How We Contain Claude: Mapping Against the Stack · Agent Permission Guardrails · Agent Deployment Systems · The Four-Boundary Spec Mapping · Code Search and Code Graph MCP Servers

Adjacent event: AI Tinkerers Boston 2026-06-29 — Kumar on "Protecting Secrets from Long-running Agents" (OPA + AWS Secrets Manager, time-boxed approval, secret-broker) is the closest 2026-Q2 governance analog.

Fair track shape: dedicated Sandbox & Platform Engineering track (Day 3, 11+ sessions) + Context Engineering / MCP sessions (Day 3) + scattered auth + deployment talks. Gap vs Kumar's: fewer explicit OPA-in-front-of-Secrets-Manager sessions; the secret-broker pattern is implied (e.g. Gopal's "leak stopping") but not formalized.

Day / Time · Track · Title · Speakers · Why relevant
Day 1 · 11:05–12:05 · Workshops · How I learned to stop worrying and love the sandbox · Matt Brockman · foundational sandbox mechanics
Day 1 · 4:30–5:30 · Workshops · Agent Auth · Bereket Habtemeskel, Paola Estefania · permission / auth patterns for agents
Day 2 · 11:10–11:30 · AI-Native Enterprises · Building the engine while flying the plane — launching the Figma MCP server · Jesse Lumarie · MCP production deployment
Day 2 · 1:55–2:15 · — · Who Approved That MCP Server? Governing the Tool Layer · Jim Clark · MCP governance + approval boundaries (Kumar-adjacent)
Day 2 · 1:55–2:15 · Security · Agentic Security: Permissions, Provenance, and the Agent Supply Chain · Steve Yegge · three-layer identity / session-scope / per-op model
Day 2 · 2:50–3:10 · Claws & Personal Agents · Your company brain will leak secrets. How we stopped it for big banks · Tanmai Gopal · secret custody at scale
Day 2 · 3:45–4:05 · Expo · How We Built the Airbyte Agent MCP Server and CLI · Pedro Lopez · MCP deployment + tool-system architecture
Day 3 · 10:45–11:05 · Sandbox & Platform Engineering · Don't build agents, build environments · Adam Azzam · environment-as-boundary philosophy
Day 3 · 11:40–12:00 · Sandbox & Platform Engineering · Kubernetes Is Not Your Sandbox · Ivan Burazin · container vs. true sandbox boundary
Day 3 · 12:05–12:25 · Sandbox & Platform Engineering · Your agent needs a sandbox, not a desert · Samuel Colvin · containment vs. usability tradeoffs
Day 3 · 1:30–2:15 · Sandbox & Platform Engineering · From fork() to Fleet: Designing an Agent Sandbox Cloud (Pt 1 + 2) · Abhishek Bhardwaj · sandbox infra + fleet
Day 3 · 1:55–2:15 · Expo Stage 2 · Edge-Native AI: Building Ultra-Fast Agents and MCP Servers with Spin · Thorsten Hans · MCP on edge
Day 3 · 2:25–2:45 · Sandbox & Platform Engineering · 1,000 Agent Tasks in a Sandbox: What Breaks When LLMs Write and Run Code · Kevin Orellana · runtime isolation under stress
Day 3 · 2:25–2:45 · Context Engineering · MCP Apps — Extending the frontier · Liad Yosef, Ido Salomon · MCP architecture / extensibility
Day 3 · 2:50–3:10 · Context Engineering · MCP Apps: Give the Model Data, Give the User a UI · Dustin Mihalik · MCP as context broker
Day 3 · 3:20–3:40 · Context Engineering · MCP Tasks (async) — Why the heck aren't any agents supporting MCP tasks/async? · Cornelia Davis · async task / permission boundaries
Day 3 · 3:20–3:40 · Sandbox & Platform Engineering · Sandboxes Aren't Optional: Runtime Isolation Patterns for Coding Agents at Scale · Robert Brennan · isolation taxonomy
Day 4 · 11:10–11:30 · Agentic Engineering · MCPs, CLIs, and Skills: Choosing the Right Tooling Layer for Agentic Development · Nikita Kothari · tool-layer decision framework
Day 4 · 11:40–12:00 · Agentic Engineering · Auth for Agents: Unblock Autonomous AI with auth.md · Michael Grinich · standardized auth for agent tool access
Day 4 · 1:30–1:50 · — · YOLO Mode, Safely: microVM Sandboxes for Any Agent · Rowan Christmas · microVM isolation + blast radius
Day 4 · 1:55–2:15 · Track M · Blast Radius Zero: One-Command OpenClaw Sandboxes in the Cloud · Arun Sekhar · managed sandbox-as-a-service

Online Track (YouTube, publicly available)

82 videos out of the 556 sessions made the public "Online Track" playlist. This subset skews heavily toward harness engineering, eval, and observability — the sandbox / MCP / permission cluster is essentially absent from the online cut.

Curated below: the 17 videos with the closest bridge to corpus research. Full playlist metadata in sources/youtube-playlist.json.

# Duration Title Speaker / Org Theme Corpus link Watch
1 24m Recursive Coding Agents Raymond Weitekamp, OpenProse harness CPRR
2 16m Your Coding Agent Is Creating Review Debt Sachin Gupta harness / eval CPRR
3 9m Production Evals For Agentic AI Systems Nishant Gupta, Meta eval CPRR
4 13m SWE-Marathon: Evaluating Coding Agents at Billion-Token Scale Rishi Desai, Abundant AI eval CPRR
5 27m What if the harness mattered more than the model? Aditya Bhargava, Etsy harness CLI Agents
6 22m Beyond the Harness: A Journey Towards Adaptive Engineering Rajiv Chandegra, Annicha Labs harness CLI Agents
7 27m When Agents Meet Physical Data: The Other Physics of Agent Harnesses Dmitry Petrov, DataChain harness CLI Agents
8 35m Stop AI Agent Hallucinations: 5 Techniques + Production Patterns Elizabeth Fuentes, AWS harness CPRR
9 20m A Genius With Amnesia Victor Savkin, Nx memory Memory
10 40m Turn 10,994 Notes Into Memory Paul Iusztin, Louis-François Bouchard memory Memory
11 23m Continual Learning for AI Agents: From Failures to Durable Improvements Soheil Feizi, RELAI memory / durability Memory
12 29m Your Agent Failed in Prod. Good Luck Reproducing It. Tisha Chawla, Susheem Koul, Microsoft observability Telemetry
13 26m Why Your Agent Disagrees With Itself Diane Lin, Datadog observability Telemetry
14 20m The Log Is The Agent Ishaan Sehgal, Omnara observability Inst. Knowledge
15 7m Deterministic Infra for Non-Deterministic AI Agents Nishant Gupta, Meta determinism CLI Agents
16 29m Every Solo Agent Builder Eventually Reinvents a Worse Version of CI/CD Sumaiya Shrabony deployment / CI Deploy
17 29m MCP Apps: Primitives, discovery, and the Future of Software Pietro Zullo, Manufact sandbox / MCP MCP

Adjacencies to current work

  • Poliakov (AI Tinkerers Boston, 2026-06-29) ↔ WF — Kay Malcolm (Day 4) "No Memory, No Harness: Why the Database Is the Last Line of Defense" is the closest DBOS-adjacent substrate talk. Viren Baraiya (Day 3) "Harnessing Agents: The Durable Runtime" complements it from the runtime side.
  • Kumar (AI Tinkerers Boston, 2026-06-29) ↔ WF — Steve Yegge (Day 2) "Agentic Security: Permissions, Provenance, and the Agent Supply Chain" and Jim Clark (Day 2) "Who Approved That MCP Server?" are the governance analogs. Kumar's specific OPA-in-front-of-Secrets-Manager pattern is not directly matched.
  • Aune (AI Tinkerers Boston, 2026-06-29) ↔ WF — Varun Krovvidi's "6 Pillars of an Agentic Harness" (Day 2), Rustem Feyzkhanov's "From Agent Traces to Agent Simulations" (Day 3), and Mike Krieger's "How Anthropic Builds" (Day 4) span the harness-methodology axis Aune measured empirically.
  • Zhu (AI Tinkerers Boston, 2026-06-29) ↔ WF — the Fair is thin on Zhu's specific "deterministic code + structured memory" GTM-agent pattern; Nishant Gupta's "Deterministic Infra for Non-Deterministic AI Agents" (Online Track, 7m) is the closest at the substrate level.
  • Team Topologies for the Agentic Platform ↔ WF — the Fair's track structure (Harness / Sandbox / Memory / Evals as first-class venues) is the four-hats model made external: the corpus argument that "boundaries survive teams" gets a concrete reflection in the Fair organizing its tracks around the same boundaries.

Open questions / TODO

  • [ ] Watch the ▶-linked Online Track picks in the table above; capture per-video notes as includes/<slug>.org for the ones worth verbatim citation.
  • [ ] Add sources/schedule.pdf ingest if the PDF gains info not in sessions.json (e.g. room-map, ticket-tier gating).
  • [ ] Cross-check the "Fair has 150 Anthropic-employees-in-title sessions" claim from Twitter buzz against ~sources/speakers.json (grep for Anthropic in the affiliation field, count).
  • [ ] If natea/harness-eval methodology gets a WF write-up (Aune wasn't a WF speaker per the source data), link from the Boston Aune notes.
  • [ ] Wire this note into site/events/index.org Upcoming/Past list.

Provenance

Note assembled 2026-07-02 from a fixed dataset (sources/ files captured at scheduleVersion=4927 on 2026-07-02). Relevance filtering performed by four parallel scan agents against session-tables.org; the curation is this note's editorial judgment, not the Fair's.

Not attended; every session listed is a scheduled talk, not a witnessed one. Verdicts are attributed (sourced from the schedule payload) until a talk is actually watched from the Online Track, at which point the corresponding include should carry a :VERDICT: reproduced drawer.