AI Engineer World's Fair 2026
Event notes (not attended — remote scan of the schedule + Online Track)
Table of Contents
Logistics
- When: June 29 – July 2, 2026 (Day 1 workshops → Days 2–4 sessions)
- Where: San Francisco, CA
- Organizer: AI Engineer (swyx & team)
- Attended: no. This note is a structured remote scan of the schedule + Online Track videos, filtered against corpus research themes.
Sources
Three canonical + one raw data snapshot; the three canonical URLs are
the linkable references, the sources/ files are the exact bytes I
filtered.
- worldsfair/schedule.pdf — official schedule PDF
- YouTube playlist — AI Engineer World's Fair Online Track 2026 (82 videos; only the "Online Track" subset is public)
- worldsfair/2026#schedule — official schedule page
sources/(this dir):sessions.json(548 KB, schedule v4927, 556 sessions with{title, day, time, room, track, type, status, speakers, description})speakers.json(977 KB, 630 speakers with bios)session-details.json(672 KB, per-session descriptions + speaker bios indexed by numeric id 0..560)youtube-playlist.json(89 KB,yt-dlp --flat-playlist --dump-single-jsonoutput for the 82 Online Track videos)gen-tables.jq— reproducibly emits the org tables fromsessions.json(jq -rf gen-tables.jq sessions.json > session-tables.org)session-tables.org— 579-line generated table of all 556 sessions grouped by day (time · track · title · speakers · type)
Overview
The Fair is four days, ~556 confirmed sessions across ~15 named tracks
- workshop / sponsor / expo slots, and ~630 speakers. Day 1 is
workshop-heavy (58 sessions); Days 2–4 run ~166 sessions each. The Fair has dedicated tracks for three of the corpus's active research threads:
- Harness Engineering (13 sessions on Days 2–4) — the closest thing
to a first-class venue for "harness = everything except the model"
as an operational discipline. Overlaps directly with
Aune's
natea/harness-evalmethodology and the CPRR frame. - Sandbox & Platform Engineering (Day 3, 11+ sessions) — isolation primitives, agent runtimes, blast-radius containment. Overlaps with the sandbox + containment corpus and Kumar's secret-brokering thread.
- Memory & Continual Learning (Day 3) + Evals (Days 2–3) — memory as first-class harness surface; eval as the attribution instrument. Overlaps with the memory-systems + telemetry corpus and Poliakov's DBOS durability angle.
The Online Track (public YouTube) is 82 of the 556 sessions. Harness engineering dominates the online cut; memory + observability are present but thinner; secret custody, permission guardrails, and team topologies are essentially absent from the online set.
Sessions relevant to current research
Three clusters, each anchored to a research thread in the wal.sh
corpus. Sessions were filtered by four parallel agents against the
sources/session-tables.org dataset (556 sessions). Overlap between
clusters is noted inline (e.g. "Total Recall: Agent Memory and Harness
Engineering" hits both memory and harness).
Each cluster is a curated list (10–30 sessions per cluster from the
556-session dataset), not the full track. The full per-day table lives
at sources/session-tables.org if a broader scan is wanted.
Harness engineering + coding-agent eval + attribution
Anchor thread: CPRR methodology · elenctic vibe code review · Building AI Agent Tool Systems · CLI Coding Agents Q2 2026 · 2026 Q2 Claude Code Features · 2026 Q2 Skills
Adjacent event: AI Tinkerers Boston 2026-06-29 — Aune on
natea/harness-eval (3-trial ablation, blind judge, framework
attribution) is the closest 2026-Q2 methodological analog.
Fair track shape: dedicated Harness Engineering track (13 sessions on Days 2–4). Gap vs Aune's methodology: light on explicit blind-judge ablation / three-trial attribution formalization; Varun Krovvidi's "6 Pillars" and Rustem Feyzkhanov's trace-to-simulation progression are the closest structural equivalents.
| Day / Time · Track · Title · Speakers · Why relevant |
|---|
| Day 1 · 9:00–11:00 · Workshops · Total Recall: Agent Memory and Harness Engineering · Ignacio Martinez · memory-as-harness (also cluster 2) |
| Day 1 · 12:10–1:10 · Workshops · Evals in AI: A Deep Dive · Tejas Kumar · rubric / benchmark design as harness instrumentation |
| Day 1 · 12:10–1:10 · Workshops · From Zero to Leaderboard: Building an End-to-End AI Agent Evaluation Pipeline · Wolfram Ravenwolf · end-to-end eval pipeline |
| Day 2 · 11:10–11:30 · Claws & Personal Agents · Your Agent Didn't Fail. Your Harness Did. · Vinoth Govindarajan · direct hit on harness-vs-model attribution thesis |
| Day 2 · 11:10–11:30 · AI Architects · Your Agent Evolved. Your Evals Didn't. · Ameya Bhatawdekar · eval drift as harness failure signal |
| Day 2 · 1:55–2:15 · AI-Native Enterprises · AI Evals Platform for Cross-Functional Teams at Scale · Nachiket Paranjape, Swaroop Chitlur Haridas · platform-level eval infra |
| Day 2 · 2:50–3:10 · — · 6 Pillars of an Agentic Harness That Fixes Production Incidents · Varun Krovvidi · direct harness-methodology decomposition |
| Day 2 · 3:20–3:40 · Claws & Personal Agents · Every Harness Will Become A Claw · Sam Bhagwat · harness architecture evolution |
| Day 2 · 4:30–4:50 · Software Factories · Harness Engineering is not Enough: Why Software Factories Fail · Dex Horthy · meta-critique of harness sufficiency |
| Day 2 · 4:50–5:10 · Harness Engineering · In Code They Act, In Proof We Trust · Erik Meijer · formal-semantics harness (links to CPRR) |
| Day 3 · 10:45–11:05 · Evals · Vending-Bench: Long-Horizon Agent Evals for a Simulated Vending Business · Lukas Petersson · long-horizon eval design |
| Day 3 · 11:10–11:30 · Evals · From Signal to PR: Anatomy of a Self-Improving Agent · Jason Lopatecki · harness-to-PR attribution loop |
| Day 3 · 11:40–12:00 · Evals · Building Closed-Loop Evals for a Multimodal Agent at Uber Scale · Soumya Gupta, Jai Chopra · production harness at scale |
| Day 3 · 12:05–12:25 · Evals · From Agent Traces to Agent Simulations · Rustem Feyzkhanov · trace-to-simulation methodology (elenctic-adjacent) |
| Day 3 · 3:20–3:40 · Evals · Don't Ship Skills Without Evals · Philipp Schmid · skills-layer eval (links to Q2 skills) |
| Day 4 · 9:40–10:00 · Harness Engineering · The Unreasonable Effectiveness of Separating the Task from the Model · Maxime Rivest, Isaac Miller · task↔model↔harness seam |
| Day 4 · 10:00–10:20 · Harness Engineering · How Anthropic Builds: Lessons from Labs · Mike Krieger · industrial harness methodology from the frontier lab |
| Day 4 · 10:45–11:05 · Harness Engineering · Tokens Should Have Jobs · Katelyn Lesse, Angela Jiang · token-level harness instrumentation |
| Day 4 · 12:05–12:25 · Agentic Engineering · Benchmarking Coding Agents on New vs Legacy Code bases · Denys Linkov · coding-agent benchmark design |
| Day 4 · 12:05–12:25 · Harness Engineering · Harness Engineering: Building the Production Cage for Powerful Domain Agents · Mike Chambers · production harness patterns |
| Day 4 · 1:30–1:50 · Harness Engineering · Loophole — Adversarial Agents To Stress Test Your Morality · Brendan Rappazzo · adversarial harness / robustness |
| Day 4 · 1:55–2:15 · Harness Engineering · Agent Frameworks Considered Harmful · Rémi Louf · meta-critique of framework abstractions |
| Day 4 · 2:25–2:45 · Harness Engineering · We let an AI agent execute Bash and lived to talk about it · Sarah Sanders · sandbox harness for shell (cluster 3 also) |
| Day 4 · 2:50–3:10 · Harness Engineering · No Memory, No Harness: Why the Database Is the Last Line of Defense · Kay Malcolm · DB-backed durable harness (cluster 2 also) |
| Day 4 · 3:45–4:05 · Harness Engineering · Agents Without Code: How Skills, YAML, and Filesystems Replaced Python · Philipp Schmid · harness-as-filesystem, declarative tool spec |
Agent memory + durable execution + observability + telemetry
Anchor thread: Agent Memory Systems · Agent Memory as Institutional Knowledge · The Agent Context Thread · Agent Telemetry Systems · Team Topologies for the Agentic Platform
Adjacent event: AI Tinkerers Boston 2026-06-29 — Poliakov on "Durable and Observable AI Agents" (DBOS, Postgres, replayable workflows) is the closest 2026-Q2 substrate analog.
Fair track shape: dedicated Memory & Continual Learning track (Day 3), with scattered observability sessions across Days 2–4. Kay Malcolm's DB-as-harness talk and Viren Baraiya's durable-runtime sessions hit the DBOS-adjacent substrate directly.
| Day / Time · Track · Title · Speakers · Why relevant |
|---|
| Day 1 · 9:00–11:00 · Workshops · Total Recall: Agent Memory and Harness Engineering · Ignacio Martinez · memory decomposition + harness engineering substrate |
| Day 1 · 9:00–11:00 · Session · Advanced workshop: Mastering AI Observability · Doug Guthrie · telemetry / observability fundamentals |
| Day 1 · 2:20–4:20 · Sponsor · Context Engineering in 2026: Compaction, Memory & Cost · Louis-François Bouchard, Samridhi Vaid, Omar Solano · memory layers + token-spend accounting |
| Day 2 · 2:25–2:45 · Session · Beyond Golden Signals: Monitoring in the Age of GenAI · Marina Petzel · observability signals for agents |
| Day 2 · 3:20–3:40 · Session · From Context to Memory: Your Agents Need a Real Memory Layer · Anders Swanson · memory-layer readonly/writable/searchable semantics |
| Day 2 · 3:45–4:05 · Session · Unlock Agent Autonomy: The Runtime for AI-Native Systems · Tushar Jain · runtime substrate |
| Day 3 · 11:00–12:00 · CTO Circle · Tokenomics: From AI Spend to AI Value · Martin Harrysson, Matt Linderman, Prakhar Dixit · token-spend accounting |
| Day 3 · 11:10–11:30 · Session · Harnessing Agents: The Durable Runtime for Dynamic Workflows · Viren Baraiya · durable execution (DBOS adjacency) |
| Day 3 · 11:40–12:00 · Session · Memory Harnesses for Long-Running Research Agents · Stefania Druga · durable memory for long-horizon tasks |
| Day 3 · 3:20–3:40 · Session · Lessons from Studying Every Memory System · Shlok Khemani · memory-systems taxonomy |
| Day 3 · 3:45–4:05 · Session · LLM Knowledge Bases: a practical guide · Ben Holmes · persistent knowledge layer (survives model upgrade) |
| Day 4 · 11:10–11:30 · Sponsor · Tracing and debugging agents across systems with OpenTelemetry · Chang Liu · distributed tracing / OTel substrate |
| Day 4 · 2:50–3:10 · Session · No Memory, No Harness: Why the Database Is the Last Line of Defense · Kay Malcolm · DB-backed durability (DBOS-aligned) |
Sandbox + secret custody + permission guardrails + MCP + deployment
Anchor thread: Agent Sandbox Architectures · Practical Sandbox Configs · How We Contain Claude: Mapping Against the Stack · Agent Permission Guardrails · Agent Deployment Systems · The Four-Boundary Spec Mapping · Code Search and Code Graph MCP Servers
Adjacent event: AI Tinkerers Boston 2026-06-29 — Kumar on "Protecting Secrets from Long-running Agents" (OPA + AWS Secrets Manager, time-boxed approval, secret-broker) is the closest 2026-Q2 governance analog.
Fair track shape: dedicated Sandbox & Platform Engineering track (Day 3, 11+ sessions) + Context Engineering / MCP sessions (Day 3) + scattered auth + deployment talks. Gap vs Kumar's: fewer explicit OPA-in-front-of-Secrets-Manager sessions; the secret-broker pattern is implied (e.g. Gopal's "leak stopping") but not formalized.
| Day / Time · Track · Title · Speakers · Why relevant |
|---|
| Day 1 · 11:05–12:05 · Workshops · How I learned to stop worrying and love the sandbox · Matt Brockman · foundational sandbox mechanics |
| Day 1 · 4:30–5:30 · Workshops · Agent Auth · Bereket Habtemeskel, Paola Estefania · permission / auth patterns for agents |
| Day 2 · 11:10–11:30 · AI-Native Enterprises · Building the engine while flying the plane — launching the Figma MCP server · Jesse Lumarie · MCP production deployment |
| Day 2 · 1:55–2:15 · — · Who Approved That MCP Server? Governing the Tool Layer · Jim Clark · MCP governance + approval boundaries (Kumar-adjacent) |
| Day 2 · 1:55–2:15 · Security · Agentic Security: Permissions, Provenance, and the Agent Supply Chain · Steve Yegge · three-layer identity / session-scope / per-op model |
| Day 2 · 2:50–3:10 · Claws & Personal Agents · Your company brain will leak secrets. How we stopped it for big banks · Tanmai Gopal · secret custody at scale |
| Day 2 · 3:45–4:05 · Expo · How We Built the Airbyte Agent MCP Server and CLI · Pedro Lopez · MCP deployment + tool-system architecture |
| Day 3 · 10:45–11:05 · Sandbox & Platform Engineering · Don't build agents, build environments · Adam Azzam · environment-as-boundary philosophy |
| Day 3 · 11:40–12:00 · Sandbox & Platform Engineering · Kubernetes Is Not Your Sandbox · Ivan Burazin · container vs. true sandbox boundary |
| Day 3 · 12:05–12:25 · Sandbox & Platform Engineering · Your agent needs a sandbox, not a desert · Samuel Colvin · containment vs. usability tradeoffs |
| Day 3 · 1:30–2:15 · Sandbox & Platform Engineering · From fork() to Fleet: Designing an Agent Sandbox Cloud (Pt 1 + 2) · Abhishek Bhardwaj · sandbox infra + fleet |
| Day 3 · 1:55–2:15 · Expo Stage 2 · Edge-Native AI: Building Ultra-Fast Agents and MCP Servers with Spin · Thorsten Hans · MCP on edge |
| Day 3 · 2:25–2:45 · Sandbox & Platform Engineering · 1,000 Agent Tasks in a Sandbox: What Breaks When LLMs Write and Run Code · Kevin Orellana · runtime isolation under stress |
| Day 3 · 2:25–2:45 · Context Engineering · MCP Apps — Extending the frontier · Liad Yosef, Ido Salomon · MCP architecture / extensibility |
| Day 3 · 2:50–3:10 · Context Engineering · MCP Apps: Give the Model Data, Give the User a UI · Dustin Mihalik · MCP as context broker |
| Day 3 · 3:20–3:40 · Context Engineering · MCP Tasks (async) — Why the heck aren't any agents supporting MCP tasks/async? · Cornelia Davis · async task / permission boundaries |
| Day 3 · 3:20–3:40 · Sandbox & Platform Engineering · Sandboxes Aren't Optional: Runtime Isolation Patterns for Coding Agents at Scale · Robert Brennan · isolation taxonomy |
| Day 4 · 11:10–11:30 · Agentic Engineering · MCPs, CLIs, and Skills: Choosing the Right Tooling Layer for Agentic Development · Nikita Kothari · tool-layer decision framework |
| Day 4 · 11:40–12:00 · Agentic Engineering · Auth for Agents: Unblock Autonomous AI with auth.md · Michael Grinich · standardized auth for agent tool access |
| Day 4 · 1:30–1:50 · — · YOLO Mode, Safely: microVM Sandboxes for Any Agent · Rowan Christmas · microVM isolation + blast radius |
| Day 4 · 1:55–2:15 · Track M · Blast Radius Zero: One-Command OpenClaw Sandboxes in the Cloud · Arun Sekhar · managed sandbox-as-a-service |
Online Track (YouTube, publicly available)
82 videos out of the 556 sessions made the public "Online Track" playlist. This subset skews heavily toward harness engineering, eval, and observability — the sandbox / MCP / permission cluster is essentially absent from the online cut.
Curated below: the 17 videos with the closest bridge to corpus
research. Full playlist metadata in sources/youtube-playlist.json.
| # | Duration | Title | Speaker / Org | Theme | Corpus link | Watch |
|---|---|---|---|---|---|---|
| 1 | 24m | Recursive Coding Agents | Raymond Weitekamp, OpenProse | harness | CPRR | ▶ |
| 2 | 16m | Your Coding Agent Is Creating Review Debt | Sachin Gupta | harness / eval | CPRR | ▶ |
| 3 | 9m | Production Evals For Agentic AI Systems | Nishant Gupta, Meta | eval | CPRR | ▶ |
| 4 | 13m | SWE-Marathon: Evaluating Coding Agents at Billion-Token Scale | Rishi Desai, Abundant AI | eval | CPRR | ▶ |
| 5 | 27m | What if the harness mattered more than the model? | Aditya Bhargava, Etsy | harness | CLI Agents | ▶ |
| 6 | 22m | Beyond the Harness: A Journey Towards Adaptive Engineering | Rajiv Chandegra, Annicha Labs | harness | CLI Agents | ▶ |
| 7 | 27m | When Agents Meet Physical Data: The Other Physics of Agent Harnesses | Dmitry Petrov, DataChain | harness | CLI Agents | ▶ |
| 8 | 35m | Stop AI Agent Hallucinations: 5 Techniques + Production Patterns | Elizabeth Fuentes, AWS | harness | CPRR | ▶ |
| 9 | 20m | A Genius With Amnesia | Victor Savkin, Nx | memory | Memory | ▶ |
| 10 | 40m | Turn 10,994 Notes Into Memory | Paul Iusztin, Louis-François Bouchard | memory | Memory | ▶ |
| 11 | 23m | Continual Learning for AI Agents: From Failures to Durable Improvements | Soheil Feizi, RELAI | memory / durability | Memory | ▶ |
| 12 | 29m | Your Agent Failed in Prod. Good Luck Reproducing It. | Tisha Chawla, Susheem Koul, Microsoft | observability | Telemetry | ▶ |
| 13 | 26m | Why Your Agent Disagrees With Itself | Diane Lin, Datadog | observability | Telemetry | ▶ |
| 14 | 20m | The Log Is The Agent | Ishaan Sehgal, Omnara | observability | Inst. Knowledge | ▶ |
| 15 | 7m | Deterministic Infra for Non-Deterministic AI Agents | Nishant Gupta, Meta | determinism | CLI Agents | ▶ |
| 16 | 29m | Every Solo Agent Builder Eventually Reinvents a Worse Version of CI/CD | Sumaiya Shrabony | deployment / CI | Deploy | ▶ |
| 17 | 29m | MCP Apps: Primitives, discovery, and the Future of Software | Pietro Zullo, Manufact | sandbox / MCP | MCP | ▶ |
Adjacencies to current work
- Poliakov (AI Tinkerers Boston, 2026-06-29) ↔ WF — Kay Malcolm (Day 4) "No Memory, No Harness: Why the Database Is the Last Line of Defense" is the closest DBOS-adjacent substrate talk. Viren Baraiya (Day 3) "Harnessing Agents: The Durable Runtime" complements it from the runtime side.
- Kumar (AI Tinkerers Boston, 2026-06-29) ↔ WF — Steve Yegge (Day 2) "Agentic Security: Permissions, Provenance, and the Agent Supply Chain" and Jim Clark (Day 2) "Who Approved That MCP Server?" are the governance analogs. Kumar's specific OPA-in-front-of-Secrets-Manager pattern is not directly matched.
- Aune (AI Tinkerers Boston, 2026-06-29) ↔ WF — Varun Krovvidi's "6 Pillars of an Agentic Harness" (Day 2), Rustem Feyzkhanov's "From Agent Traces to Agent Simulations" (Day 3), and Mike Krieger's "How Anthropic Builds" (Day 4) span the harness-methodology axis Aune measured empirically.
- Zhu (AI Tinkerers Boston, 2026-06-29) ↔ WF — the Fair is thin on Zhu's specific "deterministic code + structured memory" GTM-agent pattern; Nishant Gupta's "Deterministic Infra for Non-Deterministic AI Agents" (Online Track, 7m) is the closest at the substrate level.
- Team Topologies for the Agentic Platform ↔ WF — the Fair's track structure (Harness / Sandbox / Memory / Evals as first-class venues) is the four-hats model made external: the corpus argument that "boundaries survive teams" gets a concrete reflection in the Fair organizing its tracks around the same boundaries.
Open questions / TODO
[ ]Watch the ▶-linked Online Track picks in the table above; capture per-video notes asincludes/<slug>.orgfor the ones worth verbatim citation.[ ]Addsources/schedule.pdfingest if the PDF gains info not insessions.json(e.g. room-map, ticket-tier gating).[ ]Cross-check the "Fair has150 Anthropic-employees-in-title sessions" claim from Twitter buzz against ~sources/speakers.json(grep for Anthropic in theaffiliationfield, count).[ ]Ifnatea/harness-evalmethodology gets a WF write-up (Aune wasn't a WF speaker per the source data), link from the Boston Aune notes.[ ]Wire this note intosite/events/index.orgUpcoming/Past list.
Provenance
Note assembled 2026-07-02 from a fixed dataset (sources/ files
captured at scheduleVersion=4927 on 2026-07-02). Relevance filtering
performed by four parallel scan agents against session-tables.org;
the curation is this note's editorial judgment, not the Fair's.
Not attended; every session listed is a scheduled talk, not a
witnessed one. Verdicts are attributed (sourced from the schedule
payload) until a talk is actually watched from the Online Track, at
which point the corresponding include should carry a
:VERDICT: reproduced drawer.