REPL-Driven Flight Tracking: Agentic Development with Contracts and Guardrails
Table of Contents
Abstract
This research explores REPL-driven development as the primary interface for agentic software systems, using multi-source flight tracking as the application substrate. The system correlates FlightAware's AeroAPI, local ADS-B receivers, and web scrapers — surfacing divergences between data paths as the primary output, not flight status.
The deliverable is two-layered: the first layer is a working flight tracking system; the second, transferable layer is a measured account of how agents safely operate against metered APIs with cost constraints, contract verification across independent sources, and schema evolution under failure injection.
The best interface between an agent and a complex system is not a CLI that returns exit codes. It is a REPL that lets the agent think out loud, inspect intermediate state, and build understanding incrementally — bounded by fixtures so it can't accidentally spend money or break things.
System Architecture
The system integrates three independent data sources, each with different cost profiles, latency characteristics, and failure modes:
| Source | Protocol | Cost | Latency | Reliability |
|---|---|---|---|---|
| AeroAPI v4 | REST/JSON | $0.005/call | 200-500ms | High (SLA) |
| ADS-B Receivers | SBS/TCP | Free | Real-time | Depends on line-of-sight |
| FA Web Scraper | HTTP/HTML | Free | 1-3s | Low (brittle, cookie-dependent) |
The architectural insight is that divergence between sources is the signal. Three sources agreeing is uninteresting. Three sources disagreeing tells you where the contract boundary is.
The Fixtures-First Safety Property
All development defaults to AEROAPI_MODE=fixtures. In this mode:
- Every API call reads from recorded fixture files
- Zero network calls are made
- Zero API credit is consumed
- Results are deterministic
Switching to live requires explicit human opt-in: AEROAPI_MODE=live.
This is not a convenience — it is a safety property that bounds the
blast radius of autonomous agent exploration.
Without fixtures-first: agent loops 200× → spends $1.00 → learns nothing new With fixtures-first: agent loops 200× → costs $0.00 → learns from recorded data
The fixture is the sandbox. Live is the deployment.
Contract Verification
The contract verifier runs multiple adapters against the same flight and diffs their canonical output:
(def contract-keys [:ident :registration :status :gate_destination :estimated_in]) ;; Callsign normalization: BA48 ≡ BAW48 ≡ BAW48Q (defn normalize-ident [s] (cond (re-matches #"[A-Z]{2}\d+.*" s) (str ({"BA" "BAW" "AA" "AAL" "DL" "DAL"} (subs s 0 2)) (subs s 2)) :else s))
A key finding: the AeroAPI and the FlightAware website disagree on
field availability at different lifecycle stages. The gate_origin field
is visible on the website before it appears in the API — a real
contract divergence that naive consumers would miss.
Case Study: BAW48Q (Speedbird 48 Quebec Heavy)
British Airways 48, Boeing 777-300ER, KSEA → EGLL. Daily transatlantic service.
Route Analysis
Filed route: KSEA ALPSE YDC → oceanic fixes → JELCO GRIBS KETLA → Greenland crossing → NUGRA2H STAR → EGLL
| Metric | Value |
|---|---|
| Direct distance | 4,791 nm |
| Filed distance | 4,948 nm |
| Dogleg | 157 nm |
| Peak latitude | 63°N (Greenland) |
| Filed speed | 459 kt |
| Block time | ~9h 15m |
| Aircraft | B77W (777-300ER) |
| STAR | NUGRA2H into Heathrow |
ADS-B Coverage Transition
The route passes through Davis Strait (~60-63°N) where terrestrial
ADS-B coverage transitions to space-based surveillance (Aireon). The
update_type field in AeroAPI track positions encodes this:
| Code | Source | Coverage |
|---|---|---|
| A | ADS-B terrestrial | Ground station range |
| S | Space-based (Aireon) | Global/polar |
| Z | ATC radar | Controlled airspace |
| P | Projected | Everywhere (estimated) |
A naive tracker reports "gap" when surveillance source changes. A correct tracker reports "handoff."
Finding: No A→S Transition Observed
Analysis of 658 track positions from the 2026-05-17 departure showed
all positions as type A (ADS-B) or Z (radar) — no S (space-based).
This suggests either continuous terrestrial ADS-B coverage on this
routing or that AeroAPI's Personal tier does not surface the S
distinction. An open question for further investigation.
Babashka as Agent Runtime
All tooling is implemented in Babashka (bb), a fast-starting Clojure scripting runtime (~10ms startup vs ~5s for JVM Clojure).
bb tasks # 14 available tasks
| Task | Purpose |
|---|---|
bb status BAW48 |
Query flight, emit canonical JSON |
bb track BAW48 --dry-run |
Plan polling window without cost |
bb verify BAW48 |
Cross-source contract verification |
bb adsb:local --table |
Live aircraft from ADS-B receivers |
bb scan |
Discover ADS-B receivers on LAN |
bb jepsen |
Redis Streams correctness tests |
bb streams:status |
Stream lengths, consumer groups |
bb test:unit |
11 tests, 68 assertions |
Every task is idempotent, mode-aware (respects AEROAPI_MODE), and
REPL-accessible — the same namespace that powers the CLI is available
for interactive exploration.
Redis Streams as Kafka-Lite
Redis Streams provide Kafka-like operational patterns on FreeBSD without the JVM dependency:
| Kafka Concept | Redis Equivalent | Fidelity |
|---|---|---|
| Topic | Stream key | Full |
| Consumer Group | XREADGROUP | Full |
| Offset | Message ID (ts-seq) | Full |
| Commit | XACK | Full |
| Retention | MAXLEN / MINID | Full |
| Schema Registry | Redis keys + JSON Schema | Partial |
| Exactly-once | XACK + idempotent consumer | Partial |
| Dead Letter Queue | Separate stream + XCLAIM | Full |
Outbox Pattern
The producer writes poll results to a local NDJSON file before
attempting XADD to Redis. On Redis failure, data accumulates locally
and drains on reconnect. This is the transactional outbox pattern — the
NDJSON file is the outbox.
Schema Registry
JSON Schemas stored in Redis (schemas:{subject}:v{version}) with
compatibility checking (backward, forward, full). Every record is
validated before XADD.
Jepsen-Lite Correctness Tests
Five property-based tests validate the Redis Streams pipeline under failure injection:
- Linearizable XADD — all acknowledged writes appear in XRANGE
- Consumer group exactly-once — each message ack'd exactly once
- Outbox drain ordering — no data lost after Redis kill + restart
- Schema registry race — concurrent schema registration is safe
- DLQ completeness — all failed messages reach the dead letter queue
Observability
All components emit structured NDJSON logs and OTLP metrics:
Script → log-event → events.ndjson → otel-tail → OTLP → Prometheus → Grafana
│ │
└── emit-counter ──────────────────────── OTLP ────────────────┘
The agent and the human see the same dashboard. Anomaly detection is not special — it's a Prometheus alert.
Design Decisions
| Decision | Rationale |
|---|---|
| Babashka over JVM Clojure | 10ms startup — agents can't wait for JVMs |
| Fixtures-first as default | Cost bounding is a safety property |
| Multiple adapters, one contract | Divergence detection requires independence |
| NDJSON (not structured DB) | Append-only, greppable, tail -F works |
| Redis Streams (not Kafka) | FreeBSD host, no JVM, same operational surface |
| Emacs ft.el + bb CLI | Same semantics, two ergonomics |
| OpenAPI spec as source of truth | Prism mocks it, tests validate it |
| Monit supervision | FreeBSD, GitOps config, explicit dependencies |
Key Takeaways
- REPL-driven = the agent explores safely with sub-second feedback
- Fixtures-first = the agent can't overspend ($0.00 for exploration)
- Contract verification = the agent knows when sources disagree
- Observability = the human sees everything the agent did
- Schema evolution = payload changes are explicit and validated
- Jepsen-lite = correctness properties verified under failure
Resources
- Source code (GitHub)
- Babashka — fast Clojure scripting
- FlightAware AeroAPI — flight data API
- ADS-B — automatic dependent surveillance
- Redis Streams — event streaming
- Jepsen — distributed systems testing
