REPL-Driven Flight Tracking: Agentic Development with Contracts and Guardrails

Table of Contents

Abstract

This research explores REPL-driven development as the primary interface for agentic software systems, using multi-source flight tracking as the application substrate. The system correlates FlightAware's AeroAPI, local ADS-B receivers, and web scrapers — surfacing divergences between data paths as the primary output, not flight status.

The deliverable is two-layered: the first layer is a working flight tracking system; the second, transferable layer is a measured account of how agents safely operate against metered APIs with cost constraints, contract verification across independent sources, and schema evolution under failure injection.

The best interface between an agent and a complex system is not a CLI that returns exit codes. It is a REPL that lets the agent think out loud, inspect intermediate state, and build understanding incrementally — bounded by fixtures so it can't accidentally spend money or break things.

System Architecture

C4 context diagram showing flight tracking system

The system integrates three independent data sources, each with different cost profiles, latency characteristics, and failure modes:

Source Protocol Cost Latency Reliability
AeroAPI v4 REST/JSON $0.005/call 200-500ms High (SLA)
ADS-B Receivers SBS/TCP Free Real-time Depends on line-of-sight
FA Web Scraper HTTP/HTML Free 1-3s Low (brittle, cookie-dependent)

The architectural insight is that divergence between sources is the signal. Three sources agreeing is uninteresting. Three sources disagreeing tells you where the contract boundary is.

Component architecture diagram

The Fixtures-First Safety Property

All development defaults to AEROAPI_MODE=fixtures. In this mode:

  • Every API call reads from recorded fixture files
  • Zero network calls are made
  • Zero API credit is consumed
  • Results are deterministic

Switching to live requires explicit human opt-in: AEROAPI_MODE=live. This is not a convenience — it is a safety property that bounds the blast radius of autonomous agent exploration.

Without fixtures-first:  agent loops 200× → spends $1.00 → learns nothing new
With fixtures-first:     agent loops 200× → costs $0.00 → learns from recorded data

The fixture is the sandbox. Live is the deployment.

Contract Verification

The contract verifier runs multiple adapters against the same flight and diffs their canonical output:

(def contract-keys [:ident :registration :status :gate_destination :estimated_in])

;; Callsign normalization: BA48 ≡ BAW48 ≡ BAW48Q
(defn normalize-ident [s]
  (cond
    (re-matches #"[A-Z]{2}\d+.*" s)
    (str ({"BA" "BAW" "AA" "AAL" "DL" "DAL"} (subs s 0 2)) (subs s 2))
    :else s))

A key finding: the AeroAPI and the FlightAware website disagree on field availability at different lifecycle stages. The gate_origin field is visible on the website before it appears in the API — a real contract divergence that naive consumers would miss.

Case Study: BAW48Q (Speedbird 48 Quebec Heavy)

British Airways 48, Boeing 777-300ER, KSEA → EGLL. Daily transatlantic service.

BAW48 route from Seattle to London via great circle over Greenland

Route Analysis

Filed route: KSEA ALPSE YDC → oceanic fixes → JELCO GRIBS KETLA → Greenland crossing → NUGRA2H STAR → EGLL

Metric Value
Direct distance 4,791 nm
Filed distance 4,948 nm
Dogleg 157 nm
Peak latitude 63°N (Greenland)
Filed speed 459 kt
Block time ~9h 15m
Aircraft B77W (777-300ER)
STAR NUGRA2H into Heathrow

ADS-B Coverage Transition

The route passes through Davis Strait (~60-63°N) where terrestrial ADS-B coverage transitions to space-based surveillance (Aireon). The update_type field in AeroAPI track positions encodes this:

Code Source Coverage
A ADS-B terrestrial Ground station range
S Space-based (Aireon) Global/polar
Z ATC radar Controlled airspace
P Projected Everywhere (estimated)

A naive tracker reports "gap" when surveillance source changes. A correct tracker reports "handoff."

Finding: No A→S Transition Observed

Analysis of 658 track positions from the 2026-05-17 departure showed all positions as type A (ADS-B) or Z (radar) — no S (space-based). This suggests either continuous terrestrial ADS-B coverage on this routing or that AeroAPI's Personal tier does not surface the S distinction. An open question for further investigation.

Babashka as Agent Runtime

All tooling is implemented in Babashka (bb), a fast-starting Clojure scripting runtime (~10ms startup vs ~5s for JVM Clojure).

bb tasks    # 14 available tasks
Task Purpose
bb status BAW48 Query flight, emit canonical JSON
bb track BAW48 --dry-run Plan polling window without cost
bb verify BAW48 Cross-source contract verification
bb adsb:local --table Live aircraft from ADS-B receivers
bb scan Discover ADS-B receivers on LAN
bb jepsen Redis Streams correctness tests
bb streams:status Stream lengths, consumer groups
bb test:unit 11 tests, 68 assertions

Every task is idempotent, mode-aware (respects AEROAPI_MODE), and REPL-accessible — the same namespace that powers the CLI is available for interactive exploration.

Redis Streams as Kafka-Lite

Redis Streams provide Kafka-like operational patterns on FreeBSD without the JVM dependency:

Kafka Concept Redis Equivalent Fidelity
Topic Stream key Full
Consumer Group XREADGROUP Full
Offset Message ID (ts-seq) Full
Commit XACK Full
Retention MAXLEN / MINID Full
Schema Registry Redis keys + JSON Schema Partial
Exactly-once XACK + idempotent consumer Partial
Dead Letter Queue Separate stream + XCLAIM Full

Outbox Pattern

The producer writes poll results to a local NDJSON file before attempting XADD to Redis. On Redis failure, data accumulates locally and drains on reconnect. This is the transactional outbox pattern — the NDJSON file is the outbox.

Schema Registry

JSON Schemas stored in Redis (schemas:{subject}:v{version}) with compatibility checking (backward, forward, full). Every record is validated before XADD.

Jepsen-Lite Correctness Tests

Five property-based tests validate the Redis Streams pipeline under failure injection:

  1. Linearizable XADD — all acknowledged writes appear in XRANGE
  2. Consumer group exactly-once — each message ack'd exactly once
  3. Outbox drain ordering — no data lost after Redis kill + restart
  4. Schema registry race — concurrent schema registration is safe
  5. DLQ completeness — all failed messages reach the dead letter queue

Observability

All components emit structured NDJSON logs and OTLP metrics:

Script → log-event → events.ndjson → otel-tail → OTLP → Prometheus → Grafana
    │                                                              │
    └── emit-counter ──────────────────────── OTLP ────────────────┘

The agent and the human see the same dashboard. Anomaly detection is not special — it's a Prometheus alert.

Design Decisions

Decision Rationale
Babashka over JVM Clojure 10ms startup — agents can't wait for JVMs
Fixtures-first as default Cost bounding is a safety property
Multiple adapters, one contract Divergence detection requires independence
NDJSON (not structured DB) Append-only, greppable, tail -F works
Redis Streams (not Kafka) FreeBSD host, no JVM, same operational surface
Emacs ft.el + bb CLI Same semantics, two ergonomics
OpenAPI spec as source of truth Prism mocks it, tests validate it
Monit supervision FreeBSD, GitOps config, explicit dependencies

Key Takeaways

  1. REPL-driven = the agent explores safely with sub-second feedback
  2. Fixtures-first = the agent can't overspend ($0.00 for exploration)
  3. Contract verification = the agent knows when sources disagree
  4. Observability = the human sees everything the agent did
  5. Schema evolution = payload changes are explicit and validated
  6. Jepsen-lite = correctness properties verified under failure

Resources

Author: Jason Walsh

j@wal.sh

Last Updated: 2026-05-17 22:54:24

build: 2026-05-17 23:28 | sha: fbe74bc