REPL-Driven Compliance: Multi-Language Gate Pipeline Development

Table of Contents

The flight-tracking note demonstrated REPL-driven development against external data sources. This note extends the methodology to a different problem: implementing the same protocol compliance spec in 29 languages, verified against a single production access log.

The system under test is an RSS reader with plumbing. Every implementation does the same thing: fetch a URL politely. The differentiation is not what data you get — it is how you get it compliantly.

Update (2026-05-25): the corpus reached 29 languages (see the experience report), adding the spec-stressors Haskell, Zig, Erlang and C alongside a libcurl-FFI tier (Chez, SBCL, Janet) for languages with no native HTTP/TLS. A human-directed --spider mode exercises R9 (structured-format link extraction) live and stays compliant — same gate pipeline, R13-tagged, refuses to scrape HTML.

1. The stack

elfeed (Elisp)          ─┐
tech-crawler (Clojure)  ─┤── same feeds, same Walsh-Research/1.2 UA
@walsh/research (TS)    ─┤── same robots.txt compliance (RFC 9309)
compliance-harness (29) ─┘── same gate pipeline, verified via access log

One spec (walsh-research-compliance/v1.3). One production access log. 29 implementations that must all produce the same observable behavior.

2. Gate pipeline

The pre-request gate pipeline is the core invariant. Order is normative and non-commutative:

R3 (blocklist) → R2 (robots.txt) → R4 (throttle) → fetch (R5 backoff)

First gate to DENY stops the request. No HTTP request is made for denied URLs. This ordering is the same across all 29 implementations.

3. REPL-first development

For each language with a genuine interactive evaluator, the implementation grew from a working 10-line REPL session:

Language REPL First expression
Python ipython urllib.request.urlopen("https://wal.sh/robots.txt").read()
Clojure clj -M -r (http/get "https://wal.sh/robots.txt")
Ruby irb Net::HTTP.get(URI("https://wal.sh/robots.txt"))
Guile guile (http-get (string->uri "https://wal.sh/robots.txt"))
Elisp ielm (url-retrieve-synchronously "https://wal.sh/robots.txt")
Elixir iex :httpc.request(:get, {~c"https://wal.sh/robots.txt", []}, [], [])
Racket racket -i (port->string (get-pure-port (string->url "https://wal.sh/robots.txt")))
Chez scheme (libcurl-get "https://wal.sh/robots.txt")
Perl perl -de0 HTTP::Tiny->new->get("https://wal.sh/robots.txt")
Tcl tclsh ::http::geturl "https://wal.sh/robots.txt"
Deno deno await fetch("https://wal.sh/robots.txt")

For compiled languages (Go, Rust, Java, Swift), the REPL is the compile-run cycle. The methodology still applies: write the smallest thing that fetches a URL, observe the result, then grow the gate pipeline around it.

4. Verification: the access log as oracle

Every implementation tags its compliance fixture requests with R13:

GET /bot/?impl=walsh/compliance-harness/python&sha=7a01aea&spec=walsh-research-compliance/v1.3

The production access log is the single source of truth. After gmake test-all, gmake logs shows which languages checked in:

c chez clojure d deno elisp elixir erlang go guile haskell janet
java julia kotlin lua nim perl php python racket ruby rust sbcl
scala swift tcl typescript zig

No mocks. No local assertions about what the server saw. The real server saw it or it did not.

5. CPRR cycle per language

Each implementation follows the CPRR cycle:

  1. Conjecture: open a REPL, fetch /robots.txt, parse it
  2. Proof: write the gate pipeline (<200 lines), commit
  3. Refinement: run against production, check access log
  4. Refutation: if it fails, git notes add documents why

29 progressive feat(lang) commits (one per language) plus scaffold/docs; 30 git notes as the failure ledger. Bugs found by *running*, not reading: Tcl SHA float coercion (~6062e326.062e+35) and return-inside-catch swallowing the blocklist; Erlang's fold dropping the final (Walsh-Research) group and illegal regex escapes; a Zig use-after-free where robots patterns pointed into a freed body; and three in the --spider path live (finditer shape, robots-at-target-scheme, and transport-error fail-open — clarification C5).

6. What the REPL teaches that tests do not

  • Tcl coerces 6062e32 to 6.062e+35 (scientific notation). No unit test catches this because the test input would need to be a real git SHA that happens to look like a float.
  • Chez Scheme has no native HTTP/TLS. The REPL session reveals this in 10 seconds. A CI pipeline reveals it in 10 minutes.
  • Guile's (web request) module requires GnuTLS, which has a different CA path on macOS vs FreeBSD. The REPL shows the TLS handshake error immediately.

The REPL compresses the feedback loop from "write 200 lines, run tests, read error" to "evaluate one expression, observe, adjust."

7. Stretch goals

Three implementations that stress the spec's assumptions about what "native HTTP" means:

Realized in the 29-language corpus:

  • libcurl-FFI tier — Chez, SBCL, Janet have no native HTTP/TLS, so they link libcurl through FFI (a 7-line C shim once, for the variadic ~curl_easy_setopt ABI). Not a subprocess: the honest answer to "is FFI-to-libcurl still native?"
  • No runtime safety net — Zig and C; manual memory exposed a use-after-free the GC'd languages hid. Zig's 0.16 Io-based std.http.Client was the most churn.
  • Different control model — Erlang/Elixir on OTP (gates kept deliberately sequential, not a supervision tree); Haskell's monadic IO making the DENY short-circuit structurally safe (no early-return footgun).

Still genuinely future: WASM/WASI-HTTP (fetch from the runtime), Make (the gate pipeline as a dependency graph), and a post-fetch X-Robots-Tag=/Content-Signal gate (R14) — see the harness's =spec-clarifications.org.

8. Related

9. Lisp-family REPL exercise: sbcl, ecl, clisp, fennel (2026-05-25)

Four Lisp-family implementations exercised in parallel tmux sessions (repl-sbcl, repl-ecl, repl-clisp, repl-fennel) against the v1.3 harness on FreeBSD 15.0 / nexus.lan.

9.1. SBCL — existing impl, live redefinition

The SBCL implementation required one FreeBSD port: .dylib.so (shared library extension) and -dynamiclib-shared -fPIC in the compiler invocation. Everything else compiled unchanged.

Setup at the REPL: sbcl --load /tmp/sbcl-repl-setup.lisp

=== SBCL FreeBSD REPL setup loaded ===
    libcurl FFI shim: /tmp/whc-sbcl-curlshim.so
    spec: walsh-research-compliance/v1.3

Positive case:

(multiple-value-bind (status body hdrs)
    (http-get "https://wal.sh/bot/")
  (format t "status=~a body-length=~a~%" status (length body)))
;; status=200 body-length=6721

Full gate pipeline — named group found with crawl-delay=2 and 3 rules:

group-named=T delay=2 rules=3

All 5 compliance targets passed:

  [R5] FETCH https://wal.sh/bot/ -> 200
PASS  allow FETCH    https://wal.sh/bot/
  [R4] wal.sh: wait 1.89s (interval 2.0s)
  [R5] FETCH https://wal.sh/research/bots/dogfood-allow -> 200
PASS  allow FETCH    https://wal.sh/research/bots/dogfood-allow
  [R4] wal.sh: wait 1.86s (interval 2.0s)
  [R5] FETCH https://wal.sh/research/bots/dogfood-walsh-only -> 200
PASS  allow FETCH    https://wal.sh/research/bots/dogfood-walsh-only
  [R2] DENY https://wal.sh/research/bots/dogfood-disallow (robots)
PASS  deny  R2-DENY  https://wal.sh/research/bots/dogfood-disallow
  [R3] DENY https://example.com/ (operator blocklist)
PASS  deny  R3-DENY  https://example.com/
T

9.1.1. Live redefinition: deny-all mode

;; Override host-blocked? to always return T
(defun host-blocked? (host blocked)
  (declare (ignore host blocked))
  (format *error-output* "  [R3-OVERRIDE] deny-all active~%")
  t)
;; WARNING: redefining COMMON-LISP-USER::HOST-BLOCKED? in DEFUN

SBCL emits a style warning but the redefinition takes effect immediately. No restart required. The gate now denies every URL including /bot/:

  [R3-OVERRIDE] deny-all active
  [R3] DENY https://wal.sh/bot/ (operator blocklist)
kind=R3-DENY status=NIL

Restoring the original is an identical defun. SBCL's debugger never fired during normal compliance testing — the condition system is quiet for non-exceptional operations. The style warning on redefinition is the only condition raised.

9.1.2. New function added at the REPL

(defun count-gates-passed (url blocked rules)
  "Count gates a URL clears before deny or fetch."
  ...)

Results:

URL gates-cleared log
https://wal.sh/bot/ 4 (R3-PASS R2-PASS R4-throttle R5-fetch)
https://example.com/ 0 (R3-BLOCKED)
https://wal.sh/research/bots/dogfood-disallow 1 (R3-PASS R2-BLOCKED)

9.2. ECL — no existing impl, exploration

ECL 24.5.10 has TCP sockets via the built-in SB-BSD-SOCKETS module:

(require 'sockets)  ; => ("SB-BSD-SOCKETS" "SOCKETS")

DNS resolution works:

(sb-bsd-sockets:host-ent-address
  (sb-bsd-sockets:get-host-by-name "wal.sh"))
;; => #(173 236 197 209)

Plain TCP connection and HTTP/1.0 GET succeed, but wal.sh rejects HTTP/1.0 with a 400. The target requires HTTPS. The sb-bsd-sockets module provides no TLS layer.

Path to compliance: ECL has ffi:load-foreign-library. Loading libcurl succeeded:

(ffi:load-foreign-library "/usr/local/lib/libcurl.so.4")
;; => #<codeblock "/usr/local/lib/libcurl.so.4" 0x833e10d80>

The same shim strategy used by SBCL applies — compile a 7-line C shim for the variadic curl_easy_setopt ABI, load it via ffi:load-foreign-library. All gate logic (R2/R3/R4/R5) is pure Lisp: no obstacles.

Condition system note: interrupting the blocking socket-connect call produced an INTERACTIVE-INTERRUPT with numbered restarts:

Condition of type: INTERACTIVE-INTERRUPT
Console interrupt.
1. (CONTINUE) CONTINUE
2. (RESTART-TOPLEVEL) Go back to Top-Level REPL.

ECL and SBCL share the CL restart protocol for async interrupts.

9.3. CLISP — no existing impl, exploration

GNU CLISP 2.49.95+ bundles a SOCKET package directly in its image — require fails but the package is already present. TCP via socket:socket-connect:

(defvar *cs* (socket:socket-connect 80 "wal.sh"))
;; *CS* — returns a bidirectional stream, not a socket object

The return value is a stream, not a socket struct. HTTP/1.0 GET returned 400 for the same reason (HTTPS required).

CLISP's FFI confirmed libcurl is loadable and curl_easy_init returns a valid handle:

(ffi:default-foreign-library "/usr/local/lib/libcurl.so.4")
(ffi:def-call-out curl-easy-init
  (:name "curl_easy_init")
  (:library "/usr/local/lib/libcurl.so.4")
  (:return-type ffi:c-pointer)
  (:arguments))
(curl-easy-init)
;; => #<FOREIGN-ADDRESS #x00003A7F906DD800>

CLISP gotcha: the format string "~10 lines" is interpreted as a field-width directive (10), not literal text. Use explicit "~%" before numbers or escape the tilde. This is a difference from SBCL where ~(format t "~10 foo") would also be an error but the message was different.

9.4. Fennel/LuaJIT — no existing impl, exploration

Fennel 1.6.0 on PUC Lua 5.4 has neither socket (luasocket not installed) nor ffi (that is a LuaJIT extension). However, Fennel can target LuaJIT:

fennel --lua luajit
Welcome to Fennel 1.6.0 on LuaJIT 2.1.1765228720 BSD/x64!

On LuaJIT, (require :ffi) works. libcurl loaded and curl_easy_init returned a valid pointer:

(local ffi (require :ffi))
(local lc (ffi.load "/usr/local/lib/libcurl.so.4"))
(local h (lc.curl_easy_init))
(print (tostring h))
;; cdata<void *>: 0x0a9c41095000

Full HTTPS GET of /bot/ at the REPL, using the same pre-compiled shim:

(do
  (local ffi (require :ffi))
  (local lc  (ffi.load "/usr/local/lib/libcurl.so.4"))
  (local sh  (ffi.load "/tmp/whc-sbcl-curlshim.so"))
  (local lbc (ffi.load "/lib/libc.so.7"))
  (lc.curl_global_init 3)
  (local h (lc.curl_easy_init))
  (local fp (lbc.fopen "/tmp/body.txt" "wb"))
  (sh.shim_setopt_str  h 10002 "https://wal.sh/bot/?impl=walsh/compliance-harness/fennel")
  (sh.shim_setopt_str  h 10018 "Mozilla/5.0 (compatible; Walsh-Research/1.2; +https://wal.sh/bot/)")
  (sh.shim_setopt_long h 52 1) (sh.shim_setopt_long h 13 30)
  (sh.shim_setopt_ptr  h 10001 fp)
  (local rc (lc.curl_easy_perform h))
  (lbc.fclose fp)
  (local cb (ffi.new "long[1]"))
  (sh.shim_getinfo_long h 2097154 cb)
  (local status (tonumber (. cb 0)))
  (lc.curl_easy_cleanup h)
  (print (string.format "[fennel/repl] rc=%d status=%d" rc status)))
;; [fennel/repl] rc=0 status=200

R3 gate sketched in Fennel:

(fn host-blocked? [host blocked]
  (var found false)
  (each [_ d (ipairs blocked)]
    (when (or (= host d)
              (= (string.sub host (- (+ 1 (length d)))) (.. "." d)))
      (set found true)))
  found)

(print (host-blocked? "example.com" ["example.com" "badactor.net"]))
;; true
(print (host-blocked? "wal.sh" ["example.com" "badactor.net"]))
;; false

9.5. Summary: libcurl shim as a compliance primitive

All four Lisp environments can load /usr/local/lib/libcurl.so.4 via their respective FFI mechanisms. The variadic ABI shim (7 lines of C, compiled once at startup to ~/tmp/whc-*-curlshim.so) is the only platform-specific artifact. Gate logic is portable across CL dialects and Fennel with minimal adaptation.

Impl Runtime FFI libcurl HTTPS fetch Gate logic
sbcl SBCL 2.5.7 sb-alien WORKS WORKS (200) all gates
ecl ECL 24.5.10 ffi:load-foreign-library LOADS shim needed all gates
clisp CLISP 2.49.95+ ffi:def-call-out WORKS (handle ok) shim needed all gates
fennel LuaJIT 2.1 ffi (LuaJIT) WORKS WORKS (200) R3 sketched

The SBCL condition/restart system was silent during normal compliance runs. The only conditions raised were style warnings on defun redefinition — non-fatal, non-interactive, proceed automatically. ECL's INTERACTIVE-INTERRUPT on Ctrl-C is the same protocol with numbered restarts: the CL restart vocabulary is consistent across implementations.