Bombadil for SPAs: Property-Based Testing Best Practices

Table of Contents

1. Overview

Bombadil (Antithesis, successor to Quickstrom) tests a web UI by exploring it like a fuzzer and checking LTL properties — always, eventually, now ... implies ... within — against state extracted from the live DOM. This note tracks what actually works when the target is a single-page app, using wal.sh's own pocket-es search surface as the running example. It is a living document; entries are added as the eldest-spec suite evolves.

The recurring theme: a property-based suite is only as good as its extractors. Most "violations" you hit early are refutations of the spec, not the site.

2. Installation and environment

  • The CLI is a Rust binary, not the npm package. @antithesishq/bombadil on npm ships TypeScript types only — no executable. Install the binary separately (release asset bombadil-<arch>-<os>, cargo, or Nix) and keep it on PATH. CI that only runs npm install will fail with command-not-found.
  • Bombadil manages its own Chromium. bombadil test resolves chromium on PATH. If you only have Google Chrome, point a chromium shim at it. On a CI runner, install chromium-browser explicitly.
  • Drive an existing browser with test-external. When the managed-Chromium path is broken or you want to watch the run, launch Chrome with --remote-debugging-port=9992 and use bombadil test-external --remote-debugger http://localhost:9992 --create-target.

3. SPA-specific behavior

  • The URL stays constant; exploration happens in-page. For a client-side surface like pocket-es, the whole run can sit on /search/ — Bombadil types into inputs and clicks results without a navigation. Do not assume the trace will show many distinct URLs; assert on DOM state, not on route changes.
  • Use the origin as a boundary. The first argument to test is both the start URL and the boundary — Bombadil will not navigate off it. Point it at the exact surface you mean to exercise.
  • Prefer quiescence over fixed waits. Bombadil 0.5.0 replaced fixed timeouts with quiescence timers (#176), which settles SPA re-renders far better. Bound the run with --time-limit rather than a step count.
  • Watch for navigation stalls. A single slow or never-settling route can trip navigation timed out ... during Loading. Keep the boundary tight and the time limit modest while iterating.

4. Extractors must match the real DOM

Verify every selector against the live DOM in a real browser before trusting a refutation. Three false positives from the wal.sh suite, all spec bugs:

  • Over-broad selectors capture the wrong element. ul:first-of-type > li > a matched the page table-of-contents, not <nav>, so the nav check was polluted with article titles. Scope to nav a.
  • Encode the site's actual structure, not an assumed one. The conjecture assumed a "Home" nav item; wal.sh has Research/Events/Current/Search and a wordmark that is the home link. A property asserting something the site never had fails forever and teaches you nothing.
  • Logos are not always images. a:has(img[alt*"wal.sh"])= matched nothing — the home link is an <a class"wordmark">= text logo. The extractor silently returned null.

5. Headless vs. real-browser divergence

  • Cross-origin resources flake under headless. An external badge (static.fsf.org/...) reported naturalWidth == 0= headless but loaded fine (182×45) in a real browser. A "no broken images" property fired on a network artifact, not a defect. Restrict such checks to same-origin resources, or wait for load before asserting.
  • Always confirm a refutation in a headed browser before filing it as a site bug. The cheapest debugging step is opening the witness URL yourself.

6. Guarding extractors against their own crashes

An extractor that throws aborts the entire run, costing you every other property. Defend the DOM calls:

  • querySelector(href) where href == "#"= throws (SyntaxError: '#' is not a valid selector). Exclude bare fragments (a[href^"#"]:not([href="#"])=) and wrap the lookup in try/catch.
  • Treat any malformed-input path as "no result," never as an exception.

7. Determinism and reproduction

  • Bombadil 0.5.0 has no --seed flag. Determinism comes from --reproduce <TRACE_FILE>, which replays a recorded trace exactly (#177). Do not design a seed-keyed reproduction scheme around a flag that does not exist.
  • Key artifacts to the build. Write traces and screenshots under runs/<build-sha>/ so a refutation is tied to the commit that produced it. The trace runs/<sha>/trace.jsonl plus --reproduce is the witness.

8. Checklist

  1. Binary on PATH (bombadil --version); Chromium resolvable.
  2. tsc --noEmit clean (add DOM.Iterable to lib; skipLibCheck for upstream type bugs).
  3. Every selector verified against the live DOM in a real browser.
  4. Same-origin guards on resource-loading properties.
  5. Extractors wrapped against querySelector / parse exceptions.
  6. Run bounded by --time-limit, output under runs/<sha>/.
  7. Each refutation reproduced with --reproduce before it is believed.