Why This Is Hard
The unified-contract idea has failed before. What's different this time.

Table of Contents

The obvious critique

"Goldberry is just another linter."

This document is the defense against that critique. It names the reasons the unified-contract idea has failed before and explains what goldberry does differently.

The 12-layer stack

The web's contract surface is absurdly large. Here is the full stack, with the verification surface that exists at each layer:

layer invariant verifiable tools that try who pays
code type/shape/effect TypeScript, Flow, ReScript, CLJS the developer
linting style + simple safety ESLint, biome, oxlint the team
unit functional behavior Jest, Vitest, deftest the team
coverage path exercise nyc, c8, cloverage team / management
build composition + size + tree-shake Vite, esbuild, shadow-cljs, Closure the team
split per-route boundary Webpack chunks, shadow-cljs modules the team
fingerprinting cache integrity, version pin webpack contenthash, immutable hdrs ops
deployment atomicity, rollback, canary Spinnaker, Argo, Vercel, CF Pages ops + management
UA execution cross-browser feature support caniuse, browserslist, polyfills the team
render+time layout, paint, jank Lighthouse, Core Web Vitals ops + product
CSS computed-style invariants stylelint, computed-style asserts nobody really
interaction gesture, focus, a11y Playwright, axe-core, Testing Lib the team

Twelve layers. Each has tools.

None of the tools at any layer reference the invariants from any other layer. That is the actual disease.

Why so many layers

Each layer exists because the previous layer's contract enforcement was insufficient. The layers are load-bearing:

  • Linting exists because runtime errors are too late
  • Coverage exists because tests lie about completeness
  • Code splitting exists because bundles got too large
  • Fingerprinting exists because CDNs cached stale code
  • Browser detection exists because the platform is fragmented
  • Lighthouse exists because perf is invisible to developers
  • A11y tooling exists because lawsuits

Each layer is a refutation in goldberry's vocabulary. The web stack is a Lakatos research programme that never settled.

Why the web never converged

Unlike databases (which converged on a few patterns) or kernels (which converged on POSIX-ish), the web stayed in active refinement because the platform itself never stopped changing.

The web is the only software target that has to ship into a hostile, mutable, multi-vendor execution environment for an audience that includes everyone alive:

  • You don't control the runtime
  • The runtime updates without your consent
  • The runtime is shipped by your competitors
  • The runtime varies across platforms you don't control
  • Accessibility law applies
  • Performance budgets are device-tier-dependent
  • The network is adversarial
  • The user can install extensions that modify your DOM
  • The user's grandparent might be using it

No other software target has this combination. Games have a fixed runtime. Mobile apps have a curated runtime. Servers control their runtime. Embedded controls their runtime.

Twelve layers of contract enforcement is not too many. It's barely enough.

Three projects that tried "single place"

Closure (Google, 2009-present)

JS compiler + linter + type checker + dependency manager + module system + minifier in one binary. Used internally for Gmail, Maps, Docs. Billions of users run on Closure with whole-program optimization.

Google has a single source-of-truth contract because Closure forces it. The reason it didn't conquer outside Google: ergonomic cost. JSDoc annotations with custom types is uglier than TypeScript. Google paid that cost; nobody else would.

Bazel + rulesnodejs

Extends the contract upward to build, deploy, test, depend. Hermetic. Reproducible. Goldberry-shaped at every layer.

Adoption: Google, parts of Meta, parts of Stripe, niche elsewhere. The cost is identical to Closure: ergonomics. The win is real.

Phoenix + LiveView + Surface (Elixir, 2020+)

Closest extant single-place stack. The same Elixir code generates server logic, view, client interaction contract, deployment artifact. The type model spans all of them.

LiveView's invariant claim: there is no separate frontend. The contract is one contract, server-authored, push-based. Reaches a superset of what the 12-layer table lists except CSS-as-invariant.

CSS remains adversarial.

Why the unified-contract idea keeps failing

1. Conway's law for tooling

Linting, build, test, deploy each emerged from different specialist communities (style, perf, QA, ops). Each community owns a tool. Unifying tools means cross-community political work that no individual contributor can do.

2. Cost of expressing each layer's invariant in shared formalism

formalism lines for a React component
TypeScript ~10
Closure JSDoc ~30
Coq ~300

Each step up in formalism costs 3x. The unified-contract substrate has to choose its level: pick low (TypeScript) and you lose the lower-level guarantees; pick high (Coq) and you lose the developers.

3. Web is adversarial at the UA boundary

Server-side stacks can assume the runtime cooperates. Web stacks cannot. The browser is a platform you don't control, that updates without you, that gets installed in 19 different vendor variants, on hardware ranging from $50 Android phones to $4000 workstations.

Any unified contract has to terminate at the UA and accept that the UA may violate the contract.

4. CSS is genuinely hostile to formalization

  • Cascade is global
  • Specificity is non-monotonic
  • New rules can change rendering of parent components
  • Custom properties at runtime defeat static analysis
  • :has() defeats it harder

The invariant "this component looks the way the design specifies" has no proof obligation that doesn't bottom out in pixel diffing against a reference render.

This is why @layer (cascade layers, 2022) is in goldberry's foundation.org - the spec finally added a primitive to make cascade locally reasonable, sixteen years after the problem was named.

5. The "right" formalism for each layer disagrees

concern formalism needed
Code type theory
Build graph reachability
CSS constraint propagation
Layout geometry
Time temporal logic (LTL/CTL)
Interaction state machines
Network TLA+ (temporal-distributed)
Deployment transactional (atomicity)
A11y axiomatic (WCAG criteria)

You cannot pick one substrate that handles all of these. The closest thing is dependent types (Idris, Lean 4, Agda, Dafny) but the ergonomic cost is prohibitive for working frontend developers.

What actually works: shared schema across tools

SARIF (Static Analysis Results Interchange Format) lets ESLint + Stylelint + axe + TypeScript + Lighthouse emit findings into a common JSON shape that GitHub / Sonar / etc consume.

It's not formal verification but it's the closest thing to a unified contract surface that has shipped.

The goldberry-shaped extension: a SARIF-like schema where each finding cites the invariant (with :source: and :cluster: properties), not just a rule ID. Lint plugins, build assertions, runtime checkers all reference the same invariant catalog.

The catalog doesn't replace the tools. It gives them a shared vocabulary.

What goldberry does differently

The catalog is the single place

The invariants exist. Tools verify them separately. Goldberry names them, catalogs them, cross-references them across paradigms and eras.

Corpus-based, not formalism-based

The 1990s tried to prove correctness mathematically. Goldberry tracks the refinement history. Each defect entry is evidence that the invariant hypothesis was insufficient. The corpus documents how the industry got it right, in pieces, over decades.

Minimal notation

Two-sentence :invariantstatement: with prose rationale. No Z notation, no CSP, no Petri nets. Readable by any senior engineer without formal training.

Uses existing tools

Git, GitHub, issue trackers, standard repos. No new IDE. No UIMS. No CTTE. The tools already exist; goldberry provides the vocabulary they share.

Defects-as-evidence, not code generation

Each DEFECT-NNNN entry links to real code (GitHub issues, PRs, commits). The catalog is built FROM code, not separately from code. The specification gap that killed 1990s work doesn't form.

Phases with explicit gates

Draft -> Eval -> Staging -> Promoted. No completeness tax. Ship early and often. Phase 1 -> Phase 2 is a real deliverable.

Era tags

Same invariant, different technologies. A focus-trap bug in a 1995 dialog has the same invariant violation as a 2024 React modal. The :era: tag captures this.

The Devcards corollary

Bruce Hauman's Devcards (~2015) proves corpus-as-development- environment works. The key insight:

A component is incompletely specified by its source code. Its full specification is the cartesian product of (component x props x state x context). Render that product as cards, and the contract becomes inspectable, sharable, version-controlled.

Cards ARE the test cases ARE the documentation ARE the demo ARE the regression suite. Same artifact, four uses.

The ClojureScript/Devcards stack achieves this but requires the discipline to be designed in from day one. You can't retrofit Devcards onto a React codebase.

This is the same asymmetry as FoundationDB vs Antithesis. FDB built deterministic from day one. Antithesis added it via hypervisor for arbitrary dockerized software.

Goldberry's move: extract the invariants the CLJS stack enforces by construction, catalog them, let other paradigms verify against the catalog post-hoc.

The reason for the hate

People hate the web because the contract surface is so large that no individual engineer can hold it in their head, and the tooling pretends each layer is independent when in practice they pathologically interact.

The fix is not "fewer tools" - every tool is doing real work.

The fix is naming the invariants such that the interactions become visible. That's goldberry's bet.

What's different this time

1990s failure mode goldberry approach
notation barrier minimal YAML + prose
code generation: all or nothing defects-as-evidence
completeness tax phases with explicit gates
hiring barrier readable by any senior engineer
tooling friction uses git/GitHub/standard repos
specs diverged from code corpus built FROM code
technology moved too fast era tags track invariants across eras

The bet: by building a Lakatos-refined corpus of defects with falsifiable invariants, cross-referenced across implementations and eras, goldberry captures what 1990s formal methods got right (the abstractions) without the adoption barriers that killed them.

What goldberry is NOT

  • Not the perfect frontend testing framework
  • Not a best-practices guide
  • Not a React tutorial
  • Not a religion
  • Not yet another linter

A catalog of refuted conjectures hardened into invariants, with provenance, paradigm-cross-referenced, ready to feed a DST harness.

The thing it competes with is institutional memory in senior engineers' heads. The thing it cannot replace is judgment about which entries matter most for which application.

The stretch goal (P7+)

Corpus-as-rendered-UI: a future state where the goldberry corpus itself ships as live, navigable cards. Each DEFECT-NNNN renders as a card showing the broken version, the fixed version, and the assertion that distinguishes them.

Hauman's idea is the right ancestor. That's where this goes.

Author: goldberry research

jwalsh@nexus

Last Updated: 2026-05-17 23:10:42

build: 2026-05-20 03:36 | sha: 12ce5fe