Related Work: UI Invariant Specification Landscape

Table of Contents

Purpose

Survey of existing approaches to UI invariant specification, behavioral contracts, and formal methods for interactive systems. Organized by: what they cover, what they miss, and what goldberry learns from each.

Standards and Specifications

WCAG 2.1 / 2.2

The Web Content Accessibility Guidelines are the closest thing to codified UI invariants that achieved industry adoption.

Coverage by goldberry cluster

cluster WCAG coverage quality notes
a11ycontract Perceivable + Robust strong WCAG's primary domain
focusstate Operable 2.1, 2.4 good 2.2 added focus visibility criteria
devicecontract Operable 2.5 partial gesture simplification only
temporality Operable 2.2 weak time limits only; excludes races
totality Understandable 3.3 partial error prevention for high-stakes
eventgraph none - outside WCAG scope
lifecycle none - framework-era specific
layoutcontract Perceivable 1.3.1 weak source order only
boundaries Perceivable 1.4.10 weak reflow only
securitycontract Understandable 3.3.8 tangential authentication only

Key invariants WCAG codifies

  1. Accessibility tree invariant: every interactive element has programmatically determinable name, role, and state
  2. Focus visibility invariant: keyboard users can always identify which element receives next keystroke
  3. Keyboard operable invariant: all pointer functions available via keyboard
  4. Time limit adjustability: user-facing time limits can be disabled or extended 10x
  5. Error prevention (high-stakes): reversibility, validation, or confirmation before finalization

What WCAG does NOT cover

  • Race conditions and async ordering
  • Hydration mismatch
  • Frame budget and jank
  • Event graph correctness (capture/bubble)
  • Layout correctness (float clearing, grid alignment)
  • Backend-side constraints
  • Behavioral correctness beyond accessibility tree

Goldberry implication

a11ycontract cluster has WCAG as authoritative reference. Other clusters must cite framework docs, issue trackers, and observed defects since WCAG doesn't cover them.

ARIA Authoring Practices Guide (APG)

W3C's reference implementations for accessible widget patterns.

Structure per pattern

Each APG pattern documents:

  • Keyboard interactions (many marked "optional")
  • Focus management rules
  • Required ARIA roles and attributes
  • Behavioral requirements (mutual exclusivity, modal semantics)

Behavioral encoding mechanisms

  1. Explicit prescriptions: "Tab and Shift+Tab do not move focus outside the dialog"
  2. Implicit constraints in examples
  3. ARIA semantics as behavioral specification (aria-modal="true")
  4. "Not applicable" as silent specification (alerts must NOT consume keyboard input)

Gaps APG leaves implicit

  • Animation timing and easing
  • Focus restoration after async operations
  • Autocomplete filtering algorithms
  • Nesting and composition rules
  • Dynamic content loading strategies

Goldberry implication

APG strongly specifies roles and attributes; weakly specifies timing, composition, and dynamic content. Goldberry fills the behavioral gaps.

Component Libraries

Radix UI

Uncontrolled-first architecture with WAI-ARIA pattern compliance.

Approach

  • Granular part access for customization
  • Roving tabindex for complex components
  • Behavior wiring handled internally
  • Explicit ARIA synthesis

Testing

  • Component-specific accessibility assertions
  • Multi-browser focus/keyboard validation
  • Jest-based with custom a11y utilities

Headless UI

Render props exposing focus state; Tailwind's unstyled components.

Approach

  • data-focus and activeOption for tracking
  • Modal focus trap with scroll locking and inert
  • Framework variants (React, Vue) with contract parity

Testing

  • Focus trap and restoration tests
  • ARIA pattern validation
  • Interaction module helpers

React Aria (Adobe)

Behavior hooks encapsulating focus, keyboard, ARIA management.

Approach

  • useDialog, useButton, useMenu etc.
  • Developers don't need ARIA knowledge
  • Portable across frameworks
  • TypeScript contracts

Testing

  • Automated jest-axe integration
  • Screen reader testing (NVDA, JAWS, VoiceOver)
  • Complete keyboard flow tests

a11y-dialog (Kitty Giraudel)

Minimal focus trap for modal dialogs.

Approach

  • Single invariant: focus cannot escape without closing
  • Framework agnostic
  • Lightweight, no dependencies

Key debate

Focus trap strictness is a spectrum. GitHub issues document edge cases (datepicker within dialog) where strict trapping breaks UX.

Reach UI

forwardRef on every component; developer responsibility model.

Approach

  • Components behave like native HTML elements
  • Warns when focus management incomplete
  • as prop for semantic customization

Philosophy

"In most cases, developers don't need to know anything about ARIA to use Reach UI" - accessibility built in, not bolted on.

State Machine Approaches

XState and Harel Statecharts

David Khourshid's work: UI bugs as state machine violations.

Core thesis

  • "States that should be unreachable but are"
  • "Transitions that should fire but don't"
  • Multiple conflicting state representations (boolean flags)

Harel statecharts (1987)

  • Hierarchy: nested states with OR semantics
  • Orthogonality: parallel regions with AND semantics
  • Broadcast: events propagate to listening regions
  • Guards: preconditions on transitions
  • Actions: postconditions (entry/exit/transition)

XState approach

  • Explicit state enumeration
  • Guards as preconditions
  • Actions as postconditions
  • TypeScript integration: impossible states as type errors
  • Model-based testing with state/transition coverage

What statecharts don't model

  • Layout and visual geometry
  • Accessibility semantics
  • Timing and performance
  • Continuous values
  • Rendering and DOM state

Goldberry cluster mapping

statechart concept goldberry cluster example
state reachability lifecycle mount/unmount cleanup
guard precondition a11ycontract modal accept enabled if form valid
transition coupling temporality race resolution
orthogonal regions eventgraph capture/bubble independence
forbidden transition focusstate cannot leave modal without focus restore

Devcards + ClojureScript Stack

Bruce Hauman, ~2015. The development environment as UI for a corpus of UI states.

Core thesis

A component is incompletely specified by its source code. Its full specification is the cartesian product of (component x props x state x context). Render that product as cards, and the contract becomes inspectable, sharable, version-controlled.

Cards ARE the test cases ARE the documentation ARE the demo ARE the regression suite. Same artifact, four uses.

The stack (contract enforcement layers)

layer what it does goldberry relevance
ClojureScript immutable by default forbids temporality.raceresolution
Closure Compiler (adv) tree shake + dead code elimination unused contract paths removed at build
Code splitting per-route, per-card boundaries contract surface bounded by build
Figwheel / shadow-cljs hot reload preserving state lifecycle.cleanup visible at dev time
re-frame coeffects declared side effects totality.terminalstate made explicit
Devcards corpus-as-rendered-UI the catalog IS the artifact

Why it matters

Five layers of contract enforcement, none framed as "formal methods" but collectively producing the strongest UI-correctness story in the wild. Each layer assumes previous layer's invariants hold.

Why it didn't conquer

ClojureScript's adoption ceiling is a hiring-pool problem. The stack works. There aren't enough engineers willing to write parens.

Goldberry implication

Devcards proves corpus-as-development-environment works. The invariants the CLJS stack enforces by construction can be extracted, cataloged, and verified post-hoc in other paradigms. That's the wedge.

Related CLJS tools

  • shadow-cljs manifest-based code splitting
  • Helix: React wrapper preserving CLJS invariants
  • Reagent ratoms: closure-stale-closure defect class structurally cannot exist
  • Reframe-10x: event/coeffect/effect flow inspectable as corpus
  • Electric Clojure: bombadil:goldberry split collapses; same form executes server and client

Accessibility Testing Tools

Static analysis coverage

tool coverage best for
axe-core ~57% continuous violation detection
Pa11y 25-40% CI/CD gate blocking
Lighthouse 30-40% dev audit in browser
Storybook ~57% component-level snapshot

What static analysis catches

  • Missing attributes (alt, labels, roles)
  • Color contrast violations
  • Heading hierarchy breaks
  • ARIA attribute presence/syntax

What static analysis cannot catch

  • Relevance of alternative text
  • Complex dynamic interactions
  • Logical form field grouping
  • Contextual accessibility decisions

Behavioral testing

tool contract verification best for
Testing Library yes (via failure) contract enforcement in tests
Playwright ARIA yes (via snapshots) structural invariant validation

Testing Library insight

Native HTML elements have implicit accessibility contracts. A native <button> has keyboard support, tabindex, screen reader role, hover/pressed states by default. A <div role="button"> lacks these.

Tests using getByRole() fail if an element lacks accessible name or role, forcing developers to fix accessibility before the test passes. This is behavioral contract validation.

Playwright 2026

ARIA snapshot testing captures page as accessibility hierarchy. When snapshot changes unexpectedly, test fails. Validates structural invariants without vision models.

Design Systems

Documentation approaches

system behavioral specs keyboard contract a11y contract
Material semi-formal implicit narrative only
Carbon a11y-centric formal WCAG 2.1 AA
Polaris patterns formalized built-in + ARIA
Primer prose prose rules WCAG 2.1 AA
Atlassian templates guaranteed per-component

What design systems specify

  • Keyboard navigation behavior
  • WCAG 2.1 AA compliance
  • Component state matrices
  • Multi-component interaction patterns
  • Content rules

What design systems leave implicit

  • Animation timing
  • State transition sequences
  • Error recovery
  • Responsive breakpoint rules
  • Platform adaptation

Key finding

No design system enforces behavioral contracts like formal methods. Accessibility is the most formalized (WCAG criteria are testable). Manual testing still required for interaction feel and edge cases.

1990s HCI Formal Methods (The Graveyard)

Why this matters

The 1990s and early 2000s HCI community tried repeatedly to formally specify UI behavior. The failure wasn't intellectual - it was about adoption barriers, tooling gaps, and academic-industry disconnect.

Key formalisms attempted

Z Notation (Oxford/York)

  • Mathematical set theory for dialog specifications
  • 200 lines to specify a button's behavior
  • No path from spec to running code
  • Correct invariant notation, unusable ergonomics

VDM (Vienna Development Method)

  • Applied to safety-critical UI (avionics, ATC)
  • State machines as core abstraction
  • Rare skill set, niche tools

Petri Nets

  • Place-transition graphs for concurrent events
  • Explosion problem: 10 actions x 10 states = unreadable diagrams
  • Correct about concurrency; wrong abstraction boundary

CSP (Communicating Sequential Processes)

  • UI components as processes with channels
  • Close to modern component architecture
  • No code generation; alien notation

UAN (User Action Notation)

  • Structured test scenarios
  • Artifact-as-spec approach
  • Lost to prototyping tools

CTT (ConcurTaskTrees)

  • Hierarchical task decomposition
  • Temporal sequencing
  • Tool existed (CTTE); adoption stayed academic

Interactors (York/Glasgow)

  • Object-oriented abstract UI components
  • Closest to modern component frameworks
  • Two decades too early

Why they failed

  1. Notation barrier: domain experts couldn't read specs
  2. Code generation: all or nothing; never good enough
  3. Completeness tax: specs had to be complete before testing
  4. Hiring barrier: required specialists
  5. Tooling friction: specs lived in separate tools
  6. Technology moved faster than specs could keep up

What they got RIGHT

  1. Invariants as primary abstraction
  2. Separation of concerns (Seeheim model)
  3. Falsifiability as requirement
  4. Temporal reasoning for async
  5. Platform independence

The specification gap

The distance between "what the spec said" and "what shipped." Specs became historical documentation, not contracts. Goldberry avoids this by building corpus FROM code (mining GitHub), not separately from code.

Goldberry inheritance

  • Invariants as core abstraction
  • Minimal notation (YAML + prose)
  • Uses git/GitHub (existing tools)
  • Defects-as-evidence (not code generation)
  • Phases with explicit gates (no completeness tax)
  • Readable by any senior engineer (no hiring barrier)

The Unified Contract Problem

Three projects that tried "single place"

Closure (Google, 2009-present)

JS compiler + linter + type checker + dependency manager + module system + minifier. Used for Gmail, Maps, Docs. Single source-of-truth because Closure forces it. Didn't conquer outside Google: ergonomic cost (JSDoc annotations uglier than TypeScript).

Bazel + rulesnodejs

Extends contract to build, deploy, test, depend. Hermetic, reproducible. Goldberry-shaped at every layer. Adoption: Google, parts of Meta/Stripe. Cost: ergonomics.

Phoenix + LiveView (Elixir)

Closest single-place stack. Same Elixir code generates server logic, view, client interaction contract, deployment artifact. Invariant claim: there is no separate frontend. Reaches superset of most concerns except CSS-as-invariant.

Why unified-contract keeps failing

  1. Conway's law for tooling: lint/build/test/deploy from different communities, each owns a tool
  2. Cost of shared formalism: TypeScript ~10 lines, Coq ~300 lines
  3. Web is adversarial at UA boundary: browser updates without you
  4. CSS is hostile to formalization: cascade is global, specificity non-monotonic, :has() defeats static analysis
  5. Different formalisms needed per layer

What actually works: shared schema across tools

SARIF (Static Analysis Results Interchange Format) lets ESLint + Stylelint + axe + TypeScript + Lighthouse emit findings into common JSON shape. Not formal verification but closest to unified contract.

Goldberry extension: SARIF-like schema where each finding cites the invariant (with :source: and :cluster: properties). Lint plugins, build assertions, runtime checkers reference the same catalog.

Goldberry's Position

What no existing tool does

The matrix of (invariant x paradigm x era x refutation source).

What Devcards/CLJS gets closest to

Corpus-as-development-environment with build-verified contracts. Failure mode: must be designed in from day one (like FoundationDB DST). Can't retrofit onto React codebase.

Goldberry's move

Extract invariants the CLJS stack enforces by construction, catalog them, let other paradigms verify against the catalog post-hoc. The catalog doesn't replace tools; it gives them shared vocabulary.

Stretch goal (P7+)

Corpus-as-rendered-UI: each DEFECT-NNNN renders as a card showing broken version, fixed version, and assertion that distinguishes them. Hauman's idea is the right ancestor.

Sources

State Machines

  • XState: https://stately.ai/
  • Harel 1987: "Statecharts: A Visual Formalism for Complex Systems"

Historical

  • PaternĂ², "Model-Based Design and Evaluation" (2000)
  • Dix et al., "Developing User-Centred Requirements" (1989)
  • Myers et al., "UIMS retrospective" (1992)

Author: goldberry research

jwalsh@nexus

Last Updated: 2026-05-17 23:10:42

build: 2026-05-20 03:36 | sha: 12ce5fe