Related Work: UI Invariant Specification Landscape
Table of Contents
- Purpose
- Standards and Specifications
- Component Libraries
- State Machine Approaches
- Accessibility Testing Tools
- Design Systems
- 1990s HCI Formal Methods (The Graveyard)
- The Unified Contract Problem
- Goldberry's Position
- Sources
Purpose
Survey of existing approaches to UI invariant specification, behavioral contracts, and formal methods for interactive systems. Organized by: what they cover, what they miss, and what goldberry learns from each.
Standards and Specifications
WCAG 2.1 / 2.2
The Web Content Accessibility Guidelines are the closest thing to codified UI invariants that achieved industry adoption.
Coverage by goldberry cluster
| cluster | WCAG coverage | quality | notes |
|---|---|---|---|
| a11ycontract | Perceivable + Robust | strong | WCAG's primary domain |
| focusstate | Operable 2.1, 2.4 | good | 2.2 added focus visibility criteria |
| devicecontract | Operable 2.5 | partial | gesture simplification only |
| temporality | Operable 2.2 | weak | time limits only; excludes races |
| totality | Understandable 3.3 | partial | error prevention for high-stakes |
| eventgraph | none | - | outside WCAG scope |
| lifecycle | none | - | framework-era specific |
| layoutcontract | Perceivable 1.3.1 | weak | source order only |
| boundaries | Perceivable 1.4.10 | weak | reflow only |
| securitycontract | Understandable 3.3.8 | tangential | authentication only |
Key invariants WCAG codifies
- Accessibility tree invariant: every interactive element has programmatically determinable name, role, and state
- Focus visibility invariant: keyboard users can always identify which element receives next keystroke
- Keyboard operable invariant: all pointer functions available via keyboard
- Time limit adjustability: user-facing time limits can be disabled or extended 10x
- Error prevention (high-stakes): reversibility, validation, or confirmation before finalization
What WCAG does NOT cover
- Race conditions and async ordering
- Hydration mismatch
- Frame budget and jank
- Event graph correctness (capture/bubble)
- Layout correctness (float clearing, grid alignment)
- Backend-side constraints
- Behavioral correctness beyond accessibility tree
Goldberry implication
a11ycontract cluster has WCAG as authoritative reference. Other clusters must cite framework docs, issue trackers, and observed defects since WCAG doesn't cover them.
ARIA Authoring Practices Guide (APG)
W3C's reference implementations for accessible widget patterns.
Structure per pattern
Each APG pattern documents:
- Keyboard interactions (many marked "optional")
- Focus management rules
- Required ARIA roles and attributes
- Behavioral requirements (mutual exclusivity, modal semantics)
Behavioral encoding mechanisms
- Explicit prescriptions: "Tab and Shift+Tab do not move focus outside the dialog"
- Implicit constraints in examples
- ARIA semantics as behavioral specification (aria-modal="true")
- "Not applicable" as silent specification (alerts must NOT consume keyboard input)
Gaps APG leaves implicit
- Animation timing and easing
- Focus restoration after async operations
- Autocomplete filtering algorithms
- Nesting and composition rules
- Dynamic content loading strategies
Goldberry implication
APG strongly specifies roles and attributes; weakly specifies timing, composition, and dynamic content. Goldberry fills the behavioral gaps.
Component Libraries
Radix UI
Uncontrolled-first architecture with WAI-ARIA pattern compliance.
Approach
- Granular part access for customization
- Roving tabindex for complex components
- Behavior wiring handled internally
- Explicit ARIA synthesis
Testing
- Component-specific accessibility assertions
- Multi-browser focus/keyboard validation
- Jest-based with custom a11y utilities
Headless UI
Render props exposing focus state; Tailwind's unstyled components.
Approach
- data-focus and activeOption for tracking
- Modal focus trap with scroll locking and inert
- Framework variants (React, Vue) with contract parity
Testing
- Focus trap and restoration tests
- ARIA pattern validation
- Interaction module helpers
React Aria (Adobe)
Behavior hooks encapsulating focus, keyboard, ARIA management.
Approach
- useDialog, useButton, useMenu etc.
- Developers don't need ARIA knowledge
- Portable across frameworks
- TypeScript contracts
Testing
- Automated jest-axe integration
- Screen reader testing (NVDA, JAWS, VoiceOver)
- Complete keyboard flow tests
a11y-dialog (Kitty Giraudel)
Minimal focus trap for modal dialogs.
Approach
- Single invariant: focus cannot escape without closing
- Framework agnostic
- Lightweight, no dependencies
Key debate
Focus trap strictness is a spectrum. GitHub issues document edge cases (datepicker within dialog) where strict trapping breaks UX.
Reach UI
forwardRef on every component; developer responsibility model.
Approach
- Components behave like native HTML elements
- Warns when focus management incomplete
- as prop for semantic customization
Philosophy
"In most cases, developers don't need to know anything about ARIA to use Reach UI" - accessibility built in, not bolted on.
State Machine Approaches
XState and Harel Statecharts
David Khourshid's work: UI bugs as state machine violations.
Core thesis
- "States that should be unreachable but are"
- "Transitions that should fire but don't"
- Multiple conflicting state representations (boolean flags)
Harel statecharts (1987)
- Hierarchy: nested states with OR semantics
- Orthogonality: parallel regions with AND semantics
- Broadcast: events propagate to listening regions
- Guards: preconditions on transitions
- Actions: postconditions (entry/exit/transition)
XState approach
- Explicit state enumeration
- Guards as preconditions
- Actions as postconditions
- TypeScript integration: impossible states as type errors
- Model-based testing with state/transition coverage
What statecharts don't model
- Layout and visual geometry
- Accessibility semantics
- Timing and performance
- Continuous values
- Rendering and DOM state
Goldberry cluster mapping
| statechart concept | goldberry cluster | example |
|---|---|---|
| state reachability | lifecycle | mount/unmount cleanup |
| guard precondition | a11ycontract | modal accept enabled if form valid |
| transition coupling | temporality | race resolution |
| orthogonal regions | eventgraph | capture/bubble independence |
| forbidden transition | focusstate | cannot leave modal without focus restore |
Devcards + ClojureScript Stack
Bruce Hauman, ~2015. The development environment as UI for a corpus of UI states.
Core thesis
A component is incompletely specified by its source code. Its full specification is the cartesian product of (component x props x state x context). Render that product as cards, and the contract becomes inspectable, sharable, version-controlled.
Cards ARE the test cases ARE the documentation ARE the demo ARE the regression suite. Same artifact, four uses.
The stack (contract enforcement layers)
| layer | what it does | goldberry relevance |
|---|---|---|
| ClojureScript | immutable by default | forbids temporality.raceresolution |
| Closure Compiler (adv) | tree shake + dead code elimination | unused contract paths removed at build |
| Code splitting | per-route, per-card boundaries | contract surface bounded by build |
| Figwheel / shadow-cljs | hot reload preserving state | lifecycle.cleanup visible at dev time |
| re-frame coeffects | declared side effects | totality.terminalstate made explicit |
| Devcards | corpus-as-rendered-UI | the catalog IS the artifact |
Why it matters
Five layers of contract enforcement, none framed as "formal methods" but collectively producing the strongest UI-correctness story in the wild. Each layer assumes previous layer's invariants hold.
Why it didn't conquer
ClojureScript's adoption ceiling is a hiring-pool problem. The stack works. There aren't enough engineers willing to write parens.
Goldberry implication
Devcards proves corpus-as-development-environment works. The invariants the CLJS stack enforces by construction can be extracted, cataloged, and verified post-hoc in other paradigms. That's the wedge.
Related CLJS tools
- shadow-cljs manifest-based code splitting
- Helix: React wrapper preserving CLJS invariants
- Reagent ratoms: closure-stale-closure defect class structurally cannot exist
- Reframe-10x: event/coeffect/effect flow inspectable as corpus
- Electric Clojure: bombadil:goldberry split collapses; same form executes server and client
Accessibility Testing Tools
Static analysis coverage
| tool | coverage | best for |
|---|---|---|
| axe-core | ~57% | continuous violation detection |
| Pa11y | 25-40% | CI/CD gate blocking |
| Lighthouse | 30-40% | dev audit in browser |
| Storybook | ~57% | component-level snapshot |
What static analysis catches
- Missing attributes (alt, labels, roles)
- Color contrast violations
- Heading hierarchy breaks
- ARIA attribute presence/syntax
What static analysis cannot catch
- Relevance of alternative text
- Complex dynamic interactions
- Logical form field grouping
- Contextual accessibility decisions
Behavioral testing
| tool | contract verification | best for |
|---|---|---|
| Testing Library | yes (via failure) | contract enforcement in tests |
| Playwright ARIA | yes (via snapshots) | structural invariant validation |
Testing Library insight
Native HTML elements have implicit accessibility contracts. A native <button> has keyboard support, tabindex, screen reader role, hover/pressed states by default. A <div role="button"> lacks these.
Tests using getByRole() fail if an element lacks accessible name or role, forcing developers to fix accessibility before the test passes. This is behavioral contract validation.
Playwright 2026
ARIA snapshot testing captures page as accessibility hierarchy. When snapshot changes unexpectedly, test fails. Validates structural invariants without vision models.
Design Systems
Documentation approaches
| system | behavioral specs | keyboard contract | a11y contract |
|---|---|---|---|
| Material | semi-formal | implicit | narrative only |
| Carbon | a11y-centric | formal | WCAG 2.1 AA |
| Polaris | patterns | formalized | built-in + ARIA |
| Primer | prose | prose rules | WCAG 2.1 AA |
| Atlassian | templates | guaranteed | per-component |
What design systems specify
- Keyboard navigation behavior
- WCAG 2.1 AA compliance
- Component state matrices
- Multi-component interaction patterns
- Content rules
What design systems leave implicit
- Animation timing
- State transition sequences
- Error recovery
- Responsive breakpoint rules
- Platform adaptation
Key finding
No design system enforces behavioral contracts like formal methods. Accessibility is the most formalized (WCAG criteria are testable). Manual testing still required for interaction feel and edge cases.
1990s HCI Formal Methods (The Graveyard)
Why this matters
The 1990s and early 2000s HCI community tried repeatedly to formally specify UI behavior. The failure wasn't intellectual - it was about adoption barriers, tooling gaps, and academic-industry disconnect.
Key formalisms attempted
Z Notation (Oxford/York)
- Mathematical set theory for dialog specifications
- 200 lines to specify a button's behavior
- No path from spec to running code
- Correct invariant notation, unusable ergonomics
VDM (Vienna Development Method)
- Applied to safety-critical UI (avionics, ATC)
- State machines as core abstraction
- Rare skill set, niche tools
Petri Nets
- Place-transition graphs for concurrent events
- Explosion problem: 10 actions x 10 states = unreadable diagrams
- Correct about concurrency; wrong abstraction boundary
CSP (Communicating Sequential Processes)
- UI components as processes with channels
- Close to modern component architecture
- No code generation; alien notation
UAN (User Action Notation)
- Structured test scenarios
- Artifact-as-spec approach
- Lost to prototyping tools
CTT (ConcurTaskTrees)
- Hierarchical task decomposition
- Temporal sequencing
- Tool existed (CTTE); adoption stayed academic
Interactors (York/Glasgow)
- Object-oriented abstract UI components
- Closest to modern component frameworks
- Two decades too early
Why they failed
- Notation barrier: domain experts couldn't read specs
- Code generation: all or nothing; never good enough
- Completeness tax: specs had to be complete before testing
- Hiring barrier: required specialists
- Tooling friction: specs lived in separate tools
- Technology moved faster than specs could keep up
What they got RIGHT
- Invariants as primary abstraction
- Separation of concerns (Seeheim model)
- Falsifiability as requirement
- Temporal reasoning for async
- Platform independence
The specification gap
The distance between "what the spec said" and "what shipped." Specs became historical documentation, not contracts. Goldberry avoids this by building corpus FROM code (mining GitHub), not separately from code.
Goldberry inheritance
- Invariants as core abstraction
- Minimal notation (YAML + prose)
- Uses git/GitHub (existing tools)
- Defects-as-evidence (not code generation)
- Phases with explicit gates (no completeness tax)
- Readable by any senior engineer (no hiring barrier)
The Unified Contract Problem
Three projects that tried "single place"
Closure (Google, 2009-present)
JS compiler + linter + type checker + dependency manager + module system + minifier. Used for Gmail, Maps, Docs. Single source-of-truth because Closure forces it. Didn't conquer outside Google: ergonomic cost (JSDoc annotations uglier than TypeScript).
Bazel + rulesnodejs
Extends contract to build, deploy, test, depend. Hermetic, reproducible. Goldberry-shaped at every layer. Adoption: Google, parts of Meta/Stripe. Cost: ergonomics.
Phoenix + LiveView (Elixir)
Closest single-place stack. Same Elixir code generates server logic, view, client interaction contract, deployment artifact. Invariant claim: there is no separate frontend. Reaches superset of most concerns except CSS-as-invariant.
Why unified-contract keeps failing
- Conway's law for tooling: lint/build/test/deploy from different communities, each owns a tool
- Cost of shared formalism: TypeScript ~10 lines, Coq ~300 lines
- Web is adversarial at UA boundary: browser updates without you
- CSS is hostile to formalization: cascade is global, specificity non-monotonic, :has() defeats static analysis
- Different formalisms needed per layer
What actually works: shared schema across tools
SARIF (Static Analysis Results Interchange Format) lets ESLint + Stylelint + axe + TypeScript + Lighthouse emit findings into common JSON shape. Not formal verification but closest to unified contract.
Goldberry extension: SARIF-like schema where each finding cites the invariant (with :source: and :cluster: properties). Lint plugins, build assertions, runtime checkers reference the same catalog.
Goldberry's Position
What no existing tool does
The matrix of (invariant x paradigm x era x refutation source).
What Devcards/CLJS gets closest to
Corpus-as-development-environment with build-verified contracts. Failure mode: must be designed in from day one (like FoundationDB DST). Can't retrofit onto React codebase.
Goldberry's move
Extract invariants the CLJS stack enforces by construction, catalog them, let other paradigms verify against the catalog post-hoc. The catalog doesn't replace tools; it gives them shared vocabulary.
Stretch goal (P7+)
Corpus-as-rendered-UI: each DEFECT-NNNN renders as a card showing broken version, fixed version, and assertion that distinguishes them. Hauman's idea is the right ancestor.
Sources
Standards
- WCAG 2.2: https://www.w3.org/TR/WCAG22/
- ARIA APG: https://www.w3.org/WAI/ARIA/apg/
Component Libraries
- Radix: https://www.radix-ui.com/
- Headless UI: https://headlessui.com/
- React Aria: https://react-spectrum.adobe.com/react-aria/
- a11y-dialog: https://github.com/KittyGiraudel/a11y-dialog
State Machines
- XState: https://stately.ai/
- Harel 1987: "Statecharts: A Visual Formalism for Complex Systems"
ClojureScript
- Devcards: https://github.com/bhauman/devcards
- re-frame: https://github.com/day8/re-frame
- shadow-cljs: https://github.com/thheller/shadow-cljs
Testing
- Testing Library: https://testing-library.com/
- Playwright: https://playwright.dev/
- axe-core: https://www.deque.com/axe/
Design Systems
- Material Design: https://m3.material.io/
- Carbon: https://carbondesignsystem.com/
- Polaris: https://polaris.shopify.com/
- Primer: https://primer.style/
Historical
- PaternĂ², "Model-Based Design and Evaluation" (2000)
- Dix et al., "Developing User-Centred Requirements" (1989)
- Myers et al., "UIMS retrospective" (1992)
