Reversible Pipeline Transforms
Intellectual lineage of stacked, composable, invertible operations across visual programming, stack machines, mathematics, and information theory

1. Thesis
2. The Pipeline in One Example
3. The Taxonomy: Injections, Surjections, Bijections
- 3.1. Mapping to the Tool
- 3.2. Lossy Normalization and Confusability
4. Structural Properties Under Composition
5. Visual and Dataflow Programming / Stack Machines
6. Functional Composition
7. Mathematics and Type Theory
8. Relational Algebra and Information Theory
9. Reversible Computing
10. Data Pipeline Tools
11. Topology, Geometry, and the Covering Space Analogy
12. Canonical Examples: Reversible Ciphers and Irreversible Projections
13. Design Implications for the Pipeline Tool
- 13.1. Enforcement Spectrum: Who Guarantees the Inverse?
14. Verification Audit
15. References

Companion to the transform pipeline tool.

1. Thesis

A pipeline of composable transforms – where each step takes a value, produces a value, and intermediate results are visible at every arrow – is one of the oldest computational ideas. Clojure's threading macro, Unix pipes, Forth words, and HP RPN stacks are all the same abstraction wearing different syntactic clothing.

The interesting design question is not composition. It is invertibility. Some transforms are bijections: apply base64/encode, then base64/decode, and you recover the original. Others are projections: apply sha256, and the preimage is gone. The pipeline tool's job is to know which is which, propagate that information through the chain, and surface it visually at every step.

The pipeline problem has roots in stack machines, category theory, and information geometry.

2. The Pipeline in One Example

One pipeline. Four steps. The reversibility indicator changes color at step 3.

(-> "alice@example.com"
    rot13              ;; "nyvpr@rknzcyr.pbz"  ← involution (green)
    base64/encode      ;; "bnlwckBya3..."      ← bijection (green)
    identity           ;; "bnlwckBya3..."      ← id (∀ x, id x = x)
    sort               ;; "...BLYbkpw"         ← surjection (RED)
    reverse)           ;; "wpkBYL..."          ← involution (green)

Steps 1, 2, and 4 are reversible. Step 3 (sort) destroys the character ordering – the pipeline goes red and stays red. Click ⇌ reverse pipeline and the tool flips each step's direction and reverses the order. But step 3 has no inverse, so the round-trip fails there.

Four kinds of transform, three verbs:

Kind	Property	Verb	Color
Identity	`f(x) = x`	apply	green
Involution	`f(f(x)) = x`	apply	green
Bijection	`decode(encode(x)) = x`	encode / decode	green
Surjection	no inverse	apply	red

The rest of this document explains why.

3. The Taxonomy: Injections, Surjections, Bijections

Every function between sets falls into one of four categories. Lean 4 (Mathlib) provides the type-level definitions.

Lean 4	Function type	Property
`Function.Involutive`	Involution	`f(f(x)) = x` – the function is its own inverse
`Function.Bijective`	Bijection	both injective and surjective – a perfect pairing
`Function.Injective`	Injection	distinct inputs → distinct outputs (left-invertible)
`Function.Surjective`	Surjection	every output has a preimage (right-invertible, lossy)

Figure 1: Function-type lattice for pipeline operations

An involution is a bijection that is its own inverse: reverse, rot13, atbash, jump5. Apply it twice and you recover the original. There is no separate "encode" and "decode" – just one function.

A bijection with distinct forward and backward functions: base64, hex, A1Z26. The forward function (encode) and backward function (decode) are different, but their composition is the identity.

An injection maps distinct inputs to distinct outputs, but may not cover the entire codomain. gzip compress is injective: no two byte strings compress to the same output. But not every byte string is valid gzip output, so gzip decompress is partial – defined only on the image of compress.

A surjection maps onto the entire codomain, but multiple inputs may map to the same output. sort, upper, lower – information is destroyed.

3.1. Mapping to the Tool

The transform tool maps these categories to registry flags and UI verbs:

Function type	Tool flag	Verb	Color
Identity	--	apply	green
Involution	`:involution? true`, `:fn`	apply	green
Bijection	`:inverse? true`, `:encode` / `:decode`	encode / decode	green
Injection	(not yet implemented)	--	amber
Surjection	`:inverse? false`	apply	red

Involutions use a single :fn field – the direction dropdown is irrelevant. Bijections use separate :encode and :decode functions. Surjections offer only "apply" – there is no decode.

Note: this document uses "surjection" informally to mean "non-injective, information-destroying." The rebuild specification provides formal domain predicates and clarifies where the practitioner's vocabulary diverges from the mathematical definitions.

3.2. Lossy Normalization and Confusability

Some surjections destroy information that humans do not miss. upper maps "Hello" and "HELLO" to the same output; a reader considers them the same word. morse maps "FLAG{r3v3rs1bl3}" to a sequence where braces become ?; a CTF player reconstructs the braces from context. Formally these are non-injective: distinct inputs produce identical outputs. In Unicode's terminology, they are confusable (UTS #39).

This gap between formal distinctness and human equivalence appears throughout text processing:

Operation	What is lost	Human perception
`upper` / `lower`	case	same word
`morse` encode	characters outside ITU alphabet	recognizable pattern
Unicode NFKC	compatibility decomposition	same glyph
Accent stripping	diacritics	same base letter
Leet decode (`from-leet`)	substitution ambiguity	readable word
Backspace rendering (`^H`)	erased characters	final visible text
Homoglyph collapse	distinct code points (Cyrillic а vs Latin a)	identical appearance

Unicode normalization formalizes four levels of this collapse: NFC, NFD, NFKC, NFKD. Each is a surjection; each preserves a different equivalence class. NFKC is the most aggressive, collapsing compatibility characters (ligatures, width variants) into their canonical forms.

The pipeline tool does not normalize Unicode. Two strings that render identically but differ in code points (e.g. e + combining accent vs precomposed e) produce different hex, base64, and char-code outputs. This is correct: the tool operates on code points, not glyphs. Confusability is the human's problem, not the pipeline's.

The DOS Alt-255 trick (CP437 byte 0xFF maps to U+00A0 NBSP, visually identical to regular space 0x20) and the ^H backspace convention are historical examples. The Jargon File gives: "Be nice to this fool^H^H^H^Hgentleman, he's visiting from corporate HQ." On printing terminals, backspace keystrokes echoed as ^H, making the erased text visible. The stored bytes and the intended display disagree; a formal system sees the full sequence, a human reads only the correction.

4. Structural Properties Under Composition

The taxonomy above classifies transforms by invertibility. A cross-cutting property is whether a transform respects the structure of its input under composition.

4.1. Monoid Homomorphisms: Compositional Transforms

Strings form a monoid under concatenation (str in Clojure, ++ in Haskell, cat in Unix) with the empty string as identity. A transform is a monoid homomorphism when it commutes with concatenation:

upper("foo") ++ upper("bar")  =  upper("foo" ++ "bar")    ✓ compositional
sort("foo")  ++ sort("bar")   ≠  sort("foo"  ++ "bar")    ✗ global

upper processes each character independently – it is map toUpper on a list. Character at position i in the output depends only on character at position i in the input.

sort needs to see the whole string before producing output – every output position depends on every input character.

;; upper respects concatenation
(= (str/upper-case (str "Hello" " World"))
   (str (str/upper-case "Hello") (str/upper-case " World")))

;; sort does NOT respect concatenation
(= (apply str (sort (str "ba" "ab")))        ;; "aabb"
   (str (apply str (sort "ba"))
        (apply str (sort "ab"))))             ;; "abab"

A monoid homomorphism can be streamed, chunked, and parallelized – the behavior on parts tells you the behavior on the whole. A non-homomorphism requires global visibility.

4.2. Fibers and Information Loss

Not all surjections lose the same amount of information. The fiber of a function over an output value is the set of all inputs that map to it.

Transform	Loses	Fiber size	Homomorphism?	Has section?
`upper`	case bits (1 bit per alpha char)	2^k	yes	yes (split epimorphism)
`lower`	case bits	2^k	yes	yes
`sort`	character ordering (permutation)	n!/∏(freq!)	no	no

upper has a section: lower is a right inverse on the image of upper. If the input was all-lowercase, lower(upper(x)) = x. Formally, upper is a split epimorphism – a surjection with a one-sided inverse.

sort has no section. There is no function g such that sort(g(x)) = x for all sorted strings, because there is no principled way to pick one permutation from each fiber.

4.3. Nomenclature

Term	Meaning	Examples
Monoid homomorphism	`f(a ++ b) = f(a) ++ f(b)`	upper, lower, rot13
Anti-homomorphism	`f(a ++ b) = f(b) ++ f(a)`	reverse
Non-homomorphism	`f(a ++ b) ≠ f(a) ++ f(b)`	sort, sha256
Split epimorphism	surjection with a section	upper (section: lower)
Fiber	preimage of one output value	`upper("HELLO")` has 2^5 fibers
Involution	`f(f(x)) = x`	rot13, reverse, atbash, jump5
Idempotent	`f(f(x)) = f(x)`	sort, upper, lower

5. Visual and Dataflow Programming / Stack Machines

5.1. Yahoo Pipes (2007–2015)

Yahoo Pipes was the first mainstream consumer-facing dataflow editor. Users connected rectangular boxes – RSS feed sources, filters, regex transforms, sort and truncate steps – with drawn wires. The output of each box was visible by clicking it. The model was explicit: boxes are functions, wires are composition, intermediate state is inspectable.

Pipes was surjective by default. A filter box discards items that do not match; the discarded items are gone. There was no unfilter operator and no reversibility indicator. Users did not miss it because the domain (RSS aggregation) had no notion of round-tripping. The pipeline was purely feedforward.

5.2. Scratch and Blockly

MIT Scratch uses snap-together blocks whose shapes encode type compatibility: a rounded slot accepts a reporter block, a C-shaped slot accepts a sequence. This is a visual type system. The shape constraint prevents certain composition errors structurally, before evaluation.

Figure 2: Scratch 1.4 on FreeBSD – snap-together blocks as a visual type system

Blockly (Google, 2012) generalized Scratch's block grammar into a toolkit for building domain-specific visual languages. Both systems are one-directional: a block that computes a value does not carry its inverse. There is no standard mechanism for "undo a block's effect on the data stream."

5.3. Max/MSP and Pure Data

Max/MSP (Cycling '74, 1980s–present) and Pure Data (Miller Puckette) (Puckette 1996) model audio and multimedia signal flow as a directed graph of objects connected by patch cables. Every object is a function from inlet values to outlet values, evaluated on a scheduler tick.

The signal-flow metaphor is close to the pipeline tool's model. Key difference: audio transforms are typically not invertible in practice even when mathematically invertible. A low-pass filter has a transfer function with a computable inverse, but no one patches that inverse in because the use case does not require round-tripping the audio signal.

5.4. LabVIEW

LabVIEW (National Instruments, 1986; acquired by Emerson Electric, 2023) introduced dataflow programming for instrument control. A VI (Virtual Instrument) executes when sufficient data is available on its required inputs. The language is inherently parallel: nodes without data dependencies execute concurrently.

LabVIEW's contribution to the lineage is the notion that a node's execution model is determined by its data dependencies, not by statement order. The pipeline tool is simpler (sequential, not concurrent) but inherits the idea that the graph topology is the program.

5.5. Node-RED

Node-RED (Nick O'Leary and Dave Conway-Jones at IBM, 2013; now OpenJS Foundation) wires together event-driven JavaScript nodes over HTTP, MQTT, and other message buses. A node receives a message object, may mutate or replace its payload field, and emits a new message. The mutation-in-place model differs from a pure functional pipeline: nodes can be stateful.

For reversibility purposes, stateful nodes are problematic. If a node's output depends on accumulated state (a counter, a cache hit), the inverse function would need to reconstruct that state. Node-RED does not model this; neither does any deployed version of such a tool the author knows of.

5.6. Unreal Blueprints

Unreal Engine's Blueprint visual scripting system is a dataflow graph for game logic. It handles both control flow (execution pins, shown as white arrows) and data flow (typed value pins, shown in colors by type). The dual-pin model is an explicit separation of concerns: "when to run" versus "what to compute."

The pipeline tool collapses this duality: there is no separate control flow. Every step runs on each input. Blueprint's value-typed pins are the closer analogy – a string output pin can only wire to a string input pin, enforcing type compatibility before execution.

5.7. Forth

Charles Moore's Forth (Moore 1974) compiles user-defined words into sequences of machine operations. A Forth program is a sequence of words applied to a shared stack. The composition model is concatenation: putting two word sequences next to each other composes them. There is no function application syntax because juxtaposition is application.

: celsius-to-fahrenheit  ( n -- n )   9 * 5 / 32 + ;
: fahrenheit-to-celsius  ( n -- n )   32 - 5 * 9 / ;

100 celsius-to-fahrenheit .    \ prints 212
212 fahrenheit-to-celsius .    \ prints 100

Forth words are explicit inverses when the programmer writes them. The language offers no built-in mechanism to derive or verify the inverse. The stack comment notation ( n -- n ) documents the stack effect; a type system for reversibility would need to track whether f and g satisfy g . f = id.

5.8. HP-48 RPN Calculator

The HP-48 and its descendants implement a visual stack: the screen shows the top four or more stack levels labeled 1 through 4 (and higher). Every operation consumes arguments from the stack and pushes results. The user sees the entire stack state after each keystroke.

This is the closest consumer-device analogue to the pipeline tool's intermediate-value display. The HP-48's UNDO operation restores the stack to its state before the last command – a single-step inverse. It does not compose inverses; it only reverses the most recent operation by saving state. The information is preserved in the saved snapshot, not derived from the operation's mathematical inverse.

5.9. PostScript

PostScript (Adobe, 1982) is a stack-based page description language. Operators pop arguments and push results. A PostScript program is a sequence of tokens that transforms a graphics state. The graphics state is not invertible in PostScript's execution model: fill paints pixels and discards the path. There is no unfill. PostScript is a canonical example of a stack-machine language where most operations are irreversible in practice.

5.10. Factor and Joy

Factor (Slava Pestov, 2003) and Joy (Manfred von Thun, 1990s) are concatenative languages that make the stack model explicit in the type system. Joy's insight: a Joy program is a function from stack to stack. Composition is concatenation. Quotations (anonymous functions) are first-class stack values.

Factor adds a static stack-effect type system: every word declares its stack effect, and the compiler verifies consistency. This is one step toward a reversibility type system: a word whose stack effect is ( a -- b ) could additionally declare a companion word with effect ( b -- a ), and the compiler could verify their composition equals the identity on the domain.

No mainstream concatenative language implements this today. It remains a research target.

6. Functional Composition

6.1. Clojure Threading Macros

Figure 3: Threading macro with intermediate values at every step

Clojure's -> macro threads a value through a sequence of forms, inserting it as the first argument of each successive call:

(-> "hello world"
    str/upper-case
    (str/replace #"\s+" "-")
    (str/split #"")
    count)
;; => 11

The pipeline tool is this macro made visual. Each arrow in the UI corresponds to a -> insertion point. The intermediate value at each arrow is the return value of that position's form. The key insight: Clojure's macro is a syntactic rewrite that does not evaluate left-to-right at parse time; the visual tool evaluates eagerly and shows the result at each step.

->> threads as the last argument, appropriate for sequence-processing functions where the collection is the final parameter. The visual tool would need to model this distinction (first-arg threading vs last-arg threading) if it supports both modes.

6.2. Unix Pipes

Ken Thompson's pipe (1973) composes programs by connecting stdout of one to stdin of the next. The type system is bytes: every program reads bytes and writes bytes. Type correctness is entirely semantic and runtime-checked (or never checked at all).

cat /var/log/auth.log | grep "Failed" | awk '{print $11}' | sort | uniq -c | sort -rn

Unix pipes are the original horizontal pipeline visualization. Shell history is a sequence of transformations; the user sees only the final output unless they insert tee to bifurcate the stream. The pipeline tool's intermediate display is what adding | tee step-N.txt | at every step would give you.

Unix pipes are irreversible: grep discards non-matching lines permanently. There is no ungrep. This is the surjection problem: the function is not injective, so no left inverse exists.

6.3. Haskell Arrows and Lenses

Haskell's Arrow typeclass abstracts over computations that transform a type b to a type c. Arrows compose with >>>. An arrow with both a forward and backward direction is an ArrowLoop or, more pertinently, a lens.

A Haskell lens (van Laarhoven / profunctor formulation) (van Laarhoven 2009) is precisely a pair of functions: a getter s -> a and a setter s -> a -> s. A lens on a data structure lets you read a field and write it back. The composition of two lenses is a lens: the getter composes forward, the setter composes backward automatically.

Foster, Greenwald, Moore, Pierce, and Schmitt state the laws a lens obeys (Foster et al. 2007). A lens carries a getter get : s -> a and a putter put : s -> a -> s. It is well-behaved when two laws hold: get-put – putting back an unchanged view leaves the source intact, put s (get s) = s – and put-get – reading back a written view returns exactly what was written, get (put s a) = a. A lens is very-well-behaved when it additionally satisfies put-put, where a second write overwrites the first. The classes nest: very-well-behaved ⊐ well-behaved ⊐ partial. The encode tool's green/amber/red indicator is this lattice. Green is a total lens (an Iso, satisfying every law); amber is a partial lens whose putter is defined only on a subset; red is a chain that no longer round-trips. Recent work refines the amber boundary directly: contract lenses constrain partial lenses with a pair of predicates so they still compose (Zhang et al. 2023), and a characterization of partial well-behaved lenses shows the laws survive when put alone is partial but need a new law once get is partial too (Hashiba et al. 2025).

Lenses are the type-theoretic answer to the pipeline reversibility problem for data structure access. They are not sufficient for arbitrary byte transforms (base64, gzip, encryption) but they establish the pattern: a reversible operation is a pair of functions, and composition of pairs is defined component-wise.

The Iso type in lens libraries (an isomorphism) is the purest case: both the forward and backward functions are total and compose to identity. The encode tool's "fully reversible" indicator corresponds to a chain of Iso values. The moment a Prism (partial lens, possibly failing) or Fold (many-to-one) enters the chain, reversibility is lost.

Boomerang is a formally specified string-to-string bidirectional pipeline language (Bohannon et al. 2008). Its combinators mirror the regular operations – union, concatenation, Kleene star – and read left to right as an encoder, right to left as an update translator: the encode tool's problem stated as a type system. The backward direction is a partial inverse, total only on the subset of outputs the forward direction produces, and Boomerang's types track that subset with regular expressions rather than failing at run time.

6.4. The Magrittr and Pipe Operators

R's magrittr %>% pipe (Stefan Milton Bache, 2014), the native |> in R 4.1+, Elixir's |>, F#'s |>, and Hack's |> all implement the same threading semantics. Each inserts the left-hand value into the right-hand function call. The diversity of pipe operators across languages with no mutual influence is evidence that the pattern solves a real syntactic friction point.

Elixir's pipeline idiom is pervasive in idiomatic code:

"hello world"
|> String.upcase()
|> String.split(" ")
|> Enum.map(&String.length/1)
|> Enum.sum()

None of these languages track reversibility in their pipe type. The type of each step is the return type of that function; there is no annotation for "this function has a documented inverse."

6.5. Ramda and Transducers

Ramda (R.pipe, R.compose) implements point-free functional composition (Backus 1978) in JavaScript. Clojure transducers (Rich Hickey) (Hickey 2014) compose transformations independently of their application context: the same transducer works over sequences, channels, and observables.

Transducers are compositional but not reversible. A filtering transducer discards elements; a mapping transducer applies a function that may not be injective. The transducer model does not carry inverse information.

6.5.1. Transducers as Context-Free Composition

The standard sequence pipeline (->> coll (map f) (filter p) (take n)) builds intermediate collections at each step. Transducers eliminate this: they compose the transform independently of the source. A transducer is a function from one reducing function to another:

;; transducer: (rf → rf)
(def xf
  (comp
    (map str/upper-case)
    (filter #(> (count %) 3))
    (take 5)))

;; Apply to a sequence
(into [] xf ["hi" "hello" "world" "yo" "great" "ok" "fine" "super"])
;; => ["HELLO" "WORLD" "GREAT" "FINE" "SUPER"]

;; Same transducer, different context -- a core.async channel
(chan 10 xf)

The composition order is inverted relative to ->> threading: (comp (map f) (filter p)) applies map first, then filter. This matches the natural reading order ("map, then filter") because transducers compose inside-out – each wraps the next reducing function.

For the pipeline tool, the transducer model clarifies a key distinction:

Threading macros compose applications (value flows through functions).
Transducers compose transformations (functions combine before any value flows).

The pipeline tool works in the threading model (eager evaluation, intermediate values visible at each step). The transducer model would pre-compose the chain into a single function and then apply it – efficient but invisible: no intermediate values, no step-by-step trace. The tool deliberately chooses threading over transducers because the trace is the value.

6.5.2. jQuery Method Chaining: The OOP Threading Macro

jQuery's fluent API (John Resig, 2006) is the object-oriented counterpart of Clojure's threading macro. Each method returns this (the jQuery object), making the chain composable:

$('#message')
  .text('Hello, World!')     // set content
  .css('color', 'red')       // style it
  .show()                    // make visible
  .animate({opacity: 0.5});  // fade

This is structurally identical to:

(-> ($ "#message")
    (.text "Hello, World!")
    (.css "color" "red")
    (.show)
    (.animate #js {:opacity 0.5}))

The key difference: jQuery methods mutate and return this (stateful); Clojure threading macros produce new values (stateless). For reversibility: .show() has an inverse (.hide()), .css('color','red') is reversible if the original value is saved, but .remove() is irreversible – the element is gone from the DOM.

CSS selectors are themselves a pipeline. $('div.foo > p.bar:first-child') reads left to right as a filter chain: start with all div.foo elements, narrow to direct p.bar children, take the first. Each selector step is a surjection – it narrows the matched set.

See also: jQuery Conference Boston 2011 and jQuery Conference San Francisco 2011 for conference notes from the era when this pattern was being formalized.

6.5.3. Composition Order: map/filter/reduce Pipelines

The order of map, filter, and reduce in a pipeline is not freely exchangeable. Each has structural constraints:

Operation	Type	Reversible?	Commutes with filter?
map f	a → b (per element)	Iff f bijective	No – filter may depend on mapped value
mapcat f	a → [b] (one-to-many, flatten)	Iff f injective and boundaries recoverable	No – same as map
filter p	[a] → [a] (subset)	No (surjection)	No – order matters
reduce f	[a] → b (fold)	Rarely	N/A (terminal)
take n	[a] → [a] (prefix)	No (surjection)	No – filter changes which n
sort	[a] → [a] (reorder)	No (surjection on permutations)	Yes (sort . filter = filter . sort if p is order-independent)

The non-commutativity of filter and map is the source of pipeline bugs: (map inc (filter even? xs)) is not (filter even? (map inc xs)). The first filters then increments; the second increments then filters. They produce different results on the same input.

mapcat is the one-to-many step: it maps then concatenates, flattening one level. The flattening discards the boundaries between elements, so mapcat is reversible only when f is injective and each element's output has a recoverable extent – a fixed width, a delimiter, or a length prefix. (mapcat seq ["ab" "c"]) yields (\a \b \c) and cannot be split back into the original two strings; (mapcat #(vector % %) s) (duplicate each element) is reversible because the expansion is fixed-width. It is the sequence-level analog of an expanding codec: bijective only when the inverse can find the seams.

In the encode tool's domain (string → string transforms), every step is both map and reduce (the entire string is the single element). The commutativity question becomes: does (rot13 (base64 s)) equal (base64 (rot13 s))? In general no – the functions do not commute, and their non-commutativity is observable at every step in the trace.

6.5.4. Racket Contracts and Pre/Post Conditions

Racket's contract system (Findler and Felleisen 2002) attaches behavioral specifications to function boundaries. A contract is a first-class predicate checked at runtime:

(define/contract (divide x y)
  (-> number? (and/c number? (not/c zero?)) number?)
  (/ x y))

The contract (-> number? (and/c number? (not/c zero?)) number?) declares: first argument must be a number, second must be a non-zero number, result is a number. Blame is assigned to the caller if the precondition fails, and to the function if the postcondition fails.

For pipeline composition, contracts propagate: if step 1 guarantees its output is a hex string, and step 2 requires a hex string as input, the composition is well-typed at the contract level. A contract violation at step N blames step N-1's output.

The pipeline tool's domain restrictions (Section 10, property 3) are contracts: ipv4→int requires a dotted-quad string; hex/decode requires an even-length hex string. The tool should check these before execution and produce blame annotations: "step 3 failed because step 2 output is not valid hex."

Eiffel's Design by Contract (Meyer 1992) predates Racket's system and introduced require (precondition) and ensure (postcondition) as language keywords. The Hoare triple {P} S {Q} is the mathematical foundation: if precondition P holds before statement S executes, postcondition Q holds after.

Robby Findler (the contracts paper's first author) presented at RacketCon 2023. Contracts for protocol specification were explored at RacketCon 2022, and the relationship between defensive programming and contracts at RacketCon 2020. The RacketCon 2024 notes include a worked define/contract example (organized by Findler and held at UW).

6.5.5. Property-Based Testing and Pipeline Invariants

test.check (Draper 2013) and QuickCheck (Claessen and Hughes 2000) generate random inputs and verify properties. The pipeline tool's defining property is the round-trip:

(defspec round-trip-prop 100
  (prop/for-all [input gen/string-ascii
                 steps gen-reversible-pipeline]
    (let [forward  (thread input steps)
          encoded  (-> forward last :value)
          backward (thread encoded (reverse-pipeline steps))]
      (= input (-> backward last :value)))))

This is the universal property: for any input and any fully-reversible pipeline, threading forward then backward recovers the original. The test generator (gen-reversible-pipeline) produces random chains of bijective steps; the property asserts the round-trip.

Stronger properties per codec:

Codec	Property	Generator constraint
rot13	`(rot13 (rot13 s)) = s` (involution)	any string
base64	`(decode (encode s)) = s`	any string
hex	`(decode (encode s)) = s`	any string
xor-k	`(xor-k (xor-k s)) = s` (involution)	any string
ipv4→int	`(int→ipv4 (ipv4→int ip)) = ip`	gen-valid-ipv4
sort	`(sort s) = (sort (sort s))` (idempotent)	any string
a1z26	`(decode (encode s)) = s`	gen-alpha-only

The idempotence of sort is a weaker property than invertibility: applying it twice gives the same result, but it does not round-trip. This is the distinction between a retraction (idempotent surjection) and an isomorphism (bijection).

6.5.6. Named Parameters and Argument-Order Conventions

The ordering of named parameters in function calls is itself a composition problem. Clojure's threading macros demand that the "threaded" argument appear in a fixed position (first for ->, last for ->>). Functions that place their transformable argument in the wrong position break the threading chain:

;; Works: str/upper-case takes string as first arg
(-> "hello" str/upper-case)

;; Breaks: assoc takes map as first arg, not the value being threaded
;; Must wrap in anonymous fn or use as->
(as-> "hello" $
  (str/upper-case $)
  (assoc {} :greeting $))

as-> is the general solution: it binds the threaded value to a named symbol, allowing it to appear in any argument position. This is more powerful but less readable – the pipeline's visual clarity comes from the convention that the threaded value is always in the same position.

The encode tool sidesteps this: every codec has the signature (string → string) with the input as the sole argument. There is no argument-position ambiguity. This is a design constraint that enables the visual pipeline to work: unary functions compose cleanly. The moment a codec requires a second argument (a key for XOR, a salt for PBKDF2), the threading model must extend to carry configuration alongside the value.

This is precisely the Reader monad pattern: the pipeline threads a value while each step reads from a shared environment (the key/config). Transducers achieve the same effect differently: the configuration is closed over at transducer creation time, before composition.

7. Mathematics and Type Theory

See also: 6 | Information Theory | 9 | Topology

7.1. Group Theory: Invertibility as Structure

A group is a set G with a binary operation · satisfying closure, associativity, identity, and inverses. Every element g has a unique inverse g⁻¹ such that g · g⁻¹ = g⁻¹ · g = e.

The invertible transforms over a domain form a group under composition. The non-invertible transforms (functions that lose information) do not: there is no inverse to compose with, so the identity law fails. The pipeline tool's reversibility indicator is answering: does this operation belong to the group of invertible transforms on the data domain, or does it fall outside?

Group actions formalize how a group element acts on a set. An encoding operation that is invertible acts on byte strings as a group element acts on its domain: there is always an undo.

7.2. Godel, Escher, Bach: Isomorphisms as Theme

Hofstadter's Godel, Escher, Bach (Hofstadter 1979) is organized around isomorphisms between formal systems. Three threads are directly relevant to the pipeline model:

Godel numbering is a bijection from mathematical statements to integers. Each well-formed formula maps to a unique number; each number decodes to at most one formula. This is structurally identical to the encode tool's codecs: encode maps statements to integers, decode maps integers back. The encoding is reversible because the mapping is injective and the domain is well-defined.

A simplified Godel encoding in Clojure – each symbol maps to a prime base, and the sequence is encoded as a product of prime powers:

(def symbols {"M" 2 "I" 3 "U" 5})

(defn godel-encode [s]
  "Encode a MIU string as a Godel number (product of prime powers)."
  (->> s
       (map-indexed (fn [i ch]
                      (let [base (symbols (str ch))]
                        (long (Math/pow base (inc i))))))
       (reduce *)))

(defn godel-decode [n]
  "Decode a Godel number back to a MIU string."
  (let [primes [2 3 5]
        sym-for {2 "M" 3 "I" 5 "U"}]
    (loop [remaining n pos 1 result []]
      (if (= remaining 1) (apply str result)
        (let [[p sym] (first (for [p primes
                                   :let [exp (loop [e 0 v remaining]
                                              (if (zero? (mod v p))
                                                (recur (inc e) (/ v p)) e))]
                                   :when (= exp pos)]
                               [p (sym-for p)]))]
          (recur (long (/ remaining (Math/pow p pos)))
                 (inc pos)
                 (conj result sym)))))))

(godel-encode "MI")   ;; => 18  (2^1 * 3^2)
(godel-decode 18)     ;; => "MI"
(godel-encode "MIU")  ;; => 2250  (2^1 * 3^2 * 5^3)
(godel-decode 2250)   ;; => "MIU"

The encoding is a bijection: every MIU string maps to a unique integer, and every such integer decodes back. This is the same round-trip property as base64 or hex in the encode tool – but operating on a formal language rather than byte strings. See BugBash 2026 for Nada Amin's talk on dependent types and metaprogramming – the type-theoretic machinery that makes these encodings provably bijective.

Bach's Crab Canon (Canon Cancrizans) plays a melody simultaneously forward and backward. The musical structure is an involution: the composition is its own reverse. Hofstadter's dialogue chapter "Crab Canon" mirrors this – it reads the same in both directions. This is (reverse (reverse s)) = s performed as literary form, and it is the musical ancestor of reverse-pipeline.

The MU-puzzle asks which strings are reachable from MI under four transformation rules. Rule I (xI → xIU) is injective (appends, no information lost). Rule II (Mx → Mxx) is injective (doubles). Rule III (xIIIy → xUy) is non-injective – it replaces three characters with one, and the position of the replaced III is not recoverable from the output. Rule IV (xUUy → xy) is non-injective – the position of the deleted UU is lost. The puzzle asks: can you derive MU from MI? The answer is no. The invariant: the count of I's is never divisible by 3 under any rule, and MU requires zero I's. The puzzle is about reachability under a mix of injective and non-injective rules – exactly the pipeline model's design question.

The MIU rules as Clojure transforms – each is a function string → set-of-strings (nondeterministic, since Rule III can apply at multiple positions):

(defn rule-i [s]
  (when (.endsWith s "I") #{(str s "U")}))

(defn rule-ii [s]
  (when (.startsWith s "M") #{(str "M" (subs s 1) (subs s 1))}))

(defn rule-iii [s]
  (->> (range (- (count s) 2))
       (filter #(= "III" (subs s % (+ % 3))))
       (map #(str (subs s 0 %) "U" (subs s (+ % 3))))
       set))

(defn rule-iv [s]
  (->> (range (- (count s) 1))
       (filter #(= "UU" (subs s % (+ % 2))))
       (map #(str (subs s 0 %) (subs s (+ % 2))))
       set))

;; One step: apply all rules, collect all reachable strings
(defn miu-step [strings]
  (into strings (mapcat #(concat (rule-i %) (rule-ii %)
                                 (rule-iii %) (rule-iv %))
                        strings)))

;; Explore reachability from "MI" -- BFS over the rewrite system
(-> #{"MI"} miu-step miu-step miu-step miu-step)
;; => #{"MI" "MIU" "MIUIU" "MII" "MIIII" "MIIIIIIII" "MUII" ...}
;; "MU" never appears -- the I-count mod 3 invariant holds.
;;
;; The set is the memo: miu-step is a monotone function on
;; a lattice of string sets.  Repeated application reaches a
;; fixed point (Datalog-style bottom-up saturation).  The mod-3
;; invariant proves the fixed point excludes "MU".

Rules I and II are injective (the output uniquely determines the input). Rules III and IV are non-injective (multiple inputs can produce the same output). The miu-step function is a monotone endofunction on the lattice of string sets under ⊆: it can only grow the set, never shrink it. The threading macro (-> #{\"MI\"} miu-step miu-step ...) is a pipeline of set-valued transforms – each step widens the reachable frontier.

7.3. Conway's Look-and-Say: A Reversible Codec That Explodes

Conway's look-and-say sequence describes the previous term: count consecutive runs of each digit, then emit the count followed by the digit.

(defn look-and-say [s]
  (->> (partition-by identity s)
       (mapcat (fn [run] [(count run) (first run)]))
       (apply str)))

(defn look-and-say-decode [s]
  (->> (partition 2 s)
       (mapcat (fn [[n ch]] (repeat (Character/digit ^char n 10) ch)))
       (apply str)))

(-> "1"
    look-and-say        ;; "11"        ← bijection
    look-and-say        ;; "21"        ← bijection
    look-and-say        ;; "1211"      ← bijection
    look-and-say        ;; "111221"    ← bijection
    look-and-say)       ;; "312211"    ← bijection

;; Every step is reversible:
(-> "312211"
    look-and-say-decode ;; "111221"
    look-and-say-decode ;; "1211"
    look-and-say-decode ;; "21"
    look-and-say-decode ;; "11"
    look-and-say-decode);; "1"

The pipeline is fully reversible – every step is a bijection. But the output length grows by Conway's constant λ ≈ 1.303577… per step. After 70 iterations, a single "1" becomes a string of over 10^23 characters. This is a pipeline where reversibility is preserved but the data volume explodes – the opposite of a hash (which compresses to fixed width and loses information).

Conway proved that λ is the unique positive real root of a degree-71 polynomial. The look-and-say sequence is the pipeline tool's canonical example of a transform that is bijective yet computationally expensive to reverse at scale – not because information is lost, but because the intermediate values grow without bound.

A related sequence: the Collatz conjecture (3n+1 problem). The forward step is deterministic – (if (even? n) (/ n 2) (inc (* 3 n))) – but the reverse branches: given 16, the predecessor is either 32 (even path) or 5 (odd path). The reverse tree explodes exponentially, and whether every starting value eventually reaches 1 remains an open problem. Collatz is the pipeline model's hardest case: a simple deterministic step whose reachability is undecidable.

These three systems – MIU, look-and-say, Collatz – are all examples of iterated function systems that may or may not reach a fixed point (attractor). Kaprekar's routine always reaches 6174 (a fixed point). Collatz conjecturally always reaches 1 (an attractor). MIU never reaches "MU" (an unreachable state). Look-and-say diverges (no fixed point, unbounded growth).

Peter Alvaro's CALM theorem (Consistency As Logical Monotonicity) at BugBash 2026 formalizes this: a monotone function on a lattice is guaranteed to converge to a fixed point without coordination. The miu-step function is monotone on the lattice of string sets under ⊆. Datalog evaluation, CRDTs, and the pipeline tool's reachability analysis are all instances of the same pattern: saturate a monotone function until it stabilizes.

7.4. Fixed Points and Contraction Mappings

Iterated cos converges to the Dottie number (0.739085...) -- the unique fixed point where ~cos(x) = x. Iterated sin converges to 0. Both are contraction mappings: each application brings the value closer to the attractor regardless of starting point.

Iterated cos (10 steps – still converging toward Dottie 0.7391…):

(-> 1.0 Math/cos Math/cos Math/cos Math/cos Math/cos
         Math/cos Math/cos Math/cos Math/cos Math/cos)

Iterated sin (10 steps – converging toward 0, slower):

(-> 1.0 Math/sin Math/sin Math/sin Math/sin Math/sin
         Math/sin Math/sin Math/sin Math/sin Math/sin)

Constant-function pipeline (20× cos + format – always "0.74" regardless of input):

(-> (System/currentTimeMillis)
    double
    Math/cos Math/cos Math/cos Math/cos Math/cos
    Math/cos Math/cos Math/cos Math/cos Math/cos
    Math/cos Math/cos Math/cos Math/cos Math/cos
    Math/cos Math/cos Math/cos Math/cos Math/cos
    #(format "%.2f" %))

The last pipeline is a constant function: the attractor erases all input information before toFixed even runs. Every epoch value, every random number, every string-converted-to-double – all produce "0.74".

toFixed(n) is the fractional analog of mod N. Where mod 256 clamps integers to a byte, toFixed(2) clamps reals to two decimal places. Both are surjections that discard precision:

Operation	Domain	Discards	Analog
mod N	integers	quotient (winding number)	Moduli table
toFixed(n)	reals	digits beyond position n	decimal truncation
subs(0,k)	strings	characters beyond position k	string truncation
quantize	audio/signals	amplitude below threshold	ADC resolution

All four are in the same family: projections onto a finite-precision domain. The pipeline indicator turns red at any of them.

The RPN calculator connection: the HP-48 stack display shows exactly this convergence. Mash the COS key repeatedly and watch the stack settle to 0.7390855263. The 4-level stack IS the threading macro's intermediate values, rendered in hardware since 1990.

This is SICP §1.3.3's fixed-point procedure: keep applying f until |f(x) - x| < tolerance. Newton's method is fixed-point applied to x → x - f(x)/f'(x). The Y-combinator in lambda calculus is the same concept at a higher level: Y f = f (Y f) – the fixed point of a function on functions. See Lambda Calculus in Scheme, Register Machines, and Scheme for the SICP lineage on this site.

7.5. Category Theory: Isomorphisms and Functors

In category theory, a morphism f: A -> B is an isomorphism if there exists g: B -> A such that g ∘ f = id_A and f ∘ g = id_B. In the category Set, isomorphisms coincide with bijections.

A pipeline step is an isomorphism precisely when its transform is bijective and the inverse function is computable. Base64 encode is an isomorphism (over the domain of byte strings, mapping to base64 character strings). SHA-256 is not: it is a morphism with no right inverse.

The category of bijective functions between finite sets is a groupoid: every morphism is an isomorphism. The pipeline tool lives partly in this category (reversible steps) and partly in a larger category where non-invertible morphisms exist (hash, truncate, compress-lossy).

A functor maps objects and morphisms between categories while preserving structure. map lifts a function f: A → B to List A → List B, preserving composition and identity. Functor composition corresponds to pipeline composition. The pipeline tool is, informally, a UI for constructing a particular functor (Awodey 2010; Coecke and Kissinger 2017).

The category-theory thread here, the unitary gates of §6.4, and the bijections of §8.3 share a common structure. A dagger category (Selinger 2007) equips every morphism with an adjoint, but requires the dagger to be total – which fails for non-invertible pipeline steps. For a Set-valued pipeline, the invertible fragment forms a groupoid; the full pipeline lives in a larger category where some morphisms have no inverse.

7.6. Galois Connections

A Galois connection between two ordered sets (P, ≤) and (Q, ≤) is a pair of monotone functions f: P -> Q and g: Q -> P such that f(p) ≤ q ⟺ p ≤ g(q) for all p ∈ P, q ∈ Q.

Galois connections appear in abstract interpretation, type inference, and security lattice reasoning. They are not isomorphisms: the round-trips g(f(p)) ≥ p and f(g(q)) ≤ q hold, but with inequality, not equality. The connection captures an adjoint relationship, not a perfect inverse.

The encode pipeline's lossy operations sit in Galois connection territory: lossy compression maps a detailed representation p to a smaller one f(p), and g(f(p)) recovers something that "approximates" p but is not identical to it. Lossless compression is a true isomorphism; lossy compression is a Galois connection at best, and often just a surjection.

This is the setting of abstract interpretation (Cousot and Cousot 1977): the amber/lossy step is the abstraction f, the recovered value g(f(p)) over-approximates p, and the Galois connection is exactly the soundness condition relating concrete and abstract domains.

7.7. Lean 4 Tactics as Rewrite Operations

In Lean 4's tactic mode, rw [h] rewrites the goal using equation h, replacing one side with the other. simp applies a set of rewrite lemmas repeatedly, searching for a normal form. exact closes a goal by providing a term directly.

Tactics are goal transformations. The tactic state before and after each step is visible in the infoview – this is precisely the intermediate-value display the pipeline tool shows. The reversibility question in tactic proofs: can a tactic step be undone? In Lean's interactive mode, Ctrl-Z in the editor undoes the text of the last tactic, restoring the proof state. This is environment-level undo (snapshot rollback), not mathematical inverse.

The interesting parallel: some rewrites are symmetric (if a = b, then rw [h] and rw [← h] are inverses), and some are not (simplification may contract a term to a normal form that cannot be expanded back). The pipeline tool's reversibility annotation maps to this distinction in the rewrite calculus.

Lean 4's dependent types can express pipeline value constraints directly. Temperature in Kelvin requires T > 0; modular arithmetic operates on Fin n (the type of natural numbers less than n); integers and rationals behave differently under mod (rationals produce values in [0, n), integers produce ℤ/nℤ). These constraints are the formal version of the Racket contracts from the composition section, but enforced at compile time rather than runtime. See Formal Transformer Verification for a worked example of dimension-indexed dependent types in Rocq/Coq, and Lean 4 on FreeBSD for the local theorem-proving toolchain.

7.8. Specifying Invertibility as a Dependent Record

The taxonomy names the Lean predicates that classify a transform – Function.Involutive, Function.Bijective, Function.Injective, Function.Surjective. A predicate answers a yes/no question. The schema of an invertible transform is the dependent record that carries both directions together with the proof that they compose to the identity. Mathlib's Equiv is that record (toFun, invFun, left_inv, right_inv); the one-sided version the tool needs to run a chain backwards is smaller (de Moura and Ullrich 2021):

structure Codec (α β : Type) where
  encode    : α → β
  decode    : β → α
  roundtrip : ∀ a, decode (encode a) = a

The registry flags are the erased shadow of this record. :inverse? true asserts a Codec exists; :involution? true is the special case decode = encode (Mathlib's Function.Involutive), so the registry stores one :fn instead of two; :inverse? false asserts no Codec can be built. The schema makes that last claim precise – a Codec forces encode injective, so a surjection cannot be the encode of any codec:

theorem Codec.encode_inj (c : Codec α β) {a b}
    (h : c.encode a = c.encode b) : a = b := by
  have hd := congrArg c.decode h
  rwa [c.roundtrip, c.roundtrip] at hd

upper, lower, and sort map distinct inputs to the same output, so this proof cannot be discharged for them. That impossibility – not a flag the author sets by hand – is the type-level meaning of the red indicator.

Between green and red sits the injection – the amber tier the taxonomy marks "not yet implemented" in the tool. gzip compress is the exemplar: injective but not surjective, so decompress is partial – defined only on valid compressed data. In the schema this is a one-sided Codec: encode injective, decode a left inverse, but encode (decode b) = b only for b in the image. Restricted to the image the injection is a genuine bijection – spec/transform-codecs.lean proves this as section_on_image. The amber tier is green on the image.

Domain restriction (the third design property) is the dependent part. ipv4→int inverts only on dotted quads, char-codes only on space-separated integers; outside those sets the ClojureScript decode throws. The schema replaces the runtime guard with a refined domain – a subtype {s : String // DottedQuad s} carrying the predicate, so decode is total by construction (Freeman and Pfenning 1991). The same move makes c→K a codec on {t : ℝ // t ≥ -273.15} rather than on all reals.

These are not illustrations. spec/transform-codecs.lean (Mathlib-free) defines the Codec record, proves encode_inj, proves the jump5 digit cipher an involution by decide over Fin 10, and instantiates a refined-domain codec. spec/codecproofs/ then upgrades jump5 to a two-sided Mathlib Equiv (The mathlib Community 2020) – both inverse directions plus symm = self – pinned to mathlib v4.30.0. It also formalizes the thesis: a pipeline is Equiv.trans of its steps (a chain of bijections is a bijection), and reverse-pipeline is the contravariant law (e₁.trans e₂).symm = e₂.symm.trans e₁.symm – flip each step and reverse the order – which holds here by rfl. Base64's bijective core is verified too: the 64-symbol alphabet round-trips (decide), and the 3-byte↔4-sextet regrouping is an Equiv on BitVec 24 proven by bitblasting (bv_decide); only length/padding bookkeeping separates this from full base64. Both files check with lean~/~lake, no sorry.

There is no runtime binding between Lean and the ClojureScript tool, and there does not need to be. The binding is a spec correspondence keyed by codec id, in three layers: Lean proves the Codec total for all inputs; the CLJS registry carries the executable :encode~/:decode~ and the domain predicate; test.check samples the round-trip (= s (decode (encode s))) on generated inputs. The middle and sampler layers live in wal-sh.tools.transform.spec – malli schemas key each refinement domain (e.g. :ipv4-int to a dotted-quad predicate) to a codec id, and test.check samples the round-trip against them, separately from the tool itself. Lean is the proof, the registry the executable shadow, the property test the sampler – one fact in three representations.

7.9. Two prover projections, and what each settles

The same claims project onto two provers under different lenses; neither is the spec, both are checks of it. The terminology each fixes is the authoritative reading – "involution", "bijection", "Equiv", "transducer" are not loose words here.

Claim	Lean (`spec/codecproofs/`)	Dafny (lab repo)	Runtime
`reverse∘reverse = id`	--	proved over a parametric alphabet `seq<T>`	test.check
`rot13` / `jump5` involution	`Function.Involutive`, `decide` over `Fin`	`Shift(13,Shift(13,c))=c` over `Letter`	test.check
`decode∘encode = id`	`Codec` left-inverse; `Equiv` both-sided	`Shift(26-n,Shift(n,c))=c`	test.check
chain reversible / `reverse-pipeline`	`Equiv.trans` + `symm` contravariance	`ComposeReversible` (decoders reversed)	--
base64 core	alphabet `decide` + `BitVec 24` `bv_decide`	--	test.check

The lenses differ in what they make easy. Lean's Equiv packages both inverse directions and composes as a group (the dihedral structure of the involutive codecs); Dafny's finite-state-transducer reading makes the alphabet explicit, so the UTF-16-vs-code-point bug for reverse becomes the choice of the type T – a named refinement boundary rather than a footnote (see the Dafny 2026 lab and the FST note in docs/transform-tool-automata.org).

Completeness, stated honestly: proved for all inputs are the involutions, the round-trips, and chain composition; bounded proofs cover the finite cases (decide over Fin n, bv_decide over BitVec 24); sampled by test.check is everything the registry ships; open are full-string base64 padding and the surrogate-pair boundary (both left as Dafny exercises). No projection claims more than it discharges.

8. Relational Algebra and Information Theory

See also: 6 | Mathematics | Topology | Examples

8.1. Relational Algebra Operations

Relational algebra defines five primitive operations over relations (tables): selection (σ), projection (π), Cartesian product (×), union (∪), and set difference (−). Of these, projection is the canonical lossy operation.

π_{name}(Employee) keeps only the name column. The full row – salary, department, start date – is discarded. No inverse operator exists. Selection (σ_{dept='Eng'}(Employee)) also discards rows: the non-engineering rows are gone.

Natural join is information-combining, not information-losing, but it is not injective: two relations may join to produce the same result from different inputs (if foreign keys are not unique). Relational algebra over lossless-join decompositions preserves information; arbitrary joins do not.

The pipeline tool's one-way indicator maps exactly to this classification. Project and select are one-way. Rename is bijective (it is a bijection on attribute names). Extend (adding a computed column) is injective but not surjective onto the output type.

Projection's missing inverse is the database view-update problem: given an updated view, which update to the base relation produced it? The answer is not unique. A projection equipped with a principled update policy – a choice of inverse – is a relational lens (Bohannon, Pierce, and Vaughan 2006), where each expression reads left-to-right as a view definition and right-to-left as an update translation. The view-update problem, lenses, and model transformation are one problem under different names (Czarnecki et al. 2009).

8.2. Hashing as Projection

MD5, SHA-1, SHA-256 and all cryptographic hash functions map an arbitrary-length byte string to a fixed-length digest. This is a projection from a high-dimensional space (all byte strings) to a lower-dimensional one (2^128 or 2^256 values). The function is:

Surjective (every digest has at least one preimage, by the pigeonhole argument: the domain is infinite, the codomain finite, so every output is hit)
Not injective (collisions exist; for MD5 they are computationally found)
Therefore not bijective, therefore not invertible

Shannon's information theory (Shannon 1948) quantifies this: a hash compresses the source entropy into a fixed-width channel. When the source entropy exceeds the channel capacity, information is irreversibly lost.

The fingerprint metaphor is precise: a fingerprint reduces a human body to ten ink impressions. You can match a suspect to a fingerprint (surjection in the forward direction), but you cannot reconstruct the body from the prints (no preimage computation). SHA-256 is a more discriminating fingerprint – collision probability is negligible for practical inputs – but the structure is the same.

The pipeline tool must mark hash steps as one-way. The reversibility indicator for the chain becomes false the moment any hash enters it.

A concrete example – email anonymization as a mixed pipeline:

;; Verified via bb (Babashka) -- real values, not fabricated
(import java.security.MessageDigest)

(defn sha256-hex [s]
  (let [md (MessageDigest/getInstance "SHA-256")
        bytes (.digest md (.getBytes s "UTF-8"))]
    (apply str (map #(format "%02x" (bit-and % 0xff)) bytes))))

(-> "alice@example.com"
    .toLowerCase         ;; "alice@example.com"   ← surjection (but input is already lowercase)
    sha256-hex           ;; "ff8d9819fc0e12bf..." ← non-injective (RED, many-to-one)
    (subs 0 8)           ;; "ff8d9819"            ← surjection (RED, truncation)
    (Long/parseLong 16)  ;; 4287469593            ← injection (hex→long is 1-to-1)
    (mod 256))           ;; 25                    ← surjection (RED, quotient)

The pipeline goes green → RED at hash-sha256 and stays red. Each subsequent surjection (subs, mod) compounds the information loss. The final emoji-lookup is bijective on its domain [0,255] → emoji pool, but the chain is irreversible because the hash destroyed the preimage. This is the canonical pattern for anonymization: project through a one-way function, then clamp into a finite domain.

8.3. The Webring Torus and Modular Projection

The wwn (webring) torus illustrates a different kind of projection. Sites on the ring are indexed by position modulo N. Navigation is (pos + 1) mod N forward and (pos - 1 + N) mod N backward. The map ℤ -> ℤ/Nℤ (integers to integers mod N) is a quotient map: surjective, not injective. Many integers map to each residue class.

This is the mathematical structure of modular arithmetic as a covering space. The covering map ℝ -> S¹ (real line to circle) wraps the line around the circle infinitely often, with every point on the circle having infinitely many preimages. The webring wraps a linear ordering of sites into a cycle.

For the pipeline tool: any operation that reduces a value modulo N, truncates to a fixed width, or wraps around a domain is a quotient map – surjective, not injective, therefore one-way in the pipeline. The tool should classify mod, truncate, substr(0, n), and bitfield-mask operations as irreversible.

8.3.1. Semantically Meaningful Moduli

The modular projection n → n mod M is a quotient map ℤ → ℤ/Mℤ. The modulus M determines the size of the output ring. Certain moduli recur across computing, mathematics, and physical systems because the ring size has natural meaning:

M	Domain	Semantics
2	{0, 1}	Boolean / parity bit / binary decision
3	{0, 1, 2}	Ternary logic / traffic light states (red, amber, green)
4	{0, 1, 2, 3}	Cardinal directions (N=0, E=1, S=2, W=3) / quadrants
7	{0..6}	Days of the week (0-indexed: Sunday=0 or Monday=0 depending on convention)
8	{0..7}	Octal digit / 8-directional compass (N/NE/E/SE/S/SW/W/NW)
10	{0..9}	Decimal digit / check digit (Luhn, ISBN)
12	{0..11}	Months / hours on a clock face / musical pitch classes
16	{0..15}	Hex nibble / color channel (low nibble)
24	{0..23}	Hours in a day
26	{0..25}	Latin alphabet index (A=0..Z=25) / Caesar shift space
32	{0..31}	Base32 alphabet
36	{0..35}	Alphanumeric (0-9 + A-Z) / license plates
60	{0..59}	Minutes / seconds (Babylonian sexagesimal)
64	{0..63}	Base64 alphabet index
128	{0..127}	ASCII character range
256	{0..255}	Byte / octet / IPv4 octet
360	{0..359}	Degrees (angle normalization / compass bearing)
65536	{0..65535}	uint16 / TCP/UDP port number

8.3.2. Modular Arithmetic as Information Loss

Every modular reduction is a surjection: n and n + M map to the same residue. The winding number (how many times you wrapped around) is lost. This is the same information loss as the webring torus – the position on the ring is known, but the total distance traveled is not.

For the pipeline tool, mod M with any of the above moduli turns the reversibility indicator red. The lost information is the quotient: given n mod 256 = 42, the original n could be 42, 298, 554, or any 42 + 256k.

8.3.3. Turtle Geometry: mod 4 as Direction Ring

Figure 4: Logo turtle: repeat 4 { forward right } = identity

The Scratch cat starts at the origin facing East (Logo lineage: Seymour Papert, MIT, 1967). Direction is an element of ℤ/4ℤ: N=0, E=1, S=2, W=3. Turning right is (mod (+ dir 1) 4). Moving forward advances one step in the current direction.

(def directions
  {0 [0 1]    ;; N: y+1
   1 [1 0]    ;; E: x+1
   2 [0 -1]   ;; S: y-1
   3 [-1 0]}) ;; W: x-1

(defn right [{:keys [dir] :as state}]
  (assoc state :dir (mod (+ dir 1) 4)))

(defn left [{:keys [dir] :as state}]
  (assoc state :dir (mod (+ 4 (- dir 1)) 4)))  ;; subtract 1, keep positive

(defn forward [{:keys [x y dir] :as state}]
  (let [[dx dy] (directions dir)]
    (assoc state :x (+ x dx) :y (+ y dy))))

;; Cat at origin, facing East
(def cat {:x 0 :y 0 :dir 1})

(-> cat right forward right right forward right)
;; => {:x 0, :y 0, :dir 1}
;;
;; The cat is back where it started, facing the same direction.
;; The composition of these 6 steps is the identity morphism.
;;
;; Trace:
;; cat          {:x 0  :y 0  :dir 1}  E  (start)
;; right        {:x 0  :y 0  :dir 2}  S
;; forward      {:x 0  :y -1 :dir 2}  S  moved south
;; right        {:x 0  :y -1 :dir 3}  W
;; right        {:x 0  :y -1 :dir 0}  N
;; forward      {:x 0  :y 0  :dir 0}  N  moved north (back to origin)
;; right        {:x 0  :y 0  :dir 1}  E  (= start)

The direction ring is ℤ/4ℤ. right and left are inverses ((left (right state)) = state). forward is invertible only if you know the direction – it depends on state, like the Futurama AL2 cipher.

The 6-step sequence is a closed loop: its composition is the identity. This is the same structure as the Crab Canon – a path that returns to its starting point, the group-theoretic definition of a trivial loop in the fundamental group.

8.3.4. Clamping vs. Modular Projection

Clamping (max(lo, min(hi, n))) and modular projection (n mod M) are both surjections onto a bounded range, but they lose information differently:

Mod: wraps around. Values outside the range re-enter from the other side. Information loss is the winding number.
Clamp: saturates. Values outside the range stick at the boundary. Information loss is the distance past the boundary.

Both are irreversible. Both are common in data pipelines (clamping for sensor data, mod for cyclic indices). The pipeline tool should eventually support both as parameterized one-way codecs.

8.3.5. Notable Compound Moduli

Some useful projections compose two moduli. When b | a (b divides a), (n mod a) mod b = n mod b holds by the division algorithm: n = qa + r with 0 ≤ r < a, and since a is a multiple of b, r mod b = n mod b. Mathematical correctness does not imply semantic correctness.

;; Claim 1: (n mod 256) mod 16 = n mod 16
;; 16 divides 256 (256 = 16 × 16) → composition law holds.
;; Semantics: extracts the low nibble of a byte.
;; The mod-256 stage is a no-op for byte-range input (0..255).
;; Mathematically valid but the "two-stage" framing is misleading.

(every? (fn [n] (= (mod (mod n 256) 16) (mod n 16)))
        (range 512))
;; => true

;; Claim 2: (n mod 360) mod 90 = n mod 90
;; 90 divides 360 (360 = 90 × 4) → composition law holds.
;; Semantics: FAILS. mod 90 collapses all four quadrants to [0,89].
;; N=45 (quadrant N) and E=135 (quadrant E) both yield 45 after mod 90.
;; Correct quadrant requires (quot (mod n 360) 90), not a second mod.

(let [quadrant (fn [n] (quot (mod n 360) 90))
     quadrant-names {0 :N 1 :E 2 :S 3 :W}]
  (into {} (map (fn [b] [b (quadrant-names (quadrant b))])
                [0 45 90 135 180 225 270 315])))
;; => {0 :N, 45 :N, 90 :E, 135 :E, 180 :S, 225 :S, 270 :W, 315 :W}

(= (mod (mod 45 360) 90)
   (mod (mod 135 360) 90))
;; => true  -- both yield 45, quadrant identity destroyed

;; Claim 3: (month mod 12) mod 4 = month mod 4
;; 4 divides 12 → composition law holds.
;; Semantics: FAILS. Standard quarter is ceil(month/3), not month mod 4.

(let [months (range 1 13)
     standard-quarter (fn [m] (inc (quot (dec m) 3)))
     mod-quarter      (fn [m] (mod m 4))]
  (map (fn [m] {:month m :Q (standard-quarter m) :mod4 (mod-quarter m)})
       months))
;; month  1  2  3  4  5  6  7  8  9 10 11 12
;; Q      1  1  1  2  2  2  3  3  3  4  4  4
;; mod4   1  2  3  0  1  2  3  0  1  2  3  0
;; mod4 produces 0 for April and August, not valid quarter values.

Claim	b divides a?	Math correct?	Semantic correct?	Verdict
mod 256 → mod 16	16 \| 256	yes	yes (low nibble)	valid but trivial
mod 360 → mod 90	90 \| 360	yes	no (collapses quadrants)	label is wrong
mod 12 → mod 4	4 \| 12	yes	no (not quarters)	label is wrong

For bearing quadrants, use (quot (mod n 360) 90). For month quarters, use (inc (quot (dec month) 3)). The divisibility shortcut is correct; the semantic interpretation attached to claims 2 and 3 is not.

Every surjective projection above – hashing, modular reduction, clamping – discards information. Landauer's principle quantifies the physical cost.

9. Reversible Computing

9.1. Landauer's Principle

Rolf Landauer (Landauer 1961) proved that logically irreversible computation – erasing one bit of information – requires dissipating at least kT ln 2 joules of energy as heat (k is Boltzmann's constant, T is temperature in Kelvin).

The implication: information loss is physically costly. A computation that discards bits generates heat. A computation that preserves all information (bijective, reversible) can in principle be performed with zero energy dissipation. This is not merely philosophical: it is the theoretical lower bound on computation energy, and it is why reversible computing is a hardware research topic. Bennett (Bennett 1973) extended Landauer's result, showing that any computation can be made logically reversible at the cost of additional space (ancilla bits).

For the pipeline tool, Landauer's principle gives physical grounding to the reversibility indicator. A red indicator (irreversible step) marks a point where, in principle, information is destroyed – not just computationally inconvenient to recover, but physically gone from the system's entropy budget. This is the physical manifestation of the surjective projection that hash functions perform.

9.2. Toffoli Gates and Reversible Logic

The Toffoli gate (Tommaso Toffoli) (Toffoli 1980) is a universal reversible logic gate. It takes three bits (a, b, c) as input and outputs (a, b, c XOR (a AND b)). The output uniquely determines the input (the gate is its own inverse applied twice). Any classical computation can be expressed as a circuit of Toffoli gates.

Reversible circuits are circuits composed entirely of invertible gates. They compute a bijection on their input space. The ancilla bits (extra scratch bits initialized to zero) carry the information that would otherwise be discarded.

The pattern transfers directly to the pipeline tool's design space. A "garbage output" ancilla corresponds to carrying intermediate results alongside the primary value – the tool already does this by showing intermediate values. If every step in the pipeline retained its full intermediate state, the pipeline would be reversible in the Toffoli sense.

9.3. Error-Correcting Codes: Making Corruption Reversible

Corruption – flipping a bit, changing a digit – is normally irreversible. The corrupted value does not carry enough information to recover the original. Error-correcting codes change this by adding redundancy before the corruption occurs, making the corruption itself reversible.

A Hamming(7,4) code encodes 4 data bits into 7 bits by adding 3 parity bits. The parity bits are the encoding analog of Toffoli ancilla bits – extra information that makes an otherwise irreversible operation reversible.

(-> "4111111111111111"
    hamming              ;; add parity bits        ← injection (green)
    corrupt-1-bit        ;; flip one bit            ← REVERSIBLE (with parity!)
    hamming/decode)      ;; correct the flip        ← recovers original

The decode step computes a syndrome – a parity check that identifies which bit was flipped. The syndrome is the inverse function's input: it tells you exactly what to undo.

Corruption	Detectable?	Correctable?	Pipeline color
0 bits	n/a	n/a	green
1 bit	yes	yes	green (with encode)
2 bits	yes	no	amber (detected, not fixed)
3+ bits	no	no	red (silent corruption)

Without the checksum, corruption is a one-way operation like shuffle. With it, single-bit corruption is a round-trip. This is the only case in the pipeline model where damage itself is reversible – not just the encoding, but the corruption.

The same structure appears in RAID (parity stripe recovers a failed disk), TCP (checksums detect corruption, retransmission recovers it), and git (SHA-1 detects corruption, reflog recovers the object).

9.4. Quantum Computing

All quantum gates are unitary operators: they are bijections on the Hilbert space of quantum states. Unitarity preserves the inner product, which means quantum information is never destroyed during gate operations. The composition of unitary operators is unitary.

Measurement is the exception: projecting a quantum state onto a basis collapses the superposition irreversibly. Measurement is the only irreversible operation in quantum computation.

The pipeline analogy: quantum gates are the bijective transform steps; quantum measurement is the hash or projection step that collapses the intermediate representation. After measurement, the pipeline cannot be run backward. Before measurement, every step is in principle reversible.

A Clojure model of unitary gates on a two-element state vector makes the bijection/projection distinction concrete. The Pauli-X gate (bit flip) is its own inverse. The Hadamard gate maps basis states to superpositions and back – composing it with itself recovers the input. Measurement collapses the superposition: the pipeline turns red.

;; Pauli-X (NOT gate): |0⟩ ↔ |1⟩.  Self-inverse (involution).
(defn pauli-x [[a b]] [b a])

(-> [1.0 0.0]               ; |0⟩
    pauli-x                  ;;=> [0.0 1.0]  -- |1⟩
    pauli-x)                 ;;=> [1.0 0.0]  -- |0⟩ recovered

;; Hadamard gate: bijection on amplitudes.
;; H·H = I -- applying Hadamard twice recovers the original state.
(def inv-sqrt2 (/ 1.0 (Math/sqrt 2)))

(defn hadamard [[a b]]
  [(* inv-sqrt2 (+ a b))
   (* inv-sqrt2 (- a b))])

(-> [1.0 0.0]               ; |0⟩
    hadamard                 ;;=> [0.707 0.707]  -- superposition
    hadamard)                ;;=> [1.0 0.0]      -- |0⟩ recovered

;; Measurement: project to basis state.  Irreversible -- collapses the
;; superposition.  The pipeline turns red here.
(defn measure [[a b]]
  (if (> (* a a) (rand)) [1.0 0.0] [0.0 1.0]))

(-> [1.0 0.0]
    hadamard                 ; green -- bijective
    measure                  ; RED -- projection, information destroyed
    hadamard)                ; still runs, but original is unrecoverable

The Toffoli gate (Toffoli 1980) extends this to three bits and is also self-inverse: (a, b, c) → (a, b, c ⊕ (a ∧ b)). Applied twice, the XOR cancels and the original state is recovered. The pipeline tool's xor 0x42 codec is a simplified version of this pattern.

9.5. Janus: A Reversible Programming Language

Janus (Lutz and Derby, 1986; rediscovered by Yokoyama and Glück) (Yokoyama and Glück 2007) is a programming language where every statement has a defined inverse statement. Assignment is replaced by augmented assignment (x += e, invertible as x -= e). Conditionals require both a forward condition and an assertion that holds at the exit (needed to reverse the branch). Loops require an invariant that distinguishes the entry and exit states.

A Janus program is guaranteed to be invertible: the language's syntax enforces it. The inverse program is mechanically derived. This is the strongest version of the pipeline tool's reversibility guarantee: not an annotation or indicator, but a structural property enforced by the type system.

The encode tool's pipeline is not written in Janus, but the Janus model clarifies what full reversibility would require: every step must carry enough information in its output to reconstruct its input. Steps that use external state (a secret key, a random nonce, a timestamp) cannot satisfy this without also threading that state through the pipeline.

Glück and Yokoyama state the general principle (Glück and Yokoyama 2023): every reversible step carries enough in its output to reconstruct its input; where a step does not, forcing reversibility stores the missing information in ancilla or garbage. The Toffoli ancilla bit and the Janus threaded state are the same mechanism. Injectivity can be enforced as a language-level property rather than checked after the fact (Mu, Hu, and Takeichi 2004): a language admitting only injective definitions makes irreversible steps unwritable. The five properties specialize this principle to a user interface – per-step invertibility classification propagated through the chain – not a claim that the chain is a well-behaved lens.

A Janus-style codec in Clojure makes the structural guarantee concrete. Augmented assignment (x += k) has a mechanically-derived inverse (x -= k): the encode and decode functions are mirror images, and the round-trip property is provable by inspection.

;; Janus-style augmented-assignment codec.
;; x += k is invertible as x -= k -- the key `k` must travel with the value.
(defn janus-add-encode
  "Augmented assignment: add key to each char code."
  [k s]
  (apply str (map #(char (+ (int %) k)) s)))

(defn janus-add-decode
  "Mechanical inverse: subtract the same key."
  [k s]
  (apply str (map #(char (- (int %) k)) s)))

(-> "HELLO"
    (janus-add-encode 3)     ;;=> "KHOOR"  -- Caesar shift by 3
    (janus-add-decode 3))    ;;=> "HELLO"  -- exact recovery

The round-trip holds iff the key k is available at decode time. This is the ancilla requirement from Toffoli: the key is the extra information that makes the step reversible. In the pipeline tool, the XOR codec (xor 0x42) is exactly this pattern – self-inverse only because the key is hardcoded. A parameterized XOR would need to thread the key alongside the value, which is the v2 contract requirement.

9.5.1. Uiua: derived inverses as language primitives

Uiua (Schmidt 2023), a stack-based array language descended from BQN, provides three modifiers that formalize the same inverse structure the pipeline tool annotates manually:

° (un): apply the mechanically-derived inverse of a function. °√ squares, °⊟ un-couples, °utf decodes UTF-8 bytes back to a string. The inverse is derived, not hand-written – the Janus principle as a combinator.
⍜ (under): apply a function, operate on the result, then undo the outer function. ⍜⊢(×10) multiplies only the first element by 10 and restores the surrounding structure. This is the lens / bracket pattern: modify under a reversible context.
⌅ (obverse): specify a custom inverse pair for user-defined functions. This is exactly the pipeline tool's :encode / :decode pair, expressed as a language modifier.

# Reverse is an involution: °⇌ = ⇌
⍤"reverse involution" ≍ [1 2 3] °⇌⇌ [1 2 3]

# Under: modify first element, restore structure
⍤"under lens" ≍ [100 2 3] ⍜⊢(×100) [1 2 3]

Functions without a defined inverse error at compile time when wrapped in °. The language rejects the pipeline as non-reversible – the same judgment the encode tool's red indicator conveys at runtime, but enforced statically.

A revealing edge case: °⍆ (un-sort) is documented as producing a shuffle. Since sorting is a surjection that discards ordering information (the anagram problem), its "inverse" can only produce a permutation, not the original. Uiua's designers made this explicit rather than erroring – an honest affordance that acknowledges the information loss.

See Uiua research note for stack combinators, invariant assertions, and the ° / ⍜ / ⌅ modifier reference.

10. Data Pipeline Tools

10.1. CyberChef

GCHQ's CyberChef (2016, open source) (GCHQ 2016) is the direct predecessor of the tool described here. It implements exactly the model: a visual recipe of named operations applied in sequence, with the output of each operation displayed. Operations include encoding (base64, hex, URL), encryption (AES, DES), hashing (MD5, SHA), compression, and many more.

CyberChef does not annotate operations as reversible or irreversible. Every operation in the recipe list is equal in status. The user must know that SHA-256 is a one-way function and AES-encrypt is invertible (given the key). Adding an explicit reversibility indicator is a direct improvement over CyberChef's model.

CyberChef does implement a "Magic" operation that attempts to detect the input encoding and suggest a recipe – an inverse-finding heuristic. It is not mathematically guaranteed; it is pattern matching on output entropy and structure.

10.2. Apache Beam and Flink

Apache Beam defines transforms as PTransform objects that take a PCollection (a distributed collection of elements) and produce another PCollection. A pipeline is a directed acyclic graph of transforms applied to a root collection.

Beam transforms are functions on collections, not on individual elements (though ParDo applies an element-wise function). Most transforms are irreversible: Filter, GroupByKey, and aggregations discard or combine information. Map is as reversible as the function it applies. Beam does not model or annotate reversibility.

10.3. Apache Airflow

Airflow (Airbnb, 2014; Apache, 2016) defines pipelines as DAGs of tasks, where each task is an operator (PythonOperator, BashOperator, etc.) that runs a function. The DAG is the pipeline; edges are data dependencies. Airflow's contribution is scheduling and retry semantics – it models when tasks run, not what they compute.

Airflow tasks are opaque to the framework: the operator executes arbitrary code. The DAG structure does not encode whether a task is reversible. There is no built-in concept of "undo this task" – the scheduler reruns or skips, but does not invert. The closest mechanism is on_failure_callback, which is cleanup, not mathematical inverse.

10.4. AWS Step Functions and Glue

AWS Step Functions (2016) model workflows as state machines in Amazon States Language (JSON). Each state is a task, choice, parallel, or wait. The state machine is a DAG with explicit branching and error handling.

Step Functions do not model reversibility. A Task state invokes a Lambda function; the framework has no knowledge of whether the Lambda's effect is invertible. AWS Glue ETL jobs are Spark-based transform pipelines (read → transform → write) where each transform is a DynamicFrame method. Glue's visual editor renders the DAG but does not annotate transforms as lossy or lossless.

The pattern across all three (Beam, Airflow, Step Functions): the orchestration framework models task dependencies and scheduling but is agnostic to transform invertibility. The reversibility annotation lives nowhere in the deployed data engineering stack.

10.5. dbt and SQL Transform Chains

dbt (data build tool) defines a DAG of SQL models: each model is a SELECT statement that transforms upstream tables. The lineage graph is explicit and visualized. Each node materializes as a table or view.

SQL models are surjective by default (SELECT discards columns not in the projection list). The dbt lineage graph is the data engineering community's version of the pipeline tool's visual model, operating at table granularity rather than byte-string granularity.

The dbt lineage graph, relational lenses, and the database view-update problem are the same bidirectional-transformation problem at different granularities – table, view, and base relation (Czarnecki et al. 2009).

10.6. Pandas Method Chaining

Pandas DataFrames support method chaining: df.dropna().rename(...).assign(...) is a pipeline of transforms on a DataFrame. The intermediate value at each . is a new DataFrame (or a mutated one, depending on the operation).

result = (
    df
    .dropna(subset=['email'])
    .rename(columns={'email': 'address'})
    .assign(domain=lambda x: x['address'].str.split('@').str[1])
    .groupby('domain')
    .size()
    .reset_index(name='count')
    .sort_values('count', ascending=False)
)

dropna is irreversible (rows are gone). rename is bijective (a relabeling). assign is injective (adds a column; no information lost from the source columns). groupby/size is a many-to-one aggregation – strongly irreversible.

10.7. jq Pipelines

jq composes JSON transforms with |. Each filter takes a JSON value and produces a JSON value (or a stream of values). The pipeline is sequential, and each intermediate value is the output of the preceding filter.

curl -s https://api.example.com/users | jq '.[] | {name: .name, email: .email} | select(.email != null)'

.[] is injective (each element of an array is output separately, but no data is lost). Object construction {name: .name} is a projection (other fields discarded). select is a filter (non-matching values discarded). jq does not type-annotate filters as bijections or surjections.

The engineering tools above are uniformly agnostic to invertibility. The reason has a topological explanation.

11. Topology, Geometry, and the Covering Space Analogy

11.1. Covering Spaces

A covering space p: E -> B is a continuous surjective map such that every point in B has an open neighborhood U whose preimage is a disjoint union of open sets in E, each mapped homeomorphically to U (Hatcher 2002). The map ℝ -> S¹ defined by p(t) = (cos(2πt), sin(2πt)) is the canonical example: the real line covers the circle, with each point on the circle having countably infinite preimages.

The webring torus is a discrete covering: integer positions ℤ map to residue classes ℤ/Nℤ, with each class having infinitely many preimages in ℤ (namely, all integers congruent to that residue mod N). The map is the quotient map q: ℤ -> ℤ/Nℤ.

For the pipeline tool: any operation that wraps a value around a cyclic domain (modular arithmetic, ring buffer indexing, color values that wrap at 255) is a covering map. The output does not determine the input uniquely; the operation is irreversible without additional information (how many times did we wrap?).

11.2. The Fundamental Group

The fundamental group π₁(X, x₀) captures the topology of loops in a space based at x₀. For the circle S¹, π₁(S¹) = ℤ: loops are classified by their winding number. Two loops with different winding numbers are not homotopic (cannot be continuously deformed into each other).

The webring's cyclic navigation generates ℤ/Nℤ (a finite cyclic group) as its "reachability group from any position." The fundamental group distinction clarifies why modular operations are irreversible: the winding number (how many times you passed position 0) is lost when you reduce mod N.

11.3. Injections, Surjections, Bijections

See the taxonomy section for the full classification table and lattice diagram. The topological contribution: the covering space structure explains why modular operations are surjections – the winding number is the lost information.

12. Canonical Examples: Reversible Ciphers and Irreversible Projections

The pipeline tool is most instructive when it mixes reversible and irreversible steps, because the reversibility indicator changes mid-chain. The examples below are drawn from cryptographic history, pager culture, combinatorics, television puzzle design, and audio steganography.

12.1. Reversible: A1Z26 (Alphabetic Position Cipher)

A=1, B=2, … Z=26. Spaces pass through. The mapping is bijective on the uppercase Latin alphabet: every letter has exactly one number, every number in [1,26] maps to exactly one letter.

(-> "HELLO"
    a1z26)
⇒ "8 5 12 12 15"

(-> "8 5 12 12 15"
    a1z26/decode)
⇒ "HELLO"

A1Z26 is the cipher behind the Little Orphan Annie decoder ring (National Security Agency 1943) in A Christmas Story (Clark 1983) ("Be sure to drink your Ovaltine"). The actual 1934 badge is more sophisticated than simple A=1: it has a permuted alphabet on the outer ring and a numbered rotating dial. The weekly radio broadcast announced the key offset. The decoder pin is literally the inverse function cast in brass.

The actual 1934 mechanism in Clojure:

(def annie-ring "AGPTBHMCSDEZVLNEJYKUWROXFIQ")

(defn annie-encode [s key]
  (->> s
       (filter #(>= (.indexOf annie-ring (str (Character/toUpperCase %))) 0))
       (map #(let [pos (.indexOf annie-ring (str (Character/toUpperCase %)))]
               (str (inc (mod (+ pos (dec key)) 26)))))
       (interpose " ")
       (apply str)))

(defn annie-decode [s key]
  (->> (clojure.string/split s #" ")
       (remove clojure.string/blank?)
       (map #(let [n (Integer/parseInt %)]
               (str (.charAt annie-ring (mod (+ (dec n) (- 26 (dec key))) 26)))))
       (apply str)))

(-> "BE SURE TO DRINK YOUR OVALTINE"
    (annie-encode 3)
    (annie-decode 3))
;; => "BESURETODRINKYOUROVALTINE"

Singh (Singh 1999) and Kahn (Kahn 1996) cover the history of substitution ciphers in this lineage.

12.2. Reversible: Reverse (The Shining)

String reversal is its own inverse: (reverse (reverse s)) = s. In The Shining (Kubrick 1980), Danny writes REDRUM on a door; the mirror reveals MURDER. The mirror is the inverse function.

(-> "REDRUM"
    reverse)
⇒ "MURDER"

Try it: reverse REDRUM.

In American Dad! ("Dungeons and Wagons," S2E5), the characters seek Castle Roodpart – only to realize too late that reverse("Roodpart") is trapdooR. The show doubles down: saying "Rohtaga" (reverse("Agathor")) kills Steve's character in-game, and Barry plays a wizard named Fladnag (reverse("Gandalf")). Try: reverse Roodpart, reverse Rohtaga, and reverse Fladnag.

12.3. Reversible: Jump-the-Five (Pager Code)

A substitution cipher on digits used by pagers and phone phreaks in the 1990s (“Jump the Five” 1990). Each digit maps to its complement across the "5 barrier" on a telephone keypad:

Input	0	1	2	3	4	5	6	7	8	9
Output	5	6	7	8	9	0	1	2	3	4

The mapping is its own inverse (an involution), like ROT13 for digits. Non-digit characters pass through unchanged.

(-> "867-5309"
    jump5)
⇒ "312-0854"

(-> "312-0854"
    jump5)
⇒ "867-5309"

12.4. Reversible (with escalation): Futurama Alien Languages

Futurama (Cohen and Groening 1999) shipped two alien writing systems. Alien Language 1 (AL1) is a straight substitution cipher: each alien glyph maps to exactly one English letter. Fans cracked it within days of the pilot airing by frequency analysis on background signage. It is a bijection – a lookup table.

The writers responded with Alien Language 2 (AL2), a modular-arithmetic cipher: each glyph's value depends on the sum of all preceding glyph values, modulo a fixed base. AL2 requires state (the running sum) to decode, making it resistant to naive single-character frequency analysis. It is still bijective given the initial state, but the state dependency is the arms race escalation.

The progression AL1 → AL2 maps directly to the pipeline tool's property taxonomy: AL1 is a stateless bijection (green indicator, trivially reversible). AL2 is a key-dependent bijection (amber indicator – reversible only if the initial accumulator value is known). The joke is that the writers upgraded from a morphism in Set to a morphism in a fibered category over state.

12.5. Reversible (escalating): Gravity Falls End-Credits Cryptograms

Gravity Falls (Disney, Alex Hirsch, 2012–2016) (Hirsch 2012) embeds a cryptogram in the end credits of every episode. The show uses four cipher systems, introduced sequentially with increasing cryptanalytic difficulty — a deliberate pedagogical ladder:

Cipher	Episodes	Type	Class	Self-inverse?
Caesar (ROT-3)	S1 E1–6	Monoalphabetic shift	Bijection on Z/26Z	No
Atbash	S1 E7–13	Monoalphabetic mirror	Involution on Z/26Z	Yes
A1Z26	S1 E14–19	Ordinal encoding	Bijection, alpha → {1..26}	Yes
Vigenere	S1 E20+, S2	Polyalphabetic shift	Key-parameterized bijection	No

All four are mathematically reversible. The puzzle design requires bijections — fans must be able to decode. The escalation pattern is the same as Futurama (AL1 → AL2) but with four steps instead of two, and the final step (Vigenere) introduces key dependency: the keyword is hidden within each episode, making it a state-dependent bijection.

The antagonist is named Bill Cipher — a two-register pun (dollar bill + cryptographic cipher), visually based on the Eye of Providence triangle. His name is itself an encoded clue to the show's central mechanic.

Episode 20 chains three ciphers in sequence: A1Z26 → Atbash → Caesar. This is exactly the pipeline tool's model: a composite recipe where each step is individually reversible, and the chain is reversible because every step is a bijection.

;; Gravity Falls S1E20 pipeline (all bijective -- green)
(-> ciphertext
    a1z26/decode       ; numbers → letters
    atbash             ; mirror the alphabet
    rot-3/decode)      ; shift back by 3
⇒ plaintext

Try a Gravity Falls–style pipeline: rot13 → hex → base64.

12.6. Domain-Crossing: Steganography as Pipeline

The preceding examples encode information within the same domain (text → text, digits → digits). Steganography crosses domains: the payload is embedded in a carrier signal, and the "decode" step is a domain transform (e.g. audio → image).

Aphex Twin's "Windowlicker" (James 1999) and the Selected Ambient Works Volume II track "∆Mᵢ⁻¹=−α∑Dᵢ[n]" embed images in the audio frequency spectrum. The "cipher" is not a key — it is a domain transform: apply an FFT (or spectrogram) to the audio signal, and the image appears as amplitude patterns in the frequency-time plane. On the "Equation" B-side, the embedded image is Richard D. James' own face.

;; Aphex Twin spectrogram pipeline
(-> audio-signal
    fft                ; time domain → frequency domain  (bijective!)
    spectrogram        ; complex → magnitude (phase lost -- surjection)
    render-image)      ; 2D magnitude array → pixels

The FFT itself is a bijection — the inverse FFT recovers the original signal exactly. But the spectrogram discards phase information, keeping only magnitude. The pipeline is reversible through the FFT step, then irreversible at the spectrogram: you can see the face, but you cannot reconstruct the audio from the image alone.

Nine Inch Nails' Year Zero ARG (Reznor 2007) used the same technique with a different payload model. Fans spectrogram-analyzed leaked tracks expecting hidden lore and instead found a phone number — a callback to a real-world game system. The pipeline is identical (audio → FFT → spectrogram → image), but the payload is an out-of-band reference: the decoded content is a pointer, not a message. This is steganographic indirection: the pipeline's output is a key into another pipeline.

12.7. Irreversible: Sort (The Anagram Problem)

Sorting the characters of a string is a surjection: "hello" and "holle" both sort to "ehllo". The original ordering is gone. Recovering it is the anagram problem – given a sorted multiset of characters, enumerate all permutations that form valid words.

(-> "listen"
    sort)
⇒ "eilnst"

(-> "silent"
    sort)
⇒ "eilnst"

Both inputs collapse to the same output. The pipeline indicator turns red: the chain is now irreversible. sort is a projection from the symmetric group $S_n$ onto the identity permutation – it forgets the permutation that produced the input.

12.8. Irreversible: Shuffle (Random Permutation)

Shuffling a string applies a random permutation. It is injective on a given random seed (different inputs produce different outputs) but the seed is not carried in the output, so the inverse is not computable without external state. This is the "state dependency" from Section 10's property list.

12.9. Composite Recipes

The most instructive pipelines mix reversible and irreversible steps. The reversibility indicator propagates through the chain:

;; Fully reversible -- green all the way
(-> "192.168.1.1"
    ipv4→int          ; "3232235777"
    hex                ; "c0a80101"
    base64)            ; "YzBhODAxMDE="

;; Reversible until sort -- then red
(-> "Be sure to drink your Ovaltine"
    a1z26              ; "2-5 19-21-18-5 ..."     ← reversible
    sort)              ; "- -   . . . 1 1 ..."   ← irreversible

;; The Shining, encoded
(-> "MURDER"
    reverse            ; "REDRUM"                 ← reversible
    rot13              ; "ERQEHZ"                 ← reversible
    base64)            ; "RVJRRVHA"               ← reversible

13. Design Implications for the Pipeline Tool

The preceding sections converge on five properties that the pipeline tool must track per step and propagate through the chain:

Invertibility class: bijection, injection, surjection, or general map. Derived from the mathematical properties of the transform function.
Key dependency: does the inverse require an external value (decryption key, initialization vector, salt) not present in the forward output? AES-encrypt is bijective given the key, but the key is not carried in the ciphertext by default.
Domain restriction: is the inverse defined on all outputs of the forward function, or only on a subset? gzip decompress is defined only on valid compressed data.
State dependency: does the operation depend on mutable external state (a counter, a timestamp, a random nonce)? If so, the inverse requires reconstructing that state.
Chain reversibility: the chain is reversible if and only if every step is bijective and none introduce external dependencies that are not threaded through the pipeline.

The visual representation should propagate a traffic-light model: green for bijective steps, amber for injective or key-dependent steps, red for surjective or hash steps. The chain's overall indicator is the minimum (most pessimistic) of its step indicators. A single MD5 in a chain of otherwise bijective operations makes the whole chain irreversible.

13.1. Enforcement Spectrum: Who Guarantees the Inverse?

The encode tool's :inverse? true flag is a claim, not a proof. The question is where in the stack the guarantee lives. Languages differ in how strongly they enforce that decode ∘ encode = id:

Language	Mechanism	Enforcement
Uiua	`°` (un) operator (Schmidt 2023)	Compiler derives the inverse from language semantics
Janus	Every statement has a defined inverse (Yokoyama and Glück 2007)	Syntax enforces reversibility – if it compiles, it's invertible
Haskell	`Iso` type in lens libraries (van Laarhoven 2009)	Type-level – `Iso s t a b` is a pair the type checker verifies
Lean 4	Dependent types (`Equiv`)	Proven for all inputs – `spec/codecproofs/` (two-sided `Equiv`)
Dafny	Auto-active `ensures` + SMT (Z3)	Proven for all inputs – involution/round-trip as one `ensures` Z3 closes
Racket	Contracts (Findler and Felleisen 2002)	Runtime-checked pre/post conditions with blame
Clojure	PBT (test.check) (Claessen and Hughes 2000)	Empirical – 100 random inputs, not a proof

At the top of this spectrum (Uiua, Janus), the language knows the inverse and can derive it mechanically. At the bottom (Clojure, JavaScript), the programmer asserts it and PBT searches for counterexamples.

Three of these points are now built, not hypothetical, for the same codecs: Lean (spec/codecproofs/, a two-sided Equiv), Dafny (the transducer lab, auto-active ensures Z3 discharges), and Racket (transform-tool-racket, contracts with Findler–Felleisen blame) – the same decode ∘ encode = id claim guaranteed at three different points in the stack. The Lean/Dafny split is itself the Dafny 2026 question: interactive dependent types vs. auto-active SMT, same theorem.

The encode tool operates at the Clojure level: the :inverse? flag is a manual annotation, and the property-based tests verify it empirically. The green indicator means "annotated as bijective and PBT hasn't found a counterexample" – not "proven bijective." This is the same epistemological status as any PBT-verified property: high confidence, not certainty.

The practical consequence: some inputs might work even when the codec is marked irreversible. upper/decode (lowercase) applied to an already-lowercase string is a perfect round-trip. sort/decode on a string that was already sorted is the identity. The irreversibility flag is about the worst case over the entire input domain, not about any specific input. A codec that fails for even one input is not a bijection – but it may be bijective on a subdomain, and the tool does not yet track domain restrictions (property 3 above).

CyberChef is the closest deployed tool. The improvement is the explicit reversibility annotation at every step, derived from the operation's mathematical classification, not from the user's knowledge. The intellectual lineage – from Forth to group theory to Janus to quantum gates – all points toward the same abstraction: a pipeline step is a morphism, and the pipeline's reversibility is a property of the morphism's category.

Formal verification of these properties is tractable. TLA+ specifications can model the state machine of encode/decode transitions with invariants about information preservation. Alloy can enforce structural constraints on stage composition. See PLDI 2026 (PAgE workshop on agentic engineering foundations) and BugBash 2026 (Nada Amin on dependent types, Gabriela Moreira on Quint/TLA+) for venues where this work connects to active research.

14. Verification Audit

Seven parallel skeptic agents audited this document on 2026-06-07. Each agent reviewed one or more include files against the codec registry, mathematical definitions, and historical sources.

Section	Agent	Verdict	Finding
taxonomy.org	skeptic	Clean	No errors. upper/lower correctly classified as surjections.
composition-structure.org	skeptic	Corrected	`reverse` was listed as monoid homomorphism. Fixed: anti-homomorphism (`f(ab) = f(b)f(a)`).
information-theory.org	skeptic	Corrected	"nearly all digests are reached" contradicted surjectivity claim. Fixed: pigeonhole on infinite domain.
math-type-theory.org	skeptic	Acceptable	"dagger to be total" is imprecise but not false. Low severity.
examples.org	skeptic	Corrected	A1Z26 space shown as `0` not `/`. Fixed to match codec implementation.
reversible-computing.org	skeptic	Corrected	Janus creation date 1982→1986 (Lutz and Derby).
topology.org	skeptic	Clean	Covering space and fundamental group claims verified.
design-implications.org	skeptic	Clean	Five properties and enforcement spectrum correct.
visual-dataflow.org	skeptic	Clean	Historical dates and attributions verified.
references.bib	skeptic	Deferred	`moore1974` journal attribution under review (tracked issue).

A content-value assessment rated each section as ADDS KNOWLEDGE, MAKES EXPLICIT, or FILLER. Strongest sections: Composition Structure, Reversible Computing, Design Implications. Weakest: Visual/Dataflow (museum tour), Data Pipeline Tools (catalog without insight).

The rebuild specification was separately validated by a TypeScript agent (aygp-dr/transform-tool) that filed 9 spec-gap issues while building a complete implementation (40/40 tests passing). One fabricated test vector (VXJ5Yg== → VXJ5eWI=) was caught and corrected.

15. References

Awodey, Steve. 2010. Category Theory. 2nd ed. Oxford University Press.

Backus, John. 1978. “Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs.” Communications of the Acm 21 (8): 613–41. https://doi.org/10.1145/359576.359579.

Bennett, Charles H. 1973. “Logical Reversibility of Computation.” Ibm Journal of Research and Development 17 (6): 525–32. https://doi.org/10.1147/rd.176.0525.

Bohannon, Aaron, J. Nathan Foster, Benjamin C. Pierce, Alexandre Pilkiewicz, and Alan Schmitt. 2008. “Boomerang: Resourceful Lenses for String Data.” In Proceedings of the 35th Annual Acm Sigplan-Sigact Symposium on Principles of Programming Languages, 407–19. Popl ’08. ACM. https://doi.org/10.1145/1328438.1328487.

Bohannon, Aaron, Benjamin C. Pierce, and Jeffrey A. Vaughan. 2006. “Relational Lenses: A Language for Updatable Views.” In Proceedings of the 25th Acm Sigmod-Sigact-Sigart Symposium on Principles of Database Systems (Pods), 338–47. ACM. https://doi.org/10.1145/1142351.1142399.

Claessen, Koen, and John Hughes. 2000. “QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs.” In Proceedings of the Fifth Acm Sigplan International Conference on Functional Programming, 268–79. Icfp ’00. ACM. https://doi.org/10.1145/351240.351266.

Clark, Bob. 1983. “A Christmas Story.”

Coecke, Bob, and Aleks Kissinger. 2017. Picturing Quantum Processes: A First Course in Quantum Theory and Diagrammatic Reasoning. Cambridge University Press.

Cohen, David X., and Matt Groening. 1999. “Futurama Alien Languages.”

Cousot, Patrick, and Radhia Cousot. 1977. “Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints.” In Conference Record of the Fourth Acm Symposium on Principles of Programming Languages (Popl ’77), 238–52. ACM. https://doi.org/10.1145/512950.512973.

Czarnecki, Krzysztof, J. Nathan Foster, Zhenjiang Hu, Ralf Lämmel, Andy Schürr, and James F. Terwilliger. 2009. “Bidirectional Transformations: A Cross-Discipline Perspective.” In Theory and Practice of Model Transformations (Icmt 2009), 5563:260–83. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-02408-5_19.

Draper, Reid. 2013. “Test.Check: Quickcheck for Clojure.” https://github.com/clojure/test.check.

Findler, Robert Bruce, and Matthias Felleisen. 2002. “Contracts for Higher-Order Functions.” In Proceedings of the Seventh Acm Sigplan International Conference on Functional Programming, 48–59. Icfp ’02. ACM. https://doi.org/10.1145/581478.581484.

Foster, J. Nathan, Michael B. Greenwald, Jonathan T. Moore, Benjamin C. Pierce, and Alan Schmitt. 2007. “Combinators for Bidirectional Tree Transformations: A Linguistic Approach to the View-Update Problem.” Acm Transactions on Programming Languages and Systems 29 (3): 17–es. https://doi.org/10.1145/1232420.1232424.

Freeman, Tim, and Frank Pfenning. 1991. “Refinement Types for ML.” In Proceedings of the Acm Sigplan 1991 Conference on Programming Language Design and Implementation (Pldi), 268–77. ACM. https://doi.org/10.1145/113445.113468.

GCHQ. 2016. “CyberChef.” GitHub.

Glück, Robert, and Tetsuo Yokoyama. 2023. “Reversible Computing from a Programming Language Perspective.” Theoretical Computer Science 953: 113429. https://doi.org/10.1016/j.tcs.2022.06.010.

Hashiba, Keishi, Keisuke Nakano, Kazuyuki Asada, and Kentaro Kikuchi. 2025. “Characterizations of Partial Well-Behaved Lenses.” In Proceedings of the 2025 Acm Sigplan International Workshop on Partial Evaluation and Program Manipulation (Pepm ’25), 43–53. ACM. https://doi.org/10.1145/3704253.3706139.

Hatcher, Allen. 2002. Algebraic Topology. Cambridge University Press.

Hickey, Rich. 2014. “Transducers.” Strange Loop conference talk.

Hirsch, Alex. 2012. “Gravity Falls End-Credits Cryptograms.”

Hofstadter, Douglas R. 1979. Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books.

James, Richard D. 1999. “Windowlicker / ∆Mᵢ⁻¹=−Α∑Dᵢ[N].” Warp Records.

Kahn, David. 1996. The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet. 2nd ed. Scribner.

Kubrick, Stanley. 1980. “The Shining.”

Laarhoven, Twan van. 2009. “CPS Based Functional References.” Blog post.

Landauer, Rolf. 1961. “Irreversibility and Heat Generation in the Computing Process.” Ibm Journal of Research and Development 5 (3): 183–91. https://doi.org/10.1147/rd.53.0183.

Meyer, Bertrand. 1992. “Applying ``Design by Contract’’.” Computer 25 (10): 40–51. https://doi.org/10.1109/2.161279.

Moore, Charles H. 1974. “Forth: A New Way to Program a Minicomputer.” Astronomy and Astrophysics Supplement Series 15: 497.

Moura, Leonardo de, and Sebastian Ullrich. 2021. “The Lean 4 Theorem Prover and Programming Language.” In Automated Deduction – Cade 28, 12699:625–35. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-030-79876-5_37.

Mu, Shin-Cheng, Zhenjiang Hu, and Masato Takeichi. 2004. “An Injective Language for Reversible Computation.” In Mathematics of Program Construction (Mpc 2004), 3125:289–313. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-540-27764-4_16.

National Security Agency. 1943. “A1z26 Cipher.” Elementary cipher in cryptanalytic training materials.

Puckette, Miller. 1996. “Pure Data.” In Proceedings of the International Computer Music Conference. Icmc ’96.

Reznor, Trent. 2007. “Year Zero Arg.” Interscope Records / 42 Entertainment.

Schmidt, Kai. 2023. “Uiua: A Tacit Stack-Based Array Programming Language.” https://www.uiua.org.

Selinger, Peter. 2007. “Dagger Compact Closed Categories and Completely Positive Maps.” Electronic Notes in Theoretical Computer Science 170: 139–63. https://doi.org/10.1016/j.entcs.2006.12.018.

Shannon, Claude E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27 (3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.

Singh, Simon. 1999. The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography. Anchor Books.

The mathlib Community. 2020. “The Lean Mathematical Library.” In Proceedings of the 9th Acm Sigplan International Conference on Certified Programs and Proofs (Cpp), 367–81. ACM. https://doi.org/10.1145/3372885.3373824.

Toffoli, Tommaso. 1980. “Reversible Computing.” In Automata, Languages and Programming, 85:632–44. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/3-540-10003-2_104.

Yokoyama, Tetsuo, and Robert Glück. 2007. “A Reversible Programming Language and Its Invertible Self-Interpreter.” In Proceedings of the 2007 Acm Sigplan Symposium on Partial Evaluation and Semantics-Based Program Manipulation, 144–53. Pepm ’07. ACM. https://doi.org/10.1145/1244381.1244404.

Zhang, Hanliang, Wenhao Tang, Ruifeng Xie, Meng Wang, and Zhenjiang Hu. 2023. “Contract Lenses: Reasoning About Bidirectional Programs via Calculation.” Journal of Functional Programming 33: e10. https://doi.org/10.1017/S0956796823000059.

“Jump the Five.” 1990. Pager and telephone culture, 1990s.

Reversible Pipeline Transforms Intellectual lineage of stacked, composable, invertible operations across visual programming, stack machines, mathematics, and information theory

Table of Contents