Testing pocket-es from the Outside

1. The blind-builder / sighted-tester split
2. The publish-then-reindex contract
3. URL form is a contract too
4. Feature composition, not features
5. Two verified, one asserted
6. Instruments lie
7. Claim vs. calibration
8. Reusable checklist
9. See also

The builder's lessons page is written from inside the pipeline. This is the companion piece from the other chair. One agent had write access and no eyes; one had eyes and no write access. That asymmetry turned out to be the whole story.

1. The blind-builder / sighted-tester split

The single most useful thing about this session was the constraint itself. The builder could assert "611 documents indexed, three consumers, deep links work." The tester's job was to go to the URL and check whether the running artifact agreed. Most of the value was being the thing that says "the code you wrote and the page that shipped are not the same observation."

A build pipeline can be green while the live site disagrees, and only someone looking at the live site catches it.

2. The publish-then-reindex contract

The builder shipped research/pocket-es/lessons/ and the page went live, resolving 200, fully readable. From inside the pipeline that looks done. From the outside: the engine's own search for it returned zero hits. The build SHA had even advanced – a freshness signal moved, implying a rebuild happened, while the served index still excluded the new doc.

That is the kind of gap a builder cannot see, because the builder reasons about "did I write the file" and the tester reasons about "does the deployed system reflect the file." The index has an implicit dependency contract with the document set, and the only place that contract is observable is the live artifact.

3. URL form is a contract too

The site mixes two output shapes: flat files at /<id> with no trailing slash, and directory pages at /<id>/ with a trailing slash. The result links the engine generates have to match the shape the server serves, or they 404 or bounce through a redirect.

Earlier in the session the result links carried a trailing slash that 404'd on flat-file pages; that got fixed to no-slash; then a directory-style page arrived and inverted the assumption. The durable lesson: when your host serves static files, the URL-derivation logic in the indexer is load-bearing and has to handle every shape org-publish can emit.

The tester catches this because the tester clicks the link. The builder derives the URL and trusts the derivation.

4. Feature composition, not features

It is easy to test that ?q=emacs runs a search and easy to test that #heading scrolls to a heading. The interesting test is the compound link that does both at once, because the two restoration mechanisms – query-param replay and native fragment scroll – are independent and can fight.

This one restored application state and document position from a single cold-loaded URL without either stepping on the other. That finding is only reachable by testing the combination, not the parts. For stateful client apps, the bugs live in the seams between features, so the tests have to live there too.

5. Two verified, one asserted

The lessons page asserts "three consumers, one index." From the browser the tester can verify two: the ClojureScript client (window.pocketES) and the index structure. The Emacs client was an unverifiable claim from the browser vantage point – anything off the web surface sits behind a methodology boundary.

The claim held up when the source was handed over (272 lines, same BM25 constants, same stopword set). But the honest test report had to say "two verified, one asserted" until that happened, rather than laundering the builder's claim as the tester's observation.

6. Instruments lie

Twice the tooling produced confident false positives:

A URL-validity sweep reported 404s that did not exist, because probes used HEAD with an appended trailing slash – testing a URL form the server does not serve, not the documents.
A content filter flagged web-vitals.attribution.js and an educational JWT example as leaked credentials. The filename and the textbook token pattern matched the filter; neither was a real finding.

When a finding is alarming, suspect the instrument before the system, and re-derive the result a second independent way before reporting it.

7. Claim vs. calibration

The builder's writeup is genuinely rigorous. But "one tokenizer, one truth" is true for the two .cljc compile targets and an aspiration for the hand-ported Elisp third. "Property-based tests as a contract language" describes the JVM side; the Elisp ERT suite is example-based. None of that makes the page dishonest. It makes it marketing-voiced.

The tester's job is to enjoy the cosplay and still write down where the costume is, because the seam where the rigor is weakest – the hand-ported tokenizer – is exactly where the next real bug will come from.

8. Reusable checklist

Verify the deployed artifact, not the source
Probe canonical URL forms (trailing slash, no slash, .html)
Test feature composition at the seams, not features in isolation
Suspect the instrument on alarming findings
Separate verified from asserted in the test report
A build SHA advancing does not mean the index was rebuilt

9. See also

pocket-es – the search engine
Building a Search Engine in One Session – the builder's perspective