pocket-es: Client-Side BM25 Search for org-mode Sites
Table of Contents
The search box above is a BM25 search engine for this org-mode site. No server,
no daemon, no JVM at runtime. One fetch(), one JSON file, scoring in
ClojureScript.
The index was built at deploy time from every .org file on this site. The
client loads it, parses it, and exposes a Lucene-style query DSL:
match, term, bool, multi_match, prefix, plus suggest for
autocompletion.
1. Architecture
1.1. Build pipeline
The indexer is JVM Clojure (indexer.clj). It shares a tokenizer
(token.cljc) with the ClojureScript client — one tokenization contract,
no asymmetries between build and query time. The indexer extracts #+TITLE,
#+KEYWORDS, #+DESCRIPTION, #+DATE from every org file, tokenizes the
body, computes term frequencies (top 50 per doc), then IDF across the corpus.
Writes a single JSON file.
1.2. Client and wire sizes
142KB of Closure-compiled ClojureScript (32KB gzipped). Loads the JSON index, builds an in-memory search structure, and exposes both a visual UI (self-injecting DOM) and a console API.
| Asset | Raw | Gzipped | Ratio |
|---|---|---|---|
| pocket-es.js | 142KB | 32KB | 4.4x |
| search-index.json | 824KB | 255KB | 3.2x |
| Total | 966KB | 287KB | 3.4x |
2. Interfaces
Five consumers share one JSON index and one BM25 scoring contract (k1=1.2,
b=0.75). Source lives at the project root in src/pocket_es/.
2.1. Browser
The <script> tag loads pocket-es.js, which self-injects into
#pocket-es-root or #content. The console API is available as
pocketES.search(...). See Query Examples below.
2.2. Emacs
pocket-es.el loads the JSON index, tokenizes queries with the same contract
as the ClojureScript client, and scores results with BM25. Results appear in a
*pocket-es* buffer with clickable paths.
;; Interactive search M-x pocket-es-search RET lambda calculus RET ;; Programmatic access (pocket-es-search "bedrock rag") (pocket-es-suggest "clo")
2.3. Node CLI
The shadow-cljs :node-library target exports search, suggest, and
parseIndex as CommonJS functions.
$ npm run search -- "tla+" --size 3 3 results (614 docs indexed) 1. [1441.00] TLA+ for System Design 2026-01-10 — /research/tla-plus-system-design 2. [1375.00] TLA+ Traffic Lights and Communication Protocol 2024-08-11 — /research/tla+ 3. [1133.00] E-Commerce Order State Machine in TLA+ 2026-01-11 — /research/tla-plus-system-design/ecommerce-order-states $ npm run search -- "agen" --suggest agent agent architecture agent coordination agent framework
2.4. Babashka REPL
pocket-es.cli uses the shared token.cljc tokenizer and the BM25 formula
directly — no JSON string matching, no subprocess. search returns data.
$ bb -cp src -m pocket-es.cli "graphql federation" 3
3 results for "graphql federation" (614 docs indexed)
[15.1] Clojure + GraphQL Integration
2019-12 — /research/clojure-graphql
[15.0] GraphQL: Schema, Operations, and Federation
2024-08-11 — /research/graphql
[14.4] GraphQLConf 2025
2025-09-07 — /events/graphqlconf-2025
For REPL or property-based testing, search returns maps:
$ bb -cp src -e '
(require (quote [pocket-es.cli :as cli]))
(let [idx (cli/load-index "site/static/search-index.json")]
(map :title (cli/search idx "agent sandbox" :size 3)))
'
("Agent Sandbox Architectures" "Sandboxing AI Coding Agents with FreeBSD Jails"
"CLI Coding Agents — 2026 Q2 Comparison")
2.5. Console API
Available in the browser DevTools after the page loads:
pocketES.search({ query: { match: { _all: "crdt" } } })
pocketES.suggest({ text: "clo", size: 8 })
pocketES.cluster.health()
3. Query DSL
| Query type | Behavior |
|---|---|
match |
Tokenize, BM25 score per term, sum |
term |
Exact match on keyword/field array |
bool |
must/should/filter/must_not, intersect/union/filter |
multi_match |
Match across fields with boost weights |
match_all |
Return everything, score 1.0 |
prefix |
Prefix scan on string/array fields |
4. Scoring
BM25 with k1=1.2, b=0.75. The entire scoring function:
(defn bm25-term [term doc idf avg-dl] (let [tf (get-in doc [:terms term] 0) dl (:doc_len doc 1) idf-v (get idf term 0) numer (* tf (+ k1 1)) denom (+ tf (* k1 (+ (- 1 b) (* b (/ dl avg-dl)))))] (if (zero? denom) 0 (* idf-v (/ numer denom)))))
k1 1.2, b 0.75. IDF is precomputed at build time. Term frequency lives in
the per-document terms map.
5. Query Examples
Click any block to execute it. Results appear inline and are logged to the DevTools console.
5.1. match — tokenize + BM25 score
pocketES.search({ query: { match: { title: "clojure" } } })
5.2. term — exact match on keyword field
pocketES.search({ query: { term: { keywords: "crdt" } } })
5.3. multi_match — across fields with boosts
pocketES.search({ query: { multi_match: {
query: "agent isolation",
fields: ["title^3", "description", "headings^2"]
} } })
5.4. bool — must / should / must_not
pocketES.search({ query: { bool: {
must: [{ match: { title: "freebsd" } }],
should: [{ match: { title: "security" } }],
must_not: [{ term: { keywords: "python" } }]
} } })
6. State Machine
Nine user actions, five atoms, URL sync. Try/keyword clicks reset everything;
typing keeps the date filter; pagination keeps both. Full diagram source in
state-machine.dot.
7. Stack
- Shared tokenizer:
src/pocket_es/token.cljc— compiles to CLJ, CLJS, and bb - Indexer:
src/pocket_es/indexer.clj(JVM Clojure) — parses org files, computes BM25 IDF, emits JSON - Client:
src/pocket_es/core.cljs— BM25 scoring, Lucene-style query DSL, index loading - UI:
src/pocket_es/ui.cljs— self-injecting DOM, debounced input, date filters - CLI:
src/pocket_es/cli.clj— bb/JVM entry point, data-returningsearchfunction - Tests:
test/pocket_es/token_test.cljc— 25 assertions including 300 property-based test iterations