1. Overview

An org-mode site has one corpus of .org files and one parser (wal-sh.site.org/read-all). Six tools project this corpus into six artifacts for six audiences. The parser extracts #+TITLE, #+DATE, #+DESCRIPTION, #+KEYWORDS, headings, and property drawers. Each tool consumes the same parse output and discards what it doesn't need.

2. Six projections, one parser

Tool	Output	Consumer	What it reads
`pocket-es.indexer`	`search-index.json`	Browser (BM25 search)	title, description, keywords, headings, body terms
`wal-sh.site.feed`	`site.atom`	Feed readers (elfeed, NetNewsWire)	title, date, description, path
`scripts/generate_sitemap.py`	`sitemap.xml`	Search engines (Googlebot)	path, date (lastmod)
`wal-sh.site.check-headers`	lint report	Developer (CI / gmake lint)	title, author, date, keywords presence
`wal-sh.site.annotations`	verification report	Auditor (daily-publish, REPL)	property drawers (:VERIFIED_AT:, :VERDICT:)
`wal-sh.site.provenance`	provenance mapping	Agent (ProvenanceGuard)	property drawers (:SOURCE:, :VERIFIED_BY:)

The parser is the invariant. The projections vary. Adding a seventh tool (say, a citation graph extractor) requires only a new consumer of read-all, not a new parser.

3. Architecture

site/**/*.org
     │
     └── wal-sh.site.org/read-all ──┬──────────┬──────────┬──────────┬──────────┐
                                     │          │          │          │          │
                                     ▼          ▼          ▼          ▼          ▼
                                  indexer     feed     sitemap    headers   annotations
                                     │          │          │          │          │
                                     ▼          ▼          ▼          ▼          ▼
                               index.json  site.atom  sitemap.xml  lint     drawers

4. The three published feeds

Feed	URL	What it contains	Generated by
`site.atom`	`https://wal.sh/site.atom` (`/feed` redirects here)	Our published content (last 14 days)	`wal-sh.site.feed`
`research.atom`	`https://wal.sh/current/research.atom`	Outbound links the crawler curated	tech-crawler `/www-sync`
Pinboard RSS	`https://feeds.pinboard.in/rss/u:jwalsh/`	Phone/browser bookmarks	Pinboard (external)

Three audiences: what we publish, what we read, what we bookmark.

5. Shared parser contract

Every tool depends on the same map shape from org/read-all:

{:path     "site/current/2026-06-19.org"
 :title    "Morning Brief: Thursday, June 19"
 :date     "2026-06-19"
 :author   "Jason Walsh"
 :keywords ["fable-arc" "agent-trust"]
 :description "Research brief: ..."
 :headings ["Top (5-7 min)" "Themes this week" ...]
 :content  "raw org text..."
 ;; property drawers (from annotations scanner):
 :drawers  [{:heading "Top" :verified_at "..." :verdict "correct"}]}

If this shape changes, all six tools break. The parser is the contract. Property-based tests in pocket-es.indexer and wal-sh.site.check-headers verify the shape holds across the full corpus.

6. Feed validation

Both site-generated feeds validate against RFC 4287 (Atom 1.0):

xmllint --noout site/site.atom && echo "site.atom: valid"
xmllint --noout site/current/research.atom && echo "research.atom: valid"

Required elements (RFC 4287 Section 4.1.1): <feed>, <title>, <id>, <updated>, <author>, <link rel"self">=.

The <generator> tag carries the version: <generator uri"https://wal.sh/research/pocket-es/" version="1.0.0">wal-sh.site.feed</generator>=

7. Staleness

The feed, index, and sitemap all go stale when an org file changes. gmake publish-daily regenerates all three:

gmake publish-daily
# Runs: publish-current → publish-today → index → deploy-index → sitemap

The Makefile uses gmake's own timestamp comparison for the org publish step (.published/current/research.org stamp file). The index and sitemap are still phony targets (always rebuild). Making them non-phony with $(wildcard site/**/*.org) dependencies is tracked in a separate bead.

8. Related

pocket-es – the search engine that shares the same parser
Query Surface Spec – the JSON DSL for date-range queries (overlaps with feed purpose)
Annotation Systems – property drawer conventions the annotations scanner reads
Bot Compliance Spec – how we crawl the feeds we consume
Crawler Sources – the 48+ feeds that produce research.atom

Org-Mode Feed Generation: Atom and RSS from Static Sites

Table of Contents