Org-Mode Feed Generation: Atom and RSS from Static Sites
Table of Contents
1. Overview
An org-mode site has one corpus of .org files and one parser
(wal-sh.site.org/read-all). Six tools project this corpus into six
artifacts for six audiences. The parser extracts #+TITLE, #+DATE,
#+DESCRIPTION, #+KEYWORDS, headings, and property drawers. Each
tool consumes the same parse output and discards what it doesn't need.
2. Six projections, one parser
| Tool | Output | Consumer | What it reads |
|---|---|---|---|
pocket-es.indexer |
search-index.json |
Browser (BM25 search) | title, description, keywords, headings, body terms |
wal-sh.site.feed |
site.atom |
Feed readers (elfeed, NetNewsWire) | title, date, description, path |
scripts/generate_sitemap.py |
sitemap.xml |
Search engines (Googlebot) | path, date (lastmod) |
wal-sh.site.check-headers |
lint report | Developer (CI / gmake lint) | title, author, date, keywords presence |
wal-sh.site.annotations |
verification report | Auditor (daily-publish, REPL) | property drawers (:VERIFIED_AT:, :VERDICT:) |
wal-sh.site.provenance |
provenance mapping | Agent (ProvenanceGuard) | property drawers (:SOURCE:, :VERIFIED_BY:) |
The parser is the invariant. The projections vary. Adding a seventh
tool (say, a citation graph extractor) requires only a new consumer
of read-all, not a new parser.
3. Architecture
site/**/*.org
│
└── wal-sh.site.org/read-all ──┬──────────┬──────────┬──────────┬──────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
indexer feed sitemap headers annotations
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
index.json site.atom sitemap.xml lint drawers
4. The three published feeds
| Feed | URL | What it contains | Generated by |
|---|---|---|---|
site.atom |
https://wal.sh/site.atom (/feed redirects here) |
Our published content (last 14 days) | wal-sh.site.feed |
research.atom |
https://wal.sh/current/research.atom |
Outbound links the crawler curated | tech-crawler /www-sync |
| Pinboard RSS | https://feeds.pinboard.in/rss/u:jwalsh/ |
Phone/browser bookmarks | Pinboard (external) |
Three audiences: what we publish, what we read, what we bookmark.
5. Shared parser contract
Every tool depends on the same map shape from org/read-all:
{:path "site/current/2026-06-19.org"
:title "Morning Brief: Thursday, June 19"
:date "2026-06-19"
:author "Jason Walsh"
:keywords ["fable-arc" "agent-trust"]
:description "Research brief: ..."
:headings ["Top (5-7 min)" "Themes this week" ...]
:content "raw org text..."
;; property drawers (from annotations scanner):
:drawers [{:heading "Top" :verified_at "..." :verdict "correct"}]}
If this shape changes, all six tools break. The parser is the
contract. Property-based tests in pocket-es.indexer and
wal-sh.site.check-headers verify the shape holds across the
full corpus.
6. Feed validation
Both site-generated feeds validate against RFC 4287 (Atom 1.0):
xmllint --noout site/site.atom && echo "site.atom: valid"
xmllint --noout site/current/research.atom && echo "research.atom: valid"
Required elements (RFC 4287 Section 4.1.1): <feed>, <title>,
<id>, <updated>, <author>, <link rel"self">=.
The <generator> tag carries the version:
<generator uri"https://wal.sh/research/pocket-es/" version="1.0.0">wal-sh.site.feed</generator>=
7. Staleness
The feed, index, and sitemap all go stale when an org file changes.
gmake publish-daily regenerates all three:
gmake publish-daily
# Runs: publish-current → publish-today → index → deploy-index → sitemap
The Makefile uses gmake's own timestamp comparison for the org publish
step (.published/current/research.org stamp file). The index and
sitemap are still phony targets (always rebuild). Making them
non-phony with $(wildcard site/**/*.org) dependencies is tracked
in a separate bead.
8. Related
- pocket-es – the search engine that shares the same parser
- Query Surface Spec – the JSON DSL for date-range queries (overlaps with feed purpose)
- Annotation Systems – property drawer conventions the annotations scanner reads
- Bot Compliance Spec – how we crawl the feeds we consume
- Crawler Sources – the 48+ feeds that produce research.atom