Research Ecosystem: Morning Brief
Two-week window across the tracked feeds (now 50+, including surveillance, BSD, Clojure/Scheme, distsys, and interp-evals), scored against active research threads. Metadata only: titles, links, dates. Read the source for substance. (what we track, how we crawl)
Top (5-7 min)
Did Google's AI agents really build an operating system for $916?
AI Snake Oil, 2026-05-22. Empirical pushback on an agent-build headline. The critique thread meeting the agent-harness thread head on, which is exactly the scrutiny the brief was missing while it had no critique feeds wired.
Frontier Risk Report (February to March 2026)
METR, 2026-05-19. Capability elicitation and risk evaluation from a named eval org. Primary material for the eval-under-constraints thread.
The Case for Evaluating Model Behaviors
Alignment Forum, 2026-05-20. Argues for behavior evaluation over benchmark scores. Anchors this week's eval-realism theme below.
Agentic software development hypothesis
Marc Brooker, 2026-05-20. The agentic-SDLC claim stated as something falsifiable. Sits against Hillel Wayne on assumption-weakening the same week.
It is easier for Californians to escape data brokers after a Markup investigation
The Markup, 2026-05-22. Methodology-driven accountability reporting with a concrete policy outcome. The surveillance thread, finally crawled.
Themes this week
Eval realism is the open problem
Alignment Forum makes the case for evaluating model behaviors and notes the safe-to-dangerous shift breaks eval realism; METR ships a frontier risk report. Three sources converging on the same shift: from leaderboard deltas to whether an evaluation measures the behavior that matters under deployment.
Agent-build claims meet scrutiny
Latent Space declares every model lab an agent lab; AI Snake Oil asks whether Google's agents really built an OS for $916; Brooker frames agentic development as a hypothesis. The discourse is splitting cleanly into booster framing, empirical pushback, and falsifiable claims.
Scan (15 min)
Agents and LLM tooling
- New AI infra unicorns: Exa, Modal, TurboPuffer, Latent Space, 05-22
- Datasette Agent, Simon Willison, 05-21
- FTC settles with Cox Media over "Active Listening" AI marketing, Simon Willison, 05-22
AI labs
- Speed-of-light text generation with Nemotron-Labs diffusion LMs, Hugging Face, 05-23
- OpenAI named a Leader in enterprise coding agents (Gartner), OpenAI, 05-22
Formal methods, distributed systems, correctness
- Assumptions weaken properties, Hillel Wayne, 05-20
- Chess invariants, Murat Demirbas, 05-21
- What's Easy Now? What's Hard Now?, Marc Brooker, 05-18
Mechanistic interpretability and evals
- Risk reports need to address deployment-time spread of misalignment, Alignment Forum, 05-15
- Open model bonanza: Gemma 4, DeepSeek V4, Kimi K2.6, GLM-5.1, Interconnects, 05-16
Surveillance and critique
- CISA Security Leak, Schneier on Security, 05-22
- macOS Kernel Memory Corruption Exploit, Schneier on Security, 05-21
- How Deepfakes Tore a High School Apart, 404 Media, 05-21
- Shopping isn't politics, Pluralistic, 05-21
Systems, BSD, hardware
- A large set of stable kernel updates, LWN, 05-23
- 664: No one misses SPARC, BSD Now, 05-21
Aviation
- The 787: when the system that makes change-incorporation possible was dismantled, Leeham News, 05-18
- Airbus' 27-year march to a new airplane, Leeham News, 05-21
Clojure and Scheme
- Finding a file's MIME type with Apache Tika, in Clojure, Planet Clojure, 05-22
Tail
- How Virgin Atlantic ships faster with Codex, OpenAI, 05-22
- Specialization Beats Scale, Hugging Face, 05-22
- Behind the Blog: The Attention Wars, 404 Media, 05-22
- Unitree GO-M8018-6 motor reverse engineering, Hackaday, 05-23
- The memory shortage is repricing consumer electronics, Simon Willison, 05-22
Feed silences (diagnostic)
arxiv-cs-aiandCloudflare: fetch error this run (slow large feeds hit the 8s probe timeout). Both crawl fine normally; transient, not down.Logic Magazine: 404, genuinely broken. Candidate to repair or demote.transformer-circuits.pub(403),Anthropicresearch and news (RSS dropped): still demoted. Anthropic shows up via Cloudflare and HN instead.Dan Luureturned 200 with 0 parsed items; worth checking the Atom shape.
Build provenance
build: 2026-05-23 | crawler-sha: 37a15fc | feeds: 51 crawled | items-considered: 252 (14d) | published: 26