Research Ecosystem: Morning Brief

Two-week window across the tracked feeds (now 50+, including surveillance, BSD, Clojure/Scheme, distsys, and interp-evals), scored against active research threads. Metadata only: titles, links, dates. Read the source for substance. (what we track, how we crawl)

Top (5-7 min)

Did Google's AI agents really build an operating system for $916?

AI Snake Oil, 2026-05-22. Empirical pushback on an agent-build headline. The critique thread meeting the agent-harness thread head on, which is exactly the scrutiny the brief was missing while it had no critique feeds wired.

Frontier Risk Report (February to March 2026)

METR, 2026-05-19. Capability elicitation and risk evaluation from a named eval org. Primary material for the eval-under-constraints thread.

The Case for Evaluating Model Behaviors

Alignment Forum, 2026-05-20. Argues for behavior evaluation over benchmark scores. Anchors this week's eval-realism theme below.

Agentic software development hypothesis

Marc Brooker, 2026-05-20. The agentic-SDLC claim stated as something falsifiable. Sits against Hillel Wayne on assumption-weakening the same week.

It is easier for Californians to escape data brokers after a Markup investigation

The Markup, 2026-05-22. Methodology-driven accountability reporting with a concrete policy outcome. The surveillance thread, finally crawled.

Themes this week

Eval realism is the open problem

Alignment Forum makes the case for evaluating model behaviors and notes the safe-to-dangerous shift breaks eval realism; METR ships a frontier risk report. Three sources converging on the same shift: from leaderboard deltas to whether an evaluation measures the behavior that matters under deployment.

Agent-build claims meet scrutiny

Latent Space declares every model lab an agent lab; AI Snake Oil asks whether Google's agents really built an OS for $916; Brooker frames agentic development as a hypothesis. The discourse is splitting cleanly into booster framing, empirical pushback, and falsifiable claims.

Scan (15 min)

Surveillance and critique

Tail

Feed silences (diagnostic)

  • arxiv-cs-ai and Cloudflare: fetch error this run (slow large feeds hit the 8s probe timeout). Both crawl fine normally; transient, not down.
  • Logic Magazine: 404, genuinely broken. Candidate to repair or demote.
  • transformer-circuits.pub (403), Anthropic research and news (RSS dropped): still demoted. Anthropic shows up via Cloudflare and HN instead.
  • Dan Luu returned 200 with 0 parsed items; worth checking the Atom shape.

Build provenance

build: 2026-05-23 | crawler-sha: 37a15fc | feeds: 51 crawled | items-considered: 252 (14d) | published: 26