Research Ecosystem: Morning Brief

Two-week window across the tracked feeds (60 core feeds this run: agents, evals, interp, formal methods, surveillance, BSD, Clojure/Scheme, SDR, aviation), scored against active research threads. Metadata only: titles, links, dates. Read the source for substance. (what we track, how we crawl)

The arXiv cs.AI firehose is back after the weekend lull (654 items in window, the largest single source); yesterday's brief shed it as Sunday-quiet. The four generated gap-fill feeds (DeepMind, Cursor, Claude Code, Anthropic) remain in with second-hand provenance flagged inline; DeepMind has now aged out of the 14-day window (last post 05-01).

Top (5-7 min)

No honor among (ad-tech) thieves: Pluralistic, 2026-05-25. Doctorow on ad-tech firms defrauding each other, the surveillance-economy underbelly that the adversarial ad-tech thread tracks. Fresh today and the cleanest statement of how the incentives rot from the inside.
Claude Compliance API support with Cloudflare CASB: Cloudflare, 2026-05-21. Agent governance shipped as a CASB product, paired with Claude Managed Agents. The commercial mirror of the Walsh-Research compliance contract, from the infra seat rather than the spec seat.
Agentic software development hypothesis: Marc Brooker, 2026-05-20. States the agentic-SDLC claim as something falsifiable rather than assumed. The CPRR move applied to the loudest claim in the field this quarter.
Frontier Risk Report (February to March 2026): METR, 2026-05-19. Capability elicitation and risk evaluation from a named eval org. Primary material for the eval-under-constraints thread, the empirical counterweight to lab self-reporting.
Vega: zero-knowledge proofs for digital identity in the age of AI: Microsoft Research, 2026-05-21. ZK identity proofs pitched as the answer to agent-era verification. The constructive counterpart to the age-verification critique below; worth holding the two side by side.

Themes this week

Every model lab is now an agent lab: Latent Space says it outright; OpenAI's feed is wall-to-wall Codex (Gartner leader, Codex from anywhere); Cursor ships Composer 2.5; Microsoft ships MagenticLite and Fara. The product surface has converged on agents, which puts the weight on the harness, sandbox, and compliance layers rather than the model.
Identity and surveillance get an infrastructure layer: the same week brings Doctorow on ad-tech fraud, an FTC active-listening settlement, a Markup win on data brokers, and Microsoft's ZK identity proposal. The critique and the build-out are now arguing over the same substrate.
Eval realism is the contested ground: Alignment Forum argues for evaluating model behaviors and that risk reports must address deployment-time spread; METR ships a frontier risk report; Microsoft adds SocialReasoning-Bench. The move is from leaderboard deltas to whether an evaluation measures behavior that shows up under deployment.
Agent-build claims meet scrutiny: booster framing (Latent Space, OpenAI) meets empirical pushback (AI Snake Oil on the $916 OS, Torvalds on kernel bugs) and falsifiable claims (Brooker's hypothesis).

Scan (15 min)

Agents and harnesses
- All Model Labs are now Agent Labs, Latent Space, 05-23, the thesis stated outright
- Giving Agents Computers – Ivan Burazin, Daytona, Latent Space, 05-21, sandbox-as-substrate
- Railway: The Agent-Native Cloud – Jake Cooper, Latent Space, 05-20, infra repositioned around agents
- Datasette Agent, Simon Willison, 05-21, a concrete local-first agent over SQLite
- The Open Agent Leaderboard, Hugging Face, 05-18, an open eval surface for agents
- Expert Clojure Workflows for AI Agents: Four Skills, Planet Clojure, 05-14, agent skills in a Lisp shop
- Composer 2.5, Cursor, 05-18, Cursor's in-house agent model (generated feed)
- Claude Code v2.1.150, Claude Code releases, 05-23, the harness shipping multiple builds a day (generated feed)
AI labs
- OpenAI named a Leader in enterprise coding agents (Gartner), OpenAI, 05-22
- An OpenAI model disproved a central conjecture in discrete geometry, OpenAI, 05-20
- Speed-of-light text generation with Nemotron-Labs diffusion LMs, Hugging Face, 05-23
- Open model bonanza: Gemma 4, DeepSeek V4, Kimi K2.6, GLM-5.1, Interconnects, 05-16
- Empirical Research Assistance (ERA): from Nature publication to computational discovery, Google Research, 05-19
- Project Glasswing: an initial update, Anthropic, 05-22, the lab side of the Cloudflare red-team writeup (generated feed)
Eval, interpretability, safety
- The Case for Evaluating Model Behaviors, Alignment Forum, 05-20
- Risk reports need to address deployment-time spread of misalignment, Alignment Forum, 05-15
- The safe-to-dangerous shift is a fundamental problem for eval realism, Alignment Forum, 05-14
- SocialReasoning-Bench: whether agents act in users' best interests, Microsoft Research, 05-11
- Further Notes on AI Delegation and Long-Horizon Reliability, Microsoft Research, 05-15
- On AI Security, Schneier, 05-20
Formal methods, distsys, correctness
- When did the bug start?, Antithesis, 05-11, causality analysis on deterministic traces
- Assumptions weaken properties, Hillel Wayne, 05-20, the cost of every assumption in a spec
- What's Easy Now? What's Hard Now?, Marc Brooker, 05-18
- Chess invariants, Murat Demirbas, 05-21, invariants as a teaching device
- How we 40x'd the performance of a class of temporal queries, XTDB, 05-11
Surveillance and critique
- No honor among (ad-tech) thieves, Pluralistic, 05-25, the ad-tech economy defrauding itself
- FTC settlement over the "active listening" AI marketing claim, Simon Willison, 05-22
- Californians can more easily escape data brokers after a Markup investigation, The Markup, 05-22
- Bypassing On-Camera Age-Verification Checks, Schneier, 05-15, the technical case against age-gating
- Signal warns it would pull out of Canada over the lawful-access bill, Citizen Lab, 05-14
- The FBI wants to buy nationwide access to license plate readers, 404 Media, 05-18
Cloudflare and bot infrastructure
- Claude Compliance API support with Cloudflare CASB, Cloudflare, 05-21, agent governance as a product
- Announcing Claude Managed Agents on Cloudflare, Cloudflare, 05-19, a managed control plane for third-party agents
- Project Glasswing: what Mythos showed us, Cloudflare, 05-18, frontier-model red-team writeup
- When idle is not idle: a Linux kernel optimization became a QUIC bug, Cloudflare, 05-12
Systems, BSD, homelab
- OpenBSD 7.9 released, LWN, 05-21
- FreeBSD Foundation Executive Director tries daily-driving FreeBSD on a laptop, Phoronix, 05-24
- 664: No one misses SPARC, BSD Now, 05-21
- Which ZFS Storage Metrics Matter for Database Performance, Klara Systems, 05-13
- BPF support in GCC 16 and beyond, LWN, 05-21
SDR, RF, aviation
- GopherTrunk: a pure-Go trunked radio scanner (P25, DMR, TETRA, NXDN), RTL-SDR, 05-19
- MHI RJ to retrofit CRJ fleet with traffic-awareness (ADS-B In), The Air Current, 05-22
- ACAS X is already in the cockpit: the AI-and-ATC debate is three years too late, Leeham, 05-24
- Air France, Airbus guilty of corporate manslaughter in the 2009 AF447 crash, Slashdot, 05-23
Clojure and Scheme
- Clojure Deref (May 19, 2026), Planet Clojure, 05-19
- Clojure 1.12.5, Planet Clojure, 05-12
- Hoot 0.9.0 released, Spritely Institute, 05-13, Scheme to WebAssembly
- soot, solar, sedimentation, sin, and 'centers, Andy Wingo, 05-16
- Agent-Ready Stack, Planet Clojure, 05-11, a Lisp shop's agent tooling baseline

Tail

Pope Leo XIV's first encyclical says AI must serve humanity, not the powerful few, Hacker News, 05-25
California executive order directs agencies to prepare for AI-driven workforce disruption, Slashdot, 05-25
IBM spins off the first pure-play quantum chip foundry, Hacker News, 05-25
Jira Is Turing-Complete, Hacker News, 05-25
Microsoft open-sources the earliest DOS source code discovered to date, Hacker News, 05-24
The last six months in LLMs in five minutes, Simon Willison, 05-19

Feed silences (diagnostic)

arxiv-cs-ai: recovered. 276 items this run, 654 in the 14-day window, the single largest source. Yesterday's brief shed it as Sunday-quiet; the weekday announcement batch is back.
deepmind-blog (generated): aged out of the window. Latest post is 05-01, now outside 14 days, so the DeepMind gap-fill contributed nothing this run. Monthly cadence; lean on first-party labs until the next post.
Dan Luu, James Bornholt, Netflix Tech Blog: errors this run (XML parse deep in the archive feed; host-side connection; TLS path). Bornholt and Netflix are transient and left in place to re-check; Dan Luu stays demoted.
harvard-seas: 0 items again (the Localist API returned no events this run).
Logic Magazine: still demoted (feed URL serves a Netlify 404 page). No longer crawled.
Generated feeds cursor-blog, claude-code-releases, anthropic-generated current via 304; deepmind-blog stale (see above). Third-party LLM-scraped RSS for publishers with no first-party feed; provenance second-hand, on a C-004 stability watch.

Build provenance

build: 2026-05-25 | crawler-sha: 44e3db1 (Walsh-Research/1.2, compliance v1.2) | feeds: 60 core (incl. 4 generated gap-fill, 1 aged out) | items-considered: 1059 (14d, incl. 654 arXiv) | published: 49 | note: arXiv firehose recovered post-weekend; DeepMind generated feed aged out of window; 3 feed errors (Dan Luu/Bornholt/Netflix)