Morning Brief: Tuesday, June 17

Two-week window across 48 tracked feeds, scored against active research threads. Metadata only: titles, links, dates. Read the source for substance. (what we track, how we crawl, subscribe)

Vercel goes all-in on agents with three launches in one day: Eve (open-source agent framework), the Agent Stack, and Vercel Connect. GLM-5.2 lands via Hugging Face, explicitly targeting long-horizon agentic tasks. The Fable arc enters day 10 with the first red-team study of Fable 5 and Opus 4.8 appearing on arXiv. Meanwhile, a position paper argues coding benchmarks are fundamentally misaligned with agentic software engineering, and a separate paper finds "oracle signals" hiding in agent-authored test code. OpenAI publishes on predicting model behavior before release by simulating deployment. 297 arXiv cs.AI papers today.

Top (5-7 min)

Introducing Eve: an open-source agent framework
Vercel, 2026-06-17. Vercel ships Eve alongside the Agent Stack and Vercel Connect. Three coordinated launches signaling that agent hosting is now a first-class platform concern, not an afterthought.
GLM-5.2: Built for Long-Horizon Tasks
Hugging Face Blog, 2026-06-17. Zhipu AI releases GLM-5.2 targeting multi-step agentic workflows. Latent Space calls it the top frontend coding model in the world.
A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models
arXiv, 2026-06-17. Fable arc day 10: academic red-teaming arrives. The models that triggered export controls and cybersecurity protests now have a structured adversarial evaluation on the record.
Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering
arXiv, 2026-06-17. Argues that current coding evaluations measure the wrong things for how agents actually write software. Pairs with the "oracle signals" paper below on what agent-authored code actually looks like.
Predicting model behavior before release by simulating deployment
OpenAI, 2026-06-16. OpenAI's approach to pre-release safety testing via deployment simulation. Cross-posted to Alignment Forum.

Themes this week

Fable/Mythos arc (day 10)
The arc: launch (06-09), invisible guardrails (06-10), Anthropic apologizes (06-11), proactivity bias (06-12), government suspension (06-13), geopolitical fallout (06-14), D.C. cleanup (06-15), cyber defense protest (06-16), red-team study on arXiv (06-17): the arc crosses from industry and policy reaction into academic evaluation. The red-team paper gives the first structured adversarial assessment of the models at the center of the storm.
Agent frameworks go mainstream
Vercel's triple launch (Eve, Agent Stack, Connect) plus arXiv work on Distributed General-Purpose Agent Networks, PreAct (agents that get faster on repeated tasks), A Framework for Evaluating Agentic Skills at Scale, SEAGym (evaluation environment for self-evolving agents). The tooling layer is maturing fast.
Agent safety and deception
All Smoke, No Alarm: Oracle Signals in Agent-Authored Test Code, Rift: A Conflict Signature for Deception in Language Models, Decoding Hidden Deception in Reasoning LLMs, ProvenanceGuard: Source-Aware Factuality for MCP-Based Agents, SkillJect: Prompt Injection for Skill-Enabled Agents, PseudoBench: How Agentic Auto-Research Fuels Pseudoscience, Towards Understanding and Measuring Cognitive Atrophy in LLMs. A rich cluster. The "oracle signals" finding is particularly sharp: agent-generated tests can embed information that makes them pass without actually testing anything.
Coding agents under scrutiny
Coding Benchmarks Misaligned with Agentic SE, Software Delegation Contracts: Measuring Reviewability, Unlocking LLM Code Correction with Iterative Feedback, LoopCoder-v2: Only Loop Once for Efficient Test-Time Scaling. The question shifts from "can agents code?" to "can we review what they produce?"

Scan (15 min)

Tail

Feed silences (diagnostic)

  • arxiv-cs-ai: 297 items on 06-17 (3102 in window), back to weekday cadence after Monday's 600-paper dump.
  • anthropic-generated: last item 06-12.
  • claude-code-releases: v2.1.179 (06-16), new release since last brief.
  • Apple ML Research: last item 06-08.
  • deepmind-blog: silent since 06-01.
  • Ink & Switch: 1 item in window (06-05).
  • Microsoft Research: last item 06-12.
  • AI Snake Oil: last item 06-11.

Build provenance

build: 2026-06-17 | crawler-sha: 508e4ab (Walsh-Research/1.2, compliance v1.3) | feeds: 48 core | items-considered: 4586 (14d, incl. 3102 arXiv) | warehouse: 15798 items | published: 35 | note: Vercel triple agent launch (Eve/Stack/Connect); GLM-5.2 long-horizon; Fable arc day 10 red-team on arXiv; agent deception cluster (oracle signals, Rift, ProvenanceGuard); coding benchmarks vs agentic SE position paper; OpenAI deployment simulation