PAgE 2026 Keynote — Can Coding Agents Write Verifiably Correct Software?

1. Talk
2. Link set
3. Core content
4. Paper lineage (Lu group, LLM-for-Verus)
5. Verified-systems projects (the VeruSAGE-Bench targets)
6. Related researchers
7. Lineage graph
8. References

1. Talk

Thesis: have LLM agents emit formal correctness proofs alongside code, discharged by Verus (SMT-based verifier for Rust). Reported result is a frontier-model-gated yes on repository-level systems code.

Scope of the claim (refutation boundary): the demonstrated capability is proof synthesis against fixed, human-authored specifications. The specification remains the trusted head of the TCB; the agent supplies the proof, not the spec. Claim is refuted as an end-to-end correctness result if the spec is also agent-authored and unvalidated.

2. Link set

Talk / program: https://pldi26.sigplan.org/home/page-2026
Live stream (PAgE, Meadows B): https://www.youtube.com/watch?v=V0wjNL6gAG8
Project umbrella (MSR, Practical System Verification): https://www.microsoft.com/en-us/research/project/practical-system-verification/
Verus (verifier): https://github.com/verus-lang/verus
LLM-for-Verus code + benchmarks: https://github.com/microsoft/verus-proof-synthesis
Leaderboard: https://microsoft.github.io/verus-proof-synthesis/
Verus publications & projects index: https://verus-lang.github.io/verus/publications-and-projects/

3. Core content

3.1. Verus (substrate)

SMT-based verifier for Rust (Lattuada et al. 2024). Specs and proofs are written in Rust itself; linear ghost types and the borrow checker carry ownership and separation into proofs. TCB = top-level specification + Verus verifier + SMT solver (Z3) + Rust compiler. Semantic foundation for the proof-oriented type extensions: VerusBelt (Hance et al. 2026).

3.2. Two systems, two benchmarks

System	Target	Benchmark	Tasks
AutoVerus	small algorithmic code	Verus-Bench	150
VeruSAGE	code in large real systems projects	VeruSAGE-Bench	849

AutoVerus (Yang, Li, et al. 2025) pipeline: inference -> refinement -> repair (few-shot candidates, targeted refinement, verification-error debugging). VeruSAGE (Yang, Neamtu, et al. 2025): agentic observation-reasoning-action loop.

3.2.1. Verus-Bench (150, algorithm-level)

Source	Tasks	Description
CloverBench	11	Classic CS examples
MBPP	78	Formal specification problems
Diffy	38	Array / loop programs
Misc	23	Verus tutorial examples

3.2.2. VeruSAGE-Bench (849, repository-level)

Project	Tasks	Domain
Anvil (AL)	104	Distributed systems
Anvil-Advanced	63	Distributed systems
IronKV (IR)	118	Key-value store
Memory Allocator	89	Systems
Node Replication	29	Distributed systems
NRKernel (NR)	204	OS kernel
ATMO (OS)	157	Microkernel
Storage (ST)	63	Storage systems
Vest (VE)	22	Serialization

3.3. Cost / capability findings (from the lineage)

Proof-generation capability is held by expensive frontier models (e.g. Claude Sonnet 4.5), not cheap models (e.g. o4-mini).
VeruSyn (Di et al. 2026) responds to this with data synthesis: ~6.9M verified Rust programs (each spec + proof) to lift a small model.

4. Paper lineage (Lu group, LLM-for-Verus)

AutoVerus — OOPSLA'25 — arXiv:2409.13082 / https://doi.org/10.1145/3763174
Automated Proof Generation via Self-Evolution — ICLR'25 — arXiv:2410.15756
VeriStruct (data-structure modules) — arXiv:2510.25015
What's in a Proof? (expert proof-writing in F* and Verus) — arXiv:2508.02733
VeruSAGE — Dec'25 — arXiv:2512.18436
Reducing Costs of Proof Synthesis (VeruSyn) — Feb'26 — arXiv:2602.04910
ExVerus (Yang et al. 2026) (proof repair via counterexample reasoning) — ICML'26 — arXiv:2603.25810

Adjacent track (spec/code co-synthesis, trust anchor moves):

AlphaVerus (bootstrapping verified code generation) — ICML'25
Vericoding benchmark — arXiv:2509.22908

5. Verified-systems projects (the VeruSAGE-Bench targets)

Project	Repo / source	Venue
Anvil	github.com/vmware-research/verifiable-controllers	OSDI'24 (best)
IronKV	github.com/verus-lang/verified-ironkv (ex IronFleet)	—
Memory Allocator	github.com/verus-lang/verified-memory-allocator	(mimalloc-based)
Node Replication	github.com/verus-lang/verified-node-replication	—
NRKernel	github.com/utaal/verified-nrkernel	—
Atmosphere (ATMO)	github.com/mars-research/atmosphere	SOSP'25
Storage (PoWER)	github.com/microsoft/verified-storage	OSDI'25 (dist.)
Vest	github.com/secure-foundations/vest	USENIX Sec'25
VeriSMo (related)	github.com/microsoft/verismo	OSDI'24 (best)

All targets are published, peer-reviewed artifacts with human-authored specs (e.g. Anvil (Sun et al. 2024)).

6. Related researchers

6.1. Speaker / LLM-for-Verus

Shan Lu (Microsoft; U Chicago) — speaker, group lead
Chenyuan Yang (UIUC) — AutoVerus / VeruSAGE first author
Shuvendu K. Lahiri (MSR) — intent formalization
Shuai Lu, Yeyun Gong, Peng Cheng (MSR) — data synthesis
Shraddha Barke (MSR) — PAgE chair; expert proof-writing study

6.2. Verus core / systems

Bryan Parno (CMU, secure-foundations) — PI
Andrea Lattuada (MPI-SWS) — Verus lead
Travis Hance (CMU) — Verus, VerusBelt
Chris Hawblitzel, Jacob R. Lorch (MSR) — IronFleet -> Verus systems lineage
Jon Howell, Oded Padon, Tej Chajed — Verus foundations
Derek Dreyer, Yusuke Matsushita (MPI-SWS, Iris) — VerusBelt semantic foundation
Hayley LeBlanc — verified storage (PoWER)
Anton Burtsev, Vikram Narayanan (mars-research) — Atmosphere

6.3. Backfill seam (overlaps with BugBash 2026 notes)

IronFleet lineage (Hawblitzel, Lorch) -> distributed-systems correctness; cross-reference correct-by-construction and Jepsen/Antithesis as the empirical-testing complement to mechanized proof.

7. Lineage graph

8. References

Di, Nongyu, Tianyu Chen, Shan Lu, Shuai Lu, Yeyun Gong, Peng Cheng, Jacob R. Lorch, Yuan Yao, and Xiaoxing Ma. 2026. “Reducing the Costs of Proof Synthesis on Rust Systems by Scaling up a Seed Training Set.”

Hance, Travis, Laila Elbeheiry, Yusuke Matsushita, and Derek Dreyer. 2026. “Verusbelt: A Semantic Foundation for Verus’s Proof-Oriented Extensions to the Rust Type System.” In Proc. Acm Pldi.

Lattuada, Andrea, Travis Hance, Jay Bosamiya, Matthias Brun, Chanhee Cho, Hayley LeBlanc, Pranav Srinivasan, et al. 2024. “Verus: A Practical Foundation for Systems Verification.” In Proceedings of the Acm Sigops 30th Symposium on Operating Systems Principles (Sosp ’24). https://doi.org/10.1145/3694715.3695952.

Sun, Xudong, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, et al. 2024. “Anvil: Verifying Liveness of Cluster Management Controllers.” In Proc. Usenix Osdi.

Yang, Chenyuan, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu K. Lahiri, et al. 2025. “Autoverus: Automated Proof Generation for Rust Code.” Proc. Acm Program. Lang. 9 (OOPSLA2). https://doi.org/10.1145/3763174.

Yang, Chenyuan, Natalie Neamtu, Chris Hawblitzel, Jacob R. Lorch, and Shan Lu. 2025. “Verusage: A Study of Agent-Based Verification for Rust Systems.”

Yang, Jun, Yuechun Sun, Yi Wu, Rodrigo Caridad, Yongwei Yuan, Jianan Yao, Shan Lu, and Kexin Pei. 2026. “Exverus: Verus Proof Repair via Counterexample Reasoning.”