Agentic Proving for Program Verification
The paper discusses the application of agentic systems in program verification. It evaluates the performance of Claude Code within an agentic proving framework on the CLEVER benchmark. Results indicate high success rates in generating valid specifications and certifying implementations, highlighting the need for improved evaluation methodologies in program verification.
- ▪Agentic systems are emerging as effective tools for automated theorem proving in formal mathematics.
- ▪Claude Code achieved a 98.8% validity rate for specifications and an 87.5% certification rate against correct specifications.
- ▪The study emphasizes the mismatch between current program verification benchmarks and the capabilities of modern agentic provers.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.23772 (cs) [Submitted on 22 May 2026] Title:Agentic Proving for Program Verification Authors:Alessandro Sosso, Akhil Arora, Bas Spitters View a PDF of the paper titled Agentic Proving for Program Verification, by Alessandro Sosso and 2 other authors View PDF HTML (experimental) Abstract:Agentic systems have recently emerged as state-of-the-art approaches for automated theorem proving in formal mathematics. To assess how far these capabilities extend to program verification, we evaluate Claude Code in an agentic proving framework on CLEVER, a Lean 4 benchmark for verifiable code generation.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.