ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization
ImProver 2 is a neurosymbolic framework designed for automated proof optimization in formal mathematics. It addresses challenges such as heterogeneous objectives and high training costs by utilizing a data-efficient expert-iteration pipeline. The framework has shown that smaller models can effectively restructure complex proofs, achieving competitive performance with larger systems.
- ▪ImProver 2 combines a data-efficient expert-iteration pipeline with a scaffold that exposes formal structure alongside informal abstractions.
- ▪The framework trains a 7B-parameter model that outperforms larger models within the same family and competes with mid-tier frontier models.
- ▪ImProver 2 demonstrates that small models can restructure research-level proofs effectively, establishing proof optimization as a scalable task.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.22885 (cs) [Submitted on 21 May 2026] Title:ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization Authors:Riyaz Ahuja, Tate Rowney, Jeremy Avigad, Sean Welleck View a PDF of the paper titled ImProver 2: Iteratively Self-Improving LMs for Neurosymbolic Proof Optimization, by Riyaz Ahuja and 3 other authors View PDF HTML (experimental) Abstract:Formal mathematics libraries are rapidly expanding, creating a growing need to refactor verified proofs for maintainability and to improve training data quality for neural provers. However, scalable proof optimization is hindered by heterogeneous and heuristically specified objectives, scarce data, and high training and inference costs.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.