Anchor: Mitigating Artifact Drift in Agent Benchmark Generation
The paper introduces Anchor, a task-generation pipeline designed to address artifact drift in AI agent benchmark generation. It formalizes business workflow specifications into constraint optimization programs, producing consistent and verifiable environments for AI training. The authors also present ERP-Bench, a benchmark of 300 tasks for enterprise resource planning systems, demonstrating the effectiveness of their approach.
- ▪Anchor mitigates artifact drift by formalizing specifications into constraint optimization programs.
- ▪The pipeline generates natural-language instructions, environment configurations, and verifiable solutions.
- ▪ERP-Bench consists of 300 long-horizon tasks related to procurement and manufacturing workflows.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26321 (cs) [Submitted on 25 May 2026] Title:Anchor: Mitigating Artifact Drift in Agent Benchmark Generation Authors:Maksim Ivanov, Abhijay Rana View a PDF of the paper titled Anchor: Mitigating Artifact Drift in Agent Benchmark Generation, by Maksim Ivanov and 1 other authors View PDF HTML (experimental) Abstract:AI agents are beginning to complete valuable, long-horizon business operations tasks, but training and evaluation environments for enterprise work still struggle to balance realism, verifiability, and scale.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.