JobBench: Aligning Agent Work With Human Will
The paper introduces JobBench, a new benchmark for evaluating AI agents based on human needs rather than economic value. It covers 130 tasks across 35 occupations, focusing on enhancing human capabilities instead of replacing them. The authors hope to shift the AI community's focus towards creating agents that align with what humans want to delegate.
- ▪JobBench evaluates AI agents on high-priority workflows identified by experts.
- ▪The benchmark includes 130 agentic tasks across 35 different occupations.
- ▪Outputs from the AI agents are graded using an average of 35.6 binary criteria per task.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26329 (cs) [Submitted on 25 May 2026] Title:JobBench: Aligning Agent Work With Human Will Authors:Yuetai Li, Yichen Feng, Zhangchen Xu, Zixian Ma, Kaiyuan Zheng, Fengqing Jiang, Xinghua Sun, Rulin Shao, Zichen Chen, Yue Huang, Xinyang Han, Brian Lee, Kayla Xu, Shenglai Zeng, Hang Hua, Xiangliang Zhang, Basel Alomair, Ranjay Krishna, Luke Zettlemoyer, Pang Wei Koh, Bhaskar Ramasubramanian, Luyao Niu, Xiang Yue, Radha Poovendran View a PDF of the paper titled JobBench: Aligning Agent Work With Human Will, by Yuetai Li and 23 other authors View PDF HTML (experimental) Abstract:Current benchmarks for occupational AI agents are scoped primarily by economic values, telling a replacement story.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.