Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work
The article discusses a new educational approach to teaching AI through benchmark construction, specifically using a tool called QuestBench. This method allows students to create expert-level questions and evaluate AI systems, fostering a deeper understanding of AI's role in knowledge work. The findings indicate that many AI systems struggle with accuracy, highlighting the importance of critical evaluation in AI education.
- ▪QuestBench consists of 256 questions across 14 humanities and social-science domains.
- ▪Evaluation of QuestBench revealed that the mean question-level pass rate for thirteen AI systems was only 16.85%.
- ▪The best-performing system, GPT-5.5, achieved a pass rate of 57.58%, indicating significant room for improvement.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.21413 (cs) [Submitted on 20 May 2026 (v1), last revised 21 May 2026 (this version, v2)] Title:Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work Authors:Haiyang Shen, Jiuzheng Wang, Taian Guo, Mugeng Liu, Wenchun Jing, Chongyang Pan, Siqi Zhong, Zhiyang Chen, Weichen Bi, Yudong Han, Xiaoying Bai, Yun Ma View a PDF of the paper titled Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work, by Haiyang Shen and 11 other authors View PDF HTML (experimental) Abstract:As AI becomes part of everyday learning, many courses teach students to use it mainly as a productivity tool: how to prompt, search, summarize, write, code,…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.