BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces
The paper introduces BehaviorBench, a benchmark designed to evaluate personalized decision modeling using real-world behavioral traces. It aims to address the limitations of existing benchmarks that often rely on simulated user behavior. The study demonstrates that personalization can enhance belief prediction more effectively than trade prediction across various evaluation metrics.
- ▪BehaviorBench reconstructs wallet-level decision histories from public prediction-market and on-chain records.
- ▪The benchmark includes 141,445 belief instances and 1,485,972 trade instances across 2,000 evaluation wallets.
- ▪Personalization improves belief prediction consistently, while model rankings vary across task layers and metrics.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.02798 (cs) [Submitted on 1 Jun 2026] Title:BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces Authors:Liangwei Yang, Jielin Qiu, Zixiang Chen, Ming Zhu, Juntao Tan, Zhiwei Liu, Wenting Zhao, Zhujun Lan, Akshara Prabhakar, Silvio Savarese, Huan Wang, Shelby Heinecke View a PDF of the paper titled BehaviorBench: Modeling Real-World User Decisions from Behavioral Traces, by Liangwei Yang and 11 other authors View PDF HTML (experimental) Abstract:Many decision-support settings require systems that adapt to individual users, but evaluation data for this problem remain limited.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.