OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling
The paper introduces OmniToM, a benchmark designed to evaluate the Theory of Mind capabilities in large language models (LLMs) through explicit belief modeling. It highlights the limitations of current evaluation methods that focus solely on end-point question answering, which may not accurately reflect a model's reasoning abilities. By requiring models to explicitly model belief structures, OmniToM aims to provide a more comprehensive assessment of LLMs' understanding of social dynamics.
- ▪OmniToM evaluates the ability of LLMs to infer knowledge, intentions, and emotions of actors within narratives.
- ▪The benchmark consists of two stages: Belief Extraction and Belief Labeling, assessing how well models track beliefs.
- ▪Current LLMs face challenges in transforming narrative facts into actors' beliefs and shared mental states.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26322 (cs) [Submitted on 25 May 2026] Title:OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling Authors:Adam Bawatneh, Sagar Sapkota, Amrit Singh Bedi, Santu Karmaker, Mubarak Shah View a PDF of the paper titled OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling, by Adam Bawatneh and 4 other authors View PDF HTML (experimental) Abstract:Theory of Mind (ToM), the ability to infer others' knowledge, intentions, and emotions, is commonly evaluated in large language models (LLMs) using end-point question answering, where performance is judged solely by the final answer to a social reasoning query.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.