QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

May 19, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 25 views

#artificial intelligence #language models #benchmarking

TL;DR · WeSearch summary

Researchers have introduced QSTRBench, a new benchmark designed to evaluate the reasoning capabilities of language models in qualitative spatial and temporal contexts. The benchmark includes a variety of reasoning tasks across different calculi, revealing that while models perform better than random guessing, they struggle with consistent accuracy. The study highlights significant variations in performance depending on the complexity of the calculus used.

Key facts

▪QSTRBench evaluates large language models' reasoning in qualitative spatial and temporal contexts.
▪The benchmark includes various reasoning tasks across multiple calculi, such as Point Algebra and Allen's Interval Algebra.
▪Results show that while models outperform random guessing, none can consistently answer all questions correctly.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.18380 (cs) [Submitted on 18 May 2026] Title:QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi Authors:Anthony G. Cohn, Robert E. Blackwell View a PDF of the paper titled QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi, by Anthony G. Cohn and Robert E. Blackwell View PDF HTML (experimental) Abstract:We introduce an extensive qualitative spatial and temporal reasoning (QSTR) benchmark for evaluating large language models (LLMs).

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

Discussion

More from arXiv cs.AI