WeSearch

QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi

·3 min read · 0 reactions · 0 comments · 13 views
#artificial intelligence#language models#benchmarking
QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi
⚡ TL;DR · AI summary

Researchers have introduced QSTRBench, a new benchmark designed to evaluate the reasoning capabilities of language models in qualitative spatial and temporal contexts. The benchmark includes a variety of reasoning tasks across different calculi, revealing that while models perform better than random guessing, they struggle with consistent accuracy. The study highlights significant variations in performance depending on the complexity of the calculus used.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.18380 (cs) [Submitted on 18 May 2026] Title:QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi Authors:Anthony G. Cohn, Robert E. Blackwell View a PDF of the paper titled QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi, by Anthony G. Cohn and Robert E. Blackwell View PDF HTML (experimental) Abstract:We introduce an extensive qualitative spatial and temporal reasoning (QSTR) benchmark for evaluating large language models (LLMs).

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI