PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

May 20, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 30 views

#artificial intelligence #video generation #evaluation metrics

TL;DR · WeSearch summary

The article introduces PRISM, a new benchmark for evaluating programmatic spatial-temporal reasoning in video generation. It consists of over 10,000 instruction-code pairs and aims to address the challenges of assessing spatial coherence in outputs from language models. The findings highlight a significant gap between code execution success and spatial correctness, emphasizing the need for comprehensive evaluation metrics.

Key facts

▪PRISM includes 10,372 human-calibrated instruction-code pairs, making it 20 times larger than previous benchmarks.
▪The evaluation framework features four metrics: Code-Level Reliability, Spatial Reasoning, Prompt-Aware Dynamic Visual Complexity, and Temporal Density.
▪A study of seven mainstream language models revealed a 41% average drop from execution success to spatial pass rate.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.19382 (cs) [Submitted on 19 May 2026] Title:PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning Authors:Qiran Zhang, Yuheng Wang, Runde Yang, Lin Wu, Jingru Fan, Shu Yao, Jie Zhang, Tianle Zhou, Huatao Li, Ruijie Shi, Yihan Li, Chen Qian View a PDF of the paper titled PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning, by Qiran Zhang and 11 other authors View PDF Abstract:Programmatic video generation through code offers geometric precision and temporal coherence beyond pixel-level diffusion models, yet rigorously evaluating whether language models can produce spatially correct animated outputs remains an open problem.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

Discussion

More from arXiv cs.AI