WeSearch

PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning

·3 min read · 0 reactions · 0 comments · 11 views
#artificial intelligence#video generation#evaluation metrics
PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning
⚡ TL;DR · AI summary

The article introduces PRISM, a new benchmark for evaluating programmatic spatial-temporal reasoning in video generation. It consists of over 10,000 instruction-code pairs and aims to address the challenges of assessing spatial coherence in outputs from language models. The findings highlight a significant gap between code execution success and spatial correctness, emphasizing the need for comprehensive evaluation metrics.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.19382 (cs) [Submitted on 19 May 2026] Title:PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning Authors:Qiran Zhang, Yuheng Wang, Runde Yang, Lin Wu, Jingru Fan, Shu Yao, Jie Zhang, Tianle Zhou, Huatao Li, Ruijie Shi, Yihan Li, Chen Qian View a PDF of the paper titled PRISM: A Benchmark for Programmatic Spatial-Temporal Reasoning, by Qiran Zhang and 11 other authors View PDF Abstract:Programmatic video generation through code offers geometric precision and temporal coherence beyond pixel-level diffusion models, yet rigorously evaluating whether language models can produce spatially correct animated outputs remains an open problem.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI