WeSearch

GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions

·3 min read · 0 reactions · 0 comments · 17 views
#computer vision#artificial intelligence#social reasoning
GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions
⚡ TL;DR · AI summary

The article introduces GRASP, a dataset designed to enhance social reasoning in multi-person non-verbal interactions. It includes 290,000 question-answer pairs across 46,000 videos, focusing on gaze and gesture events. The proposed Social Grounding Reward aims to improve model performance in understanding social interactions.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Computer Vision and Pattern Recognition arXiv:2605.15764 (cs) [Submitted on 15 May 2026] Title:GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions Authors:Junho Kim, Xu Cao, Houze Yang, Bikram Boote, Ana Jojic, Fiona Ryan, Bolin Lai, Sangmin Lee, James M. Rehg View a PDF of the paper titled GRASP: Learning to Ground Social Reasoning in Multi-Person Non-Verbal Interactions, by Junho Kim and 8 other authors View PDF Abstract:Understanding social interactions requires reasoning over subtle non-verbal cues, yet current multimodal large language models (MLLMs) often fail to identify who interacts with whom in multi-person videos.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI