SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework
The article introduces SUGAR, a new framework for humanoid loco-manipulation learning that utilizes human videos. This framework aims to overcome challenges in existing methods by automating the extraction of motion priors and refining them into deployable skills. SUGAR has shown significant improvements in performance and scalability in both simulation and real-world applications.
- ▪SUGAR is a scalable data-driven framework designed for humanoid loco-manipulation skills.
- ▪The framework automates the extraction of kinematic interaction priors from human videos.
- ▪SUGAR outperforms traditional reference-tracking methods and achieves zero-shot real-world transfer.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Robotics arXiv:2605.20373 (cs) [Submitted on 19 May 2026] Title:SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework Authors:Tianshu Wu, Xiangqi Kong, Yue Chen, Qize Yu, Hang Ye, Jia Li, Yizhou Wang, Hao Dong View a PDF of the paper titled SUGAR: A Scalable Human-Video-Driven Generalizable Humanoid Loco-Manipulation Learning Framework, by Tianshu Wu and 7 other authors View PDF HTML (experimental) Abstract:Building humanoid robots capable of generalizable whole-body loco-manipulation in the real world remains a fundamental challenge. Existing methods either rely on laborious task-specific reward engineering, rigidly replay reference motions that fail to generalize, or depend on costly teleoperation that limits scalability.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.