The TIME Machine: On The Power of Motion for Efficient Perception
The paper titled 'The TIME Machine: On The Power of Motion for Efficient Perception' proposes a novel approach to video representation learning. It introduces a method that utilizes motion as the central modality, addressing limitations of current video models. The authors demonstrate that their approach, using a new embedding called TIME, achieves competitive performance with significantly less training data.
- ▪The proposed method uses motion in videos to improve representation learning.
- ▪This approach reduces the scale of training data needed and bypasses language-dependent training.
- ▪The new embedding, TIME, is trained exclusively on synthetic motion data and performs on par with state-of-the-art models.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computer Vision and Pattern Recognition arXiv:2605.23045 (cs) [Submitted on 21 May 2026] Title:The TIME Machine: On The Power of Motion for Efficient Perception Authors:Mantas Skackauskas, Xinyue Hao, Laura Sevilla-Lara View a PDF of the paper titled The TIME Machine: On The Power of Motion for Efficient Perception, by Mantas Skackauskas and 2 other authors View PDF HTML (experimental) Abstract:Video representation learning has seen tremendous progress in recent years. This has been driven by many factors, including the scale of training and the success of visual models trained contrastively with language.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.