Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models
The article introduces Agentic-VLA, a new framework for improving Vision-Language-Action models in robotic manipulation. This framework addresses limitations in generalization and training efficiency through innovative techniques. The evaluation shows significant performance improvements across various benchmarks, indicating its potential for adaptive learning in real-world applications.
- ▪Agentic-VLA enhances VLA models by enabling efficient online adaptation.
- ▪The framework includes Adaptive Reward Synthesis, Language-Guided Exploration, and Experience Memory.
- ▪Agentic-VLA achieved a 12.3% improvement on long-horizon tasks and a 28.5% increase in 1-shot learning.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Robotics arXiv:2605.22896 (cs) [Submitted on 21 May 2026] Title:Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models Authors:Ruofan Jin, Zaixi Zhang View a PDF of the paper titled Agentic-VLA: Efficient Online Adaptation for Vision-Language-Action Models, by Ruofan Jin and Zaixi Zhang View PDF HTML (experimental) Abstract:Vision-Language-Action (VLA) models have emerged as a promising paradigm for robotic manipulation by leveraging pre-trained vision-language representations. However, current VLA training methods suffer from two critical limitations: poor generalization to novel environments and low training efficiency requiring extensive demonstrations.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.