Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions
The paper presents POLAR, a framework designed for personalizing embodied multimodal large language model agents through long-term user interactions. It emphasizes the importance of leveraging accumulated personalized context from prior interactions to enhance task execution. The evaluation shows that POLAR significantly improves performance, particularly in complex reasoning tasks and user-specific context tracking.
- ▪POLAR organizes prior interactions into a multimodal knowledge graph for personalized context.
- ▪The framework enhances task execution by retrieving relevant memories from accumulated interactions.
- ▪Results indicate that the memory mechanism improves performance in reasoning across multiple interactions.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26256 (cs) [Submitted on 25 May 2026] Title:Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions Authors:Jeongeun Lee, Chanyoung Park, Dongha Lee View a PDF of the paper titled Personalizing Embodied Multimodal Large Language Model Agents over Long-term User Interactions, by Jeongeun Lee and 2 other authors View PDF HTML (experimental) Abstract:Multimodal large language model (MLLM)-based embodied agents have shown strong potential for solving complex tasks in physical environments. However, personalized assistance requires more than following generic instruction or recognizing object categories.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.