UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
The article introduces UnityMAS-O, a general reinforcement learning optimization framework designed for large language model (LLM)-based multi-agent systems. This framework aims to enhance the orchestration of complex tasks by allowing for user-defined workflows and structured interactions among agents. The authors demonstrate its effectiveness through various applications, showing significant improvements in performance, particularly for smaller models.
- ▪UnityMAS-O treats the complete workflow as the optimization unit rather than focusing on single responses or policy trajectories.
- ▪The framework allows users to define agents, workflows, model mappings, and rewards without needing to rewrite the optimization infrastructure.
- ▪Results indicate that multi-agent reinforcement learning can significantly improve manually specified workflows after optimization.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.26646 (cs) [Submitted on 26 May 2026] Title:UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems Authors:Yiqun Chen, Wei Yang, Erhan Zhang, Shijie Wang, Qi Liu, Zechun Niu, Bin Zhang, Haitao Li, Rui Li, Lingyong Yan, Jinyuan Feng, Biqing Qi, Xiaochi Wei, Yan Gao, Yi Wu, Yao Hu, Jiaxin Mao View a PDF of the paper titled UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems, by Yiqun Chen and 16 other authors View PDF HTML (experimental) Abstract:LLM-based multi-agent systems decompose complex tasks into interacting roles, but most remain manually orchestrated by prompts, tools, and control rules, while agents are rarely optimized through a unified reinforcement learning interface.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.