CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
The paper presents CP-MoE, a framework designed to tackle catastrophic forgetting in continual learning for large language and vision-language models. It introduces a transient expert mechanism that helps integrate task-specific updates while preserving important historical parameters. The proposed method demonstrates state-of-the-art performance on various benchmarks, effectively reducing forgetting and enhancing knowledge transfer across tasks.
- ▪CP-MoE addresses catastrophic forgetting in continual learning for large language models and vision-language models.
- ▪The framework utilizes a transient expert to capture task-specific updates and guide their integration into stable experts.
- ▪CP-MoE achieves state-of-the-art performance on the SuperNI benchmark and effectively reduces forgetting on the VQA v2 dataset.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.20247 (cs) [Submitted on 18 May 2026] Title:CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning Authors:Yang Liu, Toan Nguyen, Flora D. Salim View a PDF of the paper titled CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning, by Yang Liu and 2 other authors View PDF HTML (experimental) Abstract:Catastrophic forgetting remains a major obstacle to continual learning in large language models (LLMs) and vision--language models (VLMs).
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.