Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
The paper presents a novel approach called Counteraction-Aware Multi-Teacher On-Policy Distillation (CaMOPD) aimed at improving the general capabilities of language models while preserving domain-specific behavior. It addresses challenges faced by existing methods in scenarios where teacher prompts do not align with training distributions. The proposed method demonstrates superior performance in general capability recovery across various applications, including dialogue and medical reasoning tasks.
- ▪CaMOPD tackles issues related to conflicting recovery and preservation gradients in multi-teacher distillation.
- ▪The method employs decoupled alternating training and gap-based sample selection to enhance model performance.
- ▪CaMOPD has shown to outperform baseline methods in maintaining general capabilities while preserving domain-specific behavior.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.27115 (cs) [Submitted on 26 May 2026] Title:Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation Authors:Tianlei Chen, Jiao Ou, Ziyuan Liu, Ruiming Tang, Jian Liang, Han Li View a PDF of the paper titled Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation, by Tianlei Chen and 5 other authors View PDF HTML (experimental) Abstract:Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original model.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.