HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models
The paper introduces HELLoRA, a method for efficient fine-tuning of Mixture-of-Experts models using Low-Rank Adaptation. This approach focuses on attaching LoRA modules to frequently activated experts, resulting in reduced parameters and improved performance. The authors demonstrate that HELLoRA outperforms traditional methods while maintaining a lower computational cost across various tasks.
- ▪HELLoRA reduces trainable parameters to 15.7% of vanilla LoRA while improving accuracy by 9.2%.
- ▪It achieves a 38.7% reduction in adapter FLOPs and 1.9x training throughput compared to standard methods.
- ▪HELLoRA consistently outperforms strong PEFT baselines across three different MoE backbones and task families.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18795 (cs) [Submitted on 11 May 2026] Title:HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models Authors:Jia Wei, Zhonghao Zhang, Ping Chen, Qianyang li, Yancheng Pan, Shaoxun Wang, Ziyi Qiu, Longxiang Wang View a PDF of the paper titled HELLoRA: Hot Experts Layer-Level Low-Rank Adaptation for Mixture-of-Experts Models, by Jia Wei and 7 other authors View PDF HTML (experimental) Abstract:Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning of large language models, yet most variants target dense architectures. Mixture-of-Experts (MoE) models scale parameters at near-constant per-token compute, and their sparse activation patterns create untapped opportunities for more efficient adaptation.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.