GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

May 27, 2026 · 4:00 AM UTC ·2 min read · 0 reactions · 0 comments · 27 views

#machine learning #artificial intelligence #reinforcement learning

TL;DR · WeSearch summary

The paper presents GAC, a noise-aware adaptive mixing method for hybrid post-training in machine learning. This approach addresses the limitations of fixed mixing schedules by adapting to changes in the noise levels of training signals. Experimental results demonstrate that GAC significantly enhances performance on various benchmarks with minimal training overhead.

Key facts

▪GAC derives adaptive mixing weights from online estimates of gradient variance and disagreement between training signals.
▪The method incorporates smoothing, prior guidance, and bounded updates while reusing existing training tensors.
▪Experiments show that GAC consistently outperforms strong fixed and rule-based baselines, especially with larger model scales.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.26184 (cs) [Submitted on 25 May 2026] Title:GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training Authors:Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song View a PDF of the paper titled GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training, by Yuelin Hu and 4 other authors View PDF HTML (experimental) Abstract:Hybrid post-training usually combines supervised fine-tuning and reinforcement learning, but fixed mixing schedules cannot adapt when the relative noise of the two signals changes over time. We propose GAC, a noise-aware controller that derives an adaptive mixing weight from online estimates of gradient variance and disagreement between the two training signals.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

GAC: Noise-Aware Adaptive Mixing for Hybrid SFT-RL Post-Training

Discussion

More from arXiv cs.AI