Simply Stabilizing the Loop via Fully Looped Transformer
The paper presents the Fully Looped Transformer, a model designed to enhance training stability and performance in machine learning. It addresses issues of gradient oscillation and residual explosion that affect the Looped Transformer. The proposed modifications allow for stable training with up to 12 loop iterations and improve downstream task performance significantly.
- ▪The Fully Looped Transformer introduces two parameter-free modifications to stabilize training dynamics.
- ▪It allows for adjustable loop iterations at inference, balancing performance and computational cost.
- ▪The model improves average downstream-task performance by up to 13.2% compared to baseline looped models.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18797 (cs) [Submitted on 11 May 2026] Title:Simply Stabilizing the Loop via Fully Looped Transformer Authors:Rao Fu, Zixuan Yang, Jiankun Zhang, Jing Ma, Hechang Chen, Yu Li, Yi Chang View a PDF of the paper titled Simply Stabilizing the Loop via Fully Looped Transformer, by Rao Fu and Zixuan Yang and Jiankun Zhang and Jing Ma and Hechang Chen and Yu Li and Yi Chang View PDF HTML (experimental) Abstract:Scaling model performance typically requires increasing model size. Looped Transformer offers a compelling alternative by iteratively reusing the same Transformer blocks, trading additional computation for improved performance without increasing parameter count or context length.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.