First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation
The article presents a quantitative prediction of grokking delay under the AdamW optimization algorithm. It introduces a closed-form law for predicting this delay based on various parameters and validates it through empirical testing. The findings suggest that while the law is effective for certain architectures, its applicability to natural-language models remains uncertain.
- ▪The study derives a closed-form law for grokking delay under AdamW, predicting delays with a mean absolute percentage error of 17.7%.
- ▪Calibrating specific parameters allows for accurate predictions across different architectures, with a maximum coefficient of variation of 15%.
- ▪Causal interventions that alter weight decay significantly affect grokking outcomes, indicating the importance of norm separation and angular reachability.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.18845 (cs) [Submitted on 13 May 2026] Title:First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation Authors:Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc View a PDF of the paper titled First-Passage Prediction of Grokking Delay: ACalibrated Law under AdamW with Causal Validation, by Truong Xuan Khanh and 3 other authors View PDF HTML (experimental) Abstract:We give the first quantitative prediction of grokking delay under AdamW.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.