Self-Distillation Enables Continual Learning [PDF]

May 17, 2026 · 1:19 AM UTC ·2 min read · 0 reactions · 0 comments · 15 views

#machine learning #artificial intelligence #continual learning #self-distillation #reinforcement learning #Idan Shenfeld #Mehul Damani #Jonas Hübotter #Pulkit Agrawal #arXiv

⚡ TL;DR · AI summary

The paper introduces Self-Distillation Fine-Tuning (SDFT), a method enabling models to learn continuously from expert demonstrations without forgetting prior skills. SDFT uses in-context learning by treating a model as its own teacher, generating on-policy training signals from demonstrations. The approach outperforms supervised fine-tuning in both skill acquisition and retention across sequential learning tasks.

Key facts

▪Self-Distillation Fine-Tuning (SDFT) enables on-policy learning directly from expert demonstrations.
▪SDFT reduces catastrophic forgetting while improving accuracy on new tasks compared to supervised fine-tuning.
▪The method leverages in-context learning, using the model as its own teacher to generate training signals.
▪Experiments show SDFT allows a single model to accumulate multiple skills over time without performance decline.
▪SDFT establishes on-policy distillation as a practical approach for continual learning from demonstrations.

Original article

arXiv.org

Read full at arXiv.org →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2601.19897 (cs) [Submitted on 27 Jan 2026] Title:Self-Distillation Enables Continual Learning Authors:Idan Shenfeld, Mehul Damani, Jonas Hübotter, Pulkit Agrawal View a PDF of the paper titled Self-Distillation Enables Continual Learning, by Idan Shenfeld and 2 other authors View PDF HTML (experimental) Abstract:Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement learning can reduce forgetting, it requires explicit reward functions that are often unavailable. Learning from expert demonstrations, the primary alternative, is dominated by supervised fine-tuning (SFT), which is inherently off-policy.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv.org.

Anonymous · no account needed

Discussion

0 comments

Self-Distillation Enables Continual Learning [PDF]

Discussion

More from arXiv.org