Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

May 22, 2026 · 4:00 AM UTC ·2 min read · 0 reactions · 0 comments · 28 views

#machine learning #artificial intelligence #data science

TL;DR · WeSearch summary

A recent study explores the advantages of using smaller datasets for training machine learning models. The research indicates that repeating fewer samples can lead to faster training times compared to larger datasets. This approach leverages sampling biases, which can enhance optimization, especially in reasoning tasks.

Key facts

▪The study investigates the 'small-vs-large gap' in machine learning training.
▪Repeating smaller datasets can save computational resources during training.
▪The findings suggest that smaller datasets with more repetitions can be beneficial for optimization.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.20314 (cs) [Submitted on 19 May 2026] Title:Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases Authors:Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu View a PDF of the paper titled Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases, by Jingwen Liu and 3 other authors View PDF HTML (experimental) Abstract:This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and cannot be explained using prior theory.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

Discussion

More from arXiv cs.AI