WeSearch

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

·2 min read · 0 reactions · 0 comments · 14 views
#machine learning#artificial intelligence#data science
Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases
⚡ TL;DR · AI summary

A recent study explores the advantages of using smaller datasets for training machine learning models. The research indicates that repeating fewer samples can lead to faster training times compared to larger datasets. This approach leverages sampling biases, which can enhance optimization, especially in reasoning tasks.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Machine Learning arXiv:2605.20314 (cs) [Submitted on 19 May 2026] Title:Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases Authors:Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu View a PDF of the paper titled Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases, by Jingwen Liu and 3 other authors View PDF HTML (experimental) Abstract:This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and cannot be explained using prior theory.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI