Best Small Language Models on Hugging Face Right Now!
Recent advancements in small language models have led to impressive performance on reasoning benchmarks. Google's Gemma 3 4B and Microsoft's Phi-4-mini are outperforming larger models, challenging the notion that size equates to capability. This article explores the best small models available on Hugging Face, highlighting their strengths and the innovations behind their success.
- ▪Google's Gemma 3 4B scored 89.2% on GSM8K math reasoning, outperforming larger models.
- ▪Microsoft's Phi-4-mini at 3.8B achieved 83.7% on ARC-C, the highest in its size class.
- ▪Small models are now capable of complex reasoning due to better training data and architectural improvements.
Opening excerpt (first ~120 words) tap to expand
# Introduction Here is something that should shift how you think about AI model size: a 4-billion-parameter model released in early 2025 is now outscoring models that were 7x larger on standard reasoning benchmarks. Google's Gemma 3 4B posts an 89.2% on GSM8K math reasoning. Microsoft's Phi-4-mini at 3.8B hits 83.7% on ARC-C, the highest score in its entire size class. These numbers used to belong to 30B+ models. So the question "do I really need a 70B model for this?" deserves a second look. For the purposes of this article, "small" means under 7 billion parameters — models that can run on a single consumer GPU, a laptop, or even a modern smartphone with the right setup.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at KDnuggets.