Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
JetBrains has introduced Mellum2, a 12B-parameter Mixture-of-Experts model designed for efficient natural language and code processing. The model activates only 2.5B parameters per token, allowing for high-throughput and low-latency inference. Mellum2 is suitable for various applications, including routing, summarization, and private deployments, and is released under the Apache 2.0 license.
- ▪Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code.
- ▪The model achieves more than 2x faster inference compared to similar-sized models.
- ▪Mellum2 is optimized for latency-sensitive tasks and can be deployed in self-hosted environments.
Opening excerpt (first ~120 words) tap to expand
Back to Articles Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains Team Article Published June 1, 2026 Upvote 5 Nikita Pavlichenko pavlichenko Follow JetBrains Benchmark highlights Key use cases Routing and orchestration RAG pipelines Sub-agents Private deployment Why well-scoped models matter Getting started with Mellum2 Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code. The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference. Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments. It is released under the Apache 2.0 license.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Hugging Face - Blog.