Cerebras Brings Trillion Parameter Inference to Enterprises with Kimi K2.6
Cerebras has launched Kimi K2.6, a trillion parameter open-weight model, for enterprise customer trials. The model achieves remarkable inference speeds, significantly enhancing developer productivity in agentic coding tasks. With performance metrics showing it operates at nearly 1,000 tokens per second, K2.6 sets a new benchmark in the field of large language models.
- ▪Kimi K2.6 is the first trillion parameter open-weight model offered by Cerebras.
- ▪It delivers responses at 981 tokens per second, outperforming other models by a significant margin.
- ▪Cerebras is currently offering enterprise trials of K2.6 for various AI workloads.
Opening excerpt (first ~120 words) tap to expand
May 19 2026Cerebras Brings Trillion Parameter Inference to Enterprises with Kimi K2.6 James WangCerebras is now running Kimi K2.6 — the leading trillion parameter open-weight model — in enterprise customer trials. Widely recognized as the leader in fast inference, Cerebras has set benchmarks across numerous open-weight models including GLM-4.7, GPT-OSS-120B, and Qwen 3, while delivering dramatic speedups to customers such as OpenAI and Cognition on agentic coding models. K2.6 is one of the most frequently requested models, and we are excited to bring it to customers. It is the first one trillion parameter open-weight model we have served, achieving performance approaching 1,000 tokens per second as measured by Artificial Analysis.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Cerebras.