Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second
A Chinese AI enthusiast demonstrated the Kimi K2.5 model running on an Nvidia RTX 3060 GPU with 768GB of Intel Optane memory. This setup achieved a performance of four tokens per second, showcasing the capabilities of a trillion-parameter model on consumer hardware. The experiment highlights the potential for using legacy components to run advanced AI models typically reserved for high-end infrastructure.
- ▪The Kimi K2.5 model has a total of 1 trillion parameters but activates only 32 billion at a time for each token generated.
- ▪The full model size is approximately 630 GB, necessitating the use of 768 GB of Intel Optane Persistent Memory.
- ▪APFrisco's demonstration was notable as it utilized a mid-range GPU designed for gaming rather than AI workloads.
Opening excerpt (first ~120 words) tap to expand
Kimi K2.5 runs on RTX 3060 with 768GB Intel Optane memory at 4 tokens per second A Chinese AI enthusiast squeezed a trillion-parameter model onto consumer hardware using second-hand memory DIMMs, and the implications go far beyond the stunt itself. Share Add us on Google by Editorial Team May. 24, 2026 window.sevioads = window.sevioads || []; var sevioads_preferences = []; sevioads_preferences[0] = {}; sevioads_preferences[0].zone = "01f21ccf-2092-46b1-9ac7-8c44cc782e0f"; sevioads_preferences[0].adType = "native"; sevioads_preferences[0].inventoryId = "c5700508-581b-472c-8fdd-a931cdbfc8e1"; sevioads_preferences[0].accountId = "1e47efc1-ec2d-4fca-a8b9-354e249e5095"; sevioads.push(sevioads_preferences); A trillion-parameter AI model just ran on a graphics card that most gamers would…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Crypto Briefing.