768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second
A Redditor successfully built a workstation using 768GB of second-hand Intel Optane DIMM memory to run a 1-trillion-parameter language model. The setup achieved a performance of approximately 4 tokens per second, utilizing a Xeon CPU and a GPU. Despite the lower latency of Optane compared to SSDs, it is still slower than DRAM, and the discontinuation of Optane products poses challenges for future builds.
- ▪The Redditor used six 128GB Intel Optane PMem sticks to create a workstation capable of running a large language model.
- ▪The performance achieved was around 4 tokens per second, which the builder considers a success given the hardware limitations.
- ▪Optane memory offers lower latency than NVMe SSDs but is still slower than traditional DRAM.
Opening excerpt (first ~120 words) tap to expand
Tech Industry Artificial Intelligence 768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second News By Mark Tyson published 23 May 2026 Redditor found 768GB of affordable Optane sticks second-hand. When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. (Image credit: Lenovo) Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Subscribe to our newsletter A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-trillion-parameter LLM.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Tom's Hardware.