768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second

https://www.tomshardware.com/author/mark-tyson· May 23, 2026 · 11:20 AM UTC ·9 min read · 0 reactions · 0 comments · 37 views

#technology #artificial intelligence #hardware

768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second

TL;DR · WeSearch summary

A Redditor successfully built a workstation using 768GB of second-hand Intel Optane DIMM memory to run a 1-trillion-parameter language model. The setup achieved a performance of approximately 4 tokens per second, utilizing a Xeon CPU and a GPU. Despite the lower latency of Optane compared to SSDs, it is still slower than DRAM, and the discontinuation of Optane products poses challenges for future builds.

Key facts

▪The Redditor used six 128GB Intel Optane PMem sticks to create a workstation capable of running a large language model.
▪The performance achieved was around 4 tokens per second, which the builder considers a success given the hardware limitations.
▪Optane memory offers lower latency than NVMe SSDs but is still slower than traditional DRAM.

Original article

Tom's Hardware · https://www.tomshardware.com/author/mark-tyson

Read full at Tom's Hardware →

Opening excerpt (first ~120 words) tap to expand

Tech Industry Artificial Intelligence 768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second News By Mark Tyson published 23 May 2026 Redditor found 768GB of affordable Optane sticks second-hand. When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works. (Image credit: Lenovo) Copy link Facebook X Whatsapp Reddit Pinterest Flipboard Email Share this article 0 Join the conversation Follow us Add us as a preferred source on Google Newsletter Subscribe to our newsletter A Redditor has caused a stir by coaxing a workstation build using Optane PMem DIMMs as RAM to run a 1-trillion-parameter LLM.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Tom's Hardware.

Anonymous · no account needed

Discussion

0 comments

768GB of cheap Intel Optane DIMM memory sticks used to run 1-trillion-parameter LLM on a system with a single GPU — local Kimi K2.5 install achieved roughly 4 tokens per second

Discussion

More from Tom's Hardware