Making my tokens Drought Proof
The article discusses the author's efforts to create a fallback system during the Great Claude Token Drought of early 2026. They built a local AI model with a chat interface to manage and summarize conversations, while also tracking performance metrics. The author highlights the challenges faced in optimizing the system and the insights gained from logging interactions.
- ▪The author developed a local AI model to mitigate dependency on larger models during a token drought.
- ▪The system includes features for summarizing conversations and tracking performance metrics.
- ▪Challenges included optimizing the model's performance and managing memory and history.
Opening excerpt (first ~120 words) tap to expand
May 21, 2026 Routing around the Token Drought of early 2026 In March, through the Great Claude Token Drought™, I realised I needed a fallback. I'm becoming dependent on big players with their SOTA models, and getting used to the velocity they allow for. I needed to make sure I had an alternative that was (mostly) my own. So I spun up an ai. subdomain, gave it a chat GUI, and let it switch between claude -p and an ollama model. It's intended to be slow and mostly backgroundable. Here's how it went. This was becoming an all-too familiar sight: So I built it. Here's how the pieces fit together. Talk me through the features! There's a bunch of Docker containers running on the host (Zen 4 / RDNA 2 box with 8GB VRAM), logically separated and linked by a bridge, so they know about each other.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Bix.