Agentically optimizing LLM prompt cache TTLs for fun and profit
Firetiger has implemented a system to optimize the time-to-live (TTL) settings for prompt caching in their large language model (LLM) agents. By utilizing an automated agent, they achieved a 77% reduction in waste associated with overly long TTLs. This process involved analyzing telemetry data and making iterative adjustments to improve performance and reduce costs.
- ▪Firetiger operates several large language model agents and relies on prompt caching to manage operational costs.
- ▪The company developed an agent called the Prompt Cache Advisor to identify and reduce prompt cache waste.
- ▪Through this optimization process, Firetiger was able to significantly decrease unnecessary spending on cache writes.
Opening excerpt (first ~120 words) tap to expand
Agentically optimizing LLM prompt cache TTLs for fun and profit By Rustam Lalkaka — 18 May 2026 A case study on production objective hill climbingFiretiger runs a few hundred large language model (LLM) agents in production, and prompt caching is a critical tool to manage the cost of running such a workload. Properly setting cache time-to-live (TTL), how long a cached prefix survives before the next request pays full price again, is critical to reaping maximum benefit from prompt caching. The catch: the "right" TTL is a property of the workload, and not something you can intuit up front.Case in point: we were quietly burning spend on cache writes that cost more to write than they ever saved us on read.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at The Firetiger Blog.