Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%)

May 23, 2026 · 5:37 PM UTC ·5 min read · 0 reactions · 0 comments · 17 views

#technology #artificial intelligence #software

Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%)

⚡ TL;DR · AI summary

Street AI has introduced a memory layer for LLM applications that significantly reduces input token usage. The system organizes conversation data efficiently, allowing for relevant information retrieval while minimizing the amount of data sent to the LLM API. This innovation has demonstrated an average reduction of 68% in input tokens during testing.

Key facts

▪Street AI's memory layer sits between applications and LLM APIs, storing conversation data as signals.
▪The system automatically decays old data and retrieves only relevant information, leading to substantial token savings.
▪In a benchmark test, input tokens were reduced by 55-80% per turn, with greater savings as conversation length increased.

Original article

GitHub

Read full at GitHub →

Opening excerpt (first ~120 words) tap to expand

Street AI Continuously learning memory layer for LLM applications. Your AI's memory grows forever. Your token bill doesn't. Street AI sits between your application and the LLM API. It stores conversation as signals organized into stacks, decays old data automatically, and retrieves only what's relevant on each turn — so you send a tiny prompt instead of the full conversation history. In our 16-turn benchmark, input tokens dropped by 55–80% per turn (average 68%), with the savings growing as the conversation lengthens. Status Alpha (0.2.0). API will change. Pin a version if you depend on it.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed

Discussion

0 comments

Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%)

Discussion

More from GitHub