Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%)
Street AI has introduced a memory layer for LLM applications that significantly reduces input token usage. The system organizes conversation data efficiently, allowing for relevant information retrieval while minimizing the amount of data sent to the LLM API. This innovation has demonstrated an average reduction of 68% in input tokens during testing.
- ▪Street AI's memory layer sits between applications and LLM APIs, storing conversation data as signals.
- ▪The system automatically decays old data and retrieves only relevant information, leading to substantial token savings.
- ▪In a benchmark test, input tokens were reduced by 55-80% per turn, with greater savings as conversation length increased.
Opening excerpt (first ~120 words) tap to expand
Street AI Continuously learning memory layer for LLM applications. Your AI's memory grows forever. Your token bill doesn't. Street AI sits between your application and the LLM API. It stores conversation as signals organized into stacks, decays old data automatically, and retrieves only what's relevant on each turn — so you send a tiny prompt instead of the full conversation history. In our 16-turn benchmark, input tokens dropped by 55–80% per turn (average 68%), with the savings growing as the conversation lengthens. Status Alpha (0.2.0). API will change. Pin a version if you depend on it.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.