WeSearch

Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%)

·5 min read · 0 reactions · 0 comments · 17 views
#technology#artificial intelligence#software
Show HN: Memory for LLM apps that cuts input tokens up to 80% (avg 68%)
⚡ TL;DR · AI summary

Street AI has introduced a memory layer for LLM applications that significantly reduces input token usage. The system organizes conversation data efficiently, allowing for relevant information retrieval while minimizing the amount of data sent to the LLM API. This innovation has demonstrated an average reduction of 68% in input tokens during testing.

Key facts
Original article
GitHub
Read full at GitHub →
Opening excerpt (first ~120 words) tap to expand

Street AI Continuously learning memory layer for LLM applications. Your AI's memory grows forever. Your token bill doesn't. Street AI sits between your application and the LLM API. It stores conversation as signals organized into stacks, decays old data automatically, and retrieves only what's relevant on each turn — so you send a tiny prompt instead of the full conversation history. In our 16-turn benchmark, input tokens dropped by 55–80% per turn (average 68%), with the savings growing as the conversation lengthens. Status Alpha (0.2.0). API will change. Pin a version if you depend on it.

Excerpt limited to ~120 words for fair-use compliance. The full article is at GitHub.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from GitHub