The Day Treasure Hunt Broke My Caches—And How We Fixed It
The article discusses the challenges faced by a treasure hunt engine due to high Redis key deletion rates, which led to performance issues. The team implemented various solutions, including sharding and caching, but ultimately had to redesign their architecture. The final solution involved using ClickHouse for leaderboards and Kafka for event streaming, resulting in improved performance and availability.
- ▪The treasure hunt engine experienced 1.2 million key deletions per minute, causing significant latency and performance issues.
- ▪Initial attempts to scale the Redis cluster by increasing shards failed due to connection limits and persistent evictions.
- ▪The final architecture involved separating leaderboards by player and using Kafka for event handling, which stabilized the system.
Opening excerpt (first ~120 words) tap to expand
try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3942461) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Lillian Dube Posted on May 27 The Day Treasure Hunt Broke My Caches—And How We Fixed It #webdev #programming #architecture #systems The Problem We Were Actually Solving The treasure hunt engine used a single Redis sorted set key per map instance: hytale:treasure:global:top. With 200 concurrent maps and 40 k concurrent players, each map push'opération' (ZADD hytale:treasure:global:top ) triggered an implicit DEL when the key grew past Rediss active-expire threshold.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).