WeSearch

War Story: How a Kubernetes 1.32 Node OOM Kill Cascaded Into a 2-Hour Outage for Our Video Streaming Service

·13 min read · 0 reactions · 0 comments · 7 views
#kubernetes#outage#memory management#cloud computing#devops#Kubernetes#Ubuntu 22.04 LTS#containerd#istio-proxy#linkerd-proxy#fluentd#ANKUSH CHOUDHARY JOHAL#johal.in
War Story: How a Kubernetes 1.32 Node OOM Kill Cascaded Into a 2-Hour Outage for Our Video Streaming Service
⚡ TL;DR · AI summary

On March 14, 2024, a single Kubernetes 1.32 node OOM kill triggered a cascading 2-hour outage that caused a video streaming service to lose 92% of its 4.2 million concurrent viewers. The root cause was traced to kubelet's underreporting of memory usage by 22% in sidecar containers under cgroups v2, leading to insufficient memory headroom. Implementing pod-level memory limits with 15% headroom reduced OOM failures by 94% and saved significant SLA penalties.

Key facts
Original article
DEV Community
Read full at DEV Community →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3900225) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } ANKUSH CHOUDHARY JOHAL Posted on May 2 • Originally published at johal.in War Story: How a Kubernetes 1.32 Node OOM Kill Cascaded Into a 2-Hour Outage for Our Video Streaming Service #story #kubernetes #node #kill At 19:42 UTC on March 14, 2024, our video streaming service serving 4.2 million concurrent viewers lost 92% of traffic in 11 minutes, triggered by a single Kubernetes 1.32 node OOM kill that cascaded across 18 availability zones.

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV Community.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV Community