Three memory-leak patterns in long-running scrapers (and how I caught them after 968 Trustpilot runs)

May 18, 2026 · 2:52 AM UTC ·5 min read · 0 reactions · 0 comments · 37 views

TL;DR · WeSearch summary

The article discusses three common memory-leak patterns encountered in long-running web scrapers. These leaks can significantly increase operational costs without immediate detection. The author shares insights from extensive testing on Trustpilot runs and offers solutions to mitigate these issues.

Key facts

▪Memory leaks in scrapers can increase costs by doubling the Apify Memory limit from 1 GB to 4 GB.
▪The most common leak pattern involves an unbounded asyncio queue that grows linearly with runtime.
▪Dynamic regex patterns can lead to cache misses and increased memory usage during long runs.

Original article

DEV.to (Top)

Read full at DEV.to (Top) →

Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3831260) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Alex Spinov Posted on May 18 • Originally published at blog.spinov.online Three memory-leak patterns in long-running scrapers (and how I caught them after 968 Trustpilot runs) #webscraping #python #ai #apify Memory leaks in scrapers do not crash the run. They quietly bump the Apify Memory limit from 1 GB to 2 GB to 4 GB, double the per-run cost, and only get spotted weeks later on a compute-unit invoice.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed

Discussion

0 comments

Three memory-leak patterns in long-running scrapers (and how I caught them after 968 Trustpilot runs)

Discussion

More from DEV.to (Top)