WeSearch

Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET)

·6 min read · 0 reactions · 0 comments · 19 views
#webscraping#python#ai#apify
Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET)
⚡ TL;DR · AI summary

The article discusses a solution for scrapers that repeatedly download unchanged data. It emphasizes the importance of using conditional GET requests to minimize server load and avoid unnecessary data processing. The author shares insights from extensive scraping experience, advocating for a polite approach to web scraping.

Key facts
Original article
DEV.to (Top)
Read full at DEV.to (Top) →
Opening excerpt (first ~120 words) tap to expand

try { if(localStorage) { let currentUser = localStorage.getItem('current_user'); if (currentUser) { currentUser = JSON.parse(currentUser); if (currentUser.id === 3831260) { document.getElementById('article-show-container').classList.add('current-user-is-article-author'); } } } } catch (e) { console.error(e); } Alex Spinov Posted on May 26 • Originally published at blog.spinov.online Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET) #webscraping #python #ai #apify Note: This is a cross-post. Canonical version (full long-form) lives on my blog: https://blog.spinov.online/blog/ethical-scraping-is-a-rate-limit-question/ TL;DR The "ethical scraping" debate keeps arguing about robots.txt and ToS.

Excerpt limited to ~120 words for fair-use compliance. The full article is at DEV.to (Top).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from DEV.to (Top)