51 stories tagged with #scraping, in publish-time order across the WeSearch catalog. Tag pages update as new stories ingest.
⌘ RSS feed for this tag → or search "Scraping"
The smart TV in your living room is a node in the AI scraping economy
In this post we look under the hood of BrightData's SDK and how it turns ordinary consumer TVs into exit nodes of an enormous commercial, residential proxy network leveraged by the…
Twitch Chat Scraper: export any VOD's full chat replay for $1.05/1K
Quick answer: Twitch stores a complete timestamped chat replay for every public VOD but exposes no...…
Threads Reply Scraper: export the full conversation tree of any public post
Quick answer: Meta's official Threads API is gated behind a developer-account review and refuses...…
Steam Regional Price Data: fetch 60 regions in one run for $1.05/1K
Quick answer: Steam publishes regional prices on the public store.steampowered.com/api/appdetails...…
When scraping orchestration is the wrong abstraction for LLM workflows
LLM apps often need structured web data, not a scraping platform. Here's how to choose between orchestration and a simple extraction API.…
How I Built a Google Shopping Scraper with Python & Playwright
Why I Built This I wanted to compare prices across Google Shopping without clicking through 100 tabs...…
HTTP 200 Is a Lie: A 30-Line Schema Canary for Source Drift
A scraper that returns HTTP 200 is not a scraper that returns good data. Those are two different...…
I Tested Every Web Scraping Tool Against Lazada — Here's What Actually Works (May 2026)
Installing Scrapling on a 4GB VPS: pitfalls, wiring into an AI agent via MCP, browser selection benchmarks, the Camoufox surprise, and a priority ladder validated against Lazada's …
finally launched my tool to find content gaps on google (stop manual serp scraping lol)
AI and the Courts – A Cautionary Tale
The Invisible Digital Footprint: Finding Your Face Without Scraping the Web
Every time you walk through a crowded tourist spot or attend a stadium concert, you become an extra...…
How I scraped the CQC Care Register without hitting the API auth wall
The Care Quality Commission regulates 56,000+ healthcare and social care locations in England — care...…
How I built an Ofsted school data API on Apify (without scraping a single webpage)
Most scraping projects start by finding a website to scrape. This one started from the opposite...…
VerticalScope sues OpenAI, claims AI giant infringed copyright by scraping content to train GPT models - Law Times
Comprehensive up-to-date news coverage, aggregated from sources all over the world by Google News.…
How I Built an AI-Powered Google Maps Scraper for Lead Generation
The Problem Every sales team needs local business leads, but manually searching Google...…
SiteRows example #1:
Hello world! I'm starting this series of examples/use-cases of siterows.com, the new app I recently...…
Data Normalization Across Dublin Rental Portals: How to Make Listings Comparable
Data Normalization Across Dublin Rental Portals: How to Make Listings Comparable Dublin...…
How to Build Token-Efficient Web Scraping Pipelines for AI Agents Using n8n
TL;DR Building token-efficient scraping pipelines for AI agents requires stripping heavy...…
Best Java Web Scraping Libraries
TL;DR Pick Java web scraping libraries based on the target page structure, not on...…
Fighting the AI Scraperbot Scourge
There are many challenges involved with running a web site like LWN. Some of them, such as fin [...]…
Optimizing Stealth Browser Fingerprint Integrity and Session Auth
Maintaining execution stealth requires strict alignment between browser fingerprint headers and...…
Quiz: Python Web Scraping
Revisit Requests, Beautiful Soup, Scrapy, and Selenium in this wrap-up quiz covering the Python Web Scraping learning path.…
google news api vs scraping results directly
Why Your Requests + BeautifulSoup Stack Will Fail in Production
TL;DR — requests plus BeautifulSoup is the right tool for tutorials, side projects, and one-off...…
Why Real Browser Automation Is Replacing Simple HTTP Scraping
*The production problem * Simple HTTP scraping still works for a lot of pages. If a site returns...…
Your recurring scraper is re-downloading data that didn't change. Here's the 15-line fix (conditional GET)
Note: This is a cross-post. Canonical version (full long-form) lives on my blog:...…
How to know if you actually need mobile proxies (without buying any)
Every scraping project I start, the same question comes up: do I actually need mobile proxies for...…
BeautifulSoup and Requests for Web Scraping With Python: When Simple Still Works
Not every data collection workflow requires browser automation or complex network impersonation. For...…
Open-source Playwright wrapper that passes bot.sannysoft.com, pixelscan, and CreepJS in headless mode
Been scraping for a while and got tired of getting blocked the moment a page loads. Standard...…
The End of Web Scraping: Introducing WebMCP & Chrome DevTools for Agents
A raw, developer-first look at Google’s proposed WebMCP open standard and Chrome DevTools for Agents...…
Stop Fighting the DOM. Selector-First Thinking Will Save Your Scraper.
Most broken scrapers I see have the same shape: someone wrote the extraction logic first and the...…
Advanced Web Scraping with Power Query: Automating Data Extraction for SEO and Analytics
A technical guide to building robust data extraction pipelines using Power Query to automate your SEO auditing and analytics dashboards.…
CodeShot – Web screenshots,scraping, and link previews for AI agents
AI companies use malware proxies to mount DDoS attacks on web sites
Affordable web scraping proxies for AI startups. Cut costs with $1/GB residential IPs. Scalable, no minimums, fast setup with PacketStream.…
[Showoff Saturday] Web scraping for LLMs was driving us insane, so we built our own Search API with native MCP support
Google Maps Scraper: Build Local Data Pipelines That Actually Run
You do not need another CSV export that works once and quietly dies three days later. A Google Maps...…
HTML Tables with Hidden Data: Scraping What You Can't See
The table shows 10 columns. You export it. The CSV has 10 columns. But the page has 15 columns of...…
Trump’s allies in danger of scraping false hope from Maga victory in Kentucky primary
US president, like a cult leader whose commune keeps getting smaller, commands fierce loyalty from a shrinking base “Thomas Massie caught in a throuple!” screamed the AI-generated …
Linus Torvalds on the continued flood of AI bug reports
Maxun v0.0.40: Support for PDF extraction & parsing in our open-source, self-hostable no-code scraping platform!
Post-quantum TLS rolled out last January and broke most open-source scrapers
Your User-Agent header doesn't matter anymore. JA4 fingerprints classify bots at 98.6% accuracy before headers are even read. Here's what shifted in 2026.…
Show HN: Decentralized compute API on DePIN – scraping, OCR, JavaScript sandbox
Scraping, OCR, code execution and data processing via one API. 3 GB free/week.…
Three memory-leak patterns in long-running scrapers (and how I caught them after 968 Trustpilot runs)
Memory leaks in scrapers do not crash the run. They quietly bump the Apify Memory limit from 1 GB to...…
Scraping dynamic pages with Python, Playwright and AWS Lambda
A practical guide to scraping dynamic JavaScript-heavy pages with Python, Playwright, and AWS Lambda, then saving scheduled parquet snapshots to S3.…
Killswitch: Add per-function short-circuit mitigation primitive
Tell HN: Mindie.dev is scraping emails from profiles to send spam
It's set up, not setup: Scraping GitHub for grammar errors
An exploration of how often the noun 'setup' is used where the verb 'set up' belongs, across 30,000+ public GitHub repositories.…
I built 5 single-platform scrapers. The one that sells fastest is the aggregator that wraps them.
I run a small portfolio of public scrapers on the Apify Store. Most of them are single-platform — one...…
I've been scraping viral image-gen prompts off X for weeks — here's what I learned about why most "copy this prompt" promises fail, and the tool I built to fix it
The Day I Logged 1 In Every 2000 Public IPv4: Visualizing The AI Scraper DDoS
In an attempt to grasp the magnitude of web scraper attacks against my websites, i went the way of visualizing.…