Notes on respectfully getting a personal copy of a website's contents
The article discusses issues related to accessing a personal blog due to outdated browser versions. It highlights the challenges posed by high-volume crawlers that mimic old browser user agents, leading to access restrictions. The author provides guidance for users experiencing access issues and suggests alternatives for archiving content.
- ▪Users with outdated browsers may encounter access restrictions on the blog.
- ▪High-volume crawlers are causing increased load, prompting the author to block certain user agents.
- ▪The author recommends using archive.org for better archival crawling.
Opening excerpt (first ~120 words) tap to expand
You're using a suspiciously old browser You're probably reading this page because you've attempted to access some part of my blog (Wandering Thoughts) or CSpace, the wiki thing it's part of. Unfortunately you're using a browser version that my anti-crawler precautions consider suspicious, most often because it's too old (most often this applies to versions of Chrome). Unfortunately, as of early 2025 there's a plague of high volume crawlers (apparently in part to gather data for LLM training) that use a variety of old browser user agents, especially Chrome user agents. To reduce the load on Wandering Thoughts I'm experimenting with (attempting to) block all of them, and you've run into this.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Utoronto.