WeSearch

AI Is Starving for PDFs

Michael Kotlikov· ·4 min read · 0 reactions · 0 comments · 15 views
#ai#technology#pdf#data#information
AI Is Starving for PDFs
⚡ TL;DR · AI summary

Recent discussions in AI infrastructure highlight the limitations of plain text in preserving important information. Both Andre Karpathy and Thariq Shihipar argue for the need for formats that maintain visual context and structure. The PDF format emerges as a solution that retains the integrity of documents better than traditional text formats.

Key facts
Original article
Hacker News (AI / LLM) · Michael Kotlikov
Read full at Hacker News (AI / LLM) →
Opening excerpt (first ~120 words) tap to expand

Your AI Is Starving For PDFsKarpathy wants images in. Anthropic wants HTML out. One old format does both.Michael KotlikovMay 17, 2026ShareTwo recent AI infrastructure arguments point to the same problem: plain text loses information that matters.Andre Karpathy’s version came through DeepSeek’s OCR paper, which showed that rendering a page as an image could preserve the original with high fidelity while using far fewer tokens than raw text. Karpathy’s takeaway was not “better OCR.” It was more radical: maybe text tokens are wasteful historical baggage, and the image is a better memory substrate.Thariq Shihipar, engineering lead for Claude Code at Anthropic, made the opposite-end argument in The Unreasonable Effectiveness of HTML.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News (AI / LLM).

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments