AI Is Starving for PDFs
Recent discussions in AI infrastructure highlight the limitations of plain text in preserving important information. Both Andre Karpathy and Thariq Shihipar argue for the need for formats that maintain visual context and structure. The PDF format emerges as a solution that retains the integrity of documents better than traditional text formats.
- ▪Karpathy argues that images can preserve original document fidelity while using fewer tokens than raw text.
- ▪Shihipar emphasizes that HTML is superior to markdown for conveying complex information due to its structural capabilities.
- ▪PDFs are highlighted as a universal format that retains the author's arrangement and visual context of documents.
Opening excerpt (first ~120 words) tap to expand
Your AI Is Starving For PDFsKarpathy wants images in. Anthropic wants HTML out. One old format does both.Michael KotlikovMay 17, 2026ShareTwo recent AI infrastructure arguments point to the same problem: plain text loses information that matters.Andre Karpathy’s version came through DeepSeek’s OCR paper, which showed that rendering a page as an image could preserve the original with high fidelity while using far fewer tokens than raw text. Karpathy’s takeaway was not “better OCR.” It was more radical: maybe text tokens are wasteful historical baggage, and the image is a better memory substrate.Thariq Shihipar, engineering lead for Claude Code at Anthropic, made the opposite-end argument in The Unreasonable Effectiveness of HTML.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Hacker News (AI / LLM).