Observations on AI agent token consumption
A new study from researchers at Stanford and other institutions provides detailed insights into AI agent token consumption. The research reveals that agentic coding consumes significantly more tokens compared to code-chat or code-reasoning tasks. The findings highlight the importance of model choice and prompt management in controlling AI costs.
- ▪The study analyzed eight frontier models across 500 tasks, providing granular data on token consumption.
- ▪Agentic coding uses around 1,000 times more tokens than equivalent code-chat tasks, with a high input-to-output ratio.
- ▪Different models exhibit significant variations in token usage for the same tasks, impacting overall costs.
Opening excerpt (first ~120 words) tap to expand
16 May 2026 Observations on AI agent token consumption A new paper from researchers at Stanford, Michigan, DeepMind, All Hands, Microsoft AI and MIT is the most detailed open empirical study I’ve seen of how AI agents actually spend tokens at scale1. The authors run eight frontier models across 500 SWE-bench Verified tasks with four runs each, capturing full trajectory telemetry decomposed by token type, phase and action. They release the dataset alongside the paper, which is to my knowledge the most granular public corpus of agentic trajectories currently available. The paper is rigorous, careful about what it claims and puts hard numbers on questions that have until now only been answered with anecdotes. I’d recommend reading it in full.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Willhackett.