The last six months in LLMs in five minutes
The advancements in coding agents have become significant over the past six months. OpenAI and Anthropic focused on improving code quality through Reinforcement Learning from Verifiable Rewards. By November, these agents had improved enough to be used reliably for real work without constant corrections.
- ▪OpenAI and Anthropic enhanced coding agents using Reinforcement Learning from Verifiable Rewards.
- ▪The quality of code produced by these models improved dramatically in late 2025.
- ▪By November, coding agents became reliable enough for daily use without frequent errors.
Opening excerpt (first ~120 words) tap to expand
# It took a little while for this to become clear, but the real news from November was that the coding agents got good. OpenAI and Anthropic had spent most of 2025 running Reinforcement Learning from Verifiable Rewards to increase the quality of code written by their models, especially when paired up with their Codex and Claude Code agent harnesses. In November the results of this work became apparent. Coding agents went from often-work to mostly-work, crossing a quality barrier where you could use them as a daily-driver to get real work done, without needing to spend most of your time fixing their stupid mistakes.
Excerpt limited to ~120 words for fair-use compliance. The full article is at Simon Willison's Weblog.