90% cheaper repo inference with GPT-5.4 nano
The implementation of repo inference has transitioned from a gpt-5.4 preset to a gpt-5.4-nano preset, resulting in significant cost reductions. This change has led to a decrease in total costs by approximately 89.8% while maintaining accuracy in repo selection. The update also improved latency for direct repo-inference calls, although some latency metrics showed mixed results.
- ▪The repo inference step is crucial for determining which GitHub repository a task pertains to.
- ▪After switching to the gpt-5.4-nano model, the total cost per call dropped from $0.0429 to $0.00414.
- ▪The implementation change resulted in an estimated annual savings of $229,000 if traffic volume remains consistent.
Opening excerpt (first ~120 words) tap to expand
Most of the visible work in an engineering agent happens after it starts touching code: reading files, proposing changes, running tests, and opening PRs. The less visible cost is the orchestration work around that: deciding what context to fetch, which tool to call, and where the work should happen. Repo inference is one of those steps. When Charlie receives a task, he often needs to decide which customer GitHub repository the task is actually about. The repo-inference step examines the customer’s repo inventory and selects the primary repo for the work. That sounds simple until the signal comes from a Linear comment, a Slack thread, a GitHub webhook, or a request that mentions a product feature rather than a repo name.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Charlie Labs.