You can predict LLM output sensitivity in closed form
This article discusses the predictability of output sensitivity in large language models (LLMs) during inference. It introduces a formula for determining how much perturbation can occur in the residual stream before the predictive distribution changes significantly. The findings are based on empirical observations and mathematical formulations that describe the curvature of the loss landscape in relation to the model's output stability.
- ▪The article explores the concept of stable regions in the embedding space of LLMs.
- ▪A formula is presented to predict the boundary distance for perturbations in the residual stream.
- ▪Empirical results show that the predictions align closely with observed data across various transformer models.
Opening excerpt (first ~120 words) tap to expand
The local shape of LLM stable regions May 18, 2026 · Noah Golmant This post tries to answer a question about what transformers do at inference: how far can you perturb the residual stream at some position before the predictive distribution changes? The residual stream is the running per-token vector that gets multiplied by the unembedding WUW_UWU to produce next-token logits. (Sometimes called pre-logit activations.) I find this question interesting because it can potentially offer a conceptual insight into the underlying geometry of the distribution and the model’s learning dynamics. It’s also motivated by Janiak et al.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Noahgolmant.