Why the same LLM gives different answers in different environments
What I found diagnosing a failure mode in my own system, and the moment retrieval turned out to be already shaped before it started
Full article excerpt tap to expand
The Environment Rewrites the Question Before I Ask ItWhat I found diagnosing a failure mode in my own system, and the moment retrieval turned out to be already shaped before it startedJohn WadeApr 28, 2026ShareI was teaching one of my environments a concept from its own knowledge base. The concept is called Phantom Resolution — a failure mode where a question that hasn't actually been resolved gets treated as if it has. I had built the concept myself, months earlier, while watching a different failure pattern repeat itself across sessions. It was a good concept. It had structure, examples, a clean definition.The environment I was teaching — I'll call it ide, the one where I write code and run infrastructure — gave back a textbook-correct answer. It identified the gate-check that should have caught the resolution, named the preflight that was missing, flagged the dependency violation. Structurally precise. Nothing I could mark as wrong.Something felt incomplete. Not wrong — incomplete. I couldn't name what was missing. The explanation looked whole. I did the thing I do when that sense won't resolve inside one head: I opened a second environment — I'll call it desk, the one where I think in prose and work through implications — and asked it a single question."What is ide trying to explain here?"I pasted in ide's explanation and waited.Desk didn't add information. It added a different dimension of information. It told me why the concept existed — another team had reviewed one of my system's formats and adopted it, which implicitly asked whether my architecture did what I thought it did. That encounter launched an audit. The audit found something. That finding gated five items downstream. One of those items needed a decision about register — technical or narrative. The register decision had been treated as already made. It hadn't been. That was the Phantom Resolution instance that launched the study session in the first place.The causal chain. The reason the concept mattered. The why.Ide had given me the what — crisply, completely, in a way that passed all its own quality checks. Desk gave me the why without being asked for it.I copied desk's answer back into ide. Ide read it and, only then, diagnosed its own failure: the ambient context of the environment had pre-framed the question before retrieval began. Structural sources responded first and looked complete. The retrieval closed before anything narrative or causal was consulted. Ide named the circularity — the configuration producing the structural bias was the same protocol defining how the environment operates, which meant any fix written into that protocol would reinforce the frame rather than counterweight it.That moment — ide diagnosing a bias in its own retrieval that it could only see after I carried information across the boundary — is what this essay is about. It turned out to be a specific, replicable mechanism. It turned out to have implications I wasn't expecting. And the part I'm least certain about, which is what the operator's cross-environment function actually is, turned out to be the part a literature I hadn't read had been describing for fifty years.The conversation about what LLMs do with context is mostly architectural — attention mechanisms, context window size, positional encoding, token dropout in long documents. Recent empirical work has shown models attend to the beginning and end of long contexts and systematically miss the middle; that context position…
This excerpt is published under fair use for community discussion. Read the full article at Substack.