Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs
The paper explores the conflict between instruction-following and pattern-completion in language models. It examines how models respond to user instructions that conflict with hardcoded patterns. The findings indicate that instruction-following is inconsistent and highly dependent on the model and the nature of the instructions.
- ▪Language models are designed to follow instructions but also complete patterns, leading to potential conflicts.
- ▪The study measured instruction-following rates across 13 models and 16 different instructions, revealing a range from 1% to 99%.
- ▪Robustness against induction pressure varies by model and is influenced by the content of the instructions and the format of the output.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.20382 (cs) [Submitted on 19 May 2026] Title:Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs Authors:Carolina Camassa, Derek Shiller View a PDF of the paper titled Do as I Say, Not as I Do: Instruction-Induction Conflict in LLMs, by Carolina Camassa and 1 other authors View PDF HTML (experimental) Abstract:Language models are trained to follow instructions, but they are also powerful pattern completers. What happens when these two objectives conflict? We construct conversations in which a user instruction to behave in a target way T (e.g., always output a specific token, answer in a particular language, or adopt a persona) is opposed by N hardcoded assistant turns demonstrating a competing pattern P.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.