Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs
A new study reveals vulnerabilities in large language models (LLMs) related to optimization techniques. The research uncovers how compilation can be exploited to implant backdoors in LLMs without altering the compiler or hardware. The findings highlight a significant security risk in the deployment of LLMs and propose potential defenses against these attacks.
- ▪Inference optimization is crucial for deploying LLMs at scale, with compilation being the most common technique.
- ▪The study identifies that numerical side effects from compilation can be maliciously exploited to create stealthy backdoors in LLMs.
- ▪Two strategies for optimization-triggered attacks were proposed, achieving an average attack success rate of 90% across various LLMs and tasks.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Cryptography and Security arXiv:2605.20641 (cs) [Submitted on 20 May 2026] Title:Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs Authors:Yifei Wang, Tianlin Li, Xiaohan Zhang, Yida Yang, Xiaoyu Zhang, Li Pan View a PDF of the paper titled Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs, by Yifei Wang and 5 other authors View PDF HTML (experimental) Abstract:Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.