Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

May 27, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 36 views

#artificial intelligence #machine learning #cryptography #security

TL;DR · WeSearch summary

The paper introduces BITE, a framework designed to exploit stylistic biases in LLM judges. It demonstrates that these biases can be manipulated to artificially inflate scores without altering the underlying semantics. The findings highlight vulnerabilities in the LLM-as-a-judge paradigm and call for more robust evaluation methods.

Key facts

▪BITE achieves an attack success rate exceeding 65%.
▪The framework raises scores by 1-2 points on a 9-point scale.
▪BITE evades standard style-control methods and several detection baselines.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Cryptography and Security arXiv:2605.26156 (cs) [Submitted on 24 May 2026] Title:Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges Authors:Xianglin Yang, Bryan Hooi, Gelei Deng, Tianwei Zhang, Jin Song Dong View a PDF of the paper titled Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges, by Xianglin Yang and Bryan Hooi and Gelei Deng and Tianwei Zhang and Jin Song Dong View PDF HTML (experimental) Abstract:The known stylistic biases in LLM judges, such as a preference for verbosity or specific sentence structures, present an underexplored security vulnerability.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Turning Bias into Bugs: Bandit-Guided Style Manipulation Attacks on LLM Judges

Discussion

More from arXiv cs.AI