How Far Will They Go? Red-Teaming Online Influence with Large Language Models
The paper discusses the importance of red-teaming large language models (LLMs) to assess their potential for influencing political discourse. It introduces a framework for evaluating the political expressivity of open-source LLMs and how jailbreak techniques can expand their range of opinions. The findings reveal significant biases in political content generation and highlight the need for stronger countermeasures against LLM-enabled influence campaigns.
- ▪The study focuses on locally deployed open-source LLMs rather than API-only models for better alignment with privacy concerns.
- ▪An empirical framework was introduced to measure the political expressivity of LLMs and the effects of jailbreak techniques.
- ▪Results indicate that open-source LLMs tend to generate more left-leaning content and that political expressivity varies significantly across model families.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.22880 (cs) [Submitted on 20 May 2026] Title:How Far Will They Go? Red-Teaming Online Influence with Large Language Models Authors:Daniel C. Ruiz, Anna Serbina, Ashwin Rao, Emilio Ferrara, Luca Luceri View a PDF of the paper titled How Far Will They Go? Red-Teaming Online Influence with Large Language Models, by Daniel C. Ruiz and 4 other authors View PDF HTML (experimental) Abstract:As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.