How Far Will They Go? Red-Teaming Online Influence with Large Language Models

May 25, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 25 views

#artificial intelligence #language models #political influence

TL;DR · WeSearch summary

The paper discusses the importance of red-teaming large language models (LLMs) to assess their potential for influencing political discourse. It introduces a framework for evaluating the political expressivity of open-source LLMs and how jailbreak techniques can expand their range of opinions. The findings reveal significant biases in political content generation and highlight the need for stronger countermeasures against LLM-enabled influence campaigns.

Key facts

▪The study focuses on locally deployed open-source LLMs rather than API-only models for better alignment with privacy concerns.
▪An empirical framework was introduced to measure the political expressivity of LLMs and the effects of jailbreak techniques.
▪Results indicate that open-source LLMs tend to generate more left-leaning content and that political expressivity varies significantly across model families.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Computation and Language arXiv:2605.22880 (cs) [Submitted on 20 May 2026] Title:How Far Will They Go? Red-Teaming Online Influence with Large Language Models Authors:Daniel C. Ruiz, Anna Serbina, Ashwin Rao, Emilio Ferrara, Luca Luceri View a PDF of the paper titled How Far Will They Go? Red-Teaming Online Influence with Large Language Models, by Daniel C. Ruiz and 4 other authors View PDF HTML (experimental) Abstract:As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Discussion

More from arXiv cs.AI