Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

May 22, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 12 views

#artificial intelligence #machine learning #research

⚡ TL;DR · AI summary

The study explores the impact of different persona vectors on sycophancy in AI models. It compares off-the-shelf persona steering vectors with the standard Contrastive Activation Addition (CAA) method. Results indicate that persona vectors can effectively reduce sycophancy while maintaining accuracy when users are correct.

Key facts

▪The research evaluates the effectiveness of off-the-shelf persona vectors in reducing sycophancy in AI models.
▪Steering toward personas characterized by doubt or scrutiny achieves significant reductions in sycophancy compared to the standard CAA method.
▪The findings suggest that sycophancy is better understood as a persona-level property rather than a single steerable direction.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.21006 (cs) [Submitted on 20 May 2026] Title:Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy Authors:Ishaan Kelkar, Nebras Alam, Vikram Kakaria, Madhur Panwar, Vasu Sharma, Maheep Chaudhary View a PDF of the paper titled Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy, by Ishaan Kelkar and 5 other authors View PDF HTML (experimental) Abstract:We study the effect of different persona on \textbf{sycophancy}: model's agreement with users even when the user is incorrect. The standard mitigation, Contrastive Activation Addition (CAA), derives a steering direction from labelled pairs of sycophantic and honest responses.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

Playing Devil's Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

Discussion

More from arXiv cs.AI