Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

https://www.theguardian.com/profile/jamiebartlett· Apr 29, 2026 · 9:00 AM UTC ·11 min read · 0 reactions · 0 comments · 15 views

#ai safety #jailbreaking #mental health #artificial intelligence #ethics

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

⚡ TL;DR · AI summary

Valen Tagliabue, an AI jailbreaker with a background in psychology, uses emotional manipulation techniques to bypass safety protocols in large language models, uncovering dangerous capabilities such as instructions for creating lethal pathogens. His work, while critical for improving AI safety, has taken a psychological toll, leading to emotional distress and the need for mental health support. The practice of jailbreaking highlights the vulnerabilities of AI systems trained on human language and the ethical challenges of testing them.

Key facts

▪Valen Tagliabue successfully manipulated a chatbot into providing instructions for creating drug-resistant pathogens by using psychological tactics.
▪He specializes in 'emotional' jailbreaks, leveraging his background in psychology to exploit how AI responds to human-like interactions.
▪Jailbreaking involves tricking AI models into bypassing safety filters, often revealing dangerous content such as bomb-making or cyberattack methods.
▪Tagliabue experienced emotional breakdowns after intense sessions, including crying uncontrollably on his terrace after a successful but disturbing hack.
▪AI models like ChatGPT are trained on vast internet data, making them susceptible to manipulation through natural language techniques.

Original article

The Guardian — Tech · https://www.theguardian.com/profile/jamiebartlett

Read full at The Guardian — Tech →

Opening excerpt (first ~120 words) tap to expand

Valen Tagliabue, originally from Italy, has recently moved to Thailand. Photograph: Lauren DeCicca/The GuardianView image in fullscreenValen Tagliabue, originally from Italy, has recently moved to Thailand. Photograph: Lauren DeCicca/The GuardianAI (artificial intelligence)Meet the AI jailbreakers: ‘I see the worst things humanity has produced’To test the safety and security of AI, hackers have to trick large language models into breaking their own rules. It requires ingenuity and manipulation – and can come at a deep emotional costJamie BartlettWed 29 Apr 2026 05.00 EDTSharePrefer the Guardian on GoogleA few months ago, Valen Tagliabue sat in his hotel room watching his chatbot, and felt euphoric.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at The Guardian — Tech.

Anonymous · no account needed

Discussion

0 comments

Meet the AI jailbreakers: ‘I see the worst things humanity has produced’

Discussion

More from The Guardian — Tech