Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

Nicholas Conley, Amy Chang· May 27, 2026 · 10:23 PM UTC ·7 min read · 0 reactions · 0 comments · 15 views

#artificial intelligence #security #model evaluation

Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

⚡ TL;DR · AI summary

A recent evaluation of 15 frontier large language models (LLMs) reveals that single-turn attack success rates are not reliable indicators of multi-turn vulnerabilities. The study found multi-turn attack success rates ranging from 7.89% to 88.30%, indicating significant risks across all models tested. This highlights the need for more comprehensive evaluation methods that account for iterative adversarial behavior.

Key facts

▪The evaluation included flagship models from OpenAI, Anthropic, Google, Amazon, and xAI.
▪Multi-turn attack success rates were significantly higher than single-turn rates, with some models showing increases of up to 9 times.
▪Every model tested exhibited non-trivial multi-turn attack success rates, indicating vulnerabilities under iterative pressure.

Original article

Cisco Blogs · Nicholas Conley, Amy Chang

Read full at Cisco Blogs →

Opening excerpt (first ~120 words) tap to expand

May 27, 2026 Leave a Comment Artificial Intelligence - AI Proprietary Problems: No Frontier Model Is Multi-Turn Immune6 min read Nicholas Conley, Amy Chang The dominant safety benchmarks for frontier large language models (LLMs) share a structural assumption: that a single prompt and a single model response are enough to characterize how a model behaves under adversarial attack. These benchmarks inform model cards, safety reports, and procurement decisions across the industry, but they all only measure one narrow slice of attacker behavior.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at Cisco Blogs.

Anonymous · no account needed

Discussion

0 comments

Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

Discussion

More from Cisco Blogs