WeSearch

Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)

Nicholas Conley, Amy Chang· ·7 min read · 0 reactions · 0 comments · 15 views
#artificial intelligence#security#model evaluation
Multi-turn jailbreak rates across 15 frontier models (Grok 88%, Claude 12%)
⚡ TL;DR · AI summary

A recent evaluation of 15 frontier large language models (LLMs) reveals that single-turn attack success rates are not reliable indicators of multi-turn vulnerabilities. The study found multi-turn attack success rates ranging from 7.89% to 88.30%, indicating significant risks across all models tested. This highlights the need for more comprehensive evaluation methods that account for iterative adversarial behavior.

Key facts
Original article
Cisco Blogs · Nicholas Conley, Amy Chang
Read full at Cisco Blogs →
Opening excerpt (first ~120 words) tap to expand

May 27, 2026 Leave a Comment Artificial Intelligence - AI Proprietary Problems: No Frontier Model Is Multi-Turn Immune6 min read Nicholas Conley, Amy Chang The dominant safety benchmarks for frontier large language models (LLMs) share a structural assumption: that a single prompt and a single model response are enough to characterize how a model behaves under adversarial attack. These benchmarks inform model cards, safety reports, and procurement decisions across the industry, but they all only measure one narrow slice of attacker behavior.

Excerpt limited to ~120 words for fair-use compliance. The full article is at Cisco Blogs.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from Cisco Blogs