WeSearch

How Well Do Models Follow Their Constitutions?

·3 min read · 0 reactions · 0 comments · 15 views
#artificial intelligence#model evaluation#behavioral specifications
How Well Do Models Follow Their Constitutions?
⚡ TL;DR · AI summary

The paper examines how well AI models adhere to their specified behavioral guidelines. It introduces a multi-method audit pipeline to evaluate compliance under real-world conditions. Findings indicate that newer model generations show significant improvements in following their respective specifications.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.24229 (cs) [Submitted on 22 May 2026] Title:How Well Do Models Follow Their Constitutions? Authors:Arya Jakkli, Senthooran Rajamanoharan, Neel Nanda View a PDF of the paper titled How Well Do Models Follow Their Constitutions?, by Arya Jakkli and 2 other authors View PDF HTML (experimental) Abstract:Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a) and OpenAI's Model Spec (OpenAI, 2025a), integrated into post-training via methods like character training (Anthropic, 2024) and deliberative alignment (Guan et al., 2024).

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI