How Well Do Models Follow Their Constitutions?

May 26, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 15 views

#artificial intelligence #model evaluation #behavioral specifications

⚡ TL;DR · AI summary

The paper examines how well AI models adhere to their specified behavioral guidelines. It introduces a multi-method audit pipeline to evaluate compliance under real-world conditions. Findings indicate that newer model generations show significant improvements in following their respective specifications.

Key facts

▪The study analyzes models from Anthropic and OpenAI against their published specifications.
▪Models demonstrated a decrease in violation rates, with Claude models dropping from 15.0% to 2.0% and GPT models from 11.7% to 3.6%.
▪The research highlights remaining issues related to AI identity questioning and the generation of misleading quantitative claims.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.24229 (cs) [Submitted on 22 May 2026] Title:How Well Do Models Follow Their Constitutions? Authors:Arya Jakkli, Senthooran Rajamanoharan, Neel Nanda View a PDF of the paper titled How Well Do Models Follow Their Constitutions?, by Arya Jakkli and 2 other authors View PDF HTML (experimental) Abstract:Frontier AI developers now train models against long written behavioral specifications, such as Anthropic's constitution (Anthropic, 2025a) and OpenAI's Model Spec (OpenAI, 2025a), integrated into post-training via methods like character training (Anthropic, 2024) and deliberative alignment (Guan et al., 2024).

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

How Well Do Models Follow Their Constitutions?

Discussion

More from arXiv cs.AI