When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State
The paper discusses the limitations of outcome-only evaluations in artificial intelligence, particularly in the context of hotel pricing strategies. It introduces a new evaluation paradigm called discipline stability, which emphasizes the importance of behavioral discipline alongside achieving business objectives. The authors present findings from experiments that highlight the need for trace-based diagnostics to improve policy performance in competitive environments.
- ▪Outcome-only evaluations can lead to economically unsafe agents.
- ▪The discipline stability paradigm focuses on behavioral discipline in addition to achieving KPIs.
- ▪Experiments show that reward-only PPO variants often fail to align with trace diagnostics.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.18580 (cs) [Submitted on 18 May 2026] Title:When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State Authors:Peiying Zhu, Sidi Chang View a PDF of the paper titled When Outcome Looks Right But Discipline Fails: Trace-Based Evaluation Under Hidden Competitor State, by Peiying Zhu and 1 other authors View PDF HTML (experimental) Abstract:Outcome-only evaluation can certify economically unsafe agents: a policy can hit a business KPI while violating deployable behavioral discipline. In hotel pricing with hidden competitor state, a learner can achieve plausible revenue per available room while failing to preserve the rate discipline of a rule-based revenue-management competitor.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.