One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents
The paper introduces a novel approach for controlling non-player characters (NPCs) in life simulation games using a single reinforcement learning policy. This method, called pcsp, allows for scalable and real-time persona-conditioned NPC behavior. The results demonstrate significant improvements in persona identification and behavioral divergence in multi-agent environments.
- ▪The pcsp method achieves compositional zero-shot persona identification up to 17 times above chance.
- ▪It shows a Spearman rho of approximately 0.73 for semantic-behavioral alignment.
- ▪The approach allows for 22 times faster inference compared to traditional LLM-as-policy baselines.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.23652 (cs) [Submitted on 22 May 2026] Title:One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents Authors:Yoosung Hong View a PDF of the paper titled One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents, by Yoosung Hong View PDF HTML (experimental) Abstract:On a 300-persona life-simulation benchmark, pcsp achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho approx 0.73 semantic-behavioral alignment, and 22x faster inference than an LLM-as-policy baseline.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.