One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

May 25, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 27 views

#artificial intelligence #gaming #reinforcement learning

TL;DR · WeSearch summary

The paper introduces a novel approach for controlling non-player characters (NPCs) in life simulation games using a single reinforcement learning policy. This method, called pcsp, allows for scalable and real-time persona-conditioned NPC behavior. The results demonstrate significant improvements in persona identification and behavioral divergence in multi-agent environments.

Key facts

▪The pcsp method achieves compositional zero-shot persona identification up to 17 times above chance.
▪It shows a Spearman rho of approximately 0.73 for semantic-behavioral alignment.
▪The approach allows for 22 times faster inference compared to traditional LLM-as-policy baselines.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.23652 (cs) [Submitted on 22 May 2026] Title:One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents Authors:Yoosung Hong View a PDF of the paper titled One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents, by Yoosung Hong View PDF HTML (experimental) Abstract:On a 300-persona life-simulation benchmark, pcsp achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho approx 0.73 semantic-behavioral alignment, and 22x faster inference than an LLM-as-policy baseline.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

Discussion

More from arXiv cs.AI