SAPO: Step-Aligned Policy Optimization for Reasoning-Based Generative Recommendation
The article discusses a new approach called SAPO, which stands for Step-Aligned Policy Optimization, aimed at improving generative recommendation systems. SAPO enhances the process of next-item prediction by optimizing reasoning steps through reinforcement learning. The method shows significant improvements in recommendation accuracy, particularly in scenarios with sparse feedback.
- ▪SAPO optimizes reasoning steps in generative recommendation systems using reinforcement learning.
- ▪The approach computes separate advantages for each reasoning step instead of applying a single advantage to the entire response.
- ▪SAPO has been tested on three real-world recommendation datasets, showing consistent improvements over existing methods.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2605.17648 (cs) [Submitted on 17 May 2026] Title:SAPO: Step-Aligned Policy Optimization for Reasoning-Based Generative Recommendation Authors:Zaiyi Zheng, Guanghui Min, Yaochen Zhu, Liang Wu, Liangjie Hong, Chen Chen, Jundong Li View a PDF of the paper titled SAPO: Step-Aligned Policy Optimization for Reasoning-Based Generative Recommendation, by Zaiyi Zheng and 6 other authors View PDF HTML (experimental) Abstract:Generative recommendation treats next-item prediction as autoregressive item-identifier generation. Specifically, items are encoded as semantic identifiers (SIDs), which are short coarse-to-fine token sequences whose early tokens capture broad semantics and later tokens refine them.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.