RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents
The paper introduces RecoAtlas, a benchmark and toolkit designed for evaluating LLM recommendation agents. It emphasizes the importance of behavior-grounded metrics over traditional evaluations that focus solely on semantic plausibility. The findings suggest that RecoAtlas can enhance the development of shopping assistants by optimizing for coherent and relevant recommendation sets.
- ▪RecoAtlas is a benchmark for evaluating shopping agents with behavior-grounded metrics.
- ▪It measures relevance, complementarity, and diversity derived from interaction data.
- ▪The toolkit reveals that semantic plausibility does not necessarily reflect behavior-grounded utility.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Information Retrieval arXiv:2605.18805 (cs) [Submitted on 11 May 2026] Title:RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents Authors:Imad Aouali, Flavian Vasile, Otmane Sakhi, Alexandre Gilotte, Benjamin Heymann View a PDF of the paper titled RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents, by Imad Aouali and 4 other authors View PDF HTML (experimental) Abstract:LLM recommendation agents increasingly produce structured recommendation reports: sets of items accompanied by natural-language justifications. Yet existing evaluations often reduce this setting to reranking small shortlisted candidate sets or judge reports mainly by semantic plausibility.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.