POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

May 20, 2026 · 4:00 AM UTC ·3 min read · 0 reactions · 0 comments · 16 views

#artificial intelligence #privacy #machine learning

⚡ TL;DR · AI summary

The article introduces POLAR-Bench, a diagnostic benchmark designed to evaluate privacy-utility trade-offs in large language model (LLM) agents. It highlights the challenges LLMs face in adhering to user-defined privacy policies while interacting with third-party systems. The findings indicate a significant disparity in privacy performance between advanced models and smaller, commonly used models.

Key facts

▪POLAR-Bench assesses how well LLM agents follow user-defined privacy policies.
▪The benchmark evaluates models across 10 domains and 7,852 samples.
▪Results show that larger models withhold over 99% of protected attributes, while smaller models leak significant amounts.

Original article

arXiv cs.AI

Read full at arXiv cs.AI →

Opening excerpt (first ~120 words) tap to expand

Computer Science > Artificial Intelligence arXiv:2605.19127 (cs) [Submitted on 18 May 2026] Title:POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents Authors:Qiaoyuan Zheng, Yiqu Yang, Qi Gao, Imanol Schlag View a PDF of the paper titled POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents, by Qiaoyuan Zheng and 3 other authors View PDF HTML (experimental) Abstract:LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed

Discussion

0 comments

POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents

Discussion

More from arXiv cs.AI