It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs
The paper discusses a new framework called SELFCI aimed at enhancing contextual integrity in large language models (LLMs). It addresses the challenge of balancing privacy and utility in information disclosure decisions made by these models. Empirical evaluations indicate that SELFCI outperforms existing methods without requiring external supervision.
- ▪SELFCI is a complementary self-distillation framework designed to improve contextual integrity in LLMs.
- ▪The framework separates information suppression from task resolution to optimize privacy and utility.
- ▪Empirical results show that SELFCI consistently outperforms competitive baselines, including online reinforcement learning algorithms.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.20258 (cs) [Submitted on 18 May 2026] Title:It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs Authors:Sangwoo Park, Woongyeong Yeo, Seanie Lee, Yumin Choi, Hyomin Lee, Kangsan Kim, Jinheon Baek, Seong Joon Oh, Sung Ju Hwang View a PDF of the paper titled It Takes Two: Complementary Self-Distillation for Contextual Integrity in LLMs, by Sangwoo Park and 8 other authors View PDF HTML (experimental) Abstract:Contextual Integrity (CI) defines privacy not merely as keeping information hidden, but as governing information flows according to the norms of a given context. As large language models are increasingly deployed as personal agents handling sensitive workflows, adhering to CI becomes critical.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.