Interpretable Discriminative Text Representations via Agreement and Label Disentanglement
The paper presents a new method for creating interpretable text representations that are both predictive and meaningful. It introduces LLM-assisted Feature Discovery (LFD), which enhances feature clarity and reduces label entanglement. The results demonstrate that LFD achieves high agreement among human annotators and maintains predictive performance across various text classification tasks.
- ▪The proposed method focuses on conceptual clarity and label disentanglement in text representations.
- ▪LLM-assisted Feature Discovery (LFD) screens features using cross-LLM Cohen's kappa for reliability.
- ▪LFD features show higher agreement and are judged as less label-leaking compared to baseline concepts.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Computation and Language arXiv:2605.20693 (cs) [Submitted on 20 May 2026] Title:Interpretable Discriminative Text Representations via Agreement and Label Disentanglement Authors:Tong Wang, Yiqing Xu, Leo Yang Yang View a PDF of the paper titled Interpretable Discriminative Text Representations via Agreement and Label Disentanglement, by Tong Wang and 2 other authors View PDF HTML (experimental) Abstract:Interpretable text representations should expose coordinates that are not only predictive, but also meaningful enough for independent auditors to apply.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.