ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation
The paper presents ALDEN, a new method for enhancing private data extraction from Retrieval-Augmented Generation (RAG) systems. It utilizes active learning to improve the diversity of malicious queries and a decay-based algorithm for better topic distribution estimation. The authors demonstrate that ALDEN significantly outperforms existing methods in terms of data extraction rates.
- ▪ALDEN is designed to boost private data extraction from RAG systems.
- ▪The method employs active learning to diversify malicious queries.
- ▪A decay-based dynamic algorithm is introduced for estimating topic distribution.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Information Retrieval arXiv:2605.18762 (cs) [Submitted on 10 Apr 2026] Title:ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation Authors:Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu, Tao Li, Danjue Chen, Shixiong Li, Yimin Chen View a PDF of the paper titled ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation, by Xingyu Lyu and 7 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) is widely used to augment large language models with external knowledge retrieval to improve reliability and generalization.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.