WeSearch

ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation

·3 min read · 0 reactions · 0 comments · 10 views
#data privacy#machine learning#information retrieval
ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation
⚡ TL;DR · AI summary

The paper presents ALDEN, a new method for enhancing private data extraction from Retrieval-Augmented Generation (RAG) systems. It utilizes active learning to improve the diversity of malicious queries and a decay-based algorithm for better topic distribution estimation. The authors demonstrate that ALDEN significantly outperforms existing methods in terms of data extraction rates.

Key facts
Original article
arXiv cs.AI
Read full at arXiv cs.AI →
Opening excerpt (first ~120 words) tap to expand

Computer Science > Information Retrieval arXiv:2605.18762 (cs) [Submitted on 10 Apr 2026] Title:ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation Authors:Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu, Tao Li, Danjue Chen, Shixiong Li, Yimin Chen View a PDF of the paper titled ALDEN: Boosting Private Data Extraction from Retrieval-Augmented Generation Systems via Active Learning and Distribution Estimation, by Xingyu Lyu and 7 other authors View PDF HTML (experimental) Abstract:Retrieval-Augmented Generation (RAG) is widely used to augment large language models with external knowledge retrieval to improve reliability and generalization.

Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from arXiv cs.AI