When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning
The paper explores the impact of multi-agent debate on data cleaning processes. It finds that while debate can lead to confusion and degrade generation, it also significantly improves error detection. The authors propose conditions under which debate is beneficial, emphasizing the importance of adversarial separation in the process.
- ▪Debate's effect on data cleaning can reverse, degrading generation across all tested models.
- ▪The study shows a significant improvement in error detection by 27.4pp F1 score.
- ▪A factorial experiment indicates that self-verification fails, while a separate Critic configuration improves performance.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Artificial Intelligence arXiv:2606.02866 (cs) [Submitted on 1 Jun 2026] Title:When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning Authors:Chirag Parmar, Akshat Mehta, Henglin Wu, Jagadish Ramamurthy, Shweta Medhekar View a PDF of the paper titled When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning, by Chirag Parmar and 4 other authors View PDF HTML (experimental) Abstract:When does multi-agent debate help data cleaning, and when does it hurt? Across three benchmarks, four model families, and over 6,000 task-condition pairs, we find debate's effect reverses sign: it degrades generation across all four models (-1.6 to -15.5pp) through critique-induced confusion (CIC), hallucinated Critic feedback that the Generator accepts…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.