On the Fragility of Data Attribution When Learning Is Distributed
The paper discusses the vulnerabilities in data attribution within distributed machine learning systems. It highlights how a single participant can manipulate attribution values without affecting overall performance. The authors propose the need for more robust and incentive-compatible attribution mechanisms to address these issues.
- ▪Data attribution is crucial for pricing, auditing, and governance in machine learning.
- ▪The study reveals that a participant can inflate its attribution value while maintaining global utility.
- ▪The authors suggest that current attribution methods create a new attack surface that needs to be addressed.
Opening excerpt (first ~120 words) tap to expand
Computer Science > Machine Learning arXiv:2605.15520 (cs) [Submitted on 15 May 2026] Title:On the Fragility of Data Attribution When Learning Is Distributed Authors:Xian Gao, Bo Hui, Min-Te Sun, Wei-Shinn Ku View a PDF of the paper titled On the Fragility of Data Attribution When Learning Is Distributed, by Xian Gao and 3 other authors View PDF HTML (experimental) Abstract:Data attribution has become an important component of pricing, auditing, and governance in machine learning pipelines, yet most attribution methods implicitly assume that attribution values faithfully reflect participants' contributions.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at arXiv cs.AI.