The “Robust” Data Scientist: Winning with Messy Data and Pingouin

https://www.facebook.com/kdnuggets· May 1, 2026 · 2:00 PM UTC ·5 min read · 0 reactions · 0 comments · 4 views

This article uncovers the craftsmanship of using robust statistics in data science processes: illustrating what to do when data fail tests due to not meeting standard assumptions.

Original article

KDnuggets · https://www.facebook.com/kdnuggets

Read full at KDnuggets →

Opening excerpt (first ~120 words) tap to expand

Image by Editor # Introduction A harsh truth to begin with: textbook data science usually becomes a lie in the real world. Concepts and techniques are taught on finely curated, beautifully bell-curved data variables, but as soon as we venture into the wild of real projects, we are hit with lots of outliers, unduly skewed distributions, and indomitable variances. A previous article on building an exploratory data analysis (EDA) pipeline with Pingouin showed how to detect, through tests, cases when the data violates a variety of assumptions like homoscedasticity and normality. But what if the tests fail? Throwing the data away isn't the solution: turning robust is. This article uncovers the craftsmanship of using robust statistics in data science processes.

…

Excerpt limited to ~120 words for fair-use compliance. The full article is at KDnuggets.

Anonymous · no account needed

Discussion

0 comments

The “Robust” Data Scientist: Winning with Messy Data and Pingouin

Discussion

More from KDnuggets