The small sample trap in A/B testing
A/B testing can be misleading when sample sizes are small, as averages alone do not provide a complete picture. Larger sample sizes increase the reliability of the observed averages, making it easier to determine if a new version is truly better. Understanding the standard error is crucial for interpreting conversion rates accurately in these tests.
- ▪A/B testing results can vary significantly based on the number of visitors involved in the experiment.
- ▪The law of large numbers indicates that larger samples yield averages that are closer to the true average.
- ▪Standard error helps quantify how much the observed conversion rate might change if the experiment were repeated.
Opening excerpt (first ~120 words) tap to expand
Averages lieThe small sample trap in A/B testing18 May 2026Suppose you ran an A/B test on the signup page of your app. You wanted to ship a new version, but you weren’t sure if it is actually better, so you decided to A/B-test it. You did the experiment, and you’ve got these numbers:Baseline conversion: 3%New version conversion: 3.5%Things looks good. The conversion rate is better, so the new version must be better, right? Well, not necessarily.Let’s say there are three realities of this experiment. Three companies, exact same experiment and result, but different number of visitors in their experiments. The first had 100 visits per version - the original and new signup page; 200 visits in total. The second had 1,000 per version.
…
Excerpt limited to ~120 words for fair-use compliance. The full article is at Mustapha Hadid.