Statistical significance
📖 3 min readUpdated 2026-04-19
Statistical significance tells you the observed difference is unlikely to be random. 95% confidence is standard.
Key concepts
- p-value: probability the result is random
- Confidence interval: range of likely true values
- Sample size: larger = narrower confidence
Common mistakes
- Stopping tests early (p-hacking)
- Running too many tests without correction
- Interpreting non-significance as 'no effect'
Pragmatic threshold
95% confidence, minimum sample size for effect size of interest.
What to do with this
- Compute required sample size before starting any test, "we'll see how it goes" produces false positives in 30% of tests
- Target larger effect sizes (20%+ relative lift) when traffic is limited, small-effect tests require massive sample size you don't have
- Use 95% confidence as minimum, anything lower produces too many false positives to trust your roadmap
- Use a proper calculator (VWO, Optimizely, Evan Miller's calc), don't eyeball it
- At low traffic, prioritize big-change tests over small-optimization tests, the statistical math forces you to