Holdout tests
📖 3 min readUpdated 2026-04-19
Classical A/B has confounds. Holdouts are cleaner: half the audience gets the feature/message, half doesn't. Compare.
Examples
- Geo holdout for paid ads
- User holdout for a retention email
- Feature holdout for rollouts
Why better than A/B
Can measure incrementality, what wouldn't have happened without the treatment.
Cost
Holdout group gets worse experience (no new feature). Minimize duration.
What to do with this
- Use A/B for daily optimization, holdouts for quarterly truth, they answer different questions
- Size the holdout to 5-10% of the population, smaller holdouts don't produce significance, larger costs too much revenue
- Limit holdout duration, longer holdouts produce better data but cost more in deprived users, balance by test priority
- Run holdouts on channels, features, and campaigns periodically, "does this work" is a different question than "which variant wins"
- Document the ethical tradeoff clearly, holdout users get worse experience, teams need to agree the learning is worth it