Holdout tests

📖 3 min readUpdated 2026-04-19

Classical A/B has confounds. Holdouts are cleaner: half the audience gets the feature/message, half doesn't. Compare.

Examples

Can measure incrementality, what wouldn't have happened without the treatment.

Holdout group gets worse experience (no new feature). Minimize duration.

Use A/B for daily optimization, holdouts for quarterly truth, they answer different questions
Size the holdout to 5-10% of the population, smaller holdouts don't produce significance, larger costs too much revenue
Limit holdout duration, longer holdouts produce better data but cost more in deprived users, balance by test priority
Run holdouts on channels, features, and campaigns periodically, "does this work" is a different question than "which variant wins"
Document the ethical tradeoff clearly, holdout users get worse experience, teams need to agree the learning is worth it