Testing methodology

Cold email testing is the same as any direct response testing: one variable at a time, large enough sample, long enough window, measurement that reflects real outcomes. Most teams A/B test badly, declare winners early, and optimize toward noise.

The basics

The split

50/50 split by default. For risky challengers (new copy that could tank reply rate), 70/30 in favor of the control until you see early signal.

Running the test

  1. Define hypothesis: "Subject line A will outperform B because [reason]"
  2. Randomly split prospects 50/50 (cold email tools do this automatically)
  3. Keep everything else identical
  4. Send during same window
  5. Track reply rate + positive reply rate for at least 7 days after send
  6. Run significance calculator
  7. Declare winner or inconclusive

The significance trap

500 emails × 3% reply rate = 15 replies per variant. A 1-reply difference between variants is noise, not signal. For small-sample cold email tests, you need large differences (50%+ lift) before you can confidently say one variant won.

Rule of thumb: if your test produced under 10 replies per variant, you don't have enough data. Keep running or increase volume.

Primary metric: positive reply rate

Overall reply rate includes "unsubscribe," "wrong person," "not interested." These aren't conversions. Positive reply rate (interested, want to talk, send me more info) is the metric that correlates to pipeline.

Track both, overall reply rate tells you about deliverability and subject line effectiveness, positive reply rate tells you about offer/copy quality.

Common testing mistakes

Testing too many variables

"Let's test new subject + new first line + new CTA." Result: you don't know which change moved the needle. Can't scale what worked.

Declaring winners too early

Day 2 variant A has 5% vs variant B's 3%. "A wins!" Run to full sample. Often the gap closes or reverses.

Ignoring sample size

Testing on 100 prospects and acting on the results. You're optimizing toward noise.

Changing mid-test

Tweaking copy partway through invalidates results. Wait for test completion before iterating.

Not running a control

Launching a new campaign without a control to compare against. You can't tell if it's working compared to what.

Related

Covered in more depth on scientific testing in the direct response section, the same principles apply to cold email, VSLs, landing pages, and ads.