What to test

Test the things that move the needle. Skip the cosmetic stuff. Here's the rough priority order by impact on reply rates and pipeline.

Tier 1: highest impact (10-50%+ lift possible)

Subject line

Single biggest driver of open rates. Can swing opens by 30-60%. Test one new subject line per week against the control.

First line (personalization)

The make-or-break for reply rate. A truly personal first line vs a generic one can 2-3x replies.

Offer / angle

The specific angle you're pitching. Same product, different framing can double positive reply rate. E.g., "help your AEs ramp faster" vs "improve forecast accuracy", different buyers care about different problems.

Sender identity

Who the email is "from." Founder vs SDR vs fictional-persona email can shift reply rates. Test this cautiously, sender reputation builds slowly.

Tier 2: medium impact (5-20%)

Email length

Very short (40 words) vs standard (80-100) vs longer (150+). Typical B2B sweet spot is 70-110 words.

CTA structure

Specific times ("Tue 2pm or Thu 10am") vs open ("What works?") vs content offer ("want the summary?"). Different CTAs serve different prospect states.

Day of week

Tuesday-Thursday mornings work best for most B2B. Monday and Friday underperform. Test within the window.

Time of day

Recipient-local 8-11am vs 1-3pm. Both work; specific audience may prefer one.

Sequence length

5-touch vs 7-touch. See sequence length.

Tier 3: small impact (under 5%)

Email signature

Minimal vs expanded. Usually doesn't matter much, but heavily marketing-ized signatures hurt.

Send volume per mailbox

30/day vs 50/day. Deliverability-related, not copy-related.

Greeting style

"Hey [name]" vs "Hi [name]" vs "[name],". Rarely meaningful differences.

Formatting

Plain text vs minimal HTML. Plain text wins for cold, but the margin is small.

Don't bother testing

These changes at typical cold email volumes produce differences smaller than the noise in the data. Optimization theater.

The experiment pipeline

Maintain a running list of hypotheses. Each week or two, test the highest-leverage one. Keep a log:

Over 6 months you build a proprietary testing archive worth more than any playbook. The knowledge is specific to your audience, your offer, your voice.

Rapid iteration without full tests

For small-volume campaigns (under 500 sends), you can't A/B test cleanly. Instead, use qualitative iteration:

  1. Send 200 emails of version A
  2. Read the replies (both positive and negative)
  3. Rewrite based on what you learned
  4. Send 200 of version B
  5. Compare directionally, not statistically

Faster than formal testing. Good for early-stage campaigns where you're still figuring out the pitch.