What to test

📖 4 min readUpdated 2026-04-18

Test the things that move the needle. Skip the cosmetic stuff. Here's the rough priority order by impact on reply rates and pipeline.

Tier 1: highest impact (10-50%+ lift possible)

Subject line

Single biggest driver of open rates. Can swing opens by 30-60%. Test one new subject line per week against the control.

First line (personalization)

The make-or-break for reply rate. A truly personal first line vs a generic one can 2-3x replies.

Offer / angle

The specific angle you're pitching. Same product, different framing can double positive reply rate. E.g., "help your AEs ramp faster" vs "improve forecast accuracy", different buyers care about different problems.

Sender identity

Who the email is "from." Founder vs SDR vs fictional-persona email can shift reply rates. Test this cautiously, sender reputation builds slowly.

Tier 2: medium impact (5-20%)

Email length

Very short (40 words) vs standard (80-100) vs longer (150+). Typical B2B sweet spot is 70-110 words.

CTA structure

Specific times ("Tue 2pm or Thu 10am") vs open ("What works?") vs content offer ("want the summary?"). Different CTAs serve different prospect states.

Day of week

Tuesday-Thursday mornings work best for most B2B. Monday and Friday underperform. Test within the window.

Time of day

Recipient-local 8-11am vs 1-3pm. Both work; specific audience may prefer one.

Sequence length

5-touch vs 7-touch. See sequence length.

Tier 3: small impact (under 5%)

Email signature

Minimal vs expanded. Usually doesn't matter much, but heavily marketing-ized signatures hurt.

Send volume per mailbox

30/day vs 50/day. Deliverability-related, not copy-related.

Greeting style

"Hey [name]" vs "Hi [name]" vs "[name],". Rarely meaningful differences.

Formatting

Plain text vs minimal HTML. Plain text wins for cold, but the margin is small.

Don't bother testing

Font in email (unless you're using something weird)
Button color (you probably shouldn't have a button in cold email)
Exact greeting punctuation
Minor word changes ("sale" → "special offer")
Order of 2 sentences mid-email

These changes at typical cold email volumes produce differences smaller than the noise in the data. Optimization theater.

The experiment pipeline

Maintain a running list of hypotheses. Each week or two, test the highest-leverage one. Keep a log:

Hypothesis tested
Variable changed
Result (winner / loser / inconclusive)
Size of effect
What you rolled forward

Over 6 months you build a proprietary testing archive worth more than any playbook. The knowledge is specific to your audience, your offer, your voice.

Rapid iteration without full tests

For small-volume campaigns (under 500 sends), you can't A/B test cleanly. Instead, use qualitative iteration:

Send 200 emails of version A
Read the replies (both positive and negative)
Rewrite based on what you learned
Send 200 of version B
Compare directionally, not statistically

Faster than formal testing. Good for early-stage campaigns where you're still figuring out the pitch.

What to do with this

Test ICP and segmentation first, wrong-audience problems pose as copy problems and no copy test fixes them
Once the audience is tight, test subject line + first line, they produce the biggest reply-rate swings per hour of work
Test pitch angles (pain-led vs outcome-led vs insight-led) as separate campaigns, not as one-line edits within a campaign
Use fast directional tests (200 sends per variant) early, formal statistical tests once a pitch is stable
Never test more than 1 variable per campaign, multi-variable tests teach you nothing about the cause of the delta