Home›Expertise›Direct Response›What to test

What to test

📖 5 min readUpdated 2026-04-18

There are hundreds of things you could test. Most aren't worth the effort. The highest-leverage tests come from the top of a specific list, market, offer, headline, and decay from there. This page is the checklist, ordered by expected impact.

Tier 1. Offer-level tests (5, 200% lift possible)

Price

Higher price vs, lower
Three-tier vs, single-tier
Monthly vs, annual pricing
Pay-in-full vs, installments

Offer structure

With bonuses vs, without
One bonus vs, stack of five
Digital-only vs, digital + physical component

Guarantee

30-day vs. 90-day vs. 1-year money-back
Standard vs, better-than-money-back
Conditional (outcome-based) vs, unconditional

Urgency / scarcity

Hard deadline vs, open enrollment
Cohort-based vs, rolling
Bonus expiration vs, price increase as the urgency driver

Tier 2. Headline + hook (5, 50% lift)

Outcome-focused ("How to hit quota without cold calls")
Specificity variant ("The 7 emails that…" vs. "Emails that…")
Problem-named vs, benefit-named
Story-driven vs, claim-driven
Mechanism-led ("the 14-minute process") vs, benefit-led
Length variants (short vs, long headlines)

Tier 3. Landing page structure (5, 30% lift)

Long form vs, short form
Video vs, text hero
Single CTA page vs, multiple CTAs throughout
Above-the-fold with video vs, image vs, animation
Social proof early vs, late
FAQ section vs, no FAQ
Testimonial placement, throughout vs, one dedicated section

Tier 4. Ad creative (paid channels; 10, 100% CTR lift possible)

UGC-style vs, polished production
Founder face-on-camera vs, no face
Static vs, video
Problem-first hook vs, benefit-first
Testimonial-based vs, brand-voice
Different opening 3 seconds on video
Caption / subtitle styles

Tier 5. Email

Subject line variants
Plain text vs. HTML
Length (short vs, long)
Sent time and day
From name (founder vs, brand)
Single CTA vs, multiple CTAs
Story-led opening vs, benefit-led opening

Tier 6. Audience / targeting (paid)

Lookalike 1% vs. 5% vs. 10%
Interest-based vs, lookalike
Broad targeting (let algorithm decide) vs, narrow targeting
Different seed audiences for lookalikes
Retargeting windows (7-day vs. 30-day)

Tier 7. Form / checkout

Number of fields
Single-page vs, multi-step checkout
Address fields now vs, after purchase
Mobile form design
Guest checkout vs, account required
Payment methods offered

Tier 8. Small design / copy

Button text
Button color (usually noise, but sometimes matters)
Hero image variants
Font family for body copy
Sub-headlines
Pricing display (strike-through vs, none, value anchoring)

The "impact estimate" filter

Before running a test, estimate: if this wins, how much lift would it produce?

Expected lift > 20%, run it, priority
Expected lift 10, 20%, run it, normal queue
Expected lift 5, 10%, only if cheap to run
Expected lift < 5%, skip

If you're consistently running tests in the < 5% range, you're optimizing the wrong things. Go back up the tier list.

Test one variable, not three

The temptation: "let's test a new headline, new image, and new button all at once and compare to the old." Result: you learn nothing about which element actually drove the change.

Test one thing at a time. If you've already found a winning combination through isolated tests, then test the stacked combination against the old stack, but even then, understand you're testing systems, not variables.

Sequential vs, concurrent

You can only run one test per page at a time (if you run two, they contaminate each other). But you can run tests on different pages simultaneously. A mature operation runs 3, 6 independent tests in parallel across different funnels.

The "we already know what works" trap

The moment a team says "we know our headline is best, no need to test," conversion stops improving. Markets shift. Audiences shift. What won in Q1 often loses in Q4. The control is always subject to being dethroned.

Multi-armed bandit vs. A/B

Advanced: multi-armed bandit algorithms dynamically shift traffic toward winners during the test. Good for long-running optimization where you value ongoing performance over clean A/B comparisons. Most teams are better off with straight A/B until they've exhausted obvious tests.

What to do with this

Before queuing any test, write the expected lift as a number, if you can't estimate it, you don't understand the test well enough to run it
Fill your weekly queue with tier 1 and tier 2 tests (offer + headline), not tier 8 (button colors), reverse the typical team's calendar
Test one variable at a time, if you change 3 things and one wins, you learn nothing about which of the 3 actually moved it
Re-test last quarter's control every 6 months, markets shift and yesterday's winner often loses to today's challenger you haven't written yet
Run 3-6 independent tests in parallel across different funnels, one test at a time is academic-pace, not operator-pace