Measurement

Testing without measurement is guessing with ceremony. Bad measurement is worse than no measurement, it gives you false confidence in false conclusions. Get the measurement right and testing compounds into serious operational advantage. This page: the numbers that matter, how to track them, and how to read them without fooling yourself.

The metrics hierarchy

Revenue metrics (the truth)

Conversion metrics (the drivers)

Efficiency metrics (the costs)

Engagement metrics (the signals)

The big lie. "high CTR = good ad"

Engagement metrics are the easiest to track and the most misleading. A clickbait-style ad can have 8% CTR and 0.1% purchase conversion, producing less revenue than a "boring" ad with 2% CTR and 3% conversion. Always roll engagement metrics forward to revenue before declaring a winner.

Attribution, the 2026 reality

Third-party cookies are dead, iOS 14.5+ blocks most pixel tracking. Attribution is noisier than it was in 2015. Strategies:

Platform attribution

Each platform (Meta, Google, TikTok) reports its own attribution. Treat as directional, not authoritative. Platforms systematically over-claim their own contribution.

First-party data

UTM parameters on every outbound link. Build your own attribution from the questions you ask customers ("how did you hear about us?") and session tracking.

Ground truth

Actual customers, actual revenue, total ad spend. You can't compare that to individual campaigns, but you can compare it month over month. Total spend went up, total revenue went up, directional fact.

Marketing mix modeling (MMM)

Statistical modeling that infers channel contribution from time-series data. Requires scale (tens of thousands of conversions per month minimum). Becoming more accessible with modern tooling.

Lift tests

Turn off a channel for 2 weeks. Does total revenue drop? By how much? Messy but ground-truth. Do this once a year for each major channel.

The LTV time window

LTV is calculated over a time window. 30 days, 180 days, 1 year, lifetime. The window matters:

The most common error: claiming a "2.5x LTV/CAC ratio" based on modeled LTV that never materializes. Use observed LTV until you have 18+ months of data.

Cohort tracking

Aggregate numbers lie; cohort numbers don't. Track your customers by the month they signed up:

Cohort analysis exposes trends that aggregated numbers hide: a sudden drop in new-customer quality, a retention improvement in one period, a product change that helped one cohort and not the next.

The dashboard stack

A mature direct-response dashboard shows:

Weekly

Monthly

Quarterly

Tooling

Most early-stage teams are better off with GA4 + spreadsheet than with a $3K/month attribution tool. Tooling complexity should follow operational complexity, not lead it.

The error bars problem

Small tests produce big error bars. A 3.2% vs. 3.5% conversion rate difference in a 500-visitor test isn't a winner, it's noise. Discipline:

The "declare victory too early" trap

You start a test Friday. By Monday, the challenger is "clearly winning" by 40%. Do you declare victory?

No. Small samples produce false early leads. In 30% of A/B tests, the variant that's leading at 25% of sample size ends up losing by the end. Run the test to its full sample size.

Related: Scientific testing ยท What to test ยท Controls + challengers