Testing without measurement is guessing with ceremony. Bad measurement is worse than no measurement, it gives you false confidence in false conclusions. Get the measurement right and testing compounds into serious operational advantage. This page: the numbers that matter, how to track them, and how to read them without fooling yourself.
Engagement metrics are the easiest to track and the most misleading. A clickbait-style ad can have 8% CTR and 0.1% purchase conversion, producing less revenue than a "boring" ad with 2% CTR and 3% conversion. Always roll engagement metrics forward to revenue before declaring a winner.
Third-party cookies are dead, iOS 14.5+ blocks most pixel tracking. Attribution is noisier than it was in 2015. Strategies:
Each platform (Meta, Google, TikTok) reports its own attribution. Treat as directional, not authoritative. Platforms systematically over-claim their own contribution.
UTM parameters on every outbound link. Build your own attribution from the questions you ask customers ("how did you hear about us?") and session tracking.
Actual customers, actual revenue, total ad spend. You can't compare that to individual campaigns, but you can compare it month over month. Total spend went up, total revenue went up, directional fact.
Statistical modeling that infers channel contribution from time-series data. Requires scale (tens of thousands of conversions per month minimum). Becoming more accessible with modern tooling.
Turn off a channel for 2 weeks. Does total revenue drop? By how much? Messy but ground-truth. Do this once a year for each major channel.
LTV is calculated over a time window. 30 days, 180 days, 1 year, lifetime. The window matters:
The most common error: claiming a "2.5x LTV/CAC ratio" based on modeled LTV that never materializes. Use observed LTV until you have 18+ months of data.
Aggregate numbers lie; cohort numbers don't. Track your customers by the month they signed up:
Cohort analysis exposes trends that aggregated numbers hide: a sudden drop in new-customer quality, a retention improvement in one period, a product change that helped one cohort and not the next.
A mature direct-response dashboard shows:
Most early-stage teams are better off with GA4 + spreadsheet than with a $3K/month attribution tool. Tooling complexity should follow operational complexity, not lead it.
Small tests produce big error bars. A 3.2% vs. 3.5% conversion rate difference in a 500-visitor test isn't a winner, it's noise. Discipline:
You start a test Friday. By Monday, the challenger is "clearly winning" by 40%. Do you declare victory?
No. Small samples produce false early leads. In 30% of A/B tests, the variant that's leading at 25% of sample size ends up losing by the end. Run the test to its full sample size.
Related: Scientific testing ยท What to test ยท Controls + challengers