Measurement
📖 6 min readUpdated 2026-04-18
Testing without measurement is guessing with ceremony. Bad measurement is worse than no measurement, it gives you false confidence in false conclusions. Get the measurement right and testing compounds into serious operational advantage. This page: the numbers that matter, how to track them, and how to read them without fooling yourself.
The metrics hierarchy
Revenue metrics (the truth)
- Revenue per visitor (RPV), the ultimate metric. All other metrics are intermediate.
- Average order value (AOV)
- Lifetime value (LTV), what a customer is worth over their entire relationship
- Gross profit, revenue minus COGS (delivery costs)
- Return on ad spend (ROAS)
Conversion metrics (the drivers)
- Traffic → lead. % of visitors who opt in
- Lead → qualified. % who become qualified prospects
- Qualified → customer. % who buy
- Customer → returning. % who buy again
Efficiency metrics (the costs)
- Cost per click (CPC)
- Cost per lead (CPL)
- Cost per acquisition (CPA)
- Customer acquisition cost (CAC)
- Payback period, time to recoup CAC
Engagement metrics (the signals)
- Click-through rate (CTR)
- Open rate (email)
- Video retention curves
- Scroll depth
- Time on page
The big lie. "high CTR = good ad"
Engagement metrics are the easiest to track and the most misleading. A clickbait-style ad can have 8% CTR and 0.1% purchase conversion, producing less revenue than a "boring" ad with 2% CTR and 3% conversion. Always roll engagement metrics forward to revenue before declaring a winner.
Attribution, the 2026 reality
Third-party cookies are dead, iOS 14.5+ blocks most pixel tracking. Attribution is noisier than it was in 2015. Strategies:
Platform attribution
Each platform (Meta, Google, TikTok) reports its own attribution. Treat as directional, not authoritative. Platforms systematically over-claim their own contribution.
First-party data
UTM parameters on every outbound link. Build your own attribution from the questions you ask customers ("how did you hear about us?") and session tracking.
Ground truth
Actual customers, actual revenue, total ad spend. You can't compare that to individual campaigns, but you can compare it month over month. Total spend went up, total revenue went up, directional fact.
Marketing mix modeling (MMM)
Statistical modeling that infers channel contribution from time-series data. Requires scale (tens of thousands of conversions per month minimum). Becoming more accessible with modern tooling.
Lift tests
Turn off a channel for 2 weeks. Does total revenue drop? By how much? Messy but ground-truth. Do this once a year for each major channel.
The LTV time window
LTV is calculated over a time window. 30 days, 180 days, 1 year, lifetime. The window matters:
- 30-day LTV, useful for fast-feedback decisions, underestimates true value
- 180-day LTV, captures most of the repeat purchase behavior
- 12-month LTV, standard for most subscription businesses
- "Projected" LTV, modeled based on retention curves; use carefully
The most common error: claiming a "2.5x LTV/CAC ratio" based on modeled LTV that never materializes. Use observed LTV until you have 18+ months of data.
Cohort tracking
Aggregate numbers lie; cohort numbers don't. Track your customers by the month they signed up:
- January cohort: X customers, Y total LTV after 90 days
- February cohort: same measurements
- Compare across cohorts, are newer cohorts performing better or worse?
Cohort analysis exposes trends that aggregated numbers hide: a sudden drop in new-customer quality, a retention improvement in one period, a product change that helped one cohort and not the next.
The dashboard stack
A mature direct-response dashboard shows:
Weekly
- Revenue, new customers, churn
- Spend by channel + CAC by channel
- Pipeline / funnel step conversion rates
- Tests running + their status
Monthly
- LTV by cohort, by channel, by segment
- Payback period
- Retention curves
- Test wins / losses
Quarterly
- Margin trends
- Channel mix shifts
- Attribution reconciliation (compare platform claims to ground truth)
- Market sophistication signals
Tooling
- GA4 / Plausible / Fathom, web analytics
- Mixpanel / Amplitude, product analytics
- Triple Whale / Northbeam / Rockerbox, e-commerce attribution
- Hyros, info marketing attribution
- Custom data warehouse (Snowflake + Looker), for mature operations at scale
Most early-stage teams are better off with GA4 + spreadsheet than with a $3K/month attribution tool. Tooling complexity should follow operational complexity, not lead it.
The error bars problem
Small tests produce big error bars. A 3.2% vs. 3.5% conversion rate difference in a 500-visitor test isn't a winner, it's noise. Discipline:
- Know your baseline conversion rate
- Compute required sample size before starting the test
- Don't call a winner until the sample size is reached
- Don't chase tiny differences unless you have enormous traffic
The "declare victory too early" trap
You start a test Friday. By Monday, the challenger is "clearly winning" by 40%. Do you declare victory?
No. Small samples produce false early leads. In 30% of A/B tests, the variant that's leading at 25% of sample size ends up losing by the end. Run the test to its full sample size.
What to do with this
- Roll every engagement metric forward to revenue before declaring a winner, CTR and open rate are signals, revenue per visitor is the truth
- Build cohort analysis into your dashboard, aggregate numbers hide trends that destroy businesses, cohort numbers don't
- Run an annual channel lift test, pause one channel for 2 weeks, measure the revenue drop, ground truth beats platform attribution
- Use observed LTV, not projected, until you have 18+ months of data, "projected LTV" is where bad unit economics hide
- Compute the required sample size before starting any test, if you're calling winners on 500 visitors, you're buying lottery tickets
Related: Scientific testing · What to test · Controls + challengers