VSLs
๐ 7 min readUpdated 2026-04-18
The Video Sales Letter. VSL, is the modern incarnation of the long-form direct-response sales letter. 10 to 60 minutes of video selling a single offer. Structure-wise, it's the sales letter in video form. Done well, it outconverts a text-based equivalent for most high-ticket offers in 2026. Done poorly, it's a long way to lose attention.
Why VSLs work
- Video holds attention longer than text when well-paced
- Faces build trust in a way text can't
- Tone, rhythm, and emphasis carry additional persuasion
- Long-form structure in a medium audiences accept as "a video"
- No skipping ahead (if you use a locked player) forces sequential consumption
The structure, same as a sales letter
- 0:00โ0:30 Hook. Pattern interrupt. A surprising claim, a specific scene, a direct question. Earn the next 30 seconds.
- 0:30โ2:00 Promise. State the big promise, what they'll learn or be able to do by the end of the video / by buying.
- 2:00โ5:00 Problem + agitation. The problem they're facing. Why it's worse than they thought. Consequences.
- 5:00โ7:00 Introduction. Who you are, why you're qualified. Brief. Credibility, not autobiography.
- 7:00โ15:00 Mechanism. The novel approach. Why this works when others don't. Explain the thing.
- 15:00โ22:00 Proof. Case studies. Testimonials. Data. Screenshots. Specific outcomes.
- 22:00โ28:00 Benefits / "here's what you get." The offer stack unfolded.
- 28:00โ32:00 Price + bonuses. Value anchoring, reveal, bonus stack.
- 32:00โ35:00 Guarantee. Risk reversal.
- 35:00โ37:00 Urgency. Deadline, reason, consequence of waiting.
- 37:00โ40:00 CTA + close. Specific next step. Clear button on the screen.
The times are approximations, a tight VSL might be 18 minutes; a long one 60. Same sections, different pacing.
Pacing rules
- Every 30โ60 seconds, a new beat. New claim, new story, new visual. Monotony kills retention.
- Open loops. Promise something now; deliver later. "I'll show you the exact number in a minute." Keeps the viewer watching through the middle.
- B-roll and text overlays. Not decoration, emphasis. When you make a key claim, it also appears on screen.
- Pattern interrupts. Every few minutes, change tone, change setting, change medium. A voice-only segment, then a screen share, then a face-to-camera testimonial.
- No filler words in the script. "Uh," "you know," "basically" all cut in edit.
Hook formulas that work
- "If you [specific situation], this is the most important video you'll watch this year."
- "I'm about to show you [unbelievable specific claim]. And I'm going to prove it to you in the next [X] minutes."
- "Most [audience] are making this specific mistake. Let me show you what it's costing them."
- "Two years ago, I was [specific low point]. Today, [specific win]. Here's exactly what changed."
- "Everything you've been told about [topic] is incomplete. Here's the part nobody explains."
The script
VSLs are scripted. Word-for-word. Not ad-libbed. The reason: every sentence earns the next; there's no room for tangents, dead air, or self-editing in the moment.
Writing process:
- Draft the full script. 8,000โ15,000 words for a 30โ45 minute VSL
- Read aloud. Time each section.
- Cut anything that doesn't directly move the viewer forward
- Mark in beats (visual cues, b-roll moments, on-screen text)
- Rehearse. 3โ5 read-throughs before recording
- Record in takes; splice together
Production quality
The bar has risen. What used to work (slide-based, faceless VSLs) now underperforms in most categories. Current norms:
- Face-on-camera, not just slides
- Good audio (single most important production element)
- Decent lighting (natural light works; you don't need a studio)
- B-roll, screenshots, data visualizations cut in
- Captions / subtitles (many viewers watch muted)
- Logo/brand consistency throughout
"Good enough" production now means: natural light, face visible, clear audio, reasonable cuts. You don't need a film crew; you do need to not look amateur.
Delivery platforms
- Embedded on landing page, the canonical play. Video + CTA below it.
- Autoplay or click-to-play, click-to-play tends to perform better in 2026 (respects viewer choice)
- Locked or skippable, locked VSLs feel manipulative to sophisticated audiences; skippable + timed CTAs tend to convert better
- On webinar platforms, live and on-demand webinars are essentially scheduled VSLs
The CTA layer
Under the video, clear CTAs:
- Primary CTA, book call / buy / download
- Secondary CTA. "still have questions? here's how to reach us"
- Sticky button that appears after the price reveal
Some VSLs have a "lock" where the CTA only appears at a certain timestamp. This forces viewing. Effective at scale but reads as manipulative, decide based on your audience.
Metrics to track
- View rate. % who start the video
- Retention curve. % viewing at each timestamp
- Drop-off points, where viewers leave
- CTA click rate. % who click after watching
- Conversion rate. % who complete the next step
- Completed-view conversion, of those who watch the whole thing, what % buy (a specific sub-metric that tells you whether the problem is the video or the offer)
The iteration loop
VSLs aren't one-and-done:
- Ship v1
- Analyze retention curve, major drop-offs
- Rewrite problem sections
- A/B test hook variants
- Test price reveals at different timestamps
- Iterate every 30โ60 days
A mature VSL is usually on iteration 5+ before it hits its best conversion.
Related: Sales letter structure ยท Long form vs short form ยท Story selling