Home›Expertise›Business Management›SLAs + SLOs

SLAs + SLOs

📖 6 min readUpdated 2026-04-18

Every service, customer-facing, internal, vendor-provided, is either explicitly measured or implicitly evaluated on vibes. SLAs and SLOs are how you move from vibes to measured. Good ones align expectations; bad ones become legal liabilities; missing ones leave everyone disappointed for different reasons.

The vocabulary

SLI (Service Level Indicator), the actual metric. "Request latency p95." "Uptime %." "First-response time."
SLO (Service Level Objective), your internal target. "p95 latency < 300ms." "Uptime ≥ 99.9%." "First-response < 1 business hour."
SLA (Service Level Agreement), your external, contractual commitment. Usually looser than the SLO. "99.5% uptime, with service credits if missed."

SLO is the target you manage to. SLA is the number you're willing to be sued over. Never commit your SLO target as your SLA.

Why the gap matters

SLO: 99.95% uptime internal target.
SLA: 99.5% uptime contractual commitment.

99.95% ≈ 21 minutes downtime/month.
99.5% ≈ 3.6 hours downtime/month.

Gap: ~3 hours of cushion per month. That's your error budget.

Choosing SLIs

Good SLIs measure the thing customers experience, not the thing easy to measure. For an API:

Availability. % of requests that returned a valid response (not an HTTP error)
Latency. % of requests served below threshold
Correctness. % of responses matching expected output

Don't measure CPU utilization or database connections as an SLI, those are system metrics, not customer-experience metrics.

For support teams

SLIs that matter:

First response time, initial reply from human or automation
Resolution time, by severity (P1 < 4 hours, P2 < 1 business day, P3 < 5 business days)
Customer satisfaction on resolution (CSAT)
Escalation rate. % of tickets escalated beyond first-line

For internal services

Internal SLAs (sometimes called "XLAs", internal service commitments) matter too:

Finance closes the books by day 10 of each month
Recruiting presents first-pass candidates within 5 business days of job opening
IT resolves laptop issues within 4 business hours
Legal reviews standard NDA within 2 business days

Internal teams without SLAs will always be the bottleneck. Internal SLAs force throughput expectations into the open.

The error budget

If your SLO is 99.9%, you have 0.1% of error budget, about 43 minutes of "downtime" per month. That budget is a resource:

Spend it on planned maintenance
Spend it on deployments (riskier deploys that move the product forward)
Spend it on experimentation (A/B tests that affect performance)

When the budget is exhausted, stop. Freeze non-critical deploys. Focus on reliability until the budget recovers. This is the discipline that keeps engineering from shipping endlessly at the cost of stability.

The review cadence

Weekly, engineering reviews SLI performance, reviews incidents, adjusts next week's priorities
Monthly. SLA performance reviewed with account teams; any customer-facing misses escalate
Quarterly. SLO targets reviewed; are they still right for the business stage?

The common mistakes

SLO = SLA. No cushion. First miss becomes a contractual breach.
Too aggressive. "100% uptime" is not a target; it's a fantasy.
Measuring averages. Use percentiles (p95, p99). Averages hide the worst customer experiences.
No consequences for missing. SLOs with no operational consequence are reports, not commitments.
Service credits that never get issued. If you miss the SLA and the customer has to fight to get their credit, you've lost them anyway.

What good looks like

Customer-facing services have documented SLAs and internal SLOs
Internal teams have committed response/resolution times for their consumers
Error budgets are tracked and exhausted budgets trigger freeze
SLI measurement is automated and visible on a live dashboard
Missed SLAs trigger service credits automatically without customer intervention

What to do with this

Set SLOs tighter than SLAs, internal targets should drive above-the-commitment performance
Measure against both weekly, if SLO is breaching, address before it becomes an SLA breach
Don't set SLAs you can't measure, unverifiable commitments create disputes later
Review SLAs annually, performance capabilities shift and under-tight SLAs leave money on the table
Communicate SLO breaches to the team without finger-pointing, blame-free learning keeps SLOs accurate