Setting SLOs and SLIs for Small Teams: A Practical Guide

Published: March 20, 2026 • Reading time: 10 minutes

Service Level Objectives (SLOs) and Service Level Indicators (SLIs) sound like enterprise concepts. But they're actually just structured ways to answer a simple question: "How reliable is my service, and is that good enough?"

Here's how small teams can use SLOs without over-engineering.

The Basics: SLIs, SLOs, and SLAs

SLI (Service Level Indicator)

What you measure. A metric that indicates how well your service is performing.

Availability (percentage of successful requests)
Latency (percentage of requests under 200ms)
Throughput (requests per second)

SLO (Service Level Objective)

Your target. The goal you set for your SLI.

"99.9% of requests succeed" (availability SLO)
"95% of requests complete in under 200ms" (latency SLO)

SLA (Service Level Agreement)

Your promise. What you contractually commit to, usually with consequences.

"If we drop below 99.5% availability, customers get 10% credit"

Small teams usually don't need SLAs. Focus on SLOs first. SLAs are for when you have contracts and legal consequences.

Why Bother with SLOs?

Without SLOs, reliability is subjective. "The site feels slow" or "we had some downtime" are vague.

With SLOs, you can answer:

Are we reliable enough? — Compare actual vs target
Should we focus on features or reliability? — Check your error budget
Is this incident a big deal? — How much SLO did it burn?
Are we getting better over time? — Track SLO trends

Choosing What to Measure (SLIs)

Start with One or Two SLIs

For most services, start with:

SLI	Definition	Why It Matters
Availability	Successful requests / Total requests	Is the service working?
Latency	% requests under threshold (e.g., 200ms)	Is it fast enough?

Define What Counts

Successful request: HTTP 2xx or 3xx response
Failed request: HTTP 5xx (server errors)
Don't count: HTTP 4xx (client errors like 404)

Be consistent: If you change how you count requests, your SLO numbers become incomparable. Document your definitions.

Setting Realistic SLOs

Start with What You Have

Don't pick 99.99% because it sounds good. Look at your actual performance:

Measure your current reliability for 2-4 weeks
Set your initial SLO slightly below current performance
Gradually tighten as you improve

SLO Benchmarks

SLO	Downtime/Year	Appropriate For
99%	3.65 days	Internal tools, non-critical services
99.5%	1.83 days	Standard business applications
99.9%	8.77 hours	Customer-facing services
99.95%	4.38 hours	Important services
99.99%	52.6 minutes	Critical infrastructure

99.9% is a good starting point for most customer-facing services. It's achievable and meaningful. You can always tighten later.

Error Budgets: Making SLOs Useful

An error budget is how much "unreliability" you can afford while still meeting your SLO.

Example: 99.9% Availability SLO

Time window: 30 days
Total minutes: 43,200
Allowed downtime: 43,200 * 0.1% = 43.2 minutes

Current downtime this month: 20 minutes
Remaining error budget: 23.2 minutes

Using Error Budgets

Budget healthy: Focus on features, take risks
Budget low: Focus on reliability, slow down releases
Budget exhausted: Freeze features, fix reliability issues

Implementing SLOs (Practical Steps)

Step 1: Choose Your SLI

Start with availability: percentage of successful requests.

Step 2: Set Your SLO

Based on current performance, set a target. Example: 99.5% availability over 30 days.

Step 3: Measure It

# Calculate availability from metrics
successful_requests = requests_total - requests_5xx
availability = successful_requests / requests_total

# Or from logs
grep "HTTP/1.1" access.log | \
  awk '{print $9}' | \
  sort | uniq -c | \
  awk '{if($2~/^[23]/) good+=$1; total+=$1} END {print good/total}'

Step 4: Track Over Time

Display SLO performance on a dashboard. Show:

Current SLO percentage (rolling 30 days)
Remaining error budget (in minutes)
Incidents that burned budget

Step 5: Alert on Budget Burn

Don't just alert when SLO is missed. Alert when budget is burning too fast:

Alert if error rate > 2x normal for 10 minutes
Alert if error budget will exhaust in 3 days at current rate

Common SLO Mistakes

Mistake 1: Too Many SLOs

Problem: Tracking 10 different SLOs. None are meaningful.

Fix: Start with 1-2 SLOs. Add more only when you have a specific need.

Mistake 2: Unrealistic Targets

Problem: Setting 99.99% when you're at 99%.

Fix: Set achievable targets. Tighten gradually.

Mistake 3: Counting Everything

Problem: Including health checks, monitoring probes, and bot traffic in SLO calculation.

Fix: Count real user traffic only. Filter out synthetic requests.

Mistake 4: No Action When Budget Burns

Problem: Tracking SLOs but not changing behavior when budget is low.

Fix: Define what happens at different budget levels. Actually do it.

SLO Checklist for Small Teams

Getting Started

☐ Choose 1-2 SLIs (availability, latency)
☐ Define how you count (what's a successful request?)
☐ Measure current performance for 2-4 weeks
☐ Set initial SLO based on reality
☐ Calculate error budget

Tracking

☐ Dashboard showing SLO performance
☐ Error budget remaining
☐ Alert when burning budget too fast

Action

☐ Define what to do when budget is low
☐ Review SLOs quarterly
☐ Adjust targets as you improve

Measure Your SLOs with External Monitoring

OpsPulse provides external uptime monitoring to track availability SLOs. Know your actual uptime from the user's perspective.

Start Free Monitoring →

Summary

Setting SLOs for small teams:

Start simple: 1-2 SLIs (availability, latency)
Set realistic targets: Based on current performance
Use error budgets: Guide feature vs reliability tradeoffs
Alert on budget burn: Not just SLO misses
Take action: SLOs are useless if you don't change behavior

The goal isn't perfect reliability. The goal is intentional reliability — knowing how reliable you are and deciding if that's good enough.

The Basics: SLIs, SLOs, and SLAs

SLI (Service Level Indicator)

SLO (Service Level Objective)

SLA (Service Level Agreement)

Why Bother with SLOs?

Choosing What to Measure (SLIs)

Start with One or Two SLIs

Define What Counts

Setting Realistic SLOs

Start with What You Have

SLO Benchmarks

Error Budgets: Making SLOs Useful

Example: 99.9% Availability SLO

Using Error Budgets

Implementing SLOs (Practical Steps)

Step 1: Choose Your SLI

Step 2: Set Your SLO

Step 3: Measure It

Step 4: Track Over Time

Step 5: Alert on Budget Burn

Common SLO Mistakes

Mistake 1: Too Many SLOs

Mistake 2: Unrealistic Targets

Mistake 3: Counting Everything

Mistake 4: No Action When Budget Burns

SLO Checklist for Small Teams

Getting Started

Tracking

Action

Measure Your SLOs with External Monitoring

Summary

Related Resources