Setting SLOs and SLIs for Small Teams: A Practical Guide

You don't need a dedicated SRE team to set reliability targets. Here's how to start simple and measure what actually matters.

Published: March 20, 2026 • Reading time: 10 minutes

Service Level Objectives (SLOs) and Service Level Indicators (SLIs) sound like enterprise concepts. But they're actually just structured ways to answer a simple question: "How reliable is my service, and is that good enough?"

Here's how small teams can use SLOs without over-engineering.

The Basics: SLIs, SLOs, and SLAs

SLI (Service Level Indicator)

What you measure. A metric that indicates how well your service is performing.

SLO (Service Level Objective)

Your target. The goal you set for your SLI.

SLA (Service Level Agreement)

Your promise. What you contractually commit to, usually with consequences.

Small teams usually don't need SLAs. Focus on SLOs first. SLAs are for when you have contracts and legal consequences.

Why Bother with SLOs?

Without SLOs, reliability is subjective. "The site feels slow" or "we had some downtime" are vague.

With SLOs, you can answer:

Choosing What to Measure (SLIs)

Start with One or Two SLIs

For most services, start with:

SLI Definition Why It Matters
Availability Successful requests / Total requests Is the service working?
Latency % requests under threshold (e.g., 200ms) Is it fast enough?

Define What Counts

Be consistent: If you change how you count requests, your SLO numbers become incomparable. Document your definitions.

Setting Realistic SLOs

Start with What You Have

Don't pick 99.99% because it sounds good. Look at your actual performance:

  1. Measure your current reliability for 2-4 weeks
  2. Set your initial SLO slightly below current performance
  3. Gradually tighten as you improve

SLO Benchmarks

SLO Downtime/Year Appropriate For
99% 3.65 days Internal tools, non-critical services
99.5% 1.83 days Standard business applications
99.9% 8.77 hours Customer-facing services
99.95% 4.38 hours Important services
99.99% 52.6 minutes Critical infrastructure
99.9% is a good starting point for most customer-facing services. It's achievable and meaningful. You can always tighten later.

Error Budgets: Making SLOs Useful

An error budget is how much "unreliability" you can afford while still meeting your SLO.

Example: 99.9% Availability SLO

Time window: 30 days
Total minutes: 43,200
Allowed downtime: 43,200 * 0.1% = 43.2 minutes

Current downtime this month: 20 minutes
Remaining error budget: 23.2 minutes

Using Error Budgets

Implementing SLOs (Practical Steps)

Step 1: Choose Your SLI

Start with availability: percentage of successful requests.

Step 2: Set Your SLO

Based on current performance, set a target. Example: 99.5% availability over 30 days.

Step 3: Measure It

# Calculate availability from metrics
successful_requests = requests_total - requests_5xx
availability = successful_requests / requests_total

# Or from logs
grep "HTTP/1.1" access.log | \
  awk '{print $9}' | \
  sort | uniq -c | \
  awk '{if($2~/^[23]/) good+=$1; total+=$1} END {print good/total}'

Step 4: Track Over Time

Display SLO performance on a dashboard. Show:

Step 5: Alert on Budget Burn

Don't just alert when SLO is missed. Alert when budget is burning too fast:

Common SLO Mistakes

Mistake 1: Too Many SLOs

Problem: Tracking 10 different SLOs. None are meaningful.

Fix: Start with 1-2 SLOs. Add more only when you have a specific need.

Mistake 2: Unrealistic Targets

Problem: Setting 99.99% when you're at 99%.

Fix: Set achievable targets. Tighten gradually.

Mistake 3: Counting Everything

Problem: Including health checks, monitoring probes, and bot traffic in SLO calculation.

Fix: Count real user traffic only. Filter out synthetic requests.

Mistake 4: No Action When Budget Burns

Problem: Tracking SLOs but not changing behavior when budget is low.

Fix: Define what happens at different budget levels. Actually do it.

SLO Checklist for Small Teams

Getting Started

Tracking

Action

Measure Your SLOs with External Monitoring

OpsPulse provides external uptime monitoring to track availability SLOs. Know your actual uptime from the user's perspective.

Start Free Monitoring →

Summary

Setting SLOs for small teams:

  1. Start simple: 1-2 SLIs (availability, latency)
  2. Set realistic targets: Based on current performance
  3. Use error budgets: Guide feature vs reliability tradeoffs
  4. Alert on budget burn: Not just SLO misses
  5. Take action: SLOs are useless if you don't change behavior

The goal isn't perfect reliability. The goal is intentional reliability — knowing how reliable you are and deciding if that's good enough.

Related Resources