"Our average response time is 200ms." Sounds great, right? But what if 5% of your users are experiencing 10-second delays? The average would still look fine.
Here's why average response time is a dangerous metric and what you should track instead.
The Problem with Averages
Example: The 200ms Average That Hides a Problem
Consider these 100 requests:
- 95 requests: 100ms
- 5 requests: 2,300ms (2.3 seconds)
Average: 200ms — looks acceptable!
Reality: 5% of users waited over 2 seconds for a response.
Averages hide outliers. In latency monitoring, the outliers are often what matter most — they represent real user pain and often signal deeper problems.
Percentiles: The Metrics That Actually Matter
Instead of averages, track percentiles:
| Percentile | What It Means | Use Case |
|---|---|---|
| p50 (median) | 50% of requests are faster than this | Typical user experience |
| p90 | 90% of requests are faster than this | Most users' experience |
| p95 | 95% of requests are faster than this | Edge cases starting to show |
| p99 | 99% of requests are faster than this | Worst-case for most users |
| p99.9 | 99.9% of requests are faster than this | True outliers |
Why p99 Matters More Than p50
Consider two scenarios:
Scenario A: Stable System
- p50: 100ms
- p99: 150ms
Everyone has a good experience. The worst 1% are only 50% slower than the median.
Scenario B: Problem Brewing
- p50: 100ms (same!)
- p99: 3,000ms (20x worse)
The median looks fine, but 1% of users are experiencing 3-second delays. This could be a database issue, a slow dependency, or memory pressure — problems that will likely spread if not addressed.
Latency Targets by Application Type
What's "good" latency depends on what you're building:
| Application Type | p50 Target | p99 Target |
|---|---|---|
| Static website / CDN content | <50ms | <100ms |
| API endpoint (simple) | <100ms | <300ms |
| API endpoint (complex) | <200ms | <500ms |
| Database query (simple) | <10ms | <50ms |
| Database query (complex) | <100ms | <500ms |
| Full page load (backend) | <300ms | <1s |
These are guidelines. The right targets depend on your users, your competition, and your SLAs.
Setting Up Latency Alerts
1. Use Percentile-Based Alerts
Alert on p99, not average:
# Bad: Alert on average
if avg_response_time > 500ms: alert()
# Good: Alert on p99
if p99_response_time > 1000ms: alert()
2. Set Multiple Thresholds
| Severity | p99 Threshold | Response |
|---|---|---|
| Warning | >500ms | Investigate during business hours |
| Critical | >1s | Investigate immediately |
| Emergency | >3s | Page on-call, consider rollback |
3. Track Latency by Endpoint
Overall latency metrics hide endpoint-specific problems. Track latency separately for:
- Critical paths (checkout, login, API key endpoints)
- Heavy operations (search, reports, exports)
- Third-party integrations
4. Include Context in Alerts
When latency alerts fire, include:
- Which endpoint is slow
- Current p50, p90, p99 values
- How it compares to baseline
- Recent deployments or changes
- Correlated metrics (CPU, memory, database connections)
Common Latency Patterns and What They Mean
Pattern 1: Gradual p99 Increase
Symptoms: p99 slowly rising over days or weeks
Likely causes: Database growth, memory leak, accumulating data
Action: Check query performance, index usage, memory trends
Pattern 2: Sudden p99 Spike
Symptoms: p99 jumps suddenly, often after deployment
Likely causes: Bad deployment, infrastructure change, dependency issue
Action: Check recent changes, consider rollback
Pattern 3: p99 and p50 Diverging
Symptoms: p50 stable, p99 climbing
Likely causes: Intermittent slow operations, tail latency from dependencies
Action: Investigate slow traces, check downstream services
Pattern 4: Periodic Latency Spikes
Symptoms: Regular latency increases at certain times
Likely causes: Scheduled jobs, backup processes, traffic patterns
Action: Reschedule heavy operations, add capacity during peak times
Latency Monitoring Best Practices
1. Measure at the Right Layer
- Application level: Total request time including business logic
- Database level: Query execution time
- External dependencies: Third-party API call times
- Network level: Time spent in transit
2. Use Histograms, Not Just Summaries
Summary metrics (p50, p99) are great for dashboards, but histograms let you explore the full distribution. You can always derive percentiles from histograms, but not vice versa.
3. Track Historical Trends
Latency degrades gradually. Compare current percentiles to last week, last month to spot slow creep.
4. Correlate with Other Metrics
High latency often correlates with:
- High CPU usage
- Memory pressure
- Database connection pool exhaustion
- Disk I/O bottlenecks
5. Set SLOs for Latency
Define service level objectives like "p99 < 500ms for 99.9% of 30-day windows". This makes latency a measurable commitment, not just a metric.
Common Latency Monitoring Mistakes
Mistake 1: Only Tracking Average
Averages hide the worst user experiences. Always track at least p50 and p99.
Mistake 2: Alerting Too Aggressively on p99
p99 naturally fluctuates. Set thresholds that catch real problems, not normal variance.
Mistake 3: Ignoring p99.9
While p99.9 can be noisy, tracking it helps identify true outliers and potential edge cases.
Mistake 4: Not Breaking Down by Endpoint
Overall latency metrics are useless for debugging. Know which endpoints are slow.
Mistake 5: Not Tracking Latency Over Time
A snapshot tells you nothing about trends. Keep historical data to spot gradual degradation.
Latency Monitoring Checklist
- ☐ Track p50, p90, p99 at minimum
- ☐ Alert on p99, not average
- ☐ Break down latency by endpoint
- ☐ Set multiple alert thresholds (warning, critical, emergency)
- ☐ Include context in alerts
- ☐ Track latency trends over time
- ☐ Correlate latency with resource metrics
- ☐ Define SLOs for critical endpoints
- ☐ Use histograms for detailed analysis
- ☐ Review latency patterns weekly
Monitor Your API Latency with OpsPulse
Track response times for your endpoints and get alerted when latency degrades. Catch slow responses before users complain.
Start Free Monitoring →Summary
Latency monitoring done right:
- Use percentiles — p99 matters more than average
- Set multiple thresholds — Warning, critical, emergency
- Break down by endpoint — Know what's slow
- Track trends — Catch gradual degradation
- Correlate with resources — Find root causes faster
The goal isn't zero latency — it's predictable, acceptable latency for all users.